Calc/Macros/Linear Regression

From Apache OpenOffice Wiki
Jump to: navigation, search

This tutorial is a brief overview of how to create a Java-based macro in OpenOffice 2.2. The macro itself is a semi-advanced one: it requires the user to select two adjacent columns in the software, and then allows you to calculate the line of best fit between them. It contains a function that writes the y-intercept and the slope of the line of best fit (b and m in y = mx + b, respectively). These two values are put into the cells beside the two selected columns, without the user's choice.

Based on the above, this tutorial covers the following:

  • Creating a Java-based macro.
  • Getting information on selected cells within Calc.
  • Outputting values through the macro.


If this is your first tutorial, please be advised that you should have the OpenOffice SDK installed, and set the environment variables prior to compiling any of the source code. See the OOSDK documentation for more information.

You should also have a working version of Java installed, and familiar with writing and compiling .java files.

The Code

Programming macros in Java is relatively easy and requires only one .java file. Since we're doing linear regressions, we'll call the java file, and our function will be called compute(). Here is the beginning of the file:

[java,N] import; import; import; import; import*; import*; import;

public class LinReg {

    public static void compute(XScriptContext xScriptContext) {

The code above simply imports the various classes and interfaces we need. The compute() class requires an XScriptContext object, which is what allows us to connect to Calc and obtain information.

Now, we'll use an object called xDocModel to obtain the document from Calc. This is used for any documents, and the second line in the code below acquires the actual spreadsheet document. Note that the "UnoRuntime.queryInterface" is very important when programming macros, and anyone interested in learning more about Calc macros should read up on it in the developer documentation. Once we have an object representing the spreadsheet document, we use getSheets() to gain access to each spreadsheet.


         // Loads the spreadsheets into Java.
         XModel xDocModel = xScriptContext.getDocument();
         XSpreadsheetDocument xsd = (XSpreadsheetDocument)UnoRuntime.queryInterface(XSpreadsheetDocument.class, xDocModel);
         XSpreadsheets xSheets = xsd.getSheets();
         // We'll use the variable below to represent the spreadsheet.
         XSpreadsheet xs = null;
         try {

The actual implementation of the function will take place in two steps: first we'll get the data, and then we'll calculate the regression. While this may not be the most optimal way, it'll allow you to see how we do each part separately (so you can skip linear regressions!).

First, we need to be able to access the spreadsheet. The code below allows you to access a spreadsheet by its index, start from 0. If you want to get the active sheet rather than the first one, you'll have to read the developers' guide.


              // Gets the first sheet in the document.			
              XIndexAccess xSheetsIA = (XIndexAccess)UnoRuntime.queryInterface(XIndexAccess.class, xSheets);
              xs = (XSpreadsheet)UnoRuntime.queryInterface(XSpreadsheet.class, xSheetsIA.getByIndex(0));	

Now that we have the specific spreadsheet, we need to go through a similar process to get the selected range of cells. The first part gets the actual selected range, while the second part allows us to get its address in terms of the spreadsheet (e.g. A1:B10).


              // Gets an interface to the selected cells.
              XInterface xi = (XInterface)xDocModel.getCurrentSelection();
              XCellRange xcr = (XCellRange)UnoRuntime.queryInterface(XCellRange.class, xi);	
              // Gets the selected range's address/location.
              XCellRangeAddressable xAdd = (XCellRangeAddressable)UnoRuntime.queryInterface(XCellRangeAddressable.class, xcr);
              CellRangeAddress address = xAdd.getRangeAddress();

Using the address object, we can find where the columns and rows start and end.


              int sRow = address.StartRow;
              int sCol = address.StartColumn;
              int eRow = address.EndRow;
              int eCol = address.EndColumn;

Now that we have a range, we can ensure that we have only two columns and can then input the data into arrays of numbers. This is done below, with the only useful OO-based code near the end. We use getCellByPosition(<col>, <row>) to get a cell, and then call getValue() to get its numerical value. If it doesn't have a number, the function returns a 0.

[java, N]

              // Ensure two columns.
              if (eCol - sCol == 1) {
              int range = eRow - sRow;
              double[] vals1 = new double[range+1];
              double[] vals2 = new double[range+1];
              double sum = 0;
              for (int i = 0; i <= range; i++) {
                   // Stores the values in an array.
                   vals1[i] = xs.getCellByPosition(sCol, sRow + i).getValue();
                   vals2[i] = xs.getCellByPosition(eCol, sRow + i).getValue();

Now comes the linear regression code. I won't explain the statistics - Google does that well enough if you ask. Skip to the end of this code to get back to Calc programming.

[Java, N]

              // Get means.
              double xmean = 0;
              double ymean = 0;
              for (int i = 0; i <= range; i++) {
                   xmean += vals1[i];
                   ymean += vals2[i];
              xmean /= vals1.length;
              ymean /= vals2.length;
              // Get the coefficients.
              double numerator = 0;
              double denominator = 0;
              for (int i = 0; i <= range; i++) {
                   numerator += (vals1[i] - xmean)*(vals2[i] - ymean);
                   denominator += (vals1[i] - xmean)*(vals1[i] - xmean);
              double m = numerator / denominator;
              double b = ymean - m*xmean;

Finally, we put the values we calculated into a cell in Calc. This works in the same manner as our getValue() code earlier in the tutorial. Closing braces and a catch statement are added to finish the file.

[Java, N]

              xs.getCellByPosition(eCol + 1, sRow).setValue(b);
              xs.getCellByPosition(eCol + 1, sRow + 1).setValue(m);
         } catch (Exception e) {
              // Do nothing. :)


XML Description

While the java file is complete, we still need to describe what we're doing for Calc, so it knows that there are functions it should be running. This is done in an XML file called parcel-descriptor.xml. The XML code makes reference to the jar file we're creating, and also tells Calc which functions are called, and what they do (through a description). The code is fairly self-explanatory.

[XML, N] <?xml version="1.0" encoding="UTF-8"?> <parcel language="Java" xmlns:parcel="scripting.dtd">

    <script language="Java">
         <locale lang="en">
               <displayname value="Linear Regression"/>
                    Linear regression.
          <functionname value="LinReg.compute"/>
          <logicalname value="LinReg.compute"/>
               <prop name="classpath" value="linreg.jar"/>


Packaging It

Now you've got your XML file and your Java class... This part is easy. Compile your java class, put it in a jar file with the same name you included in the XML description above (in this case, it's linreg.jar). Compile the Java file with javac and create a jar using jar -cf lingreg.jar LinReg.class.

Finally, to include the macro is a Calc file, open a new Calc file and save it. Then use an archive manager to open the file as if it were a zip file. OpenOffice files are actually packages, so it's possible to create a set of new folders inside with the following path: /Scripts/java/LinReg.

Within this folder, include linreg.jar and parcel-descriptor.xml and you're done!

Personal tools