Difference between revisions of "Documentation/How Tos/Using Arrays"

From Apache OpenOffice Wiki
Jump to: navigation, search
(→‎Issues: different dimension arrays - example; re-ordered)
(→‎Array formula calculations: details about size constraints on unexpected arrays, and add example.)
Line 90: Line 90:
 
The process in effect works thus:
 
The process in effect works thus:
  
#All ‘unexpected arrays’ in the same array calculation should be the same size (see this issue: Issue 46681).
+
#All ‘unexpected arrays’ in the same array calculation must either
 +
#*a) have the same number of rows and columns, or
 +
#*b) have the same number of rows and some arrays may have only one column, or
 +
#*c) have the same number of rows and some arrays may have only one column.
 +
#Where some of the ‘unexpected arrays’ have just one row or column, that row or column is replicated enough times to give that array the same number of rows and columns as the other arrays. (This was tracked in [https://issues.apache.org/ooo/show_bug.cgi?id=46681 Issue 46681] and fixed in OpenOffice.org v3.1).
 
#The result will be returned in an array of that size.
 
#The result will be returned in an array of that size.
 
#The calculation is done for each element in turn, with the result returned in the corresponding element of the output array.
 
#The calculation is done for each element in turn, with the result returned in the corresponding element of the output array.
Line 123: Line 127:
 
#The result of the <tt>'''IF()'''</tt> array calculation will be an array of that size, which <tt>'''SUM()'''</tt> will add up.
 
#The result of the <tt>'''IF()'''</tt> array calculation will be an array of that size, which <tt>'''SUM()'''</tt> will add up.
 
#If A1>0 the first element is <tt>'''B1'''</tt>; else <tt>'''0'''</tt>. If A2>0 the second element is <tt>'''B2'''</tt>; else <tt>'''0'''</tt> .... The array presented to <tt>'''SUM()'''</tt> has the values in B1:B4 where the adjacent value in A1:A4 is >0. The final output is the sum of the values in B1:B4 for which the adjacent value in A1:A4 is >0.
 
#If A1>0 the first element is <tt>'''B1'''</tt>; else <tt>'''0'''</tt>. If A2>0 the second element is <tt>'''B2'''</tt>; else <tt>'''0'''</tt> .... The array presented to <tt>'''SUM()'''</tt> has the values in B1:B4 where the adjacent value in A1:A4 is >0. The final output is the sum of the values in B1:B4 for which the adjacent value in A1:A4 is >0.
 +
 +
''Example:''
 +
 +
With the array formula <tt>'''=((({6|8})>({1;3;5|7;9;10}))*({1;3;5|7;9;10}))'''</tt>
 +
 +
# The first array has two rows and one column. The other two arrays have two rows and three columns.
 +
# The single column of the first array is replicated three times, to make an array with two rows and three columns.
 +
# The result is <tt>'''{1;3;5|7;0;0}'''</tt>.
 +
# If the first array had more than two rows, or a different number of columns than one or three, it would not be sized correctly for this formula.
  
 
== Array functions ==
 
== Array functions ==

Revision as of 01:27, 12 October 2014

Introduction

Arrays may be helpful in reducing the number of cells you use, but they can be a little complex to understand. In some ways, they extend what a spreadsheet can do beyond what spreadsheets were intended for. Almost all tasks are possible without them.

An array is simply a rectangular block of information that Calc can manipulate in a formula - that is, it is information organised in rows and columns. An array may be cells on a spreadsheet, or may be held internally by Calc.

There are 2 ways to specify an array in a formula:

  • as a range - for example A2:C3.
Calc array4.png
  • as an "inline array", for example {1; 5; 3 | 6; 2; 4} (these are fully functional from OOo2.4, but do exist in earlier versions - see Array Issues). You type curly braces { } around an inline array. Entries on a row are separated by a semicolon ‘;’, and rows are separated by the pipe character ‘|’. Each row must have the same number of elements (it is wrong to write {1; 2; 3 | 4; 5} because there are 3 elements in the top row and only 2 in the next row). Inline arrays may have mixed contents, for example {4; 2; "dog" | -22; "cat"; 0}. However an inline array may not contain references (eg A4), or formulae (eg PI(), 2*3 ), or percentages (eg 5%).


Calc array5.png

You can give a name to a range of cells: select the range and choose Insert - Names - Define.

You can give a name to an inline array: choose Insert - Names - Define; type the array (eg {1; 3; 2} including the curly braces) in the Assigned to box.


Functions which understand array parameters

Some functions, such as SUM(), AVERAGE(), MATCH(), LOOKUP(), accept one or more of their parameters as arrays.

For example:

SUM( A2:C3 ) returns the sum of the numbers in the range A2:C3.
SUM( {3; 2; 4} ) returns 9, the sum of the numbers in the inline array {3; 2; 4}.
SUM expects/understands single (‘scalar’) values too - SUM( B5; 7 ) returns the sum of B5 and 7.

Functions not expecting array parameters

Some functions, such as ABS(), SQRT(), COS(), LEN() expect their parameters to be ‘scalar’ - that is, a single value such as 354 or "dog" or the contents of a cell eg B5.

Example:

SQRT(4) returns 2.
LEN("dog") returns 3.

However, you may still use an array where a single value is expected - for example SQRT( {9; 4} ). If you enter the formula ‘normally’ by pressing Enter, Calc will then evaluate the formula using a single value from the array as follows:

If it is an inline array:

Calc will use the first value (the ‘top left’).
Example:
=SQRT( {9; 4 | 25; 16} ) then press Enter returns 3, the square root of the first element in the array (9).

If it is a range:

1. Calc will return an error unless the array is a single row or a single column.
2. For a single row or a single column range, Calc will use the value where the formula cell’s column/row intersects with the array (or return an error if there is no intersection).
Examples:
Calc array1.png
The formula =ABS(B2:B5) is entered 'normally' in cell D3, that is on row 3. Row 3 intersects B2:B5 at cell B3, thus the formula evaluated is =ABS(B3).
Calc array2.png
The formula =LEN(B5:D5) is entered 'normally' in cell B1, that is in column B. Column B intersects B5:D5 at cell B5, thus the formula evaluated is =LEN(B5).

Array formulas

The real power of arrays comes when you enter a formula in a special way, as an ‘array formula’. You do this by pressing Cntrl-Shift-Enter instead of the Enter button (or by ticking the Array checkbox if you are using the Formula Wizard).

If in cell B1 you enter ={3; 4} ‘normally’ by pressing Enter, the first value 3 is displayed in the cell.

If in cell B2 you type ={3; 4} but press Cntrl-Shift-Enter instead of Enter, the cell becomes an 'array formula'. The formula now returns the entire array {3; 4}. Cell B2 displays 3 and cell C2 displays 4.

Note that if you entered the formula using the Enter key, simply selecting the cell and pressing Cntrl-Shift-Enter will not convert the cell to an array expression - you must make an actual edit (such as adding then deleting a character), or click the Function Wizard and tick the Array checkbox.

If you now try to edit cell B2, you are told that "you cannot change only part of an array". To edit an array you must select the entire array, either with the mouse or by typing Cntrl-/ (hold the Cntrl key and press the slash key ‘/’).

Calc array6.png


The formula bar indicates that this is an array formula by enclosing it in curly braces {}. You do not need to type these - they will disappear while you edit the formula, and Calc will show them again when you have finished editing.

Array formula calculations

When Calc evaluates an array formula, it treats ‘unexpected arrays’ as a series of values (rather than using a single value), calculating a result for each of the array elements, and returning an array of results.

Example:

Calc array3.png
=SQRT( {16; 4; 25} ) when entered by pressing Cntrl-Shift-Enter instead of Enter returns an array of results, with 1 row and 3 columns - {4; 2; 5}. If the formula is in cell B2, Calc places the results in cells B2:D2. 4 is placed in B2, 2 in C2 and 5 in D2.


The process in effect works thus:

  1. All ‘unexpected arrays’ in the same array calculation must either
    • a) have the same number of rows and columns, or
    • b) have the same number of rows and some arrays may have only one column, or
    • c) have the same number of rows and some arrays may have only one column.
  2. Where some of the ‘unexpected arrays’ have just one row or column, that row or column is replicated enough times to give that array the same number of rows and columns as the other arrays. (This was tracked in Issue 46681 and fixed in OpenOffice.org v3.1).
  3. The result will be returned in an array of that size.
  4. The calculation is done for each element in turn, with the result returned in the corresponding element of the output array.

Example:

With the array formula =SQRT( {16; 4; 25} ):

  1. There is only one array, with 1 row and 3 columns.
  2. The result will be returned in an array with 1 row and 3 columns.
  3. The calculation is done for 16 first, then for 4, then for 25, giving the array result {4; 2; 5}.

Example:

With the array formula =SQRT( {8 | 18} * 2 ) in cell A5:

  1. There is only one array, with 2 rows and 1 column.
  2. The result will be returned in an array with 2 rows and 1 column.
  3. The calculation is: first element SQRT(8*2) = 4; second element SQRT(18*2) = 6; the array result is thus {4 | 6} - that is, 4 in cell A5 and 6 in cell A6.

The result of an array expression is an array, which can be used within the formula.

Example:

=SUM(SQRT( {16; 4; 25} )) as an array formula. The calculation of SQRT( {16; 4; 25} ), as before, yields a result of {4; 2; 5}, thus giving SUM( {4; 2; 5} ), returning a final result in the cell of 4+2+5 = 11.

Example:

With the array formula =SUM(IF(A1:A4>0; B1:B4; 0))

  1. The two arrays A1:A4 and B1:B4 both have 4 rows and 1 column.
  2. The result of the IF() array calculation will be an array of that size, which SUM() will add up.
  3. If A1>0 the first element is B1; else 0. If A2>0 the second element is B2; else 0 .... The array presented to SUM() has the values in B1:B4 where the adjacent value in A1:A4 is >0. The final output is the sum of the values in B1:B4 for which the adjacent value in A1:A4 is >0.

Example:

With the array formula =((({6|8})>({1;3;5|7;9;10}))*({1;3;5|7;9;10}))

  1. The first array has two rows and one column. The other two arrays have two rows and three columns.
  2. The single column of the first array is replicated three times, to make an array with two rows and three columns.
  3. The result is {1;3;5|7;0;0}.
  4. If the first array had more than two rows, or a different number of columns than one or three, it would not be sized correctly for this formula.

Array functions

Some functions calculate their result as an array. If they are to return that array, they must be entered as an array formula, by pressing Cntrl-Shift-Enter (or by ticking the Array checkbox if you are using the Formula Wizard). If, instead, they are entered 'normally' with the Enter button, just the (single) top left element of the calculated array will be returned. These funtions are listed in the Array function category, and are:

MINVERSE, MMULT, MUNIT, TRANSPOSE, FREQUENCY, GROWTH, LINEST, LOGEST, TREND


For example: =MUNIT(2) entered as an array formula in cell B1 returns the 2 x 2 unit (identity) matrix as the array {1; 0 | 0; 1} - that is, cells B1, B2, C1, C2 show 1, 0, 0, 1. If this formula is entered 'normally', just the 'top left' value (1) is returned in cell B1.


Some functions (including some listed above) take parameters that are forced to evaluate as an array formula, even when the formula is entered 'normally':

MDETERM, MINVERSE, MMULT, SUMPRODUCT, SUMX2MY2, SUMX2PY2, SUMXMY2, CORREL, COVAR, FORECAST, FTEST, INTERCEPT, MODE, PEARSON, PROB, RSQ, SLOPE, STEYX, TTEST


For example: MODE(ABS(A1:A3)) (in normal mode) forces ABS(A1:A3) to be evaluated as an array formula, returning an array of the absolute values of A1:A3 from which MODE selects the most common value to be returned.

For example: PROB(A1:A5; B1:B5/100; 3) (in normal mode) forces B1:B5/100 to be evaluated as an array formula, returning an array where each value is one hundredth of its value in B1:B5 (perhaps because the probabilities were expressed in percent).


A few of these functions have some (array) parameters that are forced to evaluate as an array formula, and other (single value) parameters that evaluate normally.

For example: PROB(A1:A5; B1:B5; {3; 4}) (in normal mode) does not evaulate {3; 4} as an array, because this would mean PROB returning an array. It evaluates PROB(A1:A5; B1:B5; 3) - that is, it interprets {3; 4} in scalar mode, taking the 'top left' value 3. However, if entered as an array formula it will of course return an array.

Tips and Tricks

All these examples are to be entered as an array expression, by pressing Cntrl-Shift-Enter.

Sum of entries matching multiple conditions

SUM( (A1:A6="red")*(B1:B6="big")*C1:C6 ) returns the sum of entries in C1:C6 whose A column entries are "red" AND whose B column entries are "big". A1:A6 and B1:B6 each produce a 6 element array of TRUE or FALSE - which in number calculations are 1 or 0. Thus if A2 contains "red" and B2 contains "big" the second element of the array is 1 * 1 * C2 = C2. If A2 contains "blue" instead, the second element of the array is 0 * 1 * C2 = 0.

SUM( ((A1:A6="red")+(B1:B6="big")>0)*C1:C6 ) returns the sum of entries in C1:C6 whose A column entries are "red" OR whose B column entries are "big".

SUM( MOD((A1:A6="red")+(B1:B6="big");2)*C1:C6 ) returns the sum of entries in C1:C6 either whose A column entries are "red" OR whose B column entries are "big" but not both (exclusive OR)

SUM( NOT((A1:A6="red")+(B1:B6="big"))*C1:C6 ) returns the sum of entries in C1:C6 where neither the A column entry is "red" NOR the B column entry is "big".

Count of entries matching multiple conditions

SUM( (A1:A6="red")*(B1:B6="big") ) returns the number of rows whose A column entries are "red" AND whose B column entries are "big".

Maximum in a particular month

MAX(IF(MONTH(B1:B9)=5; C1:C9; 0)) returns the maximum value in C1:C9 where the corresponding B1:B9 date is in May (month 5). The MIN() function would provide the minimum in the month.

Average of entries meeting a condition

AVERAGE(IF(A1:A9="red"; B1:B9; "")) returns the average of entries in B1:B9 whose A column entries are “red”. The AVERAGE function ignores any blank entries.

The MEDIAN function can be used similarly.

Dynamic sorting of a column

=LARGE(B3:B9;ROW(B3:B9)+1-ROW(B3)) entered in cell C3 returns an array in C3:C9 which is B3:B9 in descending order. +1-ROW(3) is a constant - you could write =LARGE(B3:B9;ROW(B3:B9)-2) instead.

=SMALL(B3:B9;ROW(B3:B9)-2) returns B3:B9 in ascending order.

Sum ignoring errors

SUM( IF(ISERROR(A1:A9); 0; A1:A9) ). Normally SUM will propagate any error found.

Average, ignoring zero entries

AVERAGE(IF(B1:B9<>0; B1:B9; ""))

Test a cell for one of a set of values

OR(B2={2; 5; 6}) as an array expression, or OR(B2=2; B2=5; B2=6) as a ‘normal’ formula.

Sum of the smallest 4 numbers

SUM(SMALL(B3:B9; {1;2;3;4})) or SUM(SMALL(B3:B9;ROW(A1:A4))). ROW(A1:A4) produces {1;2;3;4}.

Last used cell in a column

MAX(ROW(B1:B9)*(B1:B9<>"")) returns the row number of the last cell used.

Issues

  • Some functions cannot yet be used in array formulas: COUNTIF, SUMIF (issue 65866), and ISLOGICAL (issue 87219). MATCH (issue 8947) works from OOo3.0.
  • OFFSET has limited behaviour within an array formula.
  • Names defined by Insert - Names - Define can be used within an array formula, but labels (either Insert - Names - Labels or automatically recognised at the head of a column) should not be used (issue 87268).
  • In Excel, not all functions can be used in array formulas. For example CONCATENATE works in Calc but not in Excel. Be aware of Excel's limitations if you plan to use a Calc spreadsheet in Excel.
  • Volatile functions such as RAND, RANDBETWEEN, NOW may not be recalculated in array formulas; for example in {=A1:A3+RAND()} just one random number is generated (issue 67135).
  • Before OOo2.4 spaces and negative numbers in inline arrays failed (issue 82644).
  • Before OOo3.1 it was not possible to include different dimension arrays in a formula (issue 46681). Now { {1|2} + {10;20|30;40} } internally expands {1|2} to {1;1|2;2}, to correctly return the result {11;21|32;42}.
  • Array formulas may slow down your computer if you have large arrays.
Content on this page is licensed under the Public Documentation License (PDL).
Personal tools