Documentation/How Tos/Regular Expressions in Calc

From Apache OpenOffice Wiki
Jump to: navigation, search


In simple terms regular expressions are a clever way to find text.

A typical use for regular expressions is in finding text; for instance to locate all cells containing man or woman in your spreadsheet, you could search using a single regular expression.

Regular expressions in Calc and Writer

Regular expressions are available in Calc as follows:

  • Edit → Find & Replace dialog
  • Data → Filter → Standard filter & Data → Filter → Advanced filter
  • Certain functions, such as SUMIF, LOOKUP

The best way to learn about regular expressions in Calc is to start by understanding how to use them in Find & Replace. This is covered by the 'HowTo for Regular Expressions in Writer', which you should read.

In Calc, regular expressions are applied separately to each cell. (You'll see that regular expressions are applied separately to each paragraph in Writer.) So a search for 'r.d' will match 'red' in cell A1 but will not match 'r' in cell A2 with 'd' (or 'ed') in cell A3. (The regular expression 'r.d' means 'try to match 'r' followed by another character followed by 'd' ').

Regular expressions in Calc functions

There are a number of functions in Calc which allow the use of regular expressions:

Whether or not regular expressions are used is selected on the Tools → Options → OpenOffice Calc → Calculate dialog:

choosing to use regular expressions in Calc functions

For example =COUNTIF(A1:A6;"r.d")' with "Enable regular expressions in formulas" selected will count cells in A1:A6 which contain 'red' and 'ROD'.

Additionally if "Search criteria = and <> must apply to whole cells" is not selected then 'Fred', 'bride', and 'Ridge' will also be counted. If that setting is selected, then it can be overcome by wrapping the expression thus: '=COUNTIF(A1:A6;".*r.d.*")'.

regular expression in COUNTIF function

Regular expression searches within functions are always case insensitive, irrespective of the setting of the "Case sensitive" check box on the dialog above - so 'red' and 'ROD' will always be matched in the above example. This case-insensitivity also applies to the regular expression structures ([:lower:]) and ([:upper:]), which match characters irrespective of case. See issue 71000 for some background.

Regular expressions will not work in simple comparisons. For example: ' A1="r.d" ' will always return FALSE if A1 contains 'red', even if regular expressions are enabled - it will only return TRUE if A1 contains 'r.d' ('r' then a dot then 'd'). If you wish to test using regular expressions, try the 'COUNTIF' function - 'COUNTIF(A1; "r.d")' will return '1' or '0', interpreted as TRUE or FALSE in formulae like '=IF(COUNTIF(A1; "r.d");"hooray"; "boo")'

Activating the "Enable regular expressions in formulas" setting means all the above functions will require any regular expression special characters (such as parentheses) used in strings within formulas, to be "escaped" using a preceding backslash, despite not being part of a regular expression. These backslashes will need removing if the setting is later deactivated.

Regular expressions in Calc Find & Replace

Find & Replace in Calc is very similar to Find & Replace in Writer, as described in the 'HowTo for Regular Expressions in Writer'. The following points are interesting to Calc users:

  • Mind you that the option "Search criteria = and <> must apply to whole cells" has an effect on filtering (with regex it behaves like appending a "$").

  • When a match is found, the whole cell is shown highlighted, but only the text found will be replaced. For example, searching for 'brown' will highlight a cell containing 'redbrown clay', and replacing with nothing will leave the cell containing 'red clay'.

  • If Find is used twice in a row, the second time with "Current selection only" activated, then the second search will evaluate the whole of each selected cell, not just the strings found which caused the cells to be selected in the first search. For example, searching for 'joh?n', then activating "Current selection only" and searching for '', will find cells containing Jon Smith and Smythers, Johnathon.

  • If a cell contains a hard line break (entered by Cntrl-Enter), this may be found by '\n'. For example if a cell contains 'red hard_line_break clay' then searching for 'd\nc' and replacing with nothing leaves the cell containing 'relay'.

  • The hard line break acts to mark "end of text" as understood by the regular expression special character '$' (in addition of course to the end of text in the cell). For example if a cell contains 'red hard_line_break clay' then a search for 'd$' replacing with 'al' leaves the cell with 'real hard_line_break clay'. Note that with this syntax the hard line break is not replaced - it simply marks the end of text.

  • Using '\n' in the 'Replace with' box will replace with the literal characters '\n', not a hard line break.

  • The Find & Replace dialog has an option to search 'Formulas', 'Values', or 'Notes'. This applies to any search, not just one using regular expressions. Searching with the 'Formulas' option would find 'SUM' in a cell containing the formula '=SUM(A1:A6)'. If a cell contains text instead of a function, the text will still be found - so that the simple text 'SUMMARY' in a cell would also give a match to 'SUM' using the 'Formulas' option.

  • Searching for the regular expression '^$' will not find empty cells. This is intentional - the rationale being to avoid performance issues when selecting a huge number of cells. Note that empty cells will not be found even if you are only searching a selection.

  • Find '.+' (or similar) and Replace with '&' effectively re-enters the contents of cells. This can be used to 'strip' formatting automatically applied by Calc (often needed to 'clean' data imported from the clipboard or badly formatted files), for example, to convert text strings consisting of digits, into actual numbers (the cells must first be correctly formatted 'number'). The leading apostrophes, telling Calc to treat the numbers as text, are removed.

Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages