Difference between revisions of "Documentation/How Tos/Regular Expressions in Calc"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (added more links)
m (Regular expressions in Calc functions)
(33 intermediate revisions by 7 users not shown)
Line 1: Line 1:
 +
[[fr:Documentation/FR/Expressions_Regulieres_dans_Calc]]
 +
[[nl:Documentation/nl/How_Tos/Reguliere_expressies_in_Calc]]
 +
 
== Introduction ==
 
== Introduction ==
 
In simple terms regular expressions are a clever way to find text.
 
In simple terms regular expressions are a clever way to find text.
Line 12: Line 15:
 
*Edit - Find & Replace dialog
 
*Edit - Find & Replace dialog
  
*Data - Filter - Standard filter
+
*Data - Filter - Standard filter & Advanced filter
  
*Functions, such as SUMIF, LOOKUP
+
*Certain functions, such as SUMIF, LOOKUP
  
The best way to learn about regular expressions in Calc is to start by understanding how to use Find & Replace. This is covered by the '<b>[[Documentation/How_Tos/Regular Expressions in Writer|HowTo for Regular Expressions in Writer]]</b>', which you should read.
+
The best way to learn about regular expressions in Calc is to start by understanding how to use them in Find & Replace. This is covered by the '<b>[[Documentation/How_Tos/Regular Expressions in Writer|HowTo for Regular Expressions in Writer]]</b>', which you should read.
  
  
In Calc, regular expressions are applied separately to each cell. (You'll see that regular expressions are applied separately to each paragraph in Writer.) So a search for '<b>r.d</b>' will match red in cell A1 but will not match '<b>r</b>' in cell A2 with '<b>d</b>' (or '<b>ed</b>') in cell A3. (The regular expression '<b>r.d</b>' means 'try to match '<b>r</b>' followed by another character followed by '<b>d</b>' ').
+
In Calc, regular expressions are applied separately to each cell. (You'll see that regular expressions are applied separately to each paragraph in Writer.) So a search for '<b>r.d</b>' will match '<b>red</b>' in cell A1 but will not match '<b>r</b>' in cell A2 with '<b>d</b>' (or '<b>ed</b>') in cell A3. (The regular expression '<b>r.d</b>' means 'try to match '<b>r</b>' followed by another character followed by '<b>d</b>' ').
 
+
  
 
== Regular expressions in Calc functions ==
 
== Regular expressions in Calc functions ==
Line 46: Line 48:
  
  
Whether or not regular expressions are used is selected on the Tools - Options - OpenOffice.org Calc - Calculate dialog:
+
Whether or not regular expressions are used is selected on the {{menu|Tools|Options|OpenOffice Calc|Calculate}} dialog:
  
  
Line 52: Line 54:
  
  
For example ''''=COUNTIF(A1:A6;"r.d")'''' with "Enable regular expressions in formulas" selected will count cells in A1:A6 which contain ''''red'''' and ''''ROD''''. Additionally if "Search criteria = and <> must apply to whole cells" is ''not'' selected then ''''Fred'''', ''''bride'''', and ''''Ridge'''' will also be counted.
+
For example '''=COUNTIF(A1:A6;"r.d")'''' with "Enable regular expressions in formulas" selected will count cells in A1:A6 which contain ''''red'''' and ''''ROD''''.
  
  
[[Image:Regex_howto_4.png|regular expression in COUNTIF function]]
+
Additionally if "Search criteria = and <> must apply to whole cells" is ''not'' selected then ''''Fred'''', ''''bride'''', and ''''Ridge'''' will also be counted. If that setting ''is'' selected, then it can be overcome by wrapping the expression thus: ''''=COUNTIF(A1:A6;".*r.d.*")''''.
  
  
Note that regular expression searches ''within functions'' are <u>always case insensitive</u>, irrespective of the setting of the 'Case sensitive' check box on the dialog above - so ''''red'''' and ''''ROD'''' will always be matched in the above example. Note also that the regular expression structures ([:lower:]) and ([:upper:]) match characters irrespective of case.
+
[[Image:Regex_howto_4.png|regular expression in COUNTIF function]]
 +
 
  
 +
Regular expression searches ''within functions'' are <u>always case insensitive</u>, irrespective of the setting of the "Case sensitive" check box on the dialog above - so ''''red'''' and ''''ROD'''' will always be matched in the above example. This case-insensitivity also applies to the regular expression structures ([:lower:]) and ([:upper:]), which match characters irrespective of case. See [https://bz.apache.org/ooo/show_bug.cgi?id=71000 issue 71000] for some background.
  
 
Regular expressions will not work in simple comparisons. For example:
 
Regular expressions will not work in simple comparisons. For example:
 
' <b>A1="r.d"</b> ' will always return '''FALSE''' if A1 contains '<b>red</b>', even if regular expressions are enabled - it will only return '''TRUE''' if A1 contains '<b>r.d</b>' ('<b>r</b>' then a <b>dot</b> then '<b>d</b>'). If you wish to test using regular expressions, try the '<b>COUNTIF</b>' function - '<b>COUNTIF(A1; "r.d")</b>' will return '<b>1</b>' or '<b>0</b>', interpreted as '''TRUE''' or '''FALSE''' in formulae like '<b>=IF(COUNTIF(A1; "r.d");"hooray"; "boo")</b>'
 
' <b>A1="r.d"</b> ' will always return '''FALSE''' if A1 contains '<b>red</b>', even if regular expressions are enabled - it will only return '''TRUE''' if A1 contains '<b>r.d</b>' ('<b>r</b>' then a <b>dot</b> then '<b>d</b>'). If you wish to test using regular expressions, try the '<b>COUNTIF</b>' function - '<b>COUNTIF(A1; "r.d")</b>' will return '<b>1</b>' or '<b>0</b>', interpreted as '''TRUE''' or '''FALSE''' in formulae like '<b>=IF(COUNTIF(A1; "r.d");"hooray"; "boo")</b>'
  
 +
Activating the "Enable regular expressions in formulas" setting means all the above functions will require any regular expression special characters (such as parentheses) used in strings within formulas, to be "escaped" using a preceding backslash, despite not being part of a regular expression. These backslashes will need removing if the setting is later deactivated.
  
 
== Regular expressions in Calc Find & Replace ==
 
== Regular expressions in Calc Find & Replace ==
Line 69: Line 74:
 
Find & Replace in Calc is very similar to Find & Replace in Writer, as described in the '<b>[[Documentation/How_Tos/Regular Expressions in Writer|HowTo for Regular Expressions in Writer]]</b>'. The following points are interesting to Calc users:
 
Find & Replace in Calc is very similar to Find & Replace in Writer, as described in the '<b>[[Documentation/How_Tos/Regular Expressions in Writer|HowTo for Regular Expressions in Writer]]</b>'. The following points are interesting to Calc users:
  
* When a match is found, the whole cell is shown highlighted, but only the text found will be replaced. For example, searching for '<b>brown</b>' will highlight a cell containing '<b>redbrown clay</b>', and replacing with nothing will leave the cell containg '<b>red clay</b>'.
+
* Mind you that the option "'''Search criteria = and <> must apply to whole cells'''" has an effect on filtering (with regex it behaves like appending a "$").
 +
 
 +
 
 +
* When a match is found, the whole cell is shown highlighted, but only the text found will be replaced. For example, searching for '<b>brown</b>' will highlight a cell containing '<b>redbrown clay</b>', and replacing with nothing will leave the cell containing '<b>red clay</b>'.
 +
 
 +
 
 +
* If Find is used twice in a row, the second time with "Current selection only" activated, then the second search will evaluate the ''whole'' of each selected cell, ''not'' just the strings found which caused the cells to be selected in the first search. For example, searching for '<b>joh?n</b>', then activating "Current selection only" and searching for '<b>sm.th</b>', will find cells containing '''Jon Smith''' and '''Smythers, Johnathon'''.
 +
 
 +
 
 
* If a cell contains a hard line break (entered by Cntrl-Enter), this may be found by '<b>\n</b>'. For example if a cell contains '<b>red hard_line_break clay</b>' then searching for '<b>d\nc</b>' and replacing with nothing leaves the cell containing '<b>relay</b>'.
 
* If a cell contains a hard line break (entered by Cntrl-Enter), this may be found by '<b>\n</b>'. For example if a cell contains '<b>red hard_line_break clay</b>' then searching for '<b>d\nc</b>' and replacing with nothing leaves the cell containing '<b>relay</b>'.
 +
 +
 
* The hard line break acts to mark "end of text" as understood by the regular expression special character '<b>$</b>' (in addition of course to the end of text in the cell). For example if a cell contains '<b>red hard_line_break clay</b>' then a search for '<b>d$</b>' replacing with '<b>al</b>' leaves the cell with '<b>real hard_line_break clay</b>'. Note that with this syntax the hard line break is not replaced - it simply marks the end of text.
 
* The hard line break acts to mark "end of text" as understood by the regular expression special character '<b>$</b>' (in addition of course to the end of text in the cell). For example if a cell contains '<b>red hard_line_break clay</b>' then a search for '<b>d$</b>' replacing with '<b>al</b>' leaves the cell with '<b>real hard_line_break clay</b>'. Note that with this syntax the hard line break is not replaced - it simply marks the end of text.
 +
 +
 
* Using '<b>\n</b>' in the 'Replace with' box will replace with the literal characters '<b>\n</b>', not a hard line break.
 
* Using '<b>\n</b>' in the 'Replace with' box will replace with the literal characters '<b>\n</b>', not a hard line break.
 +
 +
 
* The Find & Replace dialog has an option to search '<b>Formulas</b>', '<b>Values</b>', or '<b>Notes</b>'. This applies to any search, not just one using regular expressions. Searching with the '<b>Formulas</b>' option would find '<b>SUM</b>' in a cell containing the formula '<b>=SUM(A1:A6)</b>'. If a cell contains text instead of a function, the text will still be found - so that the simple text '<b>SUMMARY</b>' in a cell would also give a match to '<b>SUM</b>' using the '<b>Formulas</b>' option.
 
* The Find & Replace dialog has an option to search '<b>Formulas</b>', '<b>Values</b>', or '<b>Notes</b>'. This applies to any search, not just one using regular expressions. Searching with the '<b>Formulas</b>' option would find '<b>SUM</b>' in a cell containing the formula '<b>=SUM(A1:A6)</b>'. If a cell contains text instead of a function, the text will still be found - so that the simple text '<b>SUMMARY</b>' in a cell would also give a match to '<b>SUM</b>' using the '<b>Formulas</b>' option.
* Searching for the regular expression '<b>^$</b>' will not find empty cells. This is intentional - the rationale being to avoid performance issues when selecting a huge number of cells. Note that empty cells will not be found even if you are only searching a selection.
+
 
 +
 
 +
* Searching for the regular expression '<b>^$</b>' will not find empty cells. This is intentional - the rationale being to avoid performance issues when selecting a huge number of cells. Note that empty cells will not be found even if you are only searching a selection.  
 +
 
 +
 
 +
* Find '<b>.+</b>' (or similar) and Replace with '<b>&</b>' effectively re-enters the contents of cells. This can be used to 'strip' formatting automatically applied by Calc (often needed to 'clean' data imported from the clipboard or badly formatted files), for example, to convert text strings consisting of digits, into actual numbers (the cells must first be correctly formatted 'number'). The leading apostrophes, telling Calc to treat the numbers as text, are removed.
 +
 
 +
 
 +
{{PDL1}}
 +
 
 +
[[Category:Calc]]
 +
[[Category:Documentation/How Tos/Calc]]

Revision as of 14:58, 18 May 2022


Introduction

In simple terms regular expressions are a clever way to find text.


A typical use for regular expressions is in finding text; for instance to locate all cells containing man or woman in your spreadsheet, you could search using a single regular expression.


Regular expressions in Calc and Writer

Regular expressions are available in Calc as follows:

  • Edit - Find & Replace dialog
  • Data - Filter - Standard filter & Advanced filter
  • Certain functions, such as SUMIF, LOOKUP

The best way to learn about regular expressions in Calc is to start by understanding how to use them in Find & Replace. This is covered by the 'HowTo for Regular Expressions in Writer', which you should read.


In Calc, regular expressions are applied separately to each cell. (You'll see that regular expressions are applied separately to each paragraph in Writer.) So a search for 'r.d' will match 'red' in cell A1 but will not match 'r' in cell A2 with 'd' (or 'ed') in cell A3. (The regular expression 'r.d' means 'try to match 'r' followed by another character followed by 'd' ').

Regular expressions in Calc functions

There are a number of functions in Calc which allow the use of regular expressions:
'SUMIF', 'COUNTIF',
'MATCH', 'SEARCH', 'LOOKUP', 'HLOOKUP', 'VLOOKUP',
'DCOUNT', 'DCOUNTA', 'DSUM', 'DPRODUCT',
'DMAX', 'DMIN', 'DAVERAGE', 'DSTDEV', 'DSTDEVP', 'DVAR', 'DVARP',
'DGET'


Whether or not regular expressions are used is selected on the Tools → Options → OpenOffice Calc → Calculate dialog:


choosing to use regular expressions in Calc functions


For example =COUNTIF(A1:A6;"r.d")' with "Enable regular expressions in formulas" selected will count cells in A1:A6 which contain 'red' and 'ROD'.


Additionally if "Search criteria = and <> must apply to whole cells" is not selected then 'Fred', 'bride', and 'Ridge' will also be counted. If that setting is selected, then it can be overcome by wrapping the expression thus: '=COUNTIF(A1:A6;".*r.d.*")'.


regular expression in COUNTIF function


Regular expression searches within functions are always case insensitive, irrespective of the setting of the "Case sensitive" check box on the dialog above - so 'red' and 'ROD' will always be matched in the above example. This case-insensitivity also applies to the regular expression structures ([:lower:]) and ([:upper:]), which match characters irrespective of case. See issue 71000 for some background.

Regular expressions will not work in simple comparisons. For example: ' A1="r.d" ' will always return FALSE if A1 contains 'red', even if regular expressions are enabled - it will only return TRUE if A1 contains 'r.d' ('r' then a dot then 'd'). If you wish to test using regular expressions, try the 'COUNTIF' function - 'COUNTIF(A1; "r.d")' will return '1' or '0', interpreted as TRUE or FALSE in formulae like '=IF(COUNTIF(A1; "r.d");"hooray"; "boo")'

Activating the "Enable regular expressions in formulas" setting means all the above functions will require any regular expression special characters (such as parentheses) used in strings within formulas, to be "escaped" using a preceding backslash, despite not being part of a regular expression. These backslashes will need removing if the setting is later deactivated.

Regular expressions in Calc Find & Replace

Find & Replace in Calc is very similar to Find & Replace in Writer, as described in the 'HowTo for Regular Expressions in Writer'. The following points are interesting to Calc users:

  • Mind you that the option "Search criteria = and <> must apply to whole cells" has an effect on filtering (with regex it behaves like appending a "$").


  • When a match is found, the whole cell is shown highlighted, but only the text found will be replaced. For example, searching for 'brown' will highlight a cell containing 'redbrown clay', and replacing with nothing will leave the cell containing 'red clay'.


  • If Find is used twice in a row, the second time with "Current selection only" activated, then the second search will evaluate the whole of each selected cell, not just the strings found which caused the cells to be selected in the first search. For example, searching for 'joh?n', then activating "Current selection only" and searching for 'sm.th', will find cells containing Jon Smith and Smythers, Johnathon.


  • If a cell contains a hard line break (entered by Cntrl-Enter), this may be found by '\n'. For example if a cell contains 'red hard_line_break clay' then searching for 'd\nc' and replacing with nothing leaves the cell containing 'relay'.


  • The hard line break acts to mark "end of text" as understood by the regular expression special character '$' (in addition of course to the end of text in the cell). For example if a cell contains 'red hard_line_break clay' then a search for 'd$' replacing with 'al' leaves the cell with 'real hard_line_break clay'. Note that with this syntax the hard line break is not replaced - it simply marks the end of text.


  • Using '\n' in the 'Replace with' box will replace with the literal characters '\n', not a hard line break.


  • The Find & Replace dialog has an option to search 'Formulas', 'Values', or 'Notes'. This applies to any search, not just one using regular expressions. Searching with the 'Formulas' option would find 'SUM' in a cell containing the formula '=SUM(A1:A6)'. If a cell contains text instead of a function, the text will still be found - so that the simple text 'SUMMARY' in a cell would also give a match to 'SUM' using the 'Formulas' option.


  • Searching for the regular expression '^$' will not find empty cells. This is intentional - the rationale being to avoid performance issues when selecting a huge number of cells. Note that empty cells will not be found even if you are only searching a selection.


  • Find '.+' (or similar) and Replace with '&' effectively re-enters the contents of cells. This can be used to 'strip' formatting automatically applied by Calc (often needed to 'clean' data imported from the clipboard or badly formatted files), for example, to convert text strings consisting of digits, into actual numbers (the cells must first be correctly formatted 'number'). The leading apostrophes, telling Calc to treat the numbers as text, are removed.


Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages