Difference between revisions of "User:Hegen/Test"

Revision as of 14:49, 1 November 2009

1 Introduction
2 Einführung
3 Where regular expressions may be used in OOo
4 Wo Reguläre Ausdrücke im OOo genutzt werden
5 A simple example
6 Ein einfaches Beispiel
7 The least you need to know about regular expressions
8 Das Mindeste was sie über Reguläre Ausdrücke wissen müssen
9 How regular expressions are applied in OpenOffice.org
10 Wie Reguläre Ausdrücke in OOo angewendet werden
11 Literal characters
12 Buchstaben/Literale
13 Special characters
14 Sonderzeichen
15 Single character match . ?
16 Einzelzeichensuche mit . oder ?
17 Repeating match + * {m,n}
18 Wiederholtes Finden mit + * {m,n}
19 Positional match ^ $ \< \>
20 Positionssuche ^ $ \< \>
21 Alternative matches | [...]
22 Alternatives Vorkommen | [...]
23 POSIX bracket expressions [:alpha:] [:digit:] etc..
24 Grouping (...) and backreferences \x $x
25 Tabs, newlines, paragraphs \t \n $
26 Hexadecimal codes \xXXXX
27 The 'Replace with' box \t \n & $1 $2
28 Troubleshooting OOo regular expressions
29 Tips and Tricks

Introduction

Einführung

In simple terms, regular expressions are a clever way to find & replace text (similar to 'wildcards'). Regular expressions can be both powerful and complex, and it is easy for inexperienced users to make mistakes. We describe the use of OpenOffice.org regular expressions aiming to be clear enough for the novice, while detailing the aspects that can cause confusion to more experienced users.

In einfachen Worten, Reguläre Ausdrücke (RegEx für Regular Expressions HJS) sind ein intelligenter Weg um Text zu suchen oder zu ersetzen (vergleichbar zu 'Joker'-Zeichen). Reguläre Ausdrücke können sowohl mächtig als auch kompliziert sein. Das macht es für unerfahrene Nutzer leicht fehleranfällig. Wir wollen die Nutzung von OpenOffice.org – Regulären Ausdrücken einerseits so hilfreich darstellen, dass es für einen Neuling klar genug ist und andererseits die Aspekte detailliert darstellen die erfahrenere Nutzer irritieren können.

A typical use for regular expressions is in finding text in a Writer document; for instance to locate all occurrences of man or woman in your document, you could search using a regular expression which would find both words. Eine typische Anwendung für RegEx ist das Finden von Text in einem Writer-Dokument; z.B. um alle Vorkommen von „man“ oder „woman“ in ihrem Dokument zu finden, kann man einen RegEx anwenden der beide Worte findet.

Regular expressions are very common in some areas of computing, and are often known as regex or regexp. Not all regex are the same - so reading the relevant manual is sensible. RegEx werden in verschiedenen Bereichen der Datenverarbeitung häufig angewendet, und werden daher oft als regex oder regexp abgekürzt. Aber nicht alle regex sind gleichartig – daher muss man die entsprechende Anleitung (Manual) genau lesen.

Where regular expressions may be used in OOo

Wo Reguläre Ausdrücke im OOo genutzt werden

In Writer: Im Writer-Modul:

Edit - Find & Replace dialog
Bearbeiten – Suchen & Ersetzen

Edit - Changes - Accept/reject command (Filter tab)
Bearbeiten – Änderungen – Akzeptieren oder Verwerfen (Tabulator Filter)

In Calc: Im Calc-Modul:

Edit - Find & Replace dialog
Bearbeiten – Suchen & Ersetzen

Data - Filter - Standard filter & Advanced filter
Daten – Filter – Standardfilter & Spezialfilter

Certain functions, such as SUMIF, LOOKUP
in einigen Funktionen, y.B. SUMWENN (sumif), SVERWEIS (lookup)

In Base:

Find Record command
Suche Datensatz

The dialogs that appear when you use the above commands generally have an option to use regular expressions (which is off by default). For example

Die Dialoge beim Aufruf dieser Funktionen haben immer eine Option RegEx zu nutzen (ist aber als Standardwert ausgeschaltet). Zu Beispiel:

Position des Auwahlschalters für Reguläre Ausdrücke

You should check the status of the regular expression option each time you bring up the dialog, as it defaults to 'off'. Man sollte den Status dieser Option jedes mal beim Aufruf des Dialogs kontrollieren, da er standardmäßig ausgeschaltet ist.

A simple example

Ein einfaches Beispiel

If you have little or no experience of regular expressions, you may find it easiest to study them in Writer rather than say Calc. Wenn man wenig oder gar keiner Erfahrung von RegEx hat, wird man es am einfachsten finden sie im Writer-Modul zu erproben anstelle sagen wir im Calc-Modul.

In Writer, bring up the Find and Replace dialog from the Edit menu. Im Writer rufen wir den „Suchen & Ersetzen“ Dialog aus dem Bearbeiten Menue auf.

On the dialog, choose More Options and tick the Regular Expressions box Im Dialog, wählen wir „Mehr Optionen“ und klicken die „Regulären Ausdrücke“-Box an.

In the Search box enter r.d - the dot here means 'any single character'. In die Suchbox tragen wir „r.d“ ein – der Punkt bedeutet hier 'jedes einzelne Zeichen'.

Clicking the Find All button will now find all the places where an r is followed by another character followed by a d, for instance 'red' or 'hotrod' or 'bride' or 'your dog' (this last example is r followed by a space followed by d - the space is a character). Wenn man jetzt den Suche alles Knopf betätigt, werden alle Stellen an denen ein r gefolgt von einem anderen Zeichen, gefolgt von einem d auftaucht gefunden, z.B. red oder 'hotrod' oder 'bride' oder 'your dog' (im letzten Beispiel wird r gefolgt von einem Leerzeichen (space) gefolgt von einem d gefunden – der Abstand (Leerzeichen) ist ein Zeichen).

If you type xxx into the Replace with box, and click the Replace All button, these become 'xxx', 'hotxxx', 'bxxxe', 'youxxxog' Gibt man xxx in die Ersetzen mit Box ein, und betätigt man den Ersetzen alles Knopf, so bekommt man 'xxx', 'hotxxx', 'bxxxe', 'youxxxog' .

That may not be very useful, but it shows the principle. We'll continue to use the Find and Replace dialog to explain in more detail. Das mag nicht sehr sinnvoll erscheinen, aber es demonstriert das Prinzip. Wir werden den Suchen und Ersetzen Dialog im Weiteren genauer erklären.

The least you need to know about regular expressions

Das Mindeste was sie über Reguläre Ausdrücke wissen müssen

If you don't want to find out exactly how regular expressions work, but just want to get a job done, you might find these common examples useful. Enter them in the 'Search for' box, and make sure that regular expressions are selected. Wenn Sie nicht genau wissen wollen wie RegEx normalerweise arbeiten, aber ihre Aufgabe erfüllt heben wollen, werden sie diese allgemeinen Beispiele nützlich finden. Geben sie sie in das 'Suche nach'-Feld ein, und sorgen sie dafür das die RexEx-Option eingeschaltet ist.

color|colour finds color and colour
sep.rate finds sep then any character then rate - eg separate, seperate, and indeed sepXrate
sep[ae]rate finds separate and seperate - [ae] means either an a or an e
changed? finds change and changed - the d is optional because it is followed by a question mark
s\> finds the s at the end of a word
\<. finds the first letter of a word.
^. finds the first letter of a paragraph.
^$ finds an empty paragraph

color|colourfindet color und colour (beide Wörter richtig)
sep.rate findet sep dann ein beliebiges Zeichen rate - so separate, seperate, und wirklich auch sepXrate
sep[ae]rate findet separate und seperate - [ae] bedeutet entweder ein a oder ein e
changed? findet change und changed - das d ist optional weil es von einem Fragezeichen gefolgt wird
s\> findet das s am Ende des Wortes
\<. findet den ersten Buchstaben eines Wortes.
^. findet den ersten Buchstaben eines Absatzes
^$ findet einen leeren Absatz

How regular expressions are applied in OpenOffice.org

Wie Reguläre Ausdrücke in OOo angewendet werden

OpenOffice.org regular expressions appear to divide the text to be searched into portions and examine each portion separately. OOo RegEx scheinen den zu durchsuchenden Text in Portionen aufzuteilen und jede einzelne Portion separat zu Untersuchen.

In Writer, text appears to be divided into paragraphs. For example x.*z will not match x at the end of a paragraph with z beginning the next paragraph ( x.*z means x then any or no characters then z). Paragraphs seem to be treated separately (although we discuss some special cases at the end of this HowTo). Im Writer-Modul tritt Text in Absätze geteilt auf. Z.B. x.*z entspricht nicht einem x am Ende eines Absatzes dessen nächster Absatz mit einem z beginnt (x.*z bedeutet ein x dann irgend ein oder gar kein Zeichen dann z).Absätze werden also seperat untersucht (obwohl wir am Ende dieser Handreichung (HowTo) einige spezielle Fälle diskutieren werden).

the scope of regular expressions

In addition Writer considers each table cell and each text frame separately. Text frames are examined after all the other text / table cells on all pages have been examined. Zusätzlich betrachtet Writer jede Tabellenzelle und jedes Frame separat. Text Frames (Textrahmen) werden nach allen anderen Text oder Tabellenzellen untersucht.

In the Find & Replace dialog, regular expressions may be used in the Search for box. In general they may not be used in the Replace with box. The exceptions are discussed later. Im Suchen & Ersetzen Dialog werden RegEx im Suchen nach Feld eingesetzt. Normalerweise werden sie nicht im Ersetzen Feld verwendet.. Die Ausnahmen werden später dargestellt.

Literal characters

Buchstaben/Literale

If your regular expression contains characters other than the so-called 'special characters' . ^ $ * + ? \ [ ( { | then those characters are matched literally. Wenn im RegEx Zeichen enthalten sind die keine sogenannten 'Sonderzeichen' . ^ $ * + ? \ [ ( { | sind, so werden sie als Buchstaben (bzw. Literale) bezeichnet.

For example: red matches red redraw and Freddie. Zum Beispiel: red entspricht red redraw und Freddie.

OpenOffice.org allows you to choose whether you care if a character is 'UPPER CASE' or 'lower case'. If you tick the box to 'match case' on the Find and Replace dialog, then red will not match Red or FRED; if you un-tick that box then the case is ignored and both will be matched. OOo erlaubt es zwischen der Unterscheidung von GROßBUCHSTABEN (UPPER CASE) und kleinbuchstaben (lower case) oder ihrem Ignorieren zu wählen. Wenn man die Box 'Groß-/Kleinschreibung' im Suchen & Ersetzen-Dialog anwählt, entspricht red nicht RED oder F'RED; anderfalls wird die Schreibweise ignoriert und beide werden gefunden.

Special characters

Sonderzeichen

The special characters are . ^ $ * + ? \ [ ( { | Die Sonderzeichen sind . ^ $ * + ? \ [ ( { |

They have special meanings in a regular expression, as we're about to describe. Sie haben besondere Bedeutungen in RegEx, die wir beschreiben werden.

If you wish to match one of these characters literally, place a backslash '\' before it. Braucht man eines dieser Zeichen in seiner eigenen Zeichenfunktion, wird ein Backslash '\' (umgedrehter Schrägstrich) vor das Zeichen gesetzt.

For example: to match $100 use \$100 - the \$ is taken to mean $ . Zum Beispiel: für die Suche nach $100 nutzt man \$100 – das \$ nimmt man für $ .

Single character match . ?

Einzelzeichensuche mit . oder ?

The dot '.' special character stands for any single character (except newline). Das Punkt .' Sonderzeichen steht für jedes einzelne Zeichen (außer Zeilenvorschub).

For example: r.d matches 'red' and 'hotrod' and 'bride' and 'your dog' Zum Beispiel: r.d entspricht 'red' und 'hotrod' und 'bride' und 'your dog'

The question mark '?' special character means 'match zero or one of the preceding character' - or 'match the preceding character if it is found'. Das Fragezeichen '?' steht für 'keines oder genau dem Zeichen dem es folgt' bzw. 'zeige das vorangegangene Zeichen wenn es vorhanden ist'.

For example: rea?d matches 'red' and 'read' - 'a?' means 'match a single a if there is one'. Zum Beispiel: rea?d entspricht 'red' und 'read' - 'a?' heißt 'etspreche einem einzelnen Zeichen a soweit eins vorhanden ist'.

Special characters can be used in combination with each other. A dot followed by a question mark means 'match zero or one of any single chacter'. Sonderzeichen können in Kombination miteinander angewendet werden. Ein Punkt mit einem folgenden Fragezeichen heißt 'Vorkommen eines einzelnen Zeichens soweit eins da ist'.

For example: star.?ing matches 'staring', 'starring', 'starting', and 'starling', but not 'startling' Zum Beispiel: star.?ing entspricht 'staring', 'starring', 'starting', und 'starling', aber nicht 'startling'

Repeating match + * {m,n}

Wiederholtes Finden mit + * {m,n}

The plus '+' special character means 'match one or more of the preceding character'. Das Sonderzeichen Plus '+' bedeutet 'Vorkommen eines oder mehrerer der vorangestellten Zeichens'.

For example: re+d matches 'red' and 'reed' and 'reeeeed' - e+ means match one or more e's. Zum Beispiel: re+d entspricht 'red' und 'reed' und 'reeeeed' - e+ heißt ein oder mehrere e's.

The star '*' special character means 'match zero or more of the preceding character'. Das Sonderzeichen Stern '*' heißt 'entspreche keinem oder mehreren des vorangegangenen Zeichens'.

For example: rea*d matches 'red' and 'read' and 'reaaaaaaad' - 'a*' means match zero or more a's . Zum Beispiel: rea*d entspricht 'red' und 'read' und 'reaaaaaaad' - 'a*' bedeutet kein oder mehrmaliges Vorkommen von a's .

A common use for '*' is after the dot character - ie '.*' which means 'any or no characters'. Eine übliche verwendung für '*' ist nach dem Punkt-Zeichen – d.h. '.*' bedeutet 'jedes oder kein Zeichen'.

For example: rea.*d matches 'read' and 'reaXd' and 'reaYYYYd' but not - 'red' or 'reXd' Zum Beispiel: rea.*d entspricht 'read' und 'reaXd' und 'reaYYYYd' aber nicht - 'red' oder 'reXd'

Use the star '*' with caution; it will grab everything it can: Benutze den Stern '*' mit Vorsicht, es findet (und verwendet) alles was möglich ist.

For example: 'r.*d' matches 'red' but in Writer if your paragraph is actually 'The referee showed him the red card again' the match found is 'referee showed him the red card' - that is, the first 'r' and the last possible 'd'. Regular expressions are greedy by nature. Zum Beispiel: 'r.*d' findet 'red' aber wenn der Absatz im Writer 'The referee showed him the red card again' umfasst, ist das Gefundene 'referee showed him the red card' - d.h. Das erste 'r' und das letzt mögliche 'd'. Reguläre Ausdrücke sind ihrer Natur nach gierig.

--- S.6

You may specify how many times you wish the match to be repeated, with curly brackets { }. For example a{1,4}rgh! will match argh!, aargh!, aaargh! and aaaargh! - in other words between 1 and 4 a's then rgh!. Man kann mit geschweiften Klammern { } angeben wie oft der Treffer wiederholt werden soll. Z.B. a{1,4}rgh! wird argh!, aargh!, aaargh! und aaaargh! finden – mit anderen Worten gesagt zwischen 1 und 4 a's gefolgt von rgh!

Also note that a{3}rgh! will match precisely 3 a's, ie aaargh!, and a{2,}rgh! (with a comma) will match at least 2 a's, for example aargh! and aaaaaaaargh!. Beachte auch das a{3}rgh! genau 3 a's finden, d.h. aaargh!, und a{2,}rgh! (mit einem Komma) entsprechen mindestens 2 a's, z.B- aargh! und aaaaaaaargh!.

Positional match ^ $ \< \>

Positionssuche ^ $ \< \>

The circumflex '^' special character means 'match at the beginning of the text'. Das Zirkumflex (oder auch Dach-)Sonderzeichen '^' bedeutet 'Finden am Textanfang'.

The dollar '$' special character means 'match at the end of the text'. Das Dollar-Zeichen '$'bedeutet 'Finden am Ende des Textes'.

Remember that OpenOffice.org regular expressions divide up the text to be searched - each paragraph in Writer is examined separately. Man erinnere sich, dass OOo RegEx den untersuchten Text unterteilt – jeder Absatz wird gesonders betrachtet.

For example: ^red matches 'red' at the start of a paragraph (red night shepherd's delight). „Z.B.:“ ^red entspricht 'red' am Anfang des Absatzes (red night shepherd's delight).

For example: red$ matches 'red' at the end of a paragraph (he felt himself go red) „Z.B.:“ red$ entspricht 'red' am Ende des Absatzes (he felt himself go red).

For example: ^red$ matches inside a table cell that contains just 'red' „Z.B.:“^red$ entspricht dem Inhalt einer Zelle die nur 'red' enthält.

In addition a hard line break (entered by Shift-Enter) is considered the beginning / end of text, and will allow a ^ or $ match. Zusätzlich dazu wird ein Harter Zeilenwechsel (durch Umschalt-Enter eingegeben) als Anfang oder Ende eines Textes angesehen und durch ^ oder $ gefunden.

The backslash '\' special character gives special meaning to the character pairs '\<' and '\>', namely 'match at the beginning of a word', and 'match at the end of a word' Das Backslash-Zeichen '\' hat in Kombinationen von '\<' und '\>' besondere Funktionen, insbesondere 'Finde am Wortanfang' und 'Finde am Wortende'.

For example: \<red matches red at the beginning of a word (she went redder than he did). „Z.B.:“\<red entspricht red am Wortanfang (she went redder than he did).

For example: red\> matches red at the end of a word (although neither of them cared much.) „Z.B.:“red\> entspricht red am Wortende (although neither of them cared much.)

The test used to define the beginning/end of a word seems to be that the previous/next character is a space, underscore (_), tab, newline, paragraph mark or any non-alphanumeric character. Das Charakteristikum um den Worbeginn/Wortende zu bestimmen ist das das voangegangene bzw. folgende Zeichen ein Leerzeichen, ein Unterstreichungsstrich (_), ein Tabulator, ein Zeilenwechsel, ein Absatz oder irgendein nicht-alphanumerisches Zeichen ist.

For example: \<red matches 'person@rediton.com' „Z.B.:“\<red entspricht 'person@rediton.com'

For example: red\> matches 'I said, "No-one dared" ' „Z.B.:“red\> entspricht 'I said, "No-one dared" '

Alternative matches | [...]

Alternatives Vorkommen | [...]

The pipe character '|' is a special character which allows the expression either side of the '|' to match. Das Pipe-Zeichen (senkrechter Strich) '|' ist ein Sonderzeichen das den Ausdruck auf der einen oder anderen Seite vom '|' als Treffer erkennt.

For example: red|blue matches 'red' and 'blue' „Z.B.:“red|blue entspricht 'red' und 'blue'

Unfortunately, certain expressions when used after a pipe are not evaluated. This is so far known to affect ^ and backreferences, and is the subject of issue 46165 Leider werden manche Ausdrücke die nach einer Pipe stehen nicht ausgewertet. Das betrifft soweit bekannt ^ und rückwirkende Bezüge. Das ist auch der Inhalt der Fehlermeldung issue 46165

For example: ^red|blue matches paragraphs beginning with 'red' and any occurrence of 'blue', but blue|^red incorrectly matches only any occurrence of 'blue', failing to match paragraphs beginning with 'red' „Z.B.:“^red|blue entspricht Absätzen beginnend mit 'red' und jedes Vorkommen von 'blue', aber blue|^red entspricht unkorrekterweise nur jedem Vorkommen von 'blue', und findet Absätze die mit 'red' beginnen nicht.

The open square brackets character [ is a special character. Characters enclosed in square brackets are treated as alternatives - any one of them may match. You can also include ranges of characters, such as a-z or 0-9, rather than typing in abcdefghijklmnopqrstuvwxyz or 0123456789 Die eckigen Klammern [ sind besondere Sonderzeichen. Zeichen die durch Eckige Klammern eingeschlossen werden werden als Alternativen bewertet – jedes von ihnen erfüllt die Suche.Man kann sowohl Reihen von Zeichen wie a-z und 0-9 oder jedes einzelne Zeichen wie abcdefghijklmnopqrstuvwxyz oder 0123456789 eingeben.

For example: r[eo]d matches 'red' and 'rod' but not 'rid' „Z.B.:“r[eo]d entspricht 'red' und 'rod' aber nicht 'rid'

For example: [m-p]ut matches 'mut' and 'nut' and 'out' and 'put' „Z.B.:“[m-p]ut entspricht 'mut' und 'nut' und 'out' und 'put'

For example: [hm-p]ut matches 'hut' and 'mut' and 'nut' and 'out' and 'put' „Z.B.:“[hm-p]ut entspricht 'hut' und 'mut' und 'nut' und 'out' und 'put'

Special characters within alternative match square brackets do not have the same special meanings. The only characters which do have special meanings are ], -, ^ and \, and the meanings are: Sonderzeichen in eckige Klammern eingeschlossen haben nicht die entsprechenden abweichenden Bedeutungen. Die einzigen Zeichen mit besonderer Bedeutung sind ], -, ^ and \. Deren Bedeutung ist dann:

] - a closing square bracket ends the alternative match set [abcdef]
] - die schließende eckige Klammer beendet die alternative Zeichenmenge [abcdef]

^ - if the caret is the first character in the square brackets, it negates the search. For example [^a-dxyz] matches any character except abcdxyz.
^ - Ist das Karet (Dach) das erste Zeichen zwischen den eckigen Klammern negiert es die Suche. „Z.B.:“ [^a-dxyz] entspricht allen Zeichen außer abcdxyz.

\ - the backslash is used to allow ], -, ^ and \ to be used literally in square brackets, and to allow hexadecimal codes.
\ - der Backslash erlaubt den Sonderzeichen ], -, ^ and \ in ihrer eigentlichen Form genutzt zu werden, oder hexadezimale Codes einzugeben.

For example, \] stands for a literal closing square bracket, so [[\]a] will match an opening square bracket [, a closing square bracket ] or an a. \\ stands for a literal backslash. \x0009 stands for a tab character. „Z.B.:“\] steht für eine Zeichenkette mit einer schließenden eckigen Klammer, wie [[\]a] für eine öffnende eckige Klammer [, eine schließende eckige Klammer ] oder ein a steht. \x0009 steht für ein Tabulator-Zeichen.

Just to re-emphasise: these are the meanings of these characters inside square brackets, and any other characters are treated literally. For example [\t ] will match a 't' or a space - not a tab or a space. Use [\x0009 ] to match a tab or a space. Noch mal zur Rekapitulation: das sind die Bedeutungen dieser Zeichen innerhalb von eckigen Klammern, viele andere Zeichen werden direkt gefunden. „Z.B.:“[\t ] entspricht 't' oder einem Leerzeichen - nicht einem Tabulator oder einem Leerzeichen. Für einen Tabulator oder Leerzeichen nutzt man [\x0009 ].

--- S.7

POSIX bracket expressions [:alpha:] [:digit:] etc..

There is much confusion in the OpenOffice.org community about these. The Help itself is also far from clear.

There are a number of 'POSIX bracket expressions' (sometimes called 'POSIX character classes') available in OpenOffice.org regular expressions, of the form [:classname:] which allow a match with any of the characters in that class. For instance [:digit:] stands for any of the digits 0123456789.

These (by definition) may only appear inside the square brackets of an alternative match - so a valid syntax would be [abc[:digit:]], which should match a, b, c, or any digit 0-9. A correct syntax to match just any one digit would be [[:digit:]].

Unfortunately this does not work as it should! The correct syntax does not work at all, but currently an incorrect syntax ([:digit:]) will actually match a digit, as long as it is outside the square brackets of an alternative match. (Obviously this is unsatisfactory, and is the subject of issue 64368).

The POSIX bracket expressions available are listed below. Note that the exact definition of each depends on locale - for example in a different language other characters may be considered 'alphabetic letters' in [:alpha:]. The meanings given here apply generally to English-speaking locales (and do not take into account any Unicode issues).

[:digit:]: stands for any of the digits 0123456789. This is equivalent to 0-9.

[:space:]: should stand for any whitespace character, including tab; however as currently implemented it stands simply for a space character. Note that the Help is currently misleading here. (This is the subject of issue 41706).

[:print:]: should stand for any printable character; however as currently implemented it does not match the single quote nor the double quote characters ‘ ’ “ ” (and some others such as « »). It matches space, but does not match tab (this latter is expected/defined behaviour). (This is the subject of issue 83290).

[:cntrl:]: stands for a control character. As far as a user is concerned, OpenOffice.org documents have very few control characters; tab and hard_line_break are both matched, but paragraph_mark is not.

[:alpha:]: stands for a letter (including a letter with an accent). For example in the phrase (often used in English, and here given with accents as in the original language) 'déjà vu' all 6 letters will match.

[:alnum:]: stands for a character that satisfies either [:alpha:] or [:digit:]

[:lower:]: stands for a lowercase letter (including a letter with an accent). The case matching does not work unless the Match case box is ticked; if this box is not ticked this expression is equivalent to [:alpha:].

[:upper:]: stands for an uppercase letter (including a letter with an accent). The case matching does not work unless the Match case box is ticked; if this box is not ticked this expression is equivalent to [:alpha:].

There seems to be little consistency in any implementation of POSIX bracket expressions (OOo or elsewhere). One approach is simply to use straightforward character classes - so instead of [[:digit:]] you use [0-9] for example.

Grouping (...) and backreferences \x $x

Round brackets ( ) may be used to group terms.

For example: red(den)? will find 'red' and 'redden'; here (den)? means 'one or zero of den'.

For example: (blue|black)bird will find both 'bluebird' and 'blackbird'.

Each group enclosed in round brackets is also defined as a reference, and can be referred to later in the same expression using a 'backreference'. In the 'Search for' box, backreferences are written '\1', '\2', etc.; in the 'Replace with' box they are written '$1', '$2', etc.

'\1' or '$1' stands for 'whatever matched in the first round brackets'; '\2' or '$2' stands for 'whatever matched in the second round brackets'; and so on.

For example: (blue|black) \1bird in the 'Search for' box will find both 'blue bluebird' and 'black blackbird', because '\1' stands for either blue or black, whichever we found. Therefore 'black bluebird' does not match.

Backreferences in the 'Replace with' box only work from OOo2.4 onwards. The use of $1 rather than \1 is consistent with perl syntax, and more particularly with the ICU regex engine, which may at some time replace the existing OOo regex engine, thus resolving many issues.

For example: (gr..n)(blu.) in the 'Search for' box will find 'greenblue'; if the 'Replace with' box has $2$1 the replacement will be 'bluegreen'.

When regular expressions are selected, to replace text with the literal character '$' you must now use '\$'; similarly for '\' use '\\'.

For example: (1..) in the 'Search for' box and \$$1 in the 'Replace with' box replaces '100' with '$100', and '150' with '$150'.

$0 in the 'Replace with' box replaces with the entire text found.

Tabs, newlines, paragraphs \t \n $

The character pair '\t' has special meaning - it stands for a tab character.

For example: \tred will match a tab character followed by the word 'red'.

In Writer a newline may be entered by pressing Shift-Enter. A newline character is thereby inserted into the text, and the following text starts on a new line. This is not the same as a new paragraph; click View-Non printing characters to see the difference.

The OOo regular expression behaviour when matching paragraph marks and newline characters is 'unusual'. This is partly because regular expressions in other software usually deal with ordinary plain text, whereas OOo regular expressions divide the text at paragraph marks. For whatever reason, this is what you can do:

\n will match a newline (Shift-Enter) if it is entered in the Search box. In this context it is simply treated like a character, and can be replaced by say a space, or nothing. The regular expression red\n will match red followed by a newline character - and if replaced simply by say blue the newline will also be replaced. The regular expression red$ will match 'red' when it is followed by a newline. In this case, replacing with 'blue' will only replace 'red' - and will leave the newline intact.
red\ngreen will match 'red' followed by a newline followed by 'green'; replacing with say 'brown' will remove the newline. However neither red.green nor red.*green will match here - the dot . does not match newline.
$ on its own will match a paragraph mark - and can be replaced by say a 'space', or indeed nothing, in order to merge two paragraphs together. Note that red$ will match 'red' at the end of a paragraph, and if you replace it with say a space, you simply get a space where 'red' was - and the paragraphs are unaffected - the paragraph mark is not replaced. It may help to regard $ on its own as a special syntax, unique to OOo.
^$ will match an empty paragraph, which can be replaced by say nothing, in order to remove the empty paragraph. Note that ^red$ matches a paragraph with only 'red' in it - replacing this with nothing leaves an empty paragraph - the paragraph marks at either end are not replaced. It may help to regard ^$ on its own as a special syntax, unique to OOo. Unfortunately, because OOo has taken over this syntax, it seems you cannot use ^$ to find empty cells in a table (nor empty Calc cells).
If you wish to replace every newline with a paragraph mark, firstly you will search for \n with Find All to select the newlines. Then in the Replace box you enter \n, which in the Replace box stands for a paragraph mark; then choose Replace. This is somewhat bizarre, but at least now you know. Note that \r is interpreted as a literal 'r', not a carriage return.

To replace paragraph marks - as used to give lines a certain length in some html documents, for instance - with "normal" automatically wrapped lines and paragraphs, the following 3 steps should help.

1. So as not to lose "normal" paragraph marks at the end of "normal" paragraphs, replace two consecutive paragraph marks using a sequence of characters not occurring anywhere else in the text, like "*****" to replace an empty paragraph - this makes it easy to find and reinstate later. You do this by putting ^$ in the Find box and "*****" in the Replace box. (If you're only dealing with a limited chunk of text, don't forget to check "current selection only" under "more options" in the Find and Replace box.)

2. Search for the remaining line-end paragraph marks by putting $ in the Find box. To replace the mark with a "space" just type a space in the Replace dialogue.

3. Now that the text is ready for normal line-wrapping, put back the "normal" paragraph marks by typing "*****" in the Find box and \n in the Replace box. (Remember to check "current selection only" where appropriate!)

Before you try this, create a test document to practise on.

This is a good sequence to make into a macro.

It also helps deal indirectly with line-break problems.

Hexadecimal codes \xXXXX

The character sequence ' \x then a 4 digit hexadecimal number ' stands for the character with that code.

For example: \x002A stands for the star character '*'.

Hexadecimal codes can be seen on the 'Insert-Special Character' dialog.

The 'Replace with' box \t \n & $1 $2

Users are sometimes confused with what can be done using the 'Replace with' box in a Find & Replace dialog.

In general, regular expressions do not work in the 'Replace with' box. The characters you type replace the found text literally.

The four constructs that do work are:

\t inserts a tab, replacing the text found.
\n inserts a paragraph mark, replacing the text found. This may be unexpected, because \n in the 'Search for' box means 'newline'! In some operating systems it is possible to use unicode input to directly type a newline character (U+000A) in the 'Replace with' box, providing a workaround, but this is not universal.
$1, $2, etc are backreferences, which (from OOo2.4) insert text groups found. See under Grouping and backreferences. $0 inserts the entire text found.
& also inserts the entire text found.

For example if you searched for bird|berry, you would would find either 'bird' or 'berry'; now to replace with black& would give you either 'blackbird' or 'blackberry'.

Troubleshooting OOo regular expressions

If you are new to regular expressions, please realise that they can be tricky - if you are not getting the results you expect, you might need to check that you understand well enough. Try to keep regular expressions as simple and unambitious as possible.

Here are some further points of interest with OOo regular expressions:

If you find an unexpected behaviour, please check in the relevant section in this HowTo - many of the behaviour issues have been documented here.
Regular expressions are 'greedy' - that is they will match as much text as they can. Consider using curly and square brackets; for example [^ ]{1,5}\> matches 1 to 5 non-space characters at the end of a word.
Please be careful when using the Replace All button. There are a few rare occasions when this will give unexpected results. For example to remove the first character of every paragraph you might 'Search for' ^. and 'Replace with' nothing; clicking 'Replace All' now will wipe out *all* your text, instead of just the first character of each paragraph. Issue 82473 discusses this. The workaround is to 'Find All', then 'Replace'; perhaps the safest way is not to use the 'Replace All' button at all with regular expressions.

Tips and Tricks

Here are some examples that may be useful:

\<([^ ]+)[ ]+\1

finds duplicate words separated by spaces (note that there is a space before each ])

\<[:alpha:]*\>

finds any word in the whole document (notice:the check box regular expression must by checkt)

\<[1-9][0-9]*\>

finds decimal numbers

\<0[0-7]*\>

finds octal (base 8) numbers

\<0x[A-Fa-f0-9]+\>

finds hexadecimal (base 16) numbers

[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-z]{2,6}

finds most email addresses (there is no perfect regular expression - this is a practical solution)

Template:Documentation/SeeAlso

Category:Documentation/Reference]] Category:Writer]] Category:Documentation/How Tos/Writer]]

@@ Line 1: / Line 1: @@
+[[en:Documentation/How Tos/Regular Expressions in Writer]]
 [[fr:Documentation/FR/Expressions_Regulieres_dans_Writer]]
 [[nl:Documentation/nl/How_Tos/Reguliere_expressies_in_Writer]]