Difference between revisions of "Talk:Documentation/How Tos/Regular Expressions in Writer"

From Apache OpenOffice Wiki
Jump to: navigation, search
m (ICU: reference)
m (External Links: Translation links & more on email REs)
Line 187: Line 187:
 
--[[User:Drking|Drking]] 06:30, 02 February 2008 (GMT)
 
--[[User:Drking|Drking]] 06:30, 02 February 2008 (GMT)
  
The email regex doesn't match 'the-dog@pets.com' because the '-' in [] was un-escaped. I also can't find a reason to include \< and \> in OOo. I've added my version to our tips and tricks list now.  
+
 
 +
The email regex doesn't match 'the-dog@pets.com' because the '-' in [] was un-escaped. I also can't find a reason to include \< and \> in OOo. I've added my version to our tips and tricks list now.
  
 
--[[User:Drking|Drking]] 06:30, 07 February 2008 (GMT)
 
--[[User:Drking|Drking]] 06:30, 07 February 2008 (GMT)
 +
 +
 +
Flepennu has tried to add a link to French translations (so far unfinished) at the end of both the Writer & Calc How To. The links are not formed correctly and so do not appear on the article. Being as they are not external, should we rename the section 'Links' instead of 'External Links', or should the French translations be linked to elsewhere in the article?
 +
 +
On the subject of email addresses:
 +
http://www.regular-expressions.info/email.html
 +
 +
--[[User:Hgreenhough|Hgreenhough]] 10:43, 19 February 2008 (CET)
  
 
== OOo 3.0 Help links ==
 
== OOo 3.0 Help links ==

Revision as of 09:43, 19 February 2008

References

Some of the matters arising with regex can be found in the OOo archives:

bugs:

http://www.oooforum.org/forum/viewtopic.phtml?t=39589

http://www.oooforum.org/forum/viewtopic.phtml?t=64265&highlight=regular

http://www.oooforum.org/forum/viewtopic.phtml?t=61200&highlight=regular


discussion:

http://www.oooforum.org/forum/viewtopic.phtml?t=8665&highlight=regular

http://www.oooforum.org/forum/viewtopic.phtml?t=61857&highlight=regular


issue:

http://qa.openoffice.org/issues/show_bug.cgi?id=15666 (now RESOLVED FIXED)


Examples

This is probably too arcane, but here's a discussion on the black art of finding octal, decimal, & hexadecimal numbers in Writer http://www.oooforum.org/forum/viewtopic.phtml?t=66319

Octal   \<0[0-7]*\>
Decimal \<[1-9][0-9]*\>
Hex     \<0x[A-Fa-f0-9]+\>


Workarounds

In at least some versions of Linux it is possible to use unicode input to directly type a newline (line feed / soft line break/ U+000A) in the "Replace with" input box. There are no reports so far of this working on any other OS.

http://user.services.openoffice.org/en/forum/viewtopic.php?p=2842#p2842

This means that for some people it is possible to insert a newline using Find & Replace.

http://en.wikipedia.org/wiki/Unicode#Input_methods

http://www.fileformat.info/tip/microsoft/enter_unicode.htm

http://www.fileformat.info/info/unicode/char/000a/


Versioning of regex howto in future

See my comment under the same heading in Talk:Documentation/How Tos/Regular Expressions in Calc --Hgreenhough 12:05, 13 December 2007 (CET)

Thanks - I responded there --Drking 10:00, 18 January 2008 (GMT)

Backreferences

http://www.openoffice.org/issues/show_bug.cgi?id=15666#desc100 describes a glitch in the new backreference feature. It does not seem to have been reported as a seperate issue, so may not get picked up. One to test for!

Capitalize words beginning with h:
s/\<h([a-z]+)/
r/H$1/
Match case = Yes

Starting text:   He heard quiet steps behind him.
Expected result: He Heard quiet steps behind Him.
Actual result:   He H$1 quiet steps behind H$1


http://qa.openoffice.org/issues/show_bug.cgi?id=84922 describes a situation where backreferences do not work in find, although I can't follow it myself.

>>>>>>

I've commented in issue 84922 - I don't think this is a bug in OOo

--Drking 7:00, 26 January 2008 (GMT)


External Links

Someone (Andrewz) has added a couple of external links. Absolutely great that people are getting involved, but I'm not sure that external links are a good idea - thoughts welcome...


The application Help is planned to be wiki based in the future, so that people like us can contribute easily. This page links from the Calc function Help (in preparation) and probably will link from Writer Help in the long term. It may or may not be included in Help, depending on space. I hope at the very least that it will be brought up in a browser on clicking in the Help.


As it stands clicking on the external links in this page takes the user away from the Help system and the Wiki altogether - the only way back is via the back button. That's a bad thing.


Another thing is that in my view the purpose of this HowTo is to give the user the information - not to present him with stuff within which information can be found. So if the info on the linked pages is useful it should be in the HowTo.


And a third thing is that there are plenty of external pages that could be linked to - everyone has their own favourite - but this shouldn't be a directory for them.


Interested to know any other views....

--Drking 05:00, 23 January 2008 (GMT)


Agreed. 'Concise, precise, complete', is best. I think the links would be better here in Talk, under a heading such as 'External Links', or 'Further Reading', so they could be used for research by editors (and anyone else interested).

--Hgreenhough 11:03, 23 January 2008 (CET)


Thanks - can't tackle it right now, but doutless one of us will...

--Drking 8:45, 24 January 2008 (GMT)


I've changed my mind. How about an 'External Links' section at the end of the article, like on Wikipedia? That way they're easy to find (very few visitors will ever look at the discussion), all together (for easier maintenance), but clearly seperate.

--Hgreenhough 13:52, 25 January 2008 (CET)


Still thinking about it - not quite convinced - at the moment our content is (I hope) pretty definitive. External pages might not be (for instance although the Andrewz links are pretty good, there are some things that I'd take issue with, like the e-address regex needs case-insensitive). Perhaps if our 'External Links' section made it clear they were simply additional pages, and took you away from the wiki?

I cannot find a way to open an external link in a new browser - which really ought to be possible. Any ideas?

--Drking 7:00, 25 January 2008 (GMT)


Is OpenOffice.org not as much a community as it is a product? While a link dump is bad, a moderated selections of links may benefit the end user. Wikipedia, which has an offline edition, includes a few links to external sites. Also, for what it's worth, even Microsoft Office's built-in help includes links.

Taking the argument further and generalizing it to the whole Documentation section of the wiki: if you remove all links, what else would you remove now or prevent from being included in the future?

Drking: about the email address regex. OOo does case insensitive matching by default. You suggest I change a-z to a-zA-Z?

--Andrewz 01:43, 28 January 2008 (CET)


Hi Andrewz

I intended to contact you directly to alert you to this thread - my apologies.

> Is OpenOffice.org not as much a community as it is a product?

I think it aims to be a product, supported by the community. I don't think it exists to create and nurture a community.

> what else would you remove now or prevent from being included in the future?

That allows quite a good illustration: the unstructured community approach has allowed either 4 or 5 Calc FAQs to be written; you can reach them all via different routes from the Doc front page. None of them are complete. I think all of them are out of date. A couple of them as I recall are un-indexed. A real mess. More is not better. But of course a lot of worthy people poured effort into writing them, and they did it because they enjoyed contributing.

My view is that good documentation has to be absolutely focused on the user - not the person writing the documentation as it often has been. It is a pernickety business; I go back through my stuff and reduce the word count / clarify if I possibly can, always reading it as a user. (You can tell by the way I write here that this does not come easily;) ).

'Concise, precise, complete' says it better.

Actually, yourself and Hgreenhough have convinced me that we should have an external links section, so that it is clear which links are external. A user should expect a mid-article link to lead to a page in the same style, with the same style of information. Pernickety, but that makes the documentation good.

Now, I've done a lot of work to get this page up and running, but I have no rights over content, and it's a thrill to see other contributors taking an interest. Do we now have a consensus that we introduce an external links section?

> email regex ... You suggest I change a-z to a-zA-Z

Yes that would be my take, for the info to be 'complete'. Or say "turn on case sensitive".

I'd also allow for the .museum domain - I think the limit on domain size is really intended for data entry, not searching within a document.

And I'd also point out that it is a (good) practical but not a perfect solution - doesn't catch every e-address.

Bit pernickety, that...

--Drking 20:30, 28 January 2008 (GMT)


Well, I'm now agreed with Drking regarding external links, i.e. to be included, in a section at the end.

On the other subject... I can no longer see any blogspot pages because they are now blocked by Websense at our corporate firewall, so I can't refer to Andrewz's pages. But talk of .museum and postcodes reminded me it must be regarding the subject of this old thread:

sort emailadresses with regular expression in calc?

I remember thinking there probably wasn't a perfect solution using regex.

--Hgreenhough 11:51, 29 January 2008 (CET)


I added the External Links section.

Re sorting by email addresses - an interesting challenge :). Defeated me so far. Probably one of those things that spreadsheets shouldn't be used for.

(edit) Incidentally, Andrewz, I'm afraid your email regex doesn't match 'the-dog@pets.com' - it finds 'dog@pets.com' only. OOo seems to treat '-' as a word boundary, although '+' is not treated as a word boundary. Hm.

--Drking 06:30, 02 February 2008 (GMT)


The email regex doesn't match 'the-dog@pets.com' because the '-' in [] was un-escaped. I also can't find a reason to include \< and \> in OOo. I've added my version to our tips and tricks list now.

--Drking 06:30, 07 February 2008 (GMT)


Flepennu has tried to add a link to French translations (so far unfinished) at the end of both the Writer & Calc How To. The links are not formed correctly and so do not appear on the article. Being as they are not external, should we rename the section 'Links' instead of 'External Links', or should the French translations be linked to elsewhere in the article?

On the subject of email addresses: http://www.regular-expressions.info/email.html

--Hgreenhough 10:43, 19 February 2008 (CET)

OOo 3.0 Help links

The application help of OOo 3.0 links to this Wiki page and the one for Calc. The links are below the list of regular expressions:


We would like to improve the application help by inserting many more links leading to Wiki pages.

>>>>>>>>>

(noting the above comment from 'Ufi')

Excellent! thank you Uwe

--Drking 8:45, 24 January 2008 (GMT)


ICU

Next to this statement could be added an reference for authority and research?

The ICU regular expression package, a candidate to replace the existing OOo regular expression engine


The ICU question above was from Andrewz on 9/Feb/2008 (please could everyone use the 'sig+timestamp' button).

Do you mean where is it said that the ICU regex engine should replace the current one? If so, see: Regexp.

--Hgreenhough 10:04, 11 February 2008 (CET)

Personal tools