Talk:NUMBERTEXT/MONEYTEXT development

From Apache OpenOffice Wiki
Jump to: navigation, search

Discussion page of NUMBERTEXT/MONEYTEXT development

Start a new section for a new theme, bug report or a language module (Soros program). See also NUMBERTEXT.org.

License requirements: Soros programs of NUMBERTEXT project are released under LGPL/BSD dual-license.

Use ~~~~ (four tilde) at the end of your comment to include your login name and a time stamp.

To indent your comment, use one or more colons at the beginning of it.

Some languages need male/female option for number to text

Hi, in Catalan de numbers 1 and 2 can be male or female, based on what's numered. Example: cotxe (car) is male and flor (flower) is female. So 1 cotxe (one car) is spelled "un cotxe" and 1 flor (one flower) is spelled "una flor". So, 1--> un (if male noun) and una (if female noun), 2 --> dos (if male noun) and dues (if female noun).

This male/female change also happens in numbers finished in 1 and 2 different that 11 and 12 (21, 22, 31, 32, ...) and also in hundreds and thousands.

Spanish also has this male/female, but only in numbers finished in 1. In Spanish 2 it's always spelled "dos".

Finally, this male/female isseu als is important for currency to text. Many currency are treated as male nouns: euro, dollar. But few currencis are "female": sterling pounds or the old spanish peseta. So, 1200 $ is spelled as "mil dos-cents dòllars", but 1200 PTA is spelled as "mil dues-centes pessetes".

I have fixed them by text converters. ca_ES uses manual arguments for the gender of the currency units and subunits, es_ES module uses automatic gender detection (feminine units end with "a" or "as"):
# masculine to feminine conversion of "un" after millions,
# if "as?$" matches currency name

f:(.*ill)(.*),(.*) \1$(f:\2,\3)		# don't modify un in millions
f:(.*un)([^a].*,|,)(.*as?) $(f:\1a\2\3)	# un libra -> una libra
f:(.*),(.*) \1 \2

"([A-Z]{3}) ([-−]?1)" $(f:|$2,$(\1:us))
"([A-Z]{3}) ([-−]?\d+0{6,})" $2 de $(\1:up)
"([A-Z]{3}) ([-−]?\d+)" $(f:|$2,$(\1:up))
Thanks for your report. Nemeth 22:12, 3 September 2009 (UTC)

Works fine with currency, thanks. But I'm thinking in some additional option in NUMBERTEX OOo Calc function. Currently we have, =NUMBERTEXT(number); =NUMBERTEXT(number,lang_code); What about? =NUMBERTEXT(number,lang_code, gender_code); Where gender_code can be: 0,1,2,.... Catalan only needs 2 variations, but may be other languages uses 3 or more variations. Of course, masculine/0 code as default.

or maybe better? =NUMBERTEXT_FEM(number); =NUMBERTEXT_FEM(number,lang_code); for "feminine" option.

Of course, we could use MONEYTEXT function with a fake currency code, with feminine tag, but empty units strings. But I think it is a workarround. --Jmontane 20:35, 6 September 2009 (UTC)

NUMBERTEXT is a string function. The numeric input converted by Calc automatically. What about
NUMBERTEXT("ordinal:4545")
NUMBERTEXT("feminine:564")
NUMBERTEXT("ordinal-feminine:564")
NUMBERTEXT(CONCATENATE("ordinal-feminine:";$A1))
and similar expressions?
Maybe for the special handling of dates, we have to add a DATETEXT() function. Thanks for your suggestions. Nemeth 11:36, 10 November 2009 (UTC)


Minor bug in Spanish language definition

Spanish has gender variation in numbers containing the string "ientos" (doscientos/as, quinientos/as, novecientos/as, etc). It generates "doscientos libras", but the correct would be "doscientas libras". I think that this line should solve this:

f:(.*ient)o(s.*),(.*as?) $(f:\1a\2,\3)   # doscientos libra/libras -> doscientas

--Roebek 16:24, 25 September 2009 (UTC)

Thanks for your patch. There is in the new Numbertext 0.7 release. Nemeth 11:36, 10 November 2009 (UTC)

Some fixes on Catalan definition

__numbertext__ 

^0 zero
1$ u
1 un
2 dos
3 tres
4 quatre
5 cinc
6 sis
7 set
8 vuit
9 nou
10 deu
11 onze
12 dotze
13 tretze
14 catorze
15 quinze
16 setze
17 disset
1(\d) di$1
20 vint
2(\d) vint-i-$1
30 trenta
40 quaranta
50 cinquanta
60 seixanta
70 setanta
80 vuitanta
90 noranta
(\d)(\d) $(\10)-$2
1(\d\d) cent $1
(\d)(\d\d) $1-cents $2
1(\d{3}) mil $1
(\d{1,3})(\d{3}) $1 mil $2
1(\d{6}) un milió $1
(\d{1,6})(\d{6}) $1 milions $2
1(\d{9}) mil milions $1
1(\d{12}) un bilió $1
(\d{1,6})(\d{12}) $1 bilions $2
1(\d{18}) un trilió $1
(\d{1,6})(\d{18}) $1 trilions $2
1(\d{24}) un quadrilió $1
(\d{1,6})(\d{24}) $1 quadrilions $2  

# negative number?

[-−](\d+) menys |$1

# decimals

"([-−]?\d+)[.,]" $1| coma
"([-−]?\d+[.,]\d*)(\d)" $1| |$2

# currency

# unit/subunit singular/plural

us:([^,]*),([^,]*),([^,]*),([^,]*) \1
up:([^,]*),([^,]*),([^,]*),([^,]*) \2
ss:([^,]*),([^,]*),([^,]*),([^,]*) \3
sp:([^,]*),([^,]*),([^,]*),([^,]*) \4
CHF:(\D+) $(\1: franc suís, francs suís, cèntim, cèntims)
EUR:(\D+) $(\1: euro, euros, cèntim, cèntims)
GBP:(\D+) $(\1: lliura esterlina, lliures esterlines, penic, penics)
JPY:(\D+) $(\1: ien, iens, sen, sen)
USD:(\D+) $(\1: dòlar EUA, dòlar EUA, cent, cents)
"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2 $(\1:us)
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2 $(\1:up)
"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 amb $(1) $(\2:ss)
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 amb $(\30) $(\2:sp)
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 amb $3 $(\2:sp) 
Fixed in Numbertext 0.6. Many thanks for your help. Nemeth 22:16, 3 September 2009 (UTC)

Thanks for your work. I've updated at launchpad (bug #425374) Catalan Soros code with some additional fixes and improvements.--Jmontane 20:36, 6 September 2009 (UTC)

French numbering remarks

Congratulations for this fantastic extension ! It was needed for many years !

These remarks are still valid for version 0.7


MONEYTEXT

a) Not language specific : When there is more than two decimals, MONEYTEXT rounds the value to 2 decimals, that is correct behaviour, I think. But currently it rounds up only above decimal 5, instead of from decimal 5, and not even in every cases.

Compare with the rounding of Calc when formatted with 2 decimals :

Value 9,9949 is displayed 10 by Calc, but MONEYTEXT will treat it like 9,99
MONEYTEXT produces 10 only for a value strictly greater that 9,995, for example 9,995001

Value 5,995 Euros in en-US gives : six euro and zero cents

rounding up is correct but...
the text should be : six euros
(plural for euros, no mention of cents)

Value 9,995 Euros in en-US gives : nine euro and ninety-nine cents

no round up this time ! round up occurs only with a slightly greater value.
I believe, Python (the implementation language of the Numbertext extension) uses different rounding algorithm, but I will check it. Nemeth 11:44, 10 November 2009 (UTC)


b) not language specific, case of rounding down :

MONEYTEXT value 7,004 gives in fr-FR : "sept euros et zéro centimes" instead of : "sept euros"

MONEYTEXT value 0,004 gives in fr-FR : "zéro euros et zéro centimes" instead of : "zéro euro"

BMarcelly 10:43, 10 November 2009 (UTC)

I will fix it. Many thanks for your great bug reports, especially for the previous missing 0.x decimals. It was a complementer character group bug of the interpreter. Nemeth 11:44, 10 November 2009 (UTC)

Monetary units

These monetary units are listed in file numbertext_fr_FR.py (and other french variants) but are not recognized by MONEYTEXT:

BIF, DJF, DZD, GNF, HTF, KMF, MAD, MUR, SCR, VUV, XOF


For fr-FR, fr-BE, fr-CH you should add XPF: franc Pacifique

singular : 1 franc Pacifique ; plural : 2 francs Pacifique


In file numbertext_ro_RO.py the monetary unit RON is listed but not recognized by MONEYTEXT.

BMarcelly 10:46, 10 November 2009 (UTC)

I think, this is the fault of the i18n database of OpenOffice.org. Nemeth 11:44, 10 November 2009 (UTC)

Turkish language source

Hello,

First I thank to developers of this extension. I made turkish version numbertext_tr_TR.py. Here is the source


File:Numbertext tr TR.txt


I hope in newer versions turkish version adds to the project


In turkish;
Number texts written with spaces like one hundered twent five, but money texts written with deleting of spaces, like onehunderedtwentyfive turkish lira

Is it possible to do this?
Ramdem 20:01, 12 September 2009 (UTC)

Yes, it's possible by a space deletion call. I will add it, and you can check the result. Nemeth 13:09, 27 September 2009 (UTC)
I have integrated with some small fixes the Turkish description to Numbertext 0.7. See http://NUMBERTEXT.org, too. Thanks, Nemeth 11:45, 10 November 2009 (UTC)

Thanks Nemeth I will announce this release numbertext at turkish openoffice.org forum Ramdem 17:27, 11 November 2009 (UTC)

Minor Bug in Thai BAHTTEXT or NUMBERTEXT/MONEYTEXT

In OOo, it spells all the numbers ending with '-01' as 'หนึ่ง', not 'เอ็ด' which are all wrong. There is only 2 cases that OOo spells them correctly, that are when the number is 1, and when the number has other number before 1 such as '-21' or '-51'.

The rule of spelling a number in Thai when '1' is at the least digit of integral part of a number in Thai, it is spelled 'เอ็ด' not 'หนึ่ง' such as; 31 is spelled 'สามสิบเอ็ด' not 'สามสิบหนึ่ง', or 201 is spelled 'สองร้อยเอ็ด' not 'สองร้อยหนึ่ง', or 50001 is spelled 'ห้าหมื่นเอ็ด' not 'ห้าหมื่นหนึ่ง', and so on.

There is only one case it is spelled 'หนึ่ง' when the number is 1.

See the issue at OO.o Bug Tracker

Personal tools