Difference between revisions of "Talk:NUMBERTEXT/MONEYTEXT development"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Some languages need male/female option for number to text)
(Greek language needs male/female option: new section)
 
(48 intermediate revisions by 11 users not shown)
Line 4: Line 4:
  
 
License requirements: Soros programs of NUMBERTEXT project are released under LGPL/BSD dual-license.
 
License requirements: Soros programs of NUMBERTEXT project are released under LGPL/BSD dual-license.
 +
 +
Use <nowiki>~~~~</nowiki> (four tilde) at the end of your comment to include your login name and a time stamp.
 +
 +
To indent your comment, use one or more colons at the beginning of it.
  
 
== Some languages need male/female option for number to text==
 
== Some languages need male/female option for number to text==
Line 30: Line 34:
 
: Thanks for your report. [[User:Nemeth|Nemeth]] 22:12, 3 September 2009 (UTC)
 
: Thanks for your report. [[User:Nemeth|Nemeth]] 22:12, 3 September 2009 (UTC)
  
== Some fixes on Catalan definition ==
+
Works fine with currency, thanks.  But I'm thinking in some additional option in NUMBERTEX OOo Calc function.
 +
Currently we have,
 +
=NUMBERTEXT(number);
 +
=NUMBERTEXT(number,lang_code);
 +
What about?
 +
=NUMBERTEXT(number,lang_code, gender_code); Where gender_code can be: 0,1,2,.... Catalan only needs 2 variations, but may be other languages uses 3 or more variations. Of course, masculine/0 code as default.
  
__numbertext__
+
or maybe better?
 +
=NUMBERTEXT_FEM(number);
 +
=NUMBERTEXT_FEM(number,lang_code); for "feminine" option.
  
^0 zero
+
Of course, we could use MONEYTEXT function with a fake currency code, with feminine tag, but empty units strings. But I think it is a workarround.
 +
--[[User:Jmontane|Jmontane]] 20:35, 6 September 2009 (UTC)
  
1$ u
+
: NUMBERTEXT is a string function. The numeric input converted by Calc automatically. What about
  
1 un
+
NUMBERTEXT("ordinal:4545")
 +
NUMBERTEXT("feminine:564")
 +
NUMBERTEXT("ordinal-feminine:564")
 +
NUMBERTEXT(CONCATENATE("ordinal-feminine:";$A1))
  
2 dos
+
: and similar expressions?
  
3 tres
+
: Maybe for the special handling of dates, we have to add a DATETEXT() function. Thanks for your suggestions. [[User:Nemeth|Nemeth]] 11:36, 10 November 2009 (UTC)
  
4 quatre
+
:: Yes, I thinks it's fine. I looked at en_US_2 code on Numbertext IDE. But, will be these prefixes (ordinal, feminine,...) language dependant? Whe can define them freely?
 +
:: I think it's a good option.
 +
--[[User:Jmontane|Jmontane]] 12:10, 28 April 2010 (UTC)
  
5 cinc
+
== Minor bug in Spanish language definition ==
  
6 sis
+
Spanish has gender variation in numbers containing the string "ientos" (doscientos/as, quinientos/as, novecientos/as, etc). It generates "doscientos libras", but the correct would be "doscientas libras". I think that this line should solve this:
  
7 set
+
f:(.*ient)o(s.*),(.*as?) $(f:\1a\2,\3)  # doscientos libra/libras -> doscientas
  
8 vuit
+
--[[User:Roebek|Roebek]] 16:24, 25 September 2009 (UTC)
  
9 nou
+
: Thanks for your patch. There is in the new Numbertext 0.7 release. [[User:Nemeth|Nemeth]] 11:36, 10 November 2009 (UTC)
  
10 deu
+
== Some fixes on Catalan definition ==
  
11 onze
+
__numbertext__
 +
 +
^0 zero
 +
1$ u
 +
1 un
 +
2 dos
 +
3 tres
 +
4 quatre
 +
5 cinc
 +
6 sis
 +
7 set
 +
8 vuit
 +
9 nou
 +
10 deu
 +
11 onze
 +
12 dotze
 +
13 tretze
 +
14 catorze
 +
15 quinze
 +
16 setze
 +
17 disset
 +
1(\d) di$1
 +
20 vint
 +
2(\d) vint-i-$1
 +
30 trenta
 +
40 quaranta
 +
50 cinquanta
 +
60 seixanta
 +
70 setanta
 +
80 vuitanta
 +
90 noranta
 +
(\d)(\d) $(\10)-$2
 +
1(\d\d) cent $1
 +
(\d)(\d\d) $1-cents $2
 +
1(\d{3}) mil $1
 +
(\d{1,3})(\d{3}) $1 mil $2
 +
1(\d{6}) un milió $1
 +
(\d{1,6})(\d{6}) $1 milions $2
 +
1(\d{9}) mil milions $1
 +
1(\d{12}) un bilió $1
 +
(\d{1,6})(\d{12}) $1 bilions $2
 +
1(\d{18}) un trilió $1
 +
(\d{1,6})(\d{18}) $1 trilions $2
 +
1(\d{24}) un quadrilió $1
 +
(\d{1,6})(\d{24}) $1 quadrilions $2 
 +
 +
# negative number?
 +
 +
[-−](\d+) menys |$1
 +
 +
# decimals
 +
 +
"([-−]?\d+)[.,]" $1| coma
 +
"([-−]?\d+[.,]\d*)(\d)" $1| |$2
 +
 +
# currency
 +
 +
# unit/subunit singular/plural
 +
 +
us:([^,]*),([^,]*),([^,]*),([^,]*) \1
 +
up:([^,]*),([^,]*),([^,]*),([^,]*) \2
 +
ss:([^,]*),([^,]*),([^,]*),([^,]*) \3
 +
sp:([^,]*),([^,]*),([^,]*),([^,]*) \4
 +
CHF:(\D+) $(\1: franc suís, francs suís, cèntim, cèntims)
 +
EUR:(\D+) $(\1: euro, euros, cèntim, cèntims)
 +
GBP:(\D+) $(\1: lliura esterlina, lliures esterlines, penic, penics)
 +
JPY:(\D+) $(\1: ien, iens, sen, sen)
 +
USD:(\D+) $(\1: dòlar EUA, dòlar EUA, cent, cents)
 +
"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2 $(\1:us)
 +
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2 $(\1:up)
 +
"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 amb $(1) $(\2:ss)
 +
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 amb $(\30) $(\2:sp)
 +
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 amb $3 $(\2:sp)
  
12 dotze
+
: Fixed in Numbertext 0.6. Many thanks for your help. [[User:Nemeth|Nemeth]] 22:16, 3 September 2009 (UTC)
  
13 tretze
+
Thanks for your work. I've updated at launchpad (bug #425374) Catalan Soros code with some additional fixes and improvements.--[[User:Jmontane|Jmontane]] 20:36, 6 September 2009 (UTC)
  
14 catorze
+
== French numbering remarks ==
  
15 quinze
+
Congratulations for this fantastic extension ! It was needed for many years !
  
16 setze
+
These remarks are still valid for version 0.9
  
17 disset
 
  
1(\d) di$1
+
==== MONEYTEXT ====
  
20 vint
+
a) Not language specific : When there is more than two decimals, MONEYTEXT rounds the value to 2 decimals, that is correct behaviour, I think. But currently it rounds up only above decimal 5, instead of from decimal 5, and not even in every cases.
  
2(\d) vint-i-$1
+
Compare with the rounding of Calc when formatted with 2 decimals :
  
30 trenta
+
: Value 9,9949 is displayed 10 by Calc, but MONEYTEXT will treat it like 9,99
 +
: MONEYTEXT produces 10 only for a value strictly greater that 9,995, for example 9,995001
  
40 quaranta
+
Value 5,995 Euros in en-US gives : six euro and zero cents
 +
: rounding up is correct but...
 +
: the text should be : '''six euros'''
 +
:: (plural for euros, no mention of cents)
  
50 cinquanta
+
Value 9,995 Euros in en-US gives : nine euro and ninety-nine cents
 +
: no round up this time ! round up occurs only with a slightly greater value.
  
60 seixanta
+
:: I believe, Python (the implementation language of the Numbertext extension) uses different rounding algorithm, but I will check it. [[User:Nemeth|Nemeth]] 11:44, 10 November 2009 (UTC)
  
70 setanta
 
  
80 vuitanta
+
b) not language specific, case of rounding down :
  
90 noranta
+
MONEYTEXT value 7,004 gives in fr-FR : "sept euros et zéro centimes" instead of : "sept euros"
  
(\d)(\d) $(\10)-$2
+
MONEYTEXT value 0,004 gives in fr-FR : "zéro euros et zéro centimes" instead of : "zéro euro"
  
1(\d\d) cent $1
+
: I will fix it. Many thanks for your great bug reports, especially for the previous missing 0.x decimals. It was a complementer character group bug of the interpreter. [[User:Nemeth|Nemeth]] 11:44, 10 November 2009 (UTC)
  
(\d)(\d\d) $1-cents $2
+
Still existing in version 0.9 /  [[User:BMarcelly|BMarcelly]] 07:03, 26 May 2010 (UTC)
  
1(\d{3}) mil $1
+
== Turkish language source ==
  
(\d{1,3})(\d{3}) $1 mil $2
+
Hello,
  
1(\d{6}) un milió $1
+
First I thank to developers of this extension.
 +
I made turkish version numbertext_tr_TR.py. Here is the source
  
(\d{1,6})(\d{6}) $1 milions $2
+
----
 +
[[File:Numbertext_tr_TR.txt]]
  
1(\d{9}) mil milions $1
 
  
1(\d{12}) un bilió $1
 
  
(\d{1,6})(\d{12}) $1 bilions $2
+
I hope in newer versions turkish version adds to the project
 +
----
 +
In turkish;<br/>
 +
Number texts written with spaces like one hundered twent five, but money texts written with deleting of spaces, like ''onehunderedtwentyfive'' turkish lira<br/>
  
1(\d{18}) un trilió $1
+
'''Is it possible to do this?
 +
'''
 +
<br/>[[User:Ramdem|Ramdem]] 20:01, 12 September 2009 (UTC)
 +
: Yes, it's possible by a space deletion call. I will add it, and you can check the result. [[User:Nemeth|Nemeth]] 13:09, 27 September 2009 (UTC)
  
(\d{1,6})(\d{18}) $1 trilions $2
+
:: I have integrated with some small fixes the Turkish description to Numbertext 0.7. See http://NUMBERTEXT.org, too. Thanks, [[User:Nemeth|Nemeth]] 11:45, 10 November 2009 (UTC)
 +
Thanks Nemeth I will announce this release numbertext at turkish openoffice.org forum [[User:Ramdem|Ramdem]] 17:27, 11 November 2009 (UTC)
  
1(\d{24}) un quadrilió $1
+
== Minor Bug in Thai BAHTTEXT or NUMBERTEXT/MONEYTEXT ==
  
(\d{1,6})(\d{24}) $1 quadrilions $2
+
In OOo, it spells all the numbers ending with '-01' as 'หนึ่ง', not 'เอ็ด' which are all wrong. There is only 2 cases that OOo spells them correctly, that are when the number is 1, and when the number has other number before 1 such as '-21' or '-51'.
  
 +
The rule of spelling a number in Thai when '1' is at the least digit of integral part of a number in Thai, it is spelled 'เอ็ด' not 'หนึ่ง' such as;
 +
31 is spelled 'สามสิบ'''เอ็ด'''' not 'สามสิบ'''หนึ่ง'''', or
 +
201 is spelled 'สองร้อย'''เอ็ด'''' not 'สองร้อย'''หนึ่ง'''', or
 +
50001 is spelled 'ห้าหมื่น'''เอ็ด'''' not 'ห้าหมื่น'''หนึ่ง'''', and so on.
  
# negative number?
+
There is only one case it is spelled 'หนึ่ง' when the number is 1.
  
[-−](\d+) menys |$1
+
See the issue at [http://www.openoffice.org/issues/show_bug.cgi?id=83490 OO.o Bug Tracker]
  
# decimals
+
And now I find that NUMBERTEXT.org is also make it wrong.
  
"([-−]?\d+)[.,]" $1| coma
+
: What a surprise! I have fixed in the version 0.8. Thanks for your report! László ([[User:Nemeth|Nemeth]] 06:43, 20 April 2010 (UTC))
  
"([-−]?\d+[.,]\d*)(\d)" $1| |$2
+
 
 +
== What is the longest string numbertext can parse? ==
 +
Just for info. What escale is the limit of numbertext? [http://en.wikipedia.org/wiki/Long_and_short_scales]
 +
Is there any limit on input or output string?
 +
--[[User:Jmontane|Jmontane]] 12:14, 28 April 2010 (UTC)
 +
: There is no limitation for the input and output size (null-terminated strings). [[User:Nemeth|Nemeth]] 07:14, 30 April 2010 (UTC)
 +
 
 +
== language / mony codes ==
 +
 
 +
Hi. It works graet, but where I can find language / mony codes ? --[[User:Adam majewski|Adam majewski]] 15:27, 30 June 2010 (UTC)
 +
 
 +
== "un" [1] varies gender in french ==
 +
 
 +
Hello.
 +
Thanks a lot for this great and smart extension !
 +
For french as for most latin languages, MONEYTEXT() function needs gender variability for 1 ("un/une"), since currencies can be male or female. However word ending is not significant in french.
 +
Here is a proposal (based on fr-xx from relase 0.9.3), which uses f/m attributes attached to each currency. Since I still do not figure out all Soros subtleties, I guess there could be a better way to achieve this.<pre>
 +
__numbertext__
 +
 
 +
[...]
  
 
# currency
 
# currency
Line 135: Line 257:
 
# unit/subunit singular/plural
 
# unit/subunit singular/plural
  
us:([^,]*),([^,]*),([^,]*),([^,]*) \1
+
us:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \1
 +
up:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \2
 +
ud:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \3
 +
ss:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \4
 +
sp:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \5
  
up:([^,]*),([^,]*),([^,]*),([^,]*) \2
+
# masculine/feminine
  
ss:([^,]*),([^,]*),([^,]*),([^,]*) \3
+
mf:.*(,f) e
  
sp:([^,]*),([^,]*),([^,]*),([^,]*) \4
+
BIF:(\D+) $(\1: franc burundais, francs burundais, de francs burundais, centime, centimes,m)
 +
CAD:(\D+) $(\1: dollar canadien, dollars canadiens, de dollars canadiens, cent, cents,m)
 +
CDF:(\D+) $(\1: franc congolais, francs congolais, de francs congolais, centime, centimes,m)
 +
CHF:(\D+) $(\1: franc suisse, francs suisses, de francs suisses, centime, centimes,m)
 +
DJF:(\D+) $(\1: franc de Djibouti, francs de Djibouti, de francs de Djibouti, centime, centimes,m)
 +
DZD:(\D+) $(\1: dinar algérien, dinars algériens, de dinars algériens, centime, centimes,m)
 +
EUR:(\D+) $(\1: euro, euros, d’euros, centime, centimes,)
 +
GBP:(\D+) $(\1: livre sterling, livres sterling, de livres sterling, penny, pennies,f)
 +
GNF:(\D+) $(\1: franc guinéen, francs guinéens, de francs guinéens,,,m)
 +
HTF:(\D+) $(\1: gourde, gourde, de gourde, centime, centimes,f)
 +
KMF:(\D+) $(\1: franc des Comores, francs des Comores, de francs des Comores, centime, centimes,m)
 +
LBP:(\D+) $(\1: livre libanaise, livres libanaises, de livres libanaises,,,f)
 +
MAD:(\D+) $(\1: dirham marocain, dirhams marocains, de dirhams marocains, centime, centimes,m)
 +
MGA:(\D+) $(\1: ariary, ariarys, d’ariarys, iraimbilanja, iraimbilanja,m)
 +
MRO:(\D+) $(\1: ouguiya, ouguiya, d’ouguiya, khoum, khoums,m)
 +
MUR:(\D+) $(\1: roupie mauricienne, roupies mauriciennes, de roupies mauriciennes, cent, cents,f)
 +
RWF:(\D+) $(\1: franc rwandais, francs rwandais, de francs rwandais, centime, centimes,m)
 +
SCR:(\D+) $(\1: roupie seychelloise, roupies seychelloises, de roupies seychelloise, cent, cents,f)
 +
TND:(\D+) $(\1: dinar tunisien, dinars tunisiens, de dinars tunisiens, millime, millimes,m)
 +
USD:(\D+) $(\1: dollar américain, dollars américains, de dollars américains, cent, cents,m)
 +
VUV:(\D+) $(\1: vatu, vatus, de vatus,,,m)
 +
X[AO]F:(\D+) $(\1: franc CFA, francs CFA, de francs CFA, centime, centimes,m)
 +
XPF:(\D+) $(\1: franc Pacifique, francs Pacifique, de francs Pacifique, centime, centimes,m)
  
CHF:(\D+) $(\1: franc suís, francs suís, cèntim, cèntims)
+
"(GNF|LBP|VUV) ([-−]?[01](.0+)?)" $2 $(\1:us)
 +
"(GNF|LBP|VUV) ([-−]?\d+0{6,})" $2 $(\1:ud)
 +
"(GNF|LBP|VUV) ([-−]?\d+[.,]\d+)" $2 $(\1:up)
  
EUR:(\D+) $(\1: euro, euros, cèntim, cèntims)
+
"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2$(\1:mf) $(\1:us)             # un/une
 +
"([A-Z]{3}) ([-−]?\d*[02-9]1)([.,]00?)?" $2$(\1:mf) $(\1:up)    # cent un/une mais pas cent onze
 +
"([A-Z]{3}) ([-−]?[0])([.,]00?)?" $2 $(\1:us)
 +
"([A-Z]{3}) ([-−]?\d+0{6,})([.,]00?)?" $2 $(\1:ud)
 +
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2 $(\1:up)
  
GBP:(\D+) $(\1: lliura esterlina, lliures esterlines, penic, penics)
+
"((MGA|MRO) [-−]?\d+)[.,]0" $1
 +
"((MGA|MRO) [-−]?\d+)[.,]2" $1 et |$(1) $(\2:ss)
 +
"((MGA|MRO) [-−]?\d+)[.,]4" $1 et |$(2) $(\2:sp)
 +
"((MGA|MRO) [-−]?\d+)[.,]6" $1 et |$(3) $(\2:sp)
 +
"((MGA|MRO) [-−]?\d+)[.,]8" $1 et |$(4) $(\2:sp)
  
JPY:(\D+) $(\1: ien, iens, sen, sen)
+
"((TND) [-−]?\d+)[.,](001)" $1 et |$(1) $(\2:ss)
 +
"((TND) [-−]?\d+)[.,](\d)" $1 et |$(\300) $(\2:sp)
 +
"((TND) [-−]?\d+)[.,](\d\d)" $1 et |$(\30) $(\2:sp)
 +
"((TND) [-−]?\d+)[.,](\d\d\d)" $1 et |$3 $(\2:sp)
  
USD:(\D+) $(\1: dòlar EUA, dòlar EUA, cent, cents)
+
"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 et |$(1) $(\2:ss)
 +
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 et |$(\30) $(\2:sp)
 +
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 et |$3 $(\2:sp)
  
"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2 $(\1:us)
+
[...]
 +
</pre>
 +
[[User:Jmzambon|jmzambon]] 14:42, 3 September 2010 (UTC)
  
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2 $(\1:up)
+
== Latvian language ==
 +
It would be nice to include also code for Latvian :
 +
<pre>
 +
__numbertext__
 +
^0 nulle
 +
1 viens
 +
2 divi
 +
3 trīs
 +
4 četri
 +
5 pieci
 +
6 seši
 +
7 sepiņi
 +
8 astoņi
 +
9 deviņi
 +
10 desmit
 +
11 vienpadsmit
 +
12 divpadsmit
 +
13 trīspadsmit
 +
14 četrpadsmit
 +
15 piecpadsmit
 +
16 sešpadsmit
 +
17 septiņpadsmit
 +
18 astoņpadsmit
 +
19 deniņpadsmit
 +
([2])(\d) divdesmit $2
 +
([23456789])(\d) $1|desmit $2
 +
1(\d\d) simts $1
 +
(\d)(\d\d) $1 simti $2
 +
1(\d{3}) viens tūkstotis $1
 +
(\d{1,3})(\d{3}) $1 tūkstoši $2
 +
1(\d{6}) viens miljons $1
 +
(\d{1,3})(\d{6}) $1 miljoni $2
 +
1(\d{9}) viens miljards $1
 +
(\d{1,3})(\d{9}) $1 miljardi $2
 +
1(\d{12}) viens triljons $1
 +
(\d{1,3})(\d{12}) $1 triljoni $2
 +
1(\d{15}) viens kvadriljons $1
 +
(\d{1,3})(\d{15}) $1 kvadriljoni $2
 +
1(\d{18}) viens kvintiljons $1
 +
(\d{1,3})(\d{18}) $1 kvintiljoni $2
 +
1(\d{21}) viens sekstiljons $1
 +
(\d{1,3})(\d{21}) $1 sekstiljoni $2
 +
1(\d{24}) viens septiljons $1
 +
(\d{1,3})(\d{24}) $1 septiljoni $2
  
"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 amb $(1) $(\2:ss)
+
# negative numbers
  
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 amb $(\30) $(\2:sp)
+
[-−](\d+) mīnus |$1
  
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 amb $3 $(\2:sp)
+
# decimals
  
== French numbering remarks ==
 
  
Congratulations for this fantastic extension ! It was needed for many years !
+
([-−]?\d+)[.,] $1| komats
 +
([-−]?\d+[.,]\d*)(\d) $1| |$2
  
I checked numbertext-0.5.oxt, which does not yet support fr-BE and fr-CH.
 
  
For your tests, the french web site [http://www.leconjugueur.com/frnombre.php?nombre=3%2C14 Le Conjugueur] is probably a reference.  
+
# female conversion
 +
f:(.*)viens viena
 +
f:(.*)i \1as
 +
f:(.*) \1
  
==== Special cases ====
+
# currency
100 should be written : cent  instead of : un cent (error only in NUMBERTEXT)
+
  
1000 should be written : mille  instead of : un mille
+
# unit/subunit
  
Same for 137, 1284, etc.
+
us:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \1
 +
up:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \2
 +
ug:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \3
 +
ss:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \4
 +
sp:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \5
 +
sg:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \6
  
 +
LVL:(\D+) $(\1: lats, lati,latu, santīms, santīmi, santīmu)
 +
EUR:(\D+) $(\1: eiro, eiro, eiro, cents, centi, centu)
 +
RUB:(\D+) $(\1: rublis, rubļi, rubļu, kapeika, kapeikas, kapeiku)
 +
USD:(\D+) $(\1: ASV dolārs, ASV dolāri, ASV dolāru, cents, centi, centu)
  
1000000000 should be written : un milliard  instead of : un milliarde
 
  
2000000000 should be written : deux milliards  instead of : deux milliardes
+
"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2| $(\1:us)
 +
"([A-Z]{3}) ([-−]?\d*[02-9]1)([.,]00?)?" $2| $(\1:us)
 +
"([A-Z]{3}) ([-−]?[23456789])([.,]00?)?" $2| $(\1:up)
 +
"([A-Z]{3}) ([-−]?\d*[02-9][23456789])([.,]00?)?" $2| $(\1:up)
 +
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2| $(\1:ug)
  
etc
+
"((RUB) [-−]?\d+)[.,]([02-9])1" $1 $(\30) |$(f:$(1)) $(\2:ss)
 +
"((RUB) [-−]?\d+)[.,]([02-9][23456789])" $1 $(f:$3)  $(\2:sp)
  
==== Decimals ====
+
"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 |$(1) $(\2:ss)
NUMBERTEXT systematically writes the decimal separator as : "comma", this is incorrect.
+
"(([A-Z]{3}) [-−]?\d+)[.,]([02-9])1" $1 $(\30) |$(1)  $(\2:ss)
  
fr-FR and fr-BE use the comma as separator. The term "comma" translates in french as "virgule"
+
"(([A-Z]{3}) [-−]?\d+)[.,]([02-9][23456789])" $1 |$3 $(\2:sp)
  
fr-CH use a dot as separator. The term "dot" translates in french as "point"
+
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 |$(\30) $(\2:sg)
 +
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 |$3 $(\2:sg)
 +
</pre>
 +
--[[User:Asterisks|Asterisks]] 23:38, 17 March 2012 (UTC)
  
 +
== Ukrainian language ==
  
In french we write (and say) decimals as if it were a number :
+
I look forward to any feedback and Ukrainian language addition.
  
3,14 should be written : trois virgule quatorze  instead of : trois virgule un quatre
+
[[File:Numbertext_uk.txt]]
  
3,1415 should be written : trois virgule mille quatre cent quinze
+
[[User:Ivanmelwise|Ivanmelwise]] ([[User talk:Ivanmelwise|talk]]) 10:29, 9 January 2013 (UTC)
  
3,141592 should be written : trois virgule cent quarante et un mille cinq cent quatre-vingt-douze
+
== Chinese DAXIE (大写) BUG ==
  
3,1415926535 should be written : trois virgule un milliard quatre cent quinze millions neuf cent vingt-six mille cinq cent trente-cinq
+
There is an error in Chinese. 
 +
zh-ZH-2 (banking writing) : digits 2 and 4 are same (贰) this is correct for 2 but not for 4. 4 should be 肆
  
NUMBERTEXT only : 5,000375 should be written : cinq virgule zéro zéro zéro trois cent soixante-quinze
+
== Greek language needs male/female option ==
  
==== MONEYTEXT ====
+
The moneytext function needs some improvements for the greek language. I can participate or provide more info.
3,1 in Euros shoud be written : trois euros et dix centimes instead of : trois euros y dix centimes
+

Latest revision as of 12:01, 10 December 2013

Discussion page of NUMBERTEXT/MONEYTEXT development

Start a new section for a new theme, bug report or a language module (Soros program). See also NUMBERTEXT.org.

License requirements: Soros programs of NUMBERTEXT project are released under LGPL/BSD dual-license.

Use ~~~~ (four tilde) at the end of your comment to include your login name and a time stamp.

To indent your comment, use one or more colons at the beginning of it.

Some languages need male/female option for number to text

Hi, in Catalan de numbers 1 and 2 can be male or female, based on what's numered. Example: cotxe (car) is male and flor (flower) is female. So 1 cotxe (one car) is spelled "un cotxe" and 1 flor (one flower) is spelled "una flor". So, 1--> un (if male noun) and una (if female noun), 2 --> dos (if male noun) and dues (if female noun).

This male/female change also happens in numbers finished in 1 and 2 different that 11 and 12 (21, 22, 31, 32, ...) and also in hundreds and thousands.

Spanish also has this male/female, but only in numbers finished in 1. In Spanish 2 it's always spelled "dos".

Finally, this male/female isseu als is important for currency to text. Many currency are treated as male nouns: euro, dollar. But few currencis are "female": sterling pounds or the old spanish peseta. So, 1200 $ is spelled as "mil dos-cents dòllars", but 1200 PTA is spelled as "mil dues-centes pessetes".

I have fixed them by text converters. ca_ES uses manual arguments for the gender of the currency units and subunits, es_ES module uses automatic gender detection (feminine units end with "a" or "as"):
# masculine to feminine conversion of "un" after millions,
# if "as?$" matches currency name

f:(.*ill)(.*),(.*) \1$(f:\2,\3)		# don't modify un in millions
f:(.*un)([^a].*,|,)(.*as?) $(f:\1a\2\3)	# un libra -> una libra
f:(.*),(.*) \1 \2

"([A-Z]{3}) ([-−]?1)" $(f:|$2,$(\1:us))
"([A-Z]{3}) ([-−]?\d+0{6,})" $2 de $(\1:up)
"([A-Z]{3}) ([-−]?\d+)" $(f:|$2,$(\1:up))
Thanks for your report. Nemeth 22:12, 3 September 2009 (UTC)

Works fine with currency, thanks. But I'm thinking in some additional option in NUMBERTEX OOo Calc function. Currently we have, =NUMBERTEXT(number); =NUMBERTEXT(number,lang_code); What about? =NUMBERTEXT(number,lang_code, gender_code); Where gender_code can be: 0,1,2,.... Catalan only needs 2 variations, but may be other languages uses 3 or more variations. Of course, masculine/0 code as default.

or maybe better? =NUMBERTEXT_FEM(number); =NUMBERTEXT_FEM(number,lang_code); for "feminine" option.

Of course, we could use MONEYTEXT function with a fake currency code, with feminine tag, but empty units strings. But I think it is a workarround. --Jmontane 20:35, 6 September 2009 (UTC)

NUMBERTEXT is a string function. The numeric input converted by Calc automatically. What about
NUMBERTEXT("ordinal:4545")
NUMBERTEXT("feminine:564")
NUMBERTEXT("ordinal-feminine:564")
NUMBERTEXT(CONCATENATE("ordinal-feminine:";$A1))
and similar expressions?
Maybe for the special handling of dates, we have to add a DATETEXT() function. Thanks for your suggestions. Nemeth 11:36, 10 November 2009 (UTC)
Yes, I thinks it's fine. I looked at en_US_2 code on Numbertext IDE. But, will be these prefixes (ordinal, feminine,...) language dependant? Whe can define them freely?
I think it's a good option.

--Jmontane 12:10, 28 April 2010 (UTC)

Minor bug in Spanish language definition

Spanish has gender variation in numbers containing the string "ientos" (doscientos/as, quinientos/as, novecientos/as, etc). It generates "doscientos libras", but the correct would be "doscientas libras". I think that this line should solve this:

f:(.*ient)o(s.*),(.*as?) $(f:\1a\2,\3)   # doscientos libra/libras -> doscientas

--Roebek 16:24, 25 September 2009 (UTC)

Thanks for your patch. There is in the new Numbertext 0.7 release. Nemeth 11:36, 10 November 2009 (UTC)

Some fixes on Catalan definition

__numbertext__ 

^0 zero
1$ u
1 un
2 dos
3 tres
4 quatre
5 cinc
6 sis
7 set
8 vuit
9 nou
10 deu
11 onze
12 dotze
13 tretze
14 catorze
15 quinze
16 setze
17 disset
1(\d) di$1
20 vint
2(\d) vint-i-$1
30 trenta
40 quaranta
50 cinquanta
60 seixanta
70 setanta
80 vuitanta
90 noranta
(\d)(\d) $(\10)-$2
1(\d\d) cent $1
(\d)(\d\d) $1-cents $2
1(\d{3}) mil $1
(\d{1,3})(\d{3}) $1 mil $2
1(\d{6}) un milió $1
(\d{1,6})(\d{6}) $1 milions $2
1(\d{9}) mil milions $1
1(\d{12}) un bilió $1
(\d{1,6})(\d{12}) $1 bilions $2
1(\d{18}) un trilió $1
(\d{1,6})(\d{18}) $1 trilions $2
1(\d{24}) un quadrilió $1
(\d{1,6})(\d{24}) $1 quadrilions $2  

# negative number?

[-−](\d+) menys |$1

# decimals

"([-−]?\d+)[.,]" $1| coma
"([-−]?\d+[.,]\d*)(\d)" $1| |$2

# currency

# unit/subunit singular/plural

us:([^,]*),([^,]*),([^,]*),([^,]*) \1
up:([^,]*),([^,]*),([^,]*),([^,]*) \2
ss:([^,]*),([^,]*),([^,]*),([^,]*) \3
sp:([^,]*),([^,]*),([^,]*),([^,]*) \4
CHF:(\D+) $(\1: franc suís, francs suís, cèntim, cèntims)
EUR:(\D+) $(\1: euro, euros, cèntim, cèntims)
GBP:(\D+) $(\1: lliura esterlina, lliures esterlines, penic, penics)
JPY:(\D+) $(\1: ien, iens, sen, sen)
USD:(\D+) $(\1: dòlar EUA, dòlar EUA, cent, cents)
"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2 $(\1:us)
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2 $(\1:up)
"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 amb $(1) $(\2:ss)
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 amb $(\30) $(\2:sp)
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 amb $3 $(\2:sp) 
Fixed in Numbertext 0.6. Many thanks for your help. Nemeth 22:16, 3 September 2009 (UTC)

Thanks for your work. I've updated at launchpad (bug #425374) Catalan Soros code with some additional fixes and improvements.--Jmontane 20:36, 6 September 2009 (UTC)

French numbering remarks

Congratulations for this fantastic extension ! It was needed for many years !

These remarks are still valid for version 0.9


MONEYTEXT

a) Not language specific : When there is more than two decimals, MONEYTEXT rounds the value to 2 decimals, that is correct behaviour, I think. But currently it rounds up only above decimal 5, instead of from decimal 5, and not even in every cases.

Compare with the rounding of Calc when formatted with 2 decimals :

Value 9,9949 is displayed 10 by Calc, but MONEYTEXT will treat it like 9,99
MONEYTEXT produces 10 only for a value strictly greater that 9,995, for example 9,995001

Value 5,995 Euros in en-US gives : six euro and zero cents

rounding up is correct but...
the text should be : six euros
(plural for euros, no mention of cents)

Value 9,995 Euros in en-US gives : nine euro and ninety-nine cents

no round up this time ! round up occurs only with a slightly greater value.
I believe, Python (the implementation language of the Numbertext extension) uses different rounding algorithm, but I will check it. Nemeth 11:44, 10 November 2009 (UTC)


b) not language specific, case of rounding down :

MONEYTEXT value 7,004 gives in fr-FR : "sept euros et zéro centimes" instead of : "sept euros"

MONEYTEXT value 0,004 gives in fr-FR : "zéro euros et zéro centimes" instead of : "zéro euro"

I will fix it. Many thanks for your great bug reports, especially for the previous missing 0.x decimals. It was a complementer character group bug of the interpreter. Nemeth 11:44, 10 November 2009 (UTC)

Still existing in version 0.9 / BMarcelly 07:03, 26 May 2010 (UTC)

Turkish language source

Hello,

First I thank to developers of this extension. I made turkish version numbertext_tr_TR.py. Here is the source


File:Numbertext tr TR.txt


I hope in newer versions turkish version adds to the project


In turkish;
Number texts written with spaces like one hundered twent five, but money texts written with deleting of spaces, like onehunderedtwentyfive turkish lira

Is it possible to do this?
Ramdem 20:01, 12 September 2009 (UTC)

Yes, it's possible by a space deletion call. I will add it, and you can check the result. Nemeth 13:09, 27 September 2009 (UTC)
I have integrated with some small fixes the Turkish description to Numbertext 0.7. See http://NUMBERTEXT.org, too. Thanks, Nemeth 11:45, 10 November 2009 (UTC)

Thanks Nemeth I will announce this release numbertext at turkish openoffice.org forum Ramdem 17:27, 11 November 2009 (UTC)

Minor Bug in Thai BAHTTEXT or NUMBERTEXT/MONEYTEXT

In OOo, it spells all the numbers ending with '-01' as 'หนึ่ง', not 'เอ็ด' which are all wrong. There is only 2 cases that OOo spells them correctly, that are when the number is 1, and when the number has other number before 1 such as '-21' or '-51'.

The rule of spelling a number in Thai when '1' is at the least digit of integral part of a number in Thai, it is spelled 'เอ็ด' not 'หนึ่ง' such as; 31 is spelled 'สามสิบเอ็ด' not 'สามสิบหนึ่ง', or 201 is spelled 'สองร้อยเอ็ด' not 'สองร้อยหนึ่ง', or 50001 is spelled 'ห้าหมื่นเอ็ด' not 'ห้าหมื่นหนึ่ง', and so on.

There is only one case it is spelled 'หนึ่ง' when the number is 1.

See the issue at OO.o Bug Tracker

And now I find that NUMBERTEXT.org is also make it wrong.

What a surprise! I have fixed in the version 0.8. Thanks for your report! László (Nemeth 06:43, 20 April 2010 (UTC))


What is the longest string numbertext can parse?

Just for info. What escale is the limit of numbertext? [1] Is there any limit on input or output string? --Jmontane 12:14, 28 April 2010 (UTC)

There is no limitation for the input and output size (null-terminated strings). Nemeth 07:14, 30 April 2010 (UTC)

language / mony codes

Hi. It works graet, but where I can find language / mony codes ? --Adam majewski 15:27, 30 June 2010 (UTC)

"un" [1] varies gender in french

Hello. Thanks a lot for this great and smart extension ! For french as for most latin languages, MONEYTEXT() function needs gender variability for 1 ("un/une"), since currencies can be male or female. However word ending is not significant in french.

Here is a proposal (based on fr-xx from relase 0.9.3), which uses f/m attributes attached to each currency. Since I still do not figure out all Soros subtleties, I guess there could be a better way to achieve this.
__numbertext__

[...]

# currency

# unit/subunit singular/plural

us:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \1
up:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \2
ud:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \3
ss:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \4
sp:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \5

# masculine/feminine

mf:.*(,f) e

BIF:(\D+) $(\1: franc burundais, francs burundais, de francs burundais, centime, centimes,m)
CAD:(\D+) $(\1: dollar canadien, dollars canadiens, de dollars canadiens, cent, cents,m)
CDF:(\D+) $(\1: franc congolais, francs congolais, de francs congolais, centime, centimes,m)
CHF:(\D+) $(\1: franc suisse, francs suisses, de francs suisses, centime, centimes,m)
DJF:(\D+) $(\1: franc de Djibouti, francs de Djibouti, de francs de Djibouti, centime, centimes,m)
DZD:(\D+) $(\1: dinar algérien, dinars algériens, de dinars algériens, centime, centimes,m)
EUR:(\D+) $(\1: euro, euros, d’euros, centime, centimes,)
GBP:(\D+) $(\1: livre sterling, livres sterling, de livres sterling, penny, pennies,f)
GNF:(\D+) $(\1: franc guinéen, francs guinéens, de francs guinéens,,,m)
HTF:(\D+) $(\1: gourde, gourde, de gourde, centime, centimes,f)
KMF:(\D+) $(\1: franc des Comores, francs des Comores, de francs des Comores, centime, centimes,m)
LBP:(\D+) $(\1: livre libanaise, livres libanaises, de livres libanaises,,,f)
MAD:(\D+) $(\1: dirham marocain, dirhams marocains, de dirhams marocains, centime, centimes,m)
MGA:(\D+) $(\1: ariary, ariarys, d’ariarys, iraimbilanja, iraimbilanja,m)
MRO:(\D+) $(\1: ouguiya, ouguiya, d’ouguiya, khoum, khoums,m)
MUR:(\D+) $(\1: roupie mauricienne, roupies mauriciennes, de roupies mauriciennes, cent, cents,f)
RWF:(\D+) $(\1: franc rwandais, francs rwandais, de francs rwandais, centime, centimes,m)
SCR:(\D+) $(\1: roupie seychelloise, roupies seychelloises, de roupies seychelloise, cent, cents,f)
TND:(\D+) $(\1: dinar tunisien, dinars tunisiens, de dinars tunisiens, millime, millimes,m)
USD:(\D+) $(\1: dollar américain, dollars américains, de dollars américains, cent, cents,m)
VUV:(\D+) $(\1: vatu, vatus, de vatus,,,m)
X[AO]F:(\D+) $(\1: franc CFA, francs CFA, de francs CFA, centime, centimes,m)
XPF:(\D+) $(\1: franc Pacifique, francs Pacifique, de francs Pacifique, centime, centimes,m)

"(GNF|LBP|VUV) ([-−]?[01](.0+)?)" $2 $(\1:us)
"(GNF|LBP|VUV) ([-−]?\d+0{6,})" $2 $(\1:ud)
"(GNF|LBP|VUV) ([-−]?\d+[.,]\d+)" $2 $(\1:up)

"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2$(\1:mf) $(\1:us)              # un/une
"([A-Z]{3}) ([-−]?\d*[02-9]1)([.,]00?)?" $2$(\1:mf) $(\1:up)     # cent un/une mais pas cent onze
"([A-Z]{3}) ([-−]?[0])([.,]00?)?" $2 $(\1:us)
"([A-Z]{3}) ([-−]?\d+0{6,})([.,]00?)?" $2 $(\1:ud)
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2 $(\1:up)

"((MGA|MRO) [-−]?\d+)[.,]0" $1
"((MGA|MRO) [-−]?\d+)[.,]2" $1 et |$(1) $(\2:ss)
"((MGA|MRO) [-−]?\d+)[.,]4" $1 et |$(2) $(\2:sp)
"((MGA|MRO) [-−]?\d+)[.,]6" $1 et |$(3) $(\2:sp)
"((MGA|MRO) [-−]?\d+)[.,]8" $1 et |$(4) $(\2:sp)

"((TND) [-−]?\d+)[.,](001)" $1 et |$(1) $(\2:ss)
"((TND) [-−]?\d+)[.,](\d)" $1 et |$(\300) $(\2:sp)
"((TND) [-−]?\d+)[.,](\d\d)" $1 et |$(\30) $(\2:sp)
"((TND) [-−]?\d+)[.,](\d\d\d)" $1 et |$3 $(\2:sp)

"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 et |$(1) $(\2:ss)
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 et |$(\30) $(\2:sp)
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 et |$3 $(\2:sp)

[...]

jmzambon 14:42, 3 September 2010 (UTC)

Latvian language

It would be nice to include also code for Latvian :

__numbertext__
^0 nulle
1 viens
2 divi
3 trīs
4 četri
5 pieci
6 seši
7 sepiņi
8 astoņi
9 deviņi
10 desmit
11 vienpadsmit
12 divpadsmit
13 trīspadsmit
14 četrpadsmit
15 piecpadsmit
16 sešpadsmit
17 septiņpadsmit
18 astoņpadsmit
19 deniņpadsmit
([2])(\d) divdesmit $2
([23456789])(\d) $1|desmit $2
1(\d\d) simts $1
(\d)(\d\d) $1 simti $2
1(\d{3}) viens tūkstotis $1
(\d{1,3})(\d{3}) $1 tūkstoši $2
1(\d{6}) viens miljons $1
(\d{1,3})(\d{6}) $1 miljoni $2
1(\d{9}) viens miljards $1
(\d{1,3})(\d{9}) $1 miljardi $2
1(\d{12}) viens triljons $1
(\d{1,3})(\d{12}) $1 triljoni $2
1(\d{15}) viens kvadriljons $1
(\d{1,3})(\d{15}) $1 kvadriljoni $2
1(\d{18}) viens kvintiljons $1
(\d{1,3})(\d{18}) $1 kvintiljoni $2
1(\d{21}) viens sekstiljons $1
(\d{1,3})(\d{21}) $1 sekstiljoni $2
1(\d{24}) viens septiljons $1
(\d{1,3})(\d{24}) $1 septiljoni $2

# negative numbers

[-−](\d+) mīnus |$1

# decimals


([-−]?\d+)[.,] $1| komats
([-−]?\d+[.,]\d*)(\d) $1| |$2


# female conversion
f:(.*)viens viena
f:(.*)i \1as
f:(.*) \1

# currency

# unit/subunit

us:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \1
up:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \2
ug:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \3
ss:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \4
sp:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \5
sg:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \6

LVL:(\D+) $(\1: lats, lati,latu, santīms, santīmi, santīmu)
EUR:(\D+) $(\1: eiro, eiro, eiro, cents, centi, centu)
RUB:(\D+) $(\1: rublis, rubļi, rubļu, kapeika, kapeikas, kapeiku)
USD:(\D+) $(\1: ASV dolārs, ASV dolāri, ASV dolāru, cents, centi, centu)


"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2| $(\1:us)
"([A-Z]{3}) ([-−]?\d*[02-9]1)([.,]00?)?" $2| $(\1:us)
"([A-Z]{3}) ([-−]?[23456789])([.,]00?)?" $2| $(\1:up)
"([A-Z]{3}) ([-−]?\d*[02-9][23456789])([.,]00?)?" $2| $(\1:up)
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2| $(\1:ug)

"((RUB) [-−]?\d+)[.,]([02-9])1" $1 $(\30) |$(f:$(1)) $(\2:ss)
"((RUB) [-−]?\d+)[.,]([02-9][23456789])" $1 $(f:$3)  $(\2:sp)

"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 |$(1) $(\2:ss)
"(([A-Z]{3}) [-−]?\d+)[.,]([02-9])1" $1 $(\30) |$(1)  $(\2:ss)

"(([A-Z]{3}) [-−]?\d+)[.,]([02-9][23456789])" $1 |$3 $(\2:sp)

"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 |$(\30) $(\2:sg)
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 |$3 $(\2:sg)

--Asterisks 23:38, 17 March 2012 (UTC)

Ukrainian language

I look forward to any feedback and Ukrainian language addition.

File:Numbertext uk.txt

Ivanmelwise (talk) 10:29, 9 January 2013 (UTC)

Chinese DAXIE (大写) BUG

There is an error in Chinese. zh-ZH-2 (banking writing) : digits 2 and 4 are same (贰) this is correct for 2 but not for 4. 4 should be 肆

Greek language needs male/female option

The moneytext function needs some improvements for the greek language. I can participate or provide more info.

Personal tools