Talk:NUMBERTEXT/MONEYTEXT development

From Apache OpenOffice Wiki
Jump to: navigation, search

Discussion page of NUMBERTEXT/MONEYTEXT development

Start a new section for a new theme, bug report or a language module (Soros program). See also NUMBERTEXT.org.

License requirements: Soros programs of NUMBERTEXT project are released under LGPL/BSD dual-license.

Use ~~~~ (four tilde) at the end of your comment to include your login name and a time stamp.

To indent your comment, use one or more colons at the beginning of it.

Some languages need male/female option for number to text

Hi, in Catalan de numbers 1 and 2 can be male or female, based on what's numered. Example: cotxe (car) is male and flor (flower) is female. So 1 cotxe (one car) is spelled "un cotxe" and 1 flor (one flower) is spelled "una flor". So, 1--> un (if male noun) and una (if female noun), 2 --> dos (if male noun) and dues (if female noun).

This male/female change also happens in numbers finished in 1 and 2 different that 11 and 12 (21, 22, 31, 32, ...) and also in hundreds and thousands.

Spanish also has this male/female, but only in numbers finished in 1. In Spanish 2 it's always spelled "dos".

Finally, this male/female isseu als is important for currency to text. Many currency are treated as male nouns: euro, dollar. But few currencis are "female": sterling pounds or the old spanish peseta. So, 1200 $ is spelled as "mil dos-cents dòllars", but 1200 PTA is spelled as "mil dues-centes pessetes".

I have fixed them by text converters. ca_ES uses manual arguments for the gender of the currency units and subunits, es_ES module uses automatic gender detection (feminine units end with "a" or "as"):
# masculine to feminine conversion of "un" after millions,
# if "as?$" matches currency name

f:(.*ill)(.*),(.*) \1$(f:\2,\3)		# don't modify un in millions
f:(.*un)([^a].*,|,)(.*as?) $(f:\1a\2\3)	# un libra -> una libra
f:(.*),(.*) \1 \2

"([A-Z]{3}) ([-−]?1)" $(f:|$2,$(\1:us))
"([A-Z]{3}) ([-−]?\d+0{6,})" $2 de $(\1:up)
"([A-Z]{3}) ([-−]?\d+)" $(f:|$2,$(\1:up))
Thanks for your report. Nemeth 22:12, 3 September 2009 (UTC)

Works fine with currency, thanks. But I'm thinking in some additional option in NUMBERTEX OOo Calc function. Currently we have, =NUMBERTEXT(number); =NUMBERTEXT(number,lang_code); What about? =NUMBERTEXT(number,lang_code, gender_code); Where gender_code can be: 0,1,2,.... Catalan only needs 2 variations, but may be other languages uses 3 or more variations. Of course, masculine/0 code as default.

or maybe better? =NUMBERTEXT_FEM(number); =NUMBERTEXT_FEM(number,lang_code); for "feminine" option.

Of course, we could use MONEYTEXT function with a fake currency code, with feminine tag, but empty units strings. But I think it is a workarround. --Jmontane 20:35, 6 September 2009 (UTC)

NUMBERTEXT is a string function. The numeric input converted by Calc automatically. What about
NUMBERTEXT("ordinal:4545")
NUMBERTEXT("feminine:564")
NUMBERTEXT("ordinal-feminine:564")
NUMBERTEXT(CONCATENATE("ordinal-feminine:";$A1))
and similar expressions?
Maybe for the special handling of dates, we have to add a DATETEXT() function. Thanks for your suggestions. Nemeth 11:36, 10 November 2009 (UTC)
Yes, I thinks it's fine. I looked at en_US_2 code on Numbertext IDE. But, will be these prefixes (ordinal, feminine,...) language dependant? Whe can define them freely?
I think it's a good option.

--Jmontane 12:10, 28 April 2010 (UTC)

Minor bug in Spanish language definition

Spanish has gender variation in numbers containing the string "ientos" (doscientos/as, quinientos/as, novecientos/as, etc). It generates "doscientos libras", but the correct would be "doscientas libras". I think that this line should solve this:

f:(.*ient)o(s.*),(.*as?) $(f:\1a\2,\3)   # doscientos libra/libras -> doscientas

--Roebek 16:24, 25 September 2009 (UTC)

Thanks for your patch. There is in the new Numbertext 0.7 release. Nemeth 11:36, 10 November 2009 (UTC)

Some fixes on Catalan definition

__numbertext__ 

^0 zero
1$ u
1 un
2 dos
3 tres
4 quatre
5 cinc
6 sis
7 set
8 vuit
9 nou
10 deu
11 onze
12 dotze
13 tretze
14 catorze
15 quinze
16 setze
17 disset
1(\d) di$1
20 vint
2(\d) vint-i-$1
30 trenta
40 quaranta
50 cinquanta
60 seixanta
70 setanta
80 vuitanta
90 noranta
(\d)(\d) $(\10)-$2
1(\d\d) cent $1
(\d)(\d\d) $1-cents $2
1(\d{3}) mil $1
(\d{1,3})(\d{3}) $1 mil $2
1(\d{6}) un milió $1
(\d{1,6})(\d{6}) $1 milions $2
1(\d{9}) mil milions $1
1(\d{12}) un bilió $1
(\d{1,6})(\d{12}) $1 bilions $2
1(\d{18}) un trilió $1
(\d{1,6})(\d{18}) $1 trilions $2
1(\d{24}) un quadrilió $1
(\d{1,6})(\d{24}) $1 quadrilions $2  

# negative number?

[-−](\d+) menys |$1

# decimals

"([-−]?\d+)[.,]" $1| coma
"([-−]?\d+[.,]\d*)(\d)" $1| |$2

# currency

# unit/subunit singular/plural

us:([^,]*),([^,]*),([^,]*),([^,]*) \1
up:([^,]*),([^,]*),([^,]*),([^,]*) \2
ss:([^,]*),([^,]*),([^,]*),([^,]*) \3
sp:([^,]*),([^,]*),([^,]*),([^,]*) \4
CHF:(\D+) $(\1: franc suís, francs suís, cèntim, cèntims)
EUR:(\D+) $(\1: euro, euros, cèntim, cèntims)
GBP:(\D+) $(\1: lliura esterlina, lliures esterlines, penic, penics)
JPY:(\D+) $(\1: ien, iens, sen, sen)
USD:(\D+) $(\1: dòlar EUA, dòlar EUA, cent, cents)
"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2 $(\1:us)
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2 $(\1:up)
"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 amb $(1) $(\2:ss)
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 amb $(\30) $(\2:sp)
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 amb $3 $(\2:sp) 
Fixed in Numbertext 0.6. Many thanks for your help. Nemeth 22:16, 3 September 2009 (UTC)

Thanks for your work. I've updated at launchpad (bug #425374) Catalan Soros code with some additional fixes and improvements.--Jmontane 20:36, 6 September 2009 (UTC)

French numbering remarks

Congratulations for this fantastic extension ! It was needed for many years !

These remarks are still valid for version 0.9


MONEYTEXT

a) Not language specific : When there is more than two decimals, MONEYTEXT rounds the value to 2 decimals, that is correct behaviour, I think. But currently it rounds up only above decimal 5, instead of from decimal 5, and not even in every cases.

Compare with the rounding of Calc when formatted with 2 decimals :

Value 9,9949 is displayed 10 by Calc, but MONEYTEXT will treat it like 9,99
MONEYTEXT produces 10 only for a value strictly greater that 9,995, for example 9,995001

Value 5,995 Euros in en-US gives : six euro and zero cents

rounding up is correct but...
the text should be : six euros
(plural for euros, no mention of cents)

Value 9,995 Euros in en-US gives : nine euro and ninety-nine cents

no round up this time ! round up occurs only with a slightly greater value.
I believe, Python (the implementation language of the Numbertext extension) uses different rounding algorithm, but I will check it. Nemeth 11:44, 10 November 2009 (UTC)


b) not language specific, case of rounding down :

MONEYTEXT value 7,004 gives in fr-FR : "sept euros et zéro centimes" instead of : "sept euros"

MONEYTEXT value 0,004 gives in fr-FR : "zéro euros et zéro centimes" instead of : "zéro euro"

I will fix it. Many thanks for your great bug reports, especially for the previous missing 0.x decimals. It was a complementer character group bug of the interpreter. Nemeth 11:44, 10 November 2009 (UTC)

Still existing in version 0.9 / BMarcelly 07:03, 26 May 2010 (UTC)

Turkish language source

Hello,

First I thank to developers of this extension. I made turkish version numbertext_tr_TR.py. Here is the source


File:Numbertext tr TR.txt


I hope in newer versions turkish version adds to the project


In turkish;
Number texts written with spaces like one hundered twent five, but money texts written with deleting of spaces, like onehunderedtwentyfive turkish lira

Is it possible to do this?
Ramdem 20:01, 12 September 2009 (UTC)

Yes, it's possible by a space deletion call. I will add it, and you can check the result. Nemeth 13:09, 27 September 2009 (UTC)
I have integrated with some small fixes the Turkish description to Numbertext 0.7. See http://NUMBERTEXT.org, too. Thanks, Nemeth 11:45, 10 November 2009 (UTC)

Thanks Nemeth I will announce this release numbertext at turkish openoffice.org forum Ramdem 17:27, 11 November 2009 (UTC)

Minor Bug in Thai BAHTTEXT or NUMBERTEXT/MONEYTEXT

In OOo, it spells all the numbers ending with '-01' as 'หนึ่ง', not 'เอ็ด' which are all wrong. There is only 2 cases that OOo spells them correctly, that are when the number is 1, and when the number has other number before 1 such as '-21' or '-51'.

The rule of spelling a number in Thai when '1' is at the least digit of integral part of a number in Thai, it is spelled 'เอ็ด' not 'หนึ่ง' such as; 31 is spelled 'สามสิบเอ็ด' not 'สามสิบหนึ่ง', or 201 is spelled 'สองร้อยเอ็ด' not 'สองร้อยหนึ่ง', or 50001 is spelled 'ห้าหมื่นเอ็ด' not 'ห้าหมื่นหนึ่ง', and so on.

There is only one case it is spelled 'หนึ่ง' when the number is 1.

See the issue at OO.o Bug Tracker

And now I find that NUMBERTEXT.org is also make it wrong.

What a surprise! I have fixed in the version 0.8. Thanks for your report! László (Nemeth 06:43, 20 April 2010 (UTC))


What is the longest string numbertext can parse?

Just for info. What escale is the limit of numbertext? [1] Is there any limit on input or output string? --Jmontane 12:14, 28 April 2010 (UTC)

There is no limitation for the input and output size (null-terminated strings). Nemeth 07:14, 30 April 2010 (UTC)

language / mony codes

Hi. It works graet, but where I can find language / mony codes ? --Adam majewski 15:27, 30 June 2010 (UTC)

"un" [1] varies gender in french

Hello. Thanks a lot for this great and smart extension ! For french as for most latin languages, MONEYTEXT() function needs gender variability for 1 ("un/une"), since currencies can be male or female. However word ending is not significant in french.

Here is a proposal (based on fr-xx from relase 0.9.3), which uses f/m attributes attached to each currency. Since I still do not figure out all Soros subtleties, I guess there could be a better way to achieve this.
__numbertext__

[...]

# currency

# unit/subunit singular/plural

us:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \1
up:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \2
ud:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \3
ss:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \4
sp:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),m?|f \5

# masculine/feminine

mf:.*(,f) e

BIF:(\D+) $(\1: franc burundais, francs burundais, de francs burundais, centime, centimes,m)
CAD:(\D+) $(\1: dollar canadien, dollars canadiens, de dollars canadiens, cent, cents,m)
CDF:(\D+) $(\1: franc congolais, francs congolais, de francs congolais, centime, centimes,m)
CHF:(\D+) $(\1: franc suisse, francs suisses, de francs suisses, centime, centimes,m)
DJF:(\D+) $(\1: franc de Djibouti, francs de Djibouti, de francs de Djibouti, centime, centimes,m)
DZD:(\D+) $(\1: dinar algérien, dinars algériens, de dinars algériens, centime, centimes,m)
EUR:(\D+) $(\1: euro, euros, d’euros, centime, centimes,)
GBP:(\D+) $(\1: livre sterling, livres sterling, de livres sterling, penny, pennies,f)
GNF:(\D+) $(\1: franc guinéen, francs guinéens, de francs guinéens,,,m)
HTF:(\D+) $(\1: gourde, gourde, de gourde, centime, centimes,f)
KMF:(\D+) $(\1: franc des Comores, francs des Comores, de francs des Comores, centime, centimes,m)
LBP:(\D+) $(\1: livre libanaise, livres libanaises, de livres libanaises,,,f)
MAD:(\D+) $(\1: dirham marocain, dirhams marocains, de dirhams marocains, centime, centimes,m)
MGA:(\D+) $(\1: ariary, ariarys, d’ariarys, iraimbilanja, iraimbilanja,m)
MRO:(\D+) $(\1: ouguiya, ouguiya, d’ouguiya, khoum, khoums,m)
MUR:(\D+) $(\1: roupie mauricienne, roupies mauriciennes, de roupies mauriciennes, cent, cents,f)
RWF:(\D+) $(\1: franc rwandais, francs rwandais, de francs rwandais, centime, centimes,m)
SCR:(\D+) $(\1: roupie seychelloise, roupies seychelloises, de roupies seychelloise, cent, cents,f)
TND:(\D+) $(\1: dinar tunisien, dinars tunisiens, de dinars tunisiens, millime, millimes,m)
USD:(\D+) $(\1: dollar américain, dollars américains, de dollars américains, cent, cents,m)
VUV:(\D+) $(\1: vatu, vatus, de vatus,,,m)
X[AO]F:(\D+) $(\1: franc CFA, francs CFA, de francs CFA, centime, centimes,m)
XPF:(\D+) $(\1: franc Pacifique, francs Pacifique, de francs Pacifique, centime, centimes,m)

"(GNF|LBP|VUV) ([-−]?[01](.0+)?)" $2 $(\1:us)
"(GNF|LBP|VUV) ([-−]?\d+0{6,})" $2 $(\1:ud)
"(GNF|LBP|VUV) ([-−]?\d+[.,]\d+)" $2 $(\1:up)

"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2$(\1:mf) $(\1:us)              # un/une
"([A-Z]{3}) ([-−]?\d*[02-9]1)([.,]00?)?" $2$(\1:mf) $(\1:up)     # cent un/une mais pas cent onze
"([A-Z]{3}) ([-−]?[0])([.,]00?)?" $2 $(\1:us)
"([A-Z]{3}) ([-−]?\d+0{6,})([.,]00?)?" $2 $(\1:ud)
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2 $(\1:up)

"((MGA|MRO) [-−]?\d+)[.,]0" $1
"((MGA|MRO) [-−]?\d+)[.,]2" $1 et |$(1) $(\2:ss)
"((MGA|MRO) [-−]?\d+)[.,]4" $1 et |$(2) $(\2:sp)
"((MGA|MRO) [-−]?\d+)[.,]6" $1 et |$(3) $(\2:sp)
"((MGA|MRO) [-−]?\d+)[.,]8" $1 et |$(4) $(\2:sp)

"((TND) [-−]?\d+)[.,](001)" $1 et |$(1) $(\2:ss)
"((TND) [-−]?\d+)[.,](\d)" $1 et |$(\300) $(\2:sp)
"((TND) [-−]?\d+)[.,](\d\d)" $1 et |$(\30) $(\2:sp)
"((TND) [-−]?\d+)[.,](\d\d\d)" $1 et |$3 $(\2:sp)

"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 et |$(1) $(\2:ss)
"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 et |$(\30) $(\2:sp)
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 et |$3 $(\2:sp)

[...]

jmzambon 14:42, 3 September 2010 (UTC)

Latvian language

It would be nice to include also code for Latvian :

__numbertext__
^0 nulle
1 viens
2 divi
3 trīs
4 četri
5 pieci
6 seši
7 sepiņi
8 astoņi
9 deviņi
10 desmit
11 vienpadsmit
12 divpadsmit
13 trīspadsmit
14 četrpadsmit
15 piecpadsmit
16 sešpadsmit
17 septiņpadsmit
18 astoņpadsmit
19 deniņpadsmit
([2])(\d) divdesmit $2
([23456789])(\d) $1|desmit $2
1(\d\d) simts $1
(\d)(\d\d) $1 simti $2
1(\d{3}) viens tūkstotis $1
(\d{1,3})(\d{3}) $1 tūkstoši $2
1(\d{6}) viens miljons $1
(\d{1,3})(\d{6}) $1 miljoni $2
1(\d{9}) viens miljards $1
(\d{1,3})(\d{9}) $1 miljardi $2
1(\d{12}) viens triljons $1
(\d{1,3})(\d{12}) $1 triljoni $2
1(\d{15}) viens kvadriljons $1
(\d{1,3})(\d{15}) $1 kvadriljoni $2
1(\d{18}) viens kvintiljons $1
(\d{1,3})(\d{18}) $1 kvintiljoni $2
1(\d{21}) viens sekstiljons $1
(\d{1,3})(\d{21}) $1 sekstiljoni $2
1(\d{24}) viens septiljons $1
(\d{1,3})(\d{24}) $1 septiljoni $2

# negative numbers

[-−](\d+) mīnus |$1

# decimals


([-−]?\d+)[.,] $1| komats
([-−]?\d+[.,]\d*)(\d) $1| |$2


# female conversion
f:(.*)viens viena
f:(.*)i \1as
f:(.*) \1

# currency

# unit/subunit

us:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \1
up:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \2
ug:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \3
ss:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \4
sp:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \5
sg:([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*) \6

LVL:(\D+) $(\1: lats, lati,latu, santīms, santīmi, santīmu)
EUR:(\D+) $(\1: eiro, eiro, eiro, cents, centi, centu)
RUB:(\D+) $(\1: rublis, rubļi, rubļu, kapeika, kapeikas, kapeiku)
USD:(\D+) $(\1: ASV dolārs, ASV dolāri, ASV dolāru, cents, centi, centu)


"([A-Z]{3}) ([-−]?1)([.,]00?)?" $2| $(\1:us)
"([A-Z]{3}) ([-−]?\d*[02-9]1)([.,]00?)?" $2| $(\1:us)
"([A-Z]{3}) ([-−]?[23456789])([.,]00?)?" $2| $(\1:up)
"([A-Z]{3}) ([-−]?\d*[02-9][23456789])([.,]00?)?" $2| $(\1:up)
"([A-Z]{3}) ([-−]?\d+)([.,]00?)?" $2| $(\1:ug)

"((RUB) [-−]?\d+)[.,]([02-9])1" $1 $(\30) |$(f:$(1)) $(\2:ss)
"((RUB) [-−]?\d+)[.,]([02-9][23456789])" $1 $(f:$3)  $(\2:sp)

"(([A-Z]{3}) [-−]?\d+)[.,](01)" $1 |$(1) $(\2:ss)
"(([A-Z]{3}) [-−]?\d+)[.,]([02-9])1" $1 $(\30) |$(1)  $(\2:ss)

"(([A-Z]{3}) [-−]?\d+)[.,]([02-9][23456789])" $1 |$3 $(\2:sp)

"(([A-Z]{3}) [-−]?\d+)[.,](\d)" $1 |$(\30) $(\2:sg)
"(([A-Z]{3}) [-−]?\d+)[.,](\d\d)" $1 |$3 $(\2:sg)

--Asterisks 23:38, 17 March 2012 (UTC)

Ukrainian language

I look forward to any feedback and Ukrainian language addition.

File:Numbertext uk.txt

Ivanmelwise (talk) 10:29, 9 January 2013 (UTC)

Chinese DAXIE (大写) BUG

There is an error in Chinese. zh-ZH-2 (banking writing) : digits 2 and 4 are same (贰) this is correct for 2 but not for 4. 4 should be 肆

Greek language needs male/female option

The moneytext function needs some improvements for the greek language. I can participate or provide more info.

Personal tools