Difference between revisions of "Import of Hindi numbers from Microsoft Word documents"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Detailed Specification)
(Open Issues)
 
(One intermediate revision by one other user not shown)
Line 228: Line 228:
  
 
== Open Issues ==
 
== Open Issues ==
<State a bulleted list of issues Issue here>
 
  
 
[[Category:Specification]]
 
[[Category:Specification]]
 +
 +
* It seems that this specification should also cater to the needs of Persian and Urdu users.
 +
*:Farsi and Urdu use a slightly different form of Hindi numerals; see e.g.  http://behdad.org/download/Publications/persiancomputing/a007.pdf and http://www.microsoft.com/middleeast/arabicdev/windows/winxp/DigitsSupport.aspx. According to the Microsoft source, Farsi uses different forms for 4,5 and 6, and Urdu uses different forms for 4,5,6 and 7. These digits are Unicode U+06F0..U+06F9 (where the Arabic "regular" Hindi numerals are U+0660..U+0669). I could not find different code-points mentioned for Urdu numerals, and in fact I found references which made Urdu digits the same as the Persian ones. -- [[User:Shai2platonix|Shai2platonix]] 01:27, 22 April 2008 (CEST)
 +
 +
:: Urdu (and Sindhi) uses the same Unicode code points for Extended Arabic-Indic (aka Persian) digits but has some glyph variation that is selected at the font rather than encoding level (using OpenType lang features and so), see [http://www.unicode.org/versions/Unicode4.0.0/ch08.pdf#G5460 Unicode book, Ch. 8.2 Arabic]. --[[User:Khaled Hosny|Khaled Hosny]] 17:59, 23 June 2008 (CEST)

Latest revision as of 15:59, 23 June 2008

Specification Status
Author Henning Brinkmann
Last Change 17.09.2007
Status Preliminary Help

Abstract

Microsoft Word marks numbers with the script to use by a hint. Furthermore there is an option to display numbers as Hindi, Arabic, by Context or determined by the System. This specification defines how the script hint and the display option shall be handled on import of Microsoft Word documents.

References

Reference Document Check Location (URL)
Specification Process Entry Check passed n/a
Product Requirement, RFE, Issue ID (required) available [1]
Product Concept Document not available
Test case specification (required) not available <PLEASE ENTER LOCATION HERE>
IDL Specification not available
Software Specification Rules n/a n/a
Other, e.g. references to related specs

Contacts

Role Name E-Mail Address
Developer Henning Brinkmann Henning.Brinkmann@sun.com
Quality Assurance Michael Rüß Michael.Ruess@sun.com
Documentation Uwe Fischer Uwe.Fischer@sun.com
User Experience <First Name, Last Name> <User@openoffice.org>

Acronyms and Abbreviations

Acronym / Abbreviation Definition
<WYSIWYG> <What You See Is What You Get>

Detailed Specification

When a digit is marked to have CTL script in the imported Word document it shall be imported as Hindi digit iff the bidi language is one of the languages mentioned below.


Language Language Code
Arabic(Algeria) 0x1401
Arabic(Bahrain) 0x3c01
Arabic(Egypt) 0xc01
Arabic(Iraq) 0x801
Arabic (Jordan) 0x2c01
Arabic(Kuwait) 0x3401
Arabic(Lebanon) 0x3001
Arabic(Libya) 0x1001
Arabic(Morocco) 0x1801
Arabic(Oman) 0x2001
Arabic(Qatar) 0x4001
Arabic(Saudi Arabia) 0x401
Arabic(Syria) 0x2801
Arabic(Tunisia) 0x1c01
Arabic(U.A.E) 0x3801
Arabic(Yemen) 0x2401


This feature shall only be activated iff the configuration item RegardHindiDigits (see below) is true.

If the configuration item RegardHindiDigits is set the following mapping between Arabic and Hindi characters applies:

Arabic (Unicode) Hindi (Unicode)
0 (U+0030) ٠ (U+0660)
1 (U+0031) ١ (U+0661)
2 (U+0032) ٢ (U+0662)
3 (U+0033) ٣ (U+0663)
4 (U+0034) ٤ (U+0664)
5 (U+0035) ٥ (U+0665)
6 (U+0036) ٦ (U+0666)
7 (U+0037) ٧ (U+0667)
8 (U+0038) ٨ (U+0668)
9 (U+0039) ٩ (U+0669)
Help | User Interface Element Templates | Example Spec

Migration

The specified features improves interoperability with Microsoft Word.

Configuration

Configuration Group Setting Type Default Comment |
Writer.xcs FilterFlags/WinWord RegardHindiDigits xs:long false If true yields to digits marked as CTL script to be imported as Hindi digits.
Help | Configuration Table Template

File Format

This specification covers import only and thus has no consequences regarding the file format.

Help

Help | File Format Table Template

Open Issues

Urdu (and Sindhi) uses the same Unicode code points for Extended Arabic-Indic (aka Persian) digits but has some glyph variation that is selected at the font rather than encoding level (using OpenType lang features and so), see Unicode book, Ch. 8.2 Arabic. --Khaled Hosny 17:59, 23 June 2008 (CEST)
Personal tools