Calc/Features/Numbers import for plain text files

From Apache OpenOffice Wiki
< Calc‎ | Features
Jump to: navigation, search

Numbers import for plain text files

Specification Status
Author Kohei Yoshida
Last Change See wiki history
Status In progress in CWS koheicsvimport

Abstract

The CSV import options dialog has been enhanced to provide a language option, and two additional options for numbers import. In addition, the dialog's state is now stored persistently in user configuration. Calc's HTML import filter now provides a language option and an option of whether or not to detect special numbers in its new import options dialog.

References

Reference Document Check Location (URL)
Issue ID (required) available Issue 3687 Issue 97416 Issue 102141
Test case specification (required) n/a

Contacts

Role Name E-Mail Address
Developer Kohei Yoshida kyoshida@novell.com
Quality Assurance Oliver Craemer oliver.craemer@sun.com
Documentation up for grabs
User Experience up for grabs

Detailed Specification

CSV import options dialog

Modified CSV import options dialog with new controls.

The CSV import options dialog now has a Language list box to allow users to specify a non-default language to use for CSV import. How this language selection influences how numbers in the CSV document are imported is explained later in this section.

In Addition, the dialog also provides the following two additional check boxes in the Other options section:

  • Quoted field as text
  • Detect special numbers

Likewise, how these options influence numbers import is explained later in this section.

Persistence of dialog's state

The states of this dialog's controls (except for the character set) are stored persistently in user configuration so that the subsequent launch of this dialog restores the previous settings. The states of the dialog controls are saved even after the application process terminates.

HTML import options dialog

Import options dialog for HTML import.

An HTML import options dialog allows users to specify a non-default language, which influences how numbers are parsed, as well as an option of enabling or disabling special number detection. How this language selection and the special number detection option influence numbers import is explained later in this section.

Options for parsing numbers

Language

Language (and regions in case the language is associated with multiple regions) determines how the number strings are parsed during import. If the language option is set to Default (in the CSV Options dialog) or Automatic (in the HTML Import Options dialog), Calc will use the language that OOo uses globally. If the language option is set to a specific language, that language will be used when parsing numbers.

Documentation caution.png Note that, when importing an HTML document, this option may conflict with the global HTML option Use 'English (USA)' locale for numbers in the HTML Compatibility page of the Options dialog. This global option is effective only when the Automatic language option is selected; it is ignored when the user chooses a specific language in the HTML import options dialog.

Detect special numbers

When this option is enabled, Calc will automatically detect all number formats, including special number formats such as dates, time, and scientific notation. The selected language influences how such special numbers are detected, since different languages and regions many have different conventions for such special numbers.

When this option is disabled, Calc will detect and convert decimal numbers only while the rest will be imported as texts. A decimal number string can have digits 0-9, thousands separators (aka group separators), and a decimal separator. A decimal number is not allowed to have more than one decimal separator. Thousands separators and decimal separators may vary with the selected language and region. The term decimal number in this instance does not include scientific notation. When this option is disabled, numbers formatted in scientific notation will be imported as text.

Quoted field as text (CSV import only)

When this option is enabled, fields or cells whose values are quoted in their entirety (i.e. the first and last characters of the value equal the text delimiter character specified in the same dialog) are imported as texts irregardless of what their contents are.

Migration

N/A

Configuration

This feature introduces the following new configuration nodes under the Calc/Dialogs/CSVImport node path, to persistently store the states of controls in the CSV import options dialog.

Name Type Description
MergeDelimiters boolean status of Merge delimiters check box
QuotedFieldAsText boolean status of Quoted field as text check box
DetectSpecialNumbers boolean status of Detect special numbers check box
Language int Selected language. The number corresponds with the internal ID of the selected language.
Separators string the character that separates the fields.
TextSeparators string the text delimiter character used to quote texts.
FixedWidth boolean status of whether the Fixed width or Separated by radio box is checked.
FromRow int ID of the row where the data import begins.
CharSet int Numerical ID of the selected character set.
FixedWidthList string Set of numerical column positions where the fixed width separators are placed. The column positions are separated by semicolons (;).

File Format

N/A

Open Issues

Not directly caused by this feature, but opening an HTML file from File - Open opens the file in Writer, even if the file is opened from Calc's application frame. On the other hand, when opening an HTML file from the command line by scalc command correctly launches Calc to import the file.

Credits

The dialog state persistence feature was contributed by Muthu Subramanian.

Personal tools