Filter Options

From Apache OpenOffice Wiki
Jump to: navigation, search



Loading and saving Apache OpenOffice API documents is described in Handling Documents. This section lists all the filter names for spreadsheet documents and describes the filter options for text file import.

The filter name and options are passed on loading or saving a document in a sequence of com.sun.star.beans.PropertyValues. The property FilterName contains the name and the property FilterOptions contains the filter options.

Documentation note.png This list is no longer current as of OpenOffice 4.1.x.
Documentation note.png All filter names are case-sensitive. For compatibility reasons the filter names will not be changed. Therefore, some of the filters seem to have "curious" names.

The list of filter names (the last two columns show the possible directions of the filters):

Filter name Description Import Export
StarOffice XML (Calc) Standard XML filter
calc_StarOffice_XML_Calc_Template XML filter for templates
StarCalc 5.0 The binary format of StarOffice Calc 5.x
StarCalc 5.0 Vorlage/Template StarOffice Calc 5.x templates
StarCalc 4.0 The binary format of StarCalc 4.x
StarCalc 4.0 Vorlage/Template StarCalc 4.x templates
StarCalc 3.0 The binary format of StarCalc 3.x
StarCalc 3.0 Vorlage/Template StarCalc 3.x templates
HTML (StarCalc) HTML filter
calc_HTML_WebQuery HTML filter for external data queries
MS Excel 97 Microsoft Excel 97/2000/XP
MS Excel 97 Vorlage/Template Microsoft Excel 97/2000/XP templates
MS Excel 95 Microsoft Excel 5.0/95
MS Excel 5.0/95 Different name for the same filter
MS Excel 95 Vorlage/Template Microsoft Excel 5.0/95 templates
MS Excel 5.0/95 Vorlage/Template Different name for the same filter
MS Excel 4.0 Microsoft Excel 2.1/3.0/4.0
MS Excel 4.0 Vorlage/Template Microsoft Excel 2.1/3.0/4.0 templates
Lotus Lotus 1-2-3
Text - txt - csv (StarCalc) Comma separated values
Rich Text Format (StarCalc)
dBase
SYLK Symbolic Link
DIF Data Interchange Format

Filter Options for Lotus, dBase and DIF Filters

These filters accept a string containing the numerical index of the used character set for single-byte characters, that is, 0 for the system character set.

The numerical indexes assigned to the character sets:

Character Set Index
Unknown 0
Windows-1252/WinLatin 1 (Western) 1
Apple Macintosh (Western) 2
DOS/OS2-437/US (Western) 3
DOS/OS2-850/International (Western) 4
DOS/OS2-860/Portuguese (Western) 5
DOS/OS2-861/Icelandic (Western) 6
DOS/OS2-863/Canadian-French (Western) 7
DOS/OS2-865/Nordic (Western) 8
System default 9
Symbol 10
ASCII/US (Western) 11
ISO-8859-1 (Western) 12
ISO-8859-2 (Central European) 13
ISO-8859-3 (Latin 3) 14
ISO-8859-4 (Baltic) 15
ISO-8859-5 (Cyrillic) 16
ISO-8859-6 (Arabic) 17
ISO-8859-7 (Greek) 18
ISO-8859-8 (Hebrew) 19
ISO-8859-9 (Turkish) 20
ISO-8859-14 (Western) 21
ISO-8859-15/EURO (Western) 22
DOS/OS2-737 (Greek) 23
DOS/OS2-775 (Baltic) 24
DOS/OS2-852 (Central European) 25
DOS/OS2-855 (Cyrillic) 26
DOS/OS2-857 (Turkish) 27
DOS/OS2-862 (Hebrew) 28
DOS/OS2-864 (Arabic) 29
DOS/OS2-866/Russian (Cyrillic) 30
DOS/OS2-869/Modern (Greek) 31
DOS/Windows-874 (Thai) 32
Windows-1250/WinLatin 2 (Central European) 33
Windows-1251 (Cyrillic) 34
Windows-1253 (Greek) 35
Windows-1254 (Turkish) 36
Windows-1255 (Hebrew) 37
Windows-1256 (Arabic) 38
Windows-1257 (Baltic) 39
Windows-1258 (Vietnamese) 40
Apple Macintosh (Arabic) 41
Apple Macintosh (Central European) 42
Apple Macintosh/Croatian (Central European) 43
Apple Macintosh (Cyrillic) 44
Not supported: Apple Macintosh (Devanagari) 45
Not supported: Apple Macintosh (Farsi) 46
Apple Macintosh (Greek) 47
Not supported: Apple Macintosh (Gujarati) 48
Not supported: Apple Macintosh (Gurmukhi) 49
Apple Macintosh (Hebrew) 50
Apple Macintosh/Icelandic (Western) 51
Apple Macintosh/Romanian (Central European) 52
Apple Macintosh (Thai) 53
Apple Macintosh (Turkish) 54
Apple Macintosh/Ukrainian (Cyrillic) 55
Apple Macintosh (Chinese Simplified) 56
Apple Macintosh (Chinese Traditional) 57
Apple Macintosh (Japanese) 58
Apple Macintosh (Korean) 59
Windows-932 (Japanese) 60
Windows-936 (Chinese Simplified) 61
Windows-Wansung-949 (Korean) 62
Windows-950 (Chinese Traditional) 63
Shift-JIS (Japanese) 64
GB-2312 (Chinese Simplified) 65
GBT-12345 (Chinese Traditional) 66
GBK/GB-2312-80 (Chinese Simplified) 67
BIG5 (Chinese Traditional) 68
EUC-JP (Japanese) 69
EUC-CN (Chinese Simplified) 70
EUC-TW (Chinese Traditional) 71
ISO-2022-JP (Japanese) 72
ISO-2022-CN (Chinese Simplified) 73
KOI8-R (Cyrillic) 74
Unicode (UTF-7) 75
Unicode (UTF-8) 76
ISO-8859-10 (Central European) 77
ISO-8859-13 (Central European) 78
EUC-KR (Korean) 79
ISO-2022-KR (Korean) 80
JIS 0201 (Japanese) 81
JIS 0208 (Japanese) 82
JIS 0212 (Japanese) 83
Windows-Johab-1361 (Korean) 84
GB-18030 (Chinese Simplified) 85
BIG5-HKSCS (Chinese Traditional) 86
TIS 620 (Thai) 87
KOI8-U (Cyrillic) 88
ISCII Devanagari (Indian) 89
Unicode (Java's modified UTF-8) 90
Adobe Standard 91
Adobe Symbol 92
PT 154 (Windows Cyrillic Asian codepage
developed in ParaType)
93
Unicode UCS4 65534
Unicode UCS2 65535

Filter Options for the CSV Filter

This filter accepts an option string containing five to nine tokens, separated by commas. Tokens 6, to 9 are optional.

Tokens 1 to 5

The following table shows an example string for a file with four columns of type date - number - number - number. In the table the tokens are numbered from (1) to (5). Each token is explained below.

Example Filter Options String Field Separator (1) Text Delimiter (2) Character Set (3) Number of First Line (4) Cell Format Codes for the four Columns (5)
Column Code
File Format:

Four columns date-num-num-num

, " System line no. 1 1

2
3
4

YY/MM/DD = 5

Standard = 1
Standard = 1
Standard = 1

Token 44 34 0 1 1/5/2/1/3/1/4/1

For the filter options above, set the PropertyValue FilterOptions in the load arguments to "44,34,0,1,1/5/2/1/3/1/4/1". There are a number of possible settings for the five tokens.

  1. Field separator(s) as ASCII values. Multiple values are separated by the slash sign ("/"), that is, if the values are separated by semicolons and horizontal tabulators, the token would be 59/9. To treat several consecutive separators as one, the four letters /MRG have to be appended to the token. If the file contains fixed width fields, the three letters FIX are used.
  2. The text delimiter as ASCII value, that is, 34 for double quotes and 39 for single quotes.
  3. The character set used in the file as described above.
  4. Number of the first line to convert. The first line in the file has the number 1.
  5. Cell format of the columns. The content of this token depends on the value of the first token.
  • If value separators are used, the form of this token is column/format[/column/format/…] where column is the number of the column, with 1 being the leftmost column. The format is explained below.
  • If the first token is FIX it has the form start/format[/start/format/…], where start is the number of the first character for this field, with 0 being the leftmost character in a line. The format is explained below.
Format specifies which cell format should be used for a field during import:
Format Code Meaning
1 Standard
2 Text
3 MM/DD/YY
4 DD/MM/YY
5 YY/MM/DD
6 -
7 -
8 -
9 ignore field (do not import)
10 US-English
The type code 10 indicates that the content of a field is US-English. This is useful if a field contains decimal numbers that are formatted according to the US system (using "." as decimal separator and "," as thousands separator). Using 10 as a format specifier for this field tells Apache OpenOffice API to correctly interpret its numerical content, even if the decimal and thousands separator in the current language are different.

Token 6 : Language identifier

This token is the equivalent of the "Language" listbox in the user interface for csv import.
It is a String expressed in decimal notation. If the value is 0 or omitted, the language identifier of the user interface is used.

The language identifier is based on the Microsoft language identifiers, for further information please see:

Language Identifier Constants and Strings ==DEPRECATED==
https://msdn.microsoft.com/en-us/library/windows/desktop/dd318693%28v=vs.85%29.aspx

Use the decimal notation, example for English US : 1033 whereas Microsoft documentation used hexadecimal notation 0x0409.

Token 7, csv import

This token is the equivalent of the check box "Quoted field as text".

String, either false or true. Default value : false.

Token 7, csv export

This token is the equivalent of the check box "Quote all text cells".

String, either false or true. Default value : false.

Token 8, csv import

This token is the equivalent of the check box "Detect special numbers".

String, either false or true. Default value : false.

Token 8, csv export

This token has no UI equivalent. If true, the number cells are stored as numbers. If false, the numbers are stored as text, with text delimiters.

String, either false or true. Default value : true.

Token 9, csv import

Not used : only 8 tokens are used.

Token 9, csv export

This token is the equivalent of the check box "Save cell contents as shown".

String, either false or true. Default value : true.

Examples

Import from UTF-8, Language German, Comma separated, Text delimiter ", Quoted field as text:
44,34,76,1,,1031,true,true

Export to Windows-1252, Field delimiter : comma, Text delimiter : quote, Save cell contents as shown:
44,34,ANSI,1,,0,false,true,true


Content on this page is licensed under the Public Documentation License (PDL).
Personal tools
In other languages