Difference between revisions of "Bibliographic/Hints and Tips/OOoRISExport.py"

From Apache OpenOffice Wiki
Jump to: navigation, search
(Description)
(Details)
Line 32: Line 32:
 
==Details==
 
==Details==
  
The conversion to RIS format is very basic. RIS allows many tags to flag data, but the Openoffice Bibliographic database has only a few database columns. Basically the progame assigns the OOoBib columns - Identifier, Type, Address etc, the RIS tags - ID, TY, CY etc.
+
===Tag Mapping===
in the dictionary structure called 'mappings':
+
The conversion method I have used to the RIS format is very basic. RIS allows many tags to flag data, but the Openoffice Bibliographic database has only a few database columns. Basically the program assigns the OOoBib columns - Identifier, Type, Address etc, the RIS tags - ID, TY, CY etc. in the dictionary structure called 'mappings':
  
 
<pre>
 
<pre>
Line 45: Line 45:
 
'Custom5':'M3','ISBN':'SN'}
 
'Custom5':'M3','ISBN':'SN'}
 
</pre>
 
</pre>
 +
In particular I have not added name processing, what is in 'Author' goes into 'AU'.
  
 
If you want you can modify the two letter tags to better fit your needs. If you can suggest a better mapping to RIS tags let me know.
 
If you want you can modify the two letter tags to better fit your needs. If you can suggest a better mapping to RIS tags let me know.
  
 +
===Character tidying===
 
I had problems with non alpha-numeric characters (ie. CR LF) messing up the output so I stripped them out by replacing all types of non-white spaces with standard spaces - ' '.  
 
I had problems with non alpha-numeric characters (ie. CR LF) messing up the output so I stripped them out by replacing all types of non-white spaces with standard spaces - ' '.  
 +
 +
===Output character coding===
 +
You can set the character coding for the output by changing the setting for
 +
CODEC = 'iso-8856-1'
 +
Some of the possible encodings include utf-8, utf-16, ascii, iso-8856-1, unicode-escape.
 +
The RIS Specification recommends the 'Windows ANSI Character Set' which I guess is iso-8856-1 ???. I am not a expert on on this so I will take any advise.
  
 
[[Category: Bibliographic]]
 
[[Category: Bibliographic]]

Revision as of 05:59, 18 November 2006

Description

OOoRISExport.py is a program to export the OpenOffice Bibliographic database contents in RIS format (RIS Specification PDF). The OpenOffice database access code is based on Michael Sowka's RISImport.py programme.

OOoRISExport.py works with the OpenOffice.org UNO which means that that it must be installed in the openoffice.org program directory and use the version of python that is installed as part of openoffice. And OpenOffice must be running for the progamme to work. If Openoffice is not running you get a error message like this:

Saving RIS file: /home/dnw/test/test.ris
Traceback (most recent call last):
  File "/home/dnw/test/python/OOoRSIExport-1.py", line 203, in ?
    main(filename)
  File "/home/dnw/test/python/OOoRSIExport-1.py", line 115, in main
    ctx = resolver.resolve( url )
__main__.com.sun.star.connection.NoConnectException: Connector : couldn't connect to socket (Success)


So to run OOoRISExport.py you need to set the local directory to

openoffice.org2.0/program
python.sh OOoRISExport.py filename (for linux)
       or
python.bat OOoRISExport.py filename (for windows) I think .. I do not have 
the windows version.

OOoRISExport.py takes the export file path-name as a parameter so the complete command is like

dnw:/opt/openoffice.org2.0/program> ./python.sh /home/dnw/test/python/OOoRSIExport-1.py /home/dnw/test/test.ris

Details

Tag Mapping

The conversion method I have used to the RIS format is very basic. RIS allows many tags to flag data, but the Openoffice Bibliographic database has only a few database columns. Basically the program assigns the OOoBib columns - Identifier, Type, Address etc, the RIS tags - ID, TY, CY etc. in the dictionary structure called 'mappings':

mappings = {'Identifier':'ID','Type':'TY','Address':'CY','Annote':'KW',\
	'Author':'AU','Booktitle':'BT','Chapter':'CT','Edition':'ET', \
	'Editor':'ED','Howpublish':'M2','Institutn':'AD','Journal':'JO',\
	'Month':'Y2','Note':'N1','Number':'IS','Organizat':'AD',\
	'Pages':'EP','Publisher':'PB','School':'AD','Series':'T3',\
	'Title':'TI','RepType':'M3','Volume':'VL','Year':'PY',\
	'URL':'UR','Custom1':'AB','Custom2':'U2','Custom3':'U3','Custom4':'M2',\
	'Custom5':'M3','ISBN':'SN'}	

In particular I have not added name processing, what is in 'Author' goes into 'AU'.

If you want you can modify the two letter tags to better fit your needs. If you can suggest a better mapping to RIS tags let me know.

Character tidying

I had problems with non alpha-numeric characters (ie. CR LF) messing up the output so I stripped them out by replacing all types of non-white spaces with standard spaces - ' '.

Output character coding

You can set the character coding for the output by changing the setting for

CODEC = 'iso-8856-1'

Some of the possible encodings include utf-8, utf-16, ascii, iso-8856-1, unicode-escape. The RIS Specification recommends the 'Windows ANSI Character Set' which I guess is iso-8856-1 ???. I am not a expert on on this so I will take any advise.

Personal tools