Bibliographic/Hints and Tips/OOoRISExport.py

From Apache OpenOffice Wiki
< Bibliographic‎ | Hints and Tips
Revision as of 02:21, 19 November 2006 by Dnw (Talk | contribs)

Jump to: navigation, search

Description

OOoRISExport.py is a program to export the OpenOffice Bibliographic database contents in RIS format (RIS Specification PDF). The OpenOffice database access code is based on Michael Sowka's RISImport.py programme.

OOoRISExport.py works with the OpenOffice.org UNO which means that that it must be installed in the openoffice.org program directory and use the version of python that is installed as part of openoffice. And OpenOffice must be running for the progamme to work. If OpenOffice is not running you get a error message like this:

Traceback 
[........]
__main__.com.sun.star.connection.NoConnectException: Connector : couldn't connect to socket (Success)


So to run OOoRISExport.py you need to set the local directory to

openoffice.org2.0/program
python.sh OOoRISExport.py filename (for linux)
       or
python.bat OOoRISExport.py filename (for windows) I think .. I do not have 
the windows version.

OOoRISExport.py takes the export file path-name as a parameter so the complete command is like:

/openoffice.org2.0/program> ./python.sh /OOoRSIExport-1.py /temp/test.ris

Debug

In the progam code I have set

DEBUG = True

so that the output can be seen when you run the program via the command line. You can turn this off by setting

DEBUG = False

Details

Tag Mapping

The conversion method I have used to the RIS format is very basic. RIS allows many tags to flag data, but the Openoffice Bibliographic database has only a few database columns. Basically the program assigns the OOoBib columns - Identifier, Type, Address etc, the RIS tags - ID, TY, CY etc. in the dictionary structure in the program called 'mappings':

mappings = {'Identifier':'ID','Type':'TY','Address':'CY','Annote':'KW',\
	'Author':'AU','Booktitle':'BT','Chapter':'CT','Edition':'ET', \
	'Editor':'ED','Howpublish':'M2','Institutn':'AD','Journal':'JO',\
	'Month':'Y2','Note':'N1','Number':'IS','Organizat':'AD',\
	'Pages':'EP','Publisher':'PB','School':'AD','Series':'T3',\
	'Title':'TI','RepType':'M3','Volume':'VL','Year':'PY',\
	'URL':'UR','Custom1':'AB','Custom2':'U2','Custom3':'U3','Custom4':'M2',\
	'Custom5':'M3','ISBN':'SN'}	

In particular I have not added name processing, what is in 'Author' goes into 'AU'.

If you want you can modify the two letter tags to better fit your needs. If you can suggest a better mapping to RIS tags let me know.

Type Mapping

I had similar problems with mapping the document types. I have used the mapping as follows - it is hardly optimal.

TypeMapping = {0:'CHAP', 1:'BOOK', 2:'PAMP', 3:'CONF', 4:'CHAP',5:'CHAP', 6:'CONF',\
               7:'JOUR', 8:'RPRT', 9:'THES', 10:'BOOK', 11:'THES', 12:'CONF',\
	       13:'RPRT', 14:'UNPB', 15:'ICOMM', 16:'ICOMM', 17:'BOOK', 18:'BOOK',\
	       19:'BOOK', 20:'BOOK', 21:'BOOK'}

The OpenOffice bibliographic type coding is:

ARTICLE = 0; BOOK = 1; BOOKLET = 2; CONFERENCE = 3; INBOOK = 4;
INCOLLECTION = 5; INPROCEEDINGS = 6; JOURNAL = 7; MANUAL = 8;
MASTERSTHESIS = 9; MISC = 10; PHDTHESIS = 11; PROCEEDINGS = 12;
TECHREPORT = 13; UNPUBLISHED = 14; EMAIL = 15; WWW = 16;
CUSTOM1 = 17; CUSTOM2 = 18; CUSTOM3 = 19; CUSTOM4 = 20; CUSTOM5 = 21;

See page 6 of the RIS specifications for the full list RIS document types. I would be happy to get advice on improving this.

Character tidying

I had problems with non alpha-numeric characters (ie. CR LF) messing up the output so I stripped them out by replacing all types of non-white spaces with standard spaces - ' '.

Output character coding

You can set the character coding for the output by changing the setting for

CODEC = 'iso-8856-1'

Some of the possible encodings include utf-8, utf-16, ascii, iso-8856-1, unicode-escape. The RIS Specification recommends the 'Windows ANSI Character Set' which I guess is iso-8856-1 ???. I am not a expert on on this so I will take any advise.

Personal tools