CalcParser

From Apache OpenOffice Wiki
Revision as of 17:48, 5 April 2007 by Jza (Talk | contribs)

Jump to: navigation, search

So this code mision in life is to parse the XML of OpenOffice.org spreadsheet. The code still needs more work. However this code will work with python 2.3.4 which is the version included in the OpenOffice.org installation.

The code uses the SAX (Simple API for XML) is a parser originally written in Java but ported to other languages such as Python.

First thing I faced here, is the change of the SAX library between versions, the original research before I create this code, was for version 2.4 and up which wasn't compatible with 2.3.

Also some of this code was for the Excel XML file scheme and I need to switch it to the OpenDocument spreadsheet XML.

The incompatibility problem was primarily with the handlers, the DefaultHandler originally used was deprecated on this libary and was handle on a separate sub-module called handler. Also it changed from DefaultHandler to ContentHandler. Please check the comment on the code.

[python]

  1. !/bin/env python

import sys from xml.sax import saxutils #originally in python 2.4 from xml.sax import parse from xml.sax import handler # Python 2.3 uses the handler for contentHandler

The next step and more interesting was to inser the proper tags. SAX uses by default 3 definitions, startElement, endElement, characters.

[python]

  1. !/bin/env python

import sys from xml.sax import saxutils from xml.sax import parse from xml.sax import handler # Python 2.3 uses the handler

  1. Replace DefaultHandler with ContentHandler
  2. from the handler modules

class CalcHandler(handler.ContentHandler):

   def __init__(self):
       self.chars=[]
       self.cells=[]
       self.rows=[]
       
   def characters(self, content):
       self.chars.append(content)
   def startElement(self, name, atts):
       if name=="table:table-cell":
           self.chars=[]
       elif name=="table:table-row":
           self.cells=[]
   
   def endElement(self, name):

if name=="table:table-cell":

           self.cells.append(.join(self.chars))
       elif name=="table:table-row":
           self.rows.append(self.cells)

calcHandler=CalcHandler() parse(sys.argv[1], calcHandler) print calcHandler.rows

Personal tools