Odt2txt.py

From Apache OpenOffice Wiki
Revision as of 04:42, 9 June 2007 by Jza (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Official page: http://www.freewisdom.org/projects/python-markdown/odt2txt.php

odt2txt.py is a Python script that converts Open Document Text (ODT) files to plain text. The output text is marked up using Markdown syntax, which preserves some of the most important formatting. In other words, you get the best of both worlds. It's text, so you can use your favorite text-processing tools, e.g.

   odt2txt.py myDoc.odt | less

On the other hand, enough formatting is preserved that the resulting text can be converted into HTML using Overview:

   odt2txt.py myDoc.odt > tmp.txt
   markdown.py tmp.txt > myDoc.html

You might want to have a look at a sample ODT document and the corresponding text and html files.

Status

The following ODT formatting is converted to corresponding Markdown syntax:

  • italics (becomes `_italics_`)
  • bold (becomes `**bold**`)
  • bold italics (`***bold italics***`)
  • simple ordered and unordered lists
  • block quotes (indented paragraphs become Markdown blockquotes)
  • code blocks (monospace paragraphs become Markdown code-blocks)
  • hyperlinks
  • footnotes

The following ODT features are not supported but hopefully will be soon:

  • simple tables
  • images

Installation and Usage

Download odt2txt.py then run it from the command line:

   python odt2txt.py myDoc.odt \> myDoc.txt

To convert it the file to HTML, use markdown.py:

   python markdown.py -footnotes myDoc.txt > myDoc.html

License

The code is dual-licensed under GPL and BSD License. Other licensing arrangements can be discussed.

Change Log

  • Aptil 7, 2006:* First version.
Personal tools