Sv.openoffice.org/OpenDocumentWhitePaper

From Apache OpenOffice Wiki
Jump to: navigation, search

OASIS/ISO OpenDocument Format

===Using open standards to promote competition and close the Digital Divide ===

© 2006 OpenDocument Fellowship Creative Commons license: Attribution ShareAlike 2.5

Executive Summary

1. Proprietary formats, those controlled by a single vendor, are generally designed to be used by one product and hence mean that users of the formats are tied or ‘locked-in’ to the one vendor. Open international standard formats are publicly available and implementable and are agreed upon by many interests. They can be used by many products and are designed to be vendor-neutral.

2. The OpenDocument format (ISO 26300) is the only open international standard intended or suitable for office documents.

3. This paper looks at the need for such a standard, the history behind the creation of ISO 26300 and the support for it by both applications and governments worldwide. This paper focuses on how OpenDocument meets government policies to increase access to digital information and to close the digital divide. It then suggests policies for the support of ISO 26300 in government agencies, in accordance with the government mandate to support international standards.


Proprietary formats vs open standards

1. Most people have had the experience of not being able to open a document that they have received or else the document opened with formatting errors. It might have been because the recipient and senders use different office products (for example, one might use Microsoft Word and the other might use Corel WordPerfect or OpenOffice.org Writer), or they might just have different versions of the same product. Most people have come to accept this incompatibility as a fact of life, but is it?

2. When you send an email, you don’t have to know what email software the recipient has. When you purchase a music CD, you don’t wonder whether it is compatible with your CD player at home. Why do we have to worry about compatibility with office software but not with email software? The reason is that the former uses proprietary formats and the latter uses open standards.

3. Proprietary formats are controlled by a single vendor. They are generally secret and are universally designed to be used in only one product. Because of this, competitors cannot read the format effectively. Hence, they cause vendor lock-in1.

4. Open International Standards2 are those approved by a treaty organization such as ISO, which has nations rather than companies as its voting membership. These formats are agreed upon by several different interests in a vendor-neutral fashion and are designed for interoperability.

5. Most governments already have a mandate3 to use open international standards, such as OpenDocument (ISO 26300), whenever possible. There is good reason for this: proprietary formats skew the market in favour of one vendor and tend to cause monopolies. Contrast the proliferation of email software (Outlook, Eudora, Gmail, Hotmail, Thunderbird, Apple Mail, Lotus Notes, Pegasus, Sylpheed, Netscape, etc) with the limited choice in office software (Microsoft Office has most of the market).

6. Proprietary data formats can be a particular problem when the monopoly belongs to a foreign company. Technological dependency on a monopoly outside the sovereignty of a nation presents significant risk in terms of both national security and the national economy: money spent on a foreign monopoly does not create local jobs or promote local economic growth.

Data formats and the Digital Divide4

7. The principal effect of proprietary data formats is vendor lock-in. If most people use Microsoft Office and that product uses a proprietary format, that puts pressure on everyone else to use the same product. This in turn creates a software monopoly or near monopoly. As everyone is forced to

1 See “vendor lock-in” in the glossary. 2 See “GATT” and “open international standard” in the glossary. 3 See “GATT” in the glossary. 4 See “Digital Divide” in the glossary.

- 2 - use the same product, those people who cannot afford to pay the monopoly rent for the product are excluded from the digital world.

8. In contrast, open standards help reduce the digital divide. Open standards open the door to products that do not disadvantage those in the lower income brackets. Unlike proprietary file formats, OpenDocument is supported by many software products, many of which can be downloaded from the Internet at no cost at all, thus bringing them within reach of those who can not afford expensive proprietary software products like Microsoft Office. How governments inadvertently contribute to the office software monopoly and the digital divide

9. Most governments inadvertently contribute to skewing the market in favour of one vendor and, in turn, widen the digital divide. This happens whenever a government unintentionally requires the public to purchase from the software monopoly in order to communicate with government officials. There are many examples of this:

10.In the United Kingdom, the Specialist Schools Programme uses a complex application template in .doc format. Any school or consultant participating in this programme must purchase Microsoft Office to apply or else risk formatting errors when transferring the document between different computers.

11.The Chilean government, like many others, publishes its bills in .doc format. Only people who purchase Microsoft Office can reliably read these files. Indeed, citizens using older versions of MS Office might be forced to upgrade to a newer version in order to stay up to date with government laws.

12.Job applications for government positions often require CVs in .doc format. People who choose a software product other than Microsoft Office may not be able to apply to government positions, or may have to go through a more complicated process.

13.Grant applications often have to be in .doc format. A grant applicant must buy from Microsoft in order to apply for a grant.

14.The list is endless. The simple fact is that almost everybody must interact with the public sector. When the public sector inadvertently forces the public to use a specific product, it is helping build a private monopoly. The result is that citizens who, for one reason or another, do not have a copy of this particular product become disenfranchised.

What can be done?

15.The situation is not hopeless. Indeed, the problem is not difficult to correct. Simply enable the public to communicate with the government using an open standard format that any software maker can support reliably.

16.By publishing public documents in an appropriate open standard format, and by allowing the public to submit documents using an open format, governments place all software makers on an equal footing. Those in lower income brackets can use low-cost software that supports the chosen open standard, hence reducing the digital divide. Which format to choose?

17.It is impractical to support every open standard format that exists. Indeed, many such formats may not be suitable for government publications or may not have application support. Here are some characteristics to look for:

18.Features. Is the format able to represent all the complex documents that a government agency is likely to use? The format chosen should be able to represent complex text documents, spreadsheets, presentations, diagrams, and all the multitude of documents that a public sector agency must deal with.5

19.Industry support. To gain the most benefit, it is best to choose a standard that is widely supported by many applications. Since the goal is to encourage competition and close the digital divide, the ideal format should be supported by a wide spectrum of software applications.

20.ISO standard. ISO is the International Standards Organization. It is the largest international standard-setting body6. ISO brings an assurance of quality and longevity. In the European Union, only ISO standards are considered open standards. Most countries have a national standards body (ANSI in the USA, BSI in the UK, DIN in Germany, etc) who are ISO members, giving ISO standards the force of law.

OpenDocument Format

Introduction

21.The OASIS OpenDocument Format (ODF) is an open, XML7 based file format for office documents. It includes support for text documents, spreadsheets, drawings, presentations and more. OpenDocument is developed by OASIS, an independent standards group. OpenDocument is also an ISO standard, ISO 26300.

22.The OASIS Technical Committee in charge of developing the OpenDocument format is comprised of experts from a wide range of backgrounds. The format was developed by representatives from software vendors (IBM, Sun Microsystems, Core, Arbotex, etc.), volunteer organizations (KDE, OpenOffice.org, OpenDocument foundations), large customers with particular needs (Boeing, the Society for Biblical Literature, archivists) and government bodies (the National Archives of Australia). As such, OpenDocument forms a wide industry consensus. It is designed for interoperability and to meet the most demanding needs.

23.OpenDocument is freely available for software makers to use and implement and does not favour any one vendor over all the others. The specification can be obtained from ISO, as ISO 26300.

5 For a comparison of open standards, see Appendix A. 6 See “ISO” and “open standard” in the glossary. 7 XML is the eXtensible Markup Language. See “XML” in the glossary.

History

24.The OpenDocument format has a 5-year history. Here is a basic timeline:

25.2001: Sun Microsystems develops an XML-based file format to be used in its office suite StarOffice and its open source off-shoot OpenOffice.org. The format includes support for text documents, spreadsheets, vector graphics and presentations.

26.2002: Sun donates the OpenOffice.org 1.0 format to OASIS. OASIS creates a Technical Committee (TC) to develop an international standard for office documents. KDE and Corel join the TC.

27.2003: The European Union Telematics between Administrations Committee8 (TAC) commissions Valoris to report on open file formats. The Valoris consulting group is tasked with exploring the use of open standards as a way to create a more competitive marketplace.

28.2003: OpenOffice.org and KOffice both commit to making the OASIS format their native (default) file format.

29.2004: The Valoris report is published.

30.2004: The TAC issues a set of recommendations9, in particular noting that, "Because of its specific role in society, the public sector must avoid [a situation where] a specific product is forced on anyone interacting with it electronically. Conversely, any document format that does not discriminate against market actors and that can be implemented across platforms should be encouraged. Likewise, the public sector should avoid any format that does not safeguard equal opportunities to market actors to implement format-processing applications, especially where this might impose product selection on the side of citizens or businesses. In this respect standardisation initiatives will ensure not only a fair and competitive market but will also help safeguard the interoperability of implementing solutions whilst preserving competition and innovation.“

31.The TAC included the following recommendations: • Recommendations for OASIS: • Add custom-defined schemas to OpenDocument. • Submit the format to ISO to become a formal standard. • Recommendations for other industry players: • Participate in the OpenDocument standardization process. • Include support for OpenDocument in products. • Provide tools to help the public sector migrate its documents to XML formats. • Recommendations for the general public: • Provide documents in multiple formats, or alternatively, in an open format with industry consensus and adoption.

32.2005: OASIS adds support for custom schemas to OpenDocument in the form of the XForms W3C standard and submits OpenDocument to ISO.

8 See “TAC” in the glossary. 9 http://ec.europa.eu/idabc/en/document/3439/5585

33.May 2006: ISO approves the format and it becomes ISO 26300.


Current status of OpenDocument

34.The OpenDocument format is the only international open standard for office documents (ISO 26300). It enjoys rapidly growing support from the software industry and government bodies.

Application support

35.The OpenDocument format is supported by every major industry player including IBM, Sun Microsystems, Novell, etc. Microsoft, under market pressure, has begun work on a plugin to allow Microsoft Office to read and write OpenDocument files.

36.OpenDocument is now the native (default) file format of six office suites10 covering the five most popular operating systems11. The OpenDocument Fellowship maintains a list with over 30 applications that either support OpenDocument natively or are working on OpenDocument support12. This list includes traditional office suites, web-based products (such as Google Writely), desktop publishing applications, content management systems, search utilities, and others.

Government support

37.The OpenDocument format has also seen rapid uptake in governments around the world. Since becoming an ISO standard (ISO 26300) uptake has only accelerated.

38.The US state of Massachusetts has decided to migrate the IT infrastructure of its Executive branch to OpenDocument. Starting in January 2007, all internal documents will be in OpenDocument format. Naturally, the state will also be in a position to accept files from the public in OpenDocument format.

39.Starting in September 2006, the Danish Ministry of Science will publish all its documents in OpenDocument format. In addition, the Danish parliament has approved a bill recommending the use of open standards in government whenever possible.

40.In Belgium, ISO 26300 (OpenDocument) is now the standard format for internal communication within the government. Starting in September 2008, all document exchanges within the Belgian government will have to be in OpenDocument format.

41.OpenDocument is set to become a national standard in Malaysia by the end of 2006. In July 2006 Malaysia's standards body voted to propose ODF as a national standard, the approval process will last until year’s end.

10OpenDocument is the native (default) file format of Sun’s Star Office and OpenOffice.org, IBM Workplace, KDE’s KOffice, Mobile Office (for mobile devices) and Neo Office.

11There are OpenDocument-based office suites for Windows, Linux, Mac OS, Solaris (Unix) and Symbian (for mobile devices).

12http://opendocumentfellowship.org/applications

Proposal: Offer public documents in ODF format

42.It is important that the public be able to access government documents without having to purchase from one particular vendor. Public documents include tax forms, grant applications, tenders, government reports, bills, etc. When a citizen depends on a particular private vendor in order to communicate with the government, the result is a distorted market.

43.In 2005, hurricane Katrina, a category 5 hurricane, devastated the city of New Orleans, Louisiana in the United States. It was the costliest and one of the deadliest hurricanes in the history of the USA. When Katrina victims went to seek relief through on-line forms, many found that they could not qualify for relief because the on-line form required the use of Microsoft's Internet Explorer.

44.Is it really justified that only customers of a certain company can apply for hurricane relief? No. This is both morally wrong, and illegal13. No government should require the purchase of a specific product to access public services. But this is exactly what happens every time a government agency publishes its documents (grants, tenders, etc) in a format that is tied to one vendor.

45.Some governments are starting to realize the role that open international standards have to play. For example, from September 2006 onwards, all documents published by the Danish Ministry of Science will be available in the OpenDocument format. This is precisely the type of policy that all agencies should adopt. By choosing ISO 26300 (OpenDocument) all suppliers are placed on an equal footing and no group is disenfranchised.

46.In addition to their WTO obligations14, most governments have in place other policies that support the user of OpenDocument. For example, the United Kingdom’s e-Government Interoperability Framework (e-GIF) specifically adopts XML as “the core standard for data integration and presentation”. As an XML-based ISO standard, OpenDocument fits perfectly within the e-GIF policy. At the European Union level, the TAC15 specifically recommends the use of the ISO 26300 OpenDocument format16.

47.In light of all this, we recommend a policy that reads: All public agencies must publish their public documents in the OpenDocument format (ISO 26300) standard. This does not preclude the agency from publishing in other formats (such as Portable Document Format), but ISO 26300 must be an option. Furthermore, it must not be more difficult to obtain an ISO 26300 copy of public documents than it is to obtain a copy in another format.

Proposal: Accept documents in ODF format

48.It is important that citizens be able to choose the software they use to communicate with government entities. One of the most important ways

13See “GATT” in the glossary.

14See “GATT” in the glossary.

15See “TAC” in the glossary.

16http://ec.europa.eu/idabc/en/document/3439/5585

to enable choice is for the government agency to accept documents in OpenDocument format.

49.Before preparing a policy of accepting documents in OpenDocument format, there is an important subtlety to discuss. As an example, we will consider the policy set up by the British Educational Communications and Technology Agency (Becta). Becta is the UK government's key partner in developing IT policy for schools.

50.In 2005 Becta published new guidelines for schools. In the new policy, schools must use software that can save files in an open format. The policy reads: “There is a vast array of [office applications], which are often packaged together as an ‘office suite’. Many of the formats that applications save data to are proprietary... Any office application used by institutions must be able to be saved to using a commonly agreed format that ensures that an institution is not locked into using specific software.”

51.Becta's list of approved open formats is: For text documents: OpenDocument (.odt), plain text, RTF17. For spreadsheets: OpenDocument (.ods), CSV. For presentations OpenDocument (.odp), HTML, SMIL.

52.While this is a well-meaning policy, it fails to have the desired outcome. The reason for this is subtle: most schools use Microsoft Office, which can read and save HTML, plain text, RTF and CSV. Hence, MS Office satisfies Becta’s policy. So far so good. The problem is that these formats are only suitable for the simplest documents. If a pupil makes a newsletter and saves it as RTF or TXT, all of the formatting and the images are lost. The only open format that is suitable for this is OpenDocument, but the school is not required to accept those (it merely has to accept some open format). As a result, the pupil still needs to buy a copy of MS Office and submit the newsletter in the proprietary .doc format.

53.The OpenDocument format is the only editable open standard that is capable of representing all of the documents that can be produced by office software. Whether it’s newsletters, presentations, or vector drawings, OpenDocument will work. Because OpenDocument is the only open standard that can express complex documents with high fidelity, it is necessary that government agencies (and schools) accept documents in the OpenDocument format.

54.We recommend a policy that reads: All institutions must be able to accept files stored in the OpenDocument format (ISO 26300) standard. While it is not required that the agency use an OpenDocument-based application as its primary software, the agency must have available software that can read ISO 26300 documents and accept such documents from the public. 17Note that RTF (Rich Text Format) is not an open standard, but Becta chose to include it because it is documented and has a high degree of application support.

Appendix A: Comparison of open standard formats

The OpenDocument format (ISO 26300) is not the only open standard data format. What makes it stand out is that it is the only open standard intended or suitable for office documents. Other open standard formats include the ISO versions of the Portable Document Format (PDF), HTML and plain text. PDF is a high-quality format, but it is not intended to be an editable format, hence it is not suitable for day to day office applications. HTML and plain text are editable, but they can’t represent the complex formatting found in office documents with any degree of fidelity. This relationship is summarized in the following table:

Common name Official name Editable High fidelity OpenDocument ISO 26300 yes yes PDF/A ISO 19005 no yes HTML ISO 15445 yes no Plain text ISO 8859 yes no

Therefore, for situations that require non-trivial documents (more complex than a letter) which must be edited later, OpenDocument is the only suitable open standard. For example, to apply for a grant in most agencies, you must submit the application in an editable format so that the agency can add notes and suggestions, which the applicant must then act upon. When a government agency works with a private sector company on a joint proposal, the document will be sent back and forth many times to be edited and re-edited. For these applications, the OpenDocument format is the only alternative that does not give preference to the products from a single company.

Glossary

Digital Divide The Digital Divide is the gap between those with regular, effective access to digital technologies and those without. The term was coined by Larry Irvin, former US Assistant Secretary of Commerce, and is now used in government policy throughout the world. It is a reference to those groups who rather than being empowered by technology, become disenfranchised and lose access to services available to the rest of society. Some groups particularly at risk include lower income households and senior citizens. See http://en.wikipedia.org/wiki/Digital_divide GATT The General Agreement on Tariffs and Trade (GATT) is an international treaty, under the auspice of the World Trade Organization (WTO). It establishes rules for international trade in goods and services. Under GATT, an international standard is one approved by a treaty organization such as ISO, which has nations rather than companies as its voting membership. Through GATT, members of the WTO have a mandate to use open international standards, such as OpenDocument (ISO 26300), whenever possible: Article VI of the Agreement on Government Procurement: http://www.wto.org/english/res_e/booksp_e/analytic_index_e/gpa_02_e.htm Section 2: Technical specifications prescribed by procuring entities shall, where appropriate, be based on international standards, where such exist. Section 3: There shall be no requirement or reference to a particular trademark or trade name. For example, requiring tenders in Microsoft WordTM files would conflict with section 3. Article II of the Agreement on Technical Barriers to Trade. http://www.wto.org/english/docs_e/legal_e/17-tbt_e.htm Section 2.4: Where technical regulations are required and relevant international standards exist... Members shall use them... as a basis for their technical regulations [except when said standard is inappropriate]. See also: ISO, open international standard. HTML The HyperText Markup Language (HTML) is the format used in web pages. HTML is maintained by the W3C and is an ISO standard (ISO 15445). It is a low-bandwidth, low-fidelity format, leaving much to the interpretation of the web browser (this is partly why web pages can look different on different browsers). It also does not support typical office documents such as spreadsheets and vector diagrams. See also: W3C. IDABC IDABC stands for Interoperable Delivery of European eGovernment - 10 - Services to public Administrations, Businesses and Citizens. The goal of this programme is to improve the exchange of information using ICT within the public sector and between the public and private sectors in the European Union. See also: TAC. ICT Information and Computer Technology (ICT) is the art of using computers to aid the information process. It deals with the use of computers and software to manipulate and transfer information. Interoperability Interoperability is the ability of two different products to work together. If Alice sends a document to Bob, and Bob can read it without any loss of information, they have interoperability. See also: open standard; proprietary format. ISO The International Organization for Standardization (ISO) is the largest international standard-setting body. It is composed of representatives from national standards bodies. The organization produces world-wide industrial and commercial standards, the so-called ISO standards. ISO is unique among standards bodies in that ISO standards have the force of law through treaties or through national standards bodies that are members of ISO. See also: GATT, open international standard. ISO 26300 The ISO code for the OpenDocument format, an open XML-based format for office applications. It includes support for text documents, spreadsheets, vector graphics, presentations and other features found in standard office documents. See also: open international standard. KDE The K Desktop Environment (KDE) is a large volunteer-based open source project. It produces desktop software for the Linux operating system. KDE participates in the OASIS Technical Committee (TC) that maintains the OpenDocument format. David Faure represents the interests of KDE in the OASIS OpenDocument TC. See also: KOffice. KOffice KOffice is an office suite for the K Desktop Environment (KDE). One of its developers, David Faure, serves in the OASIS Technical Committee (TC) that maintains the OpenDocument format. KOffice uses OpenDocument as its native (default) file format. - 11 - See also: KDE. OASIS The Organization for the Advancement of Structured Information Standards (OASIS) is a global consortium that drives the development, convergence and adoption of e-business and web service standards. It is the standards body that hosts the OpenDocument Technical Committee, in charge of developing the standard. See also: open standard. Open international standard Under GATT, an international standard is one approved by a treaty organization such as ISO, which has nations rather than companies as its voting membership. As a result, these international standards are agreed upon by several different interests. The objective is to reach a consensus in the data format and to maintain interoperability. Each government has unique laws regarding open standards. For example, the only standards body recognized by the European Commission is the International Organization for Standardization (ISO). Most national governments have a national standards body which also represents that government at ISO. For example, in the USA this role is served by the American National Standards Institute (ANSI), in the UK it is the British Standards Institute (BSI), and in Germany it is the Deutsches Institut für Normung (DIN). All of these entities are ISO members and represent their respective countries at ISO. See also: GATT; interoperability; ISO; proprietary format. OpenOffice.org OpenOffice.org is an open source office suite maintained by Sun Microsystems with a volunteer component. Sun built an XML based file format for OpenOffice.org version 1.0 and later submitted it to OASIS who then used it as a starting point for the creation of a new open standard for office applications. The new format was named OpenDocument format. PDF The Portable Document Format (PDF) is a file format developed by Adobe Systems for two-dimensional documents. While PDF itself is not an open standard in the strictest sense, some subsets of it are. For example, PDF/A is a subset of PDF intended for long-term archival use and is maintained by ISO as ISO 19005. The main drawback of PDF is that it is not intended for editable documents, but as a final-output format. Hence, it is not suitable for day to day office work. Plain text A computer file “in plain text” is one that contains only ordinary textual characters. A Word document without any images is not “plain text”. Word files are encapsulated in a proprietary format, so that they contain a lot of binary information beyond the characters you see on the screen. For that matter, an OpenDocument file without any images is also not plain - 12 - text. Plain text is a specific format, maintained at ISO, known as ISO 8859. Proprietary format A proprietary format is a data format controlled by a single vendor. It is generally secret, and universally designed with only one product in mind. Proprietary formats are never designed to ease interoperability; they are usually designed to prevent it. As a result, competing products are unable to support the format effectively. Proprietary formats result in vendor lock-in. As competing applications are unable to read the data format reliably, the consumer loses the ability to choose suppliers freely. See also: interoperability; open standard; vendor lock-in. RTF Rich Text Format (RTF) is a proprietary document format developed by Microsoft in 1987. While being proprietary, this format is documented, with a public specification available. The main drawback of RTF (besides being controlled by one company) is that it cannot represent the complex documents that are used in today's office environment. RTF cannot represent spreadsheets, presentations or vector drawings for example. TAC The European Union's Telematics between Administrations Committee (TAC) is a committee composed of representatives of the EU member states. TAC works with the European Commission on the IDABC programme, with the goal to ease the exchange of digital information across Europe. See also: IDABC. Vendor lock-in In economics, vendor lock-in is a situation in which a customer is so dependent on a vendor for products and services that the customer cannot move to another vendor without substantial costs. This creates a situation which favours the vendor at the expense of the customer. Vendor lock-in creates a market barrier to entry for competing products. In the software world, vendor lock-in can arise from the use of proprietary data formats. Because the format is kept secret, competing products cannot read it reliably, and some times not at all. This creates a significant switching cost as the customer's data can only be read reliably by one product. See also: proprietary format. WTO The World Trade Organization (WTO) is an international, multilateral organization, which sets the rules for the global trading system. Members of WTO are all signatories to its approximately 30 agreements, including the GATT, which provides a mandate for the use of open international standards such as the OpenDocument format (ISO 26300). - 13 - See also: GATT. W3C The World Wide Web Consortium (W3C) is an international consortium that develops standards for the World Wide Web. The Consortium is headed by Tim Berners-Lee (the “inventor of the web”). Its mission is “To lead the World Wide Web to its full potential by developing protocols and guidelines that ensure long-term growth for the Web”. The Consortium is responsible for the standards (like HTTP, HTML and XML) on which the World Wide Web is based. See also: XML. XML The eXtensible Markup Language (XML) is a W3C standard for creating special-purpose file formats. It is capable of describing many different kinds of data. Its primary purpose is to facilitate the sharing of data across different systems, particularly systems connected via the Internet. Languages based on XML are defined in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form. It is important to note that XML is not itself a file format, but a set of rules for making file formats. Using XML to make a format does not automatically make the format an open standard. Think of XML as a language, like English. English is “open” in the sense that everyone can learn the rules; it is not secret. But using English to write a novel does not automatically make the novel “open” in any sense. See also: W3C. - 14 -

Personal tools