Chart2

From Apache OpenOffice Wiki
Revision as of 14:49, 17 January 2007 by Bm@openoffice.org (Talk | contribs)

Jump to: navigation, search

Chart2 is a sub-project of the OpenOffice.org Chart Project. Our goal is to develop a new Chart component for (presumably) OOo 2.3.

Charts are used for visualizing data sets from e.g. spreadsheets by two and three dimensional diagrams. There are a lot of different two- and three-dimensional chart types you can choose from. This page gathers information about the new chart implementation of OpenOffice.org. It is especially written to help new comers to the process of developing in the new OOo chart module.

If you would like to participate, if you have comments or questions related to the chart you are welcome on the graphics mailing lists: (users,dev,features,bugs,cvs)@graphics.openoffice.org. See also the section at the bottom of this page called [Some useful information].

Development in the New Chart

As the information on how to compile the new chart and on how to develop in this project have become quite lengthy, they can be found now on separate pages:

Open Technical Issues in the New Chart

There still some architectural issues left that have to be solved. This section serves for showing those problems and showing the progress in finding solutions.

Application-Framework-Related Problems

Identify the XDataProvider reliably

If the chart is embedded in a container document that provides the data for this chart, the chart contains range-strings at several places in its conten.xml stream that are understood by the container, so that the data can be located there. In Calc such a string would look like this: "Sheet1.A2:A7". Such range-strings can appear in the chart:plot-area element of a chart or at a chart:series, chart:categories or chart:domain element.

If a chart has its own data, this data is stored in a local table in the XML-file. The mentioned range-strings are then of a similar form, with the fixed table-name "local_table". But, the local table is also stored, as a kind of cache, if the data comes from outside. Therefore the existence of the local table does not imply that a chart has own data.

So, neither the existence of local data, nor the existence of range-addresses (and even the content, considering that it is allowed that a sheet may also have the name local_table) determines whether or not a chart uses own data or data from the container document.

In the old chart implementation, the chart always reads the internal data and uses it. If the data comes from the container, the container has an additional attribute that contains the complete data-range as one string. It then sets this string at the chart, which gives the chart the knowledge that data comes from outside (actually, in the old implementation, there is always local data. If data in Calc changes, the Calc prepares an SchMemChart object that contains the new local data and sets this object at the chart).

In the new implementation, we must to get rid of the badly-designed approach, as we no longer have one complete address-range for the entire chart.

Solution: We have to introduce a new XML-attribute at the chart that identifies the data provider. This may be a flag denoting "own data" or "external data", or even a URL, or the like, that uniquely identifies the data provider document.

Problem: Currently, it is necessary that when loading a chart, the container document is set as XParent before the chart itself is loaded. Otherwise there is no data provider, and the ranges are not valid for the internal data provider, and therefore are forgotten. A later setting of the data provider does not re-establish the correct range-strings.


Update of the Chart on changed Data

When a chart gets its data from the container document, and the data changes, the changes are notified via a listener mechanism. That is, when the chart queries for data at the XDataProvider, it gets XLabeledDataSequences. At those XLabeledDataSequences it starts listening for changes. If a change event is fired, the chart sets itself as modified, which forces the view to repaint, as it listens to model changes exactly via this mechanism.

However, if a Calc document containing Charts is loaded, the charts are either not loaded immediately because they are not visible, or because the contain a replacement image that shows the last rendered view.

In both cases, changes in the spreadsheet, like a change in data (which should be visible via a newly rendered view) or moving around ranges (which must change the ranges that are stored in the model), are not propagated to the chart, as it is not loaded and therefore is not listening.

In the old implementation, again, the fact that the Calc knew the data-range for each chart, solved this problem. Because the Calc could load all unloaded charts affected by the change, and then notify it. However, this requires internal knowledge of the chart to be exposed to the container document.

Solution: When an OLE-object is loaded, it only shows the replacement image. This state is called the "LOADED" state. This means only a small "stub" is loaded, not the entire object. The XModel of an object (including the entirely applied content of the XML-streams) is loaded in the "RUNNING" state. So when all charts would initially be set into the RUNNING state, this would ensure the existence of all XModels, which would allow the listening to work.

See also: Chart2 Single-Click Concept. There, the issue of loading models at the right time is also handled.

Problem:: It has to be evaluated, if loading the XModel objects of all charts in a document might take too long in loading a document, or if it consumes too much memory.

Context Menus via Framework API

The ChartController implements the XDispatchProvider that allows to find out whether menu entries are enabled or not via queryDispatch. The menu is created via an XML file that is defined in chart2/uiconfig.

The content of the context menu, of course, depends on the context, so a generic mechanism (like an xml-file) for defining it may not always be possible, but in most cases is. Anyway, it should be possible to use context menus via UNO API, and also share the XDispatchProvider to find out if certain commands are available or not. Disabled menu items are not shown in context menus, but this should be handled by the implementation of the popup-menu.

Currently, the menu shows icons next to some menu entries, but the context menu doesn't. It should not be necessary to take care of the icons in the applications. They should work also via the framework.

Solution: In the long run, the context menus should also be possible to create via xml-files. Maybe with the possibility to dynamically extend them with more entries (maybe needed for a list of corrections for auto-correction of mis-spelled words). As a short-term solution, there will be a function offered by the framework to pass an XPopupMenu and an XFrame which will take care of enabling commands via the dispatch found at the frame.

SOLVED in Milestone 9.

Range Selection Component

In the chart there is a dialog for assigning range-addresses to data series, categories or to the whole chart. The range-addresses must be in the correct format of the XDataProvider. Instead of typing tediously long and cryptic strings into the edit-field, a user rather wants to use the mouse (or keyboard) to select the ranges visually in the Calc.

For this purpose the XDataProvider offers a method to retrieve a sheet.XRangeSelection via the getRangeSelection() method. This component provides a little dialog that contains the range-address and the possibility to select ranges with the mouse (which are then put into the dialog's edit field). The content of the little dialog can be queried by the chart (which opened the range selection tool) and it can then put the string into the edit field in the chart's own dialog.

The problem here is, that charts are OLE-objects. The chart has to be in the UI_ACTIVE state to be able to open the dialog. When you press the button to open the range selection tool and click into the Calc spreadsheet, this action results in an automatic deactivation of the chart to the RUNNING state (which closes the dialog?) and activation of the Calc, which means the Calc is now UI_ACTIVE.

To be UI_ACTIVE means also that the content of the menu bar as well as the existence and content of toolbars changes. In the case of the toolbars, this may also lead to a shift of the visual area of the document, and therefore a shift of the chart object.

What we need here is a way to stay active in some sort in the chart (maybe ACTIVE instead of UI_ACTIVE) and be still able to select a range of cells in the Calc.

SOLVED in Milestone 9.

Calc- and Writer-Related Problems

The com.sun.star.table.XTableCharts object is based on the old chart

When you want to insert a new chart into Calc or Writer, this can be done via the com.sun.star.table.XTableChartSupplier interface that returns an XTableCharts container. At this container there is a method:

void addNewByName(
   [in] string                            aName,
   [in] ::com::sun::star::awt::Rectangle  aRect,
   [in] sequence< CellRangeAddress >      aRanges,
   [in] boolean                           bColumnHeaders,
   [in] boolean                           bRowHeaders );

You need a sequence of CellRangeAddresses to give a range and the parameters whether to use the first column or row for labels. In comparison to the XDataProvider method:

XDataSource createDataSource(
   [in] sequence< ::com::sun::star::beans::PropertyValue > aArguments );

the parameters in aArguments differ in the following way: instead of sequence< CellRangeAddress >, we only have one string. This is no problem, as there exists a conversion. But the parameter "DataRowSource" is missing. So you can only say if the first column or row is used for labels, but not if the data comes from rows or columns. Without this information, the other information is also useless. This renders the entire interface useless.

As a work-around, you can create the chart with some dummy data, and then attach new data using the data provider interface. However, a new interface, probably one that only gets the name and the rectangle would be more appropriate than the work-around.

Live-Preview when inserting a new Chart

In the old implementation, when a chart was inserted into Calc or Writer, you got a dialog that had its own preview in a little window. After pressing OK, the chart was created and inserted into the document.

In the new implementation, the container inserts a new chart first, then creates the wizard dialog (via UNO) and shows all changes directly in the newly inserted chart. When a user presses OK, everything is fine. But when a user presses Cancel, the inserted chart has to be removed again.

This approach is new to the applications. You insert an object (modifying the document), change it, and after pressing Cancel, you might get the unmodified document without the chart you had before. Currently, the implementations create Undo-Actions for inserting a chart. After pressing Cancel, you might still have an Undo or Redo action to get the chart back. That is not what you would expect at this place, although the undo-action might be the correct object to use when you want to make the chart insertion undone. Maybe, it is sufficient here to remove the undo/redo action from the stack of the undo-manager.

SOLVED in Milestone 10.

Drag & Drop in Calc: Identify Drag-Source

If you want to drag some selected cells in a spreadsheet and drop them on a chart, so that the chart uses this new data, the chart will get an XTransferable. The format that is contained in the XTransferable is currently the DDE format. The chart has to make sure that the dropped content comes from the same document or the same XDataProvider in particular. To do that, there must be a possibility to find out if an XTransferable belongs to the XParent of the current chart, or the XDataProvider which is attached.

One idea was to compare the "Topic" part of the DDE message with the title of the Calc document's window. Currently this is the same. At least during Drag & Drop this should not change either. So, we would need a way to query the title of the parent XModel in exactly the format that is contained in the DDE topic (for an unsaved document the name is Untitled1, Untitled2, ...).

Maybe there is a better approach. Something like

 XDataProvider::isValidTransferable( [in] XTransferable xObject ) 

This way, the data provider can implement what fits best to find out if the XTransferable comes from the same document.

Drag & Drop works only in ACTIVE mode

Drag & Drop only works when the chart is always ACTIVE. So, unless charts will be always ACTIVE, drag & drop does not work, as the drop target is not the chart but the surrounding embedded object. Alternatively, we need a mechanism that forwards the drop-request to the underlying OLE-object, which would require activation at least in this moment. This could be flag-driven, to avoid loading all OLE objects when hovering over them with drop content.

Graphic-Framework-Related Problems

The type of the com.sun.star.drawing.BitmapTable

For gradients, transparency-gradients, hatches, line-dashes and bitmaps there are tables in the chart model that contain those elements together with names. E.g., you may have a gradient with the name "My Gradient". When you set the "FillGradientName" Property of an object to "My Gradient", the object will display the corresponding gradient found in this table. The same holds for fill-bitmaps.

However, there is a problem as the table for bitmaps maps a name to a URL rather than to some real graphic object. So, when you add an element to this table, the graphic itself isn't used anywhere in the document at this time. As we have a graphic-manager that deals with all graphics, that keeps graphics only if they are used (if there is a refcount > 0), the grapghics are dropped in this case.

Solution: Change this map to a mapping from names (strings) to objects of type com.sun.star.graphic.XGraphic. As the existence of an XGraphic object ensures that the graphic is available in the graphic-manager, this will solve this problem. Only in the moment an element in this list is used in no model object and also removed from the list, the underlying graphic will be dropped by the graphic-manager.

  • see Issue 66558. Kai Ahrens works on this now. He will change the type of the bitmap table for all applications, as we now have the XGraphic API.

Chart Problems

Autoscaling of Text relative to Diagram Size

In the old chart implementation, when auto-scaling was on, a resize of a chart OLE object or the resize of the Window of a standalone chart resulted in a resize of the actual font sizes in the model, i.e. the model was changed. In the case of an OLE object this is ok, as this also changes the visual area, which is also a model property. However, resizing a document window should not modify the model.

To overcome this bad design, in the new chart we decided to add a reference size to font sizes in the model. So when auto-scaling should be on for an object, the size of the page (the whole chart, which equals the visual area size) is set as ReferencePageSize along with the font. So if the chart is resized, the view can calculate a new font size by comparing the new size of the page with this reference size and adapt the stored font size. So the font becomes bigger without having to change the model (apart from the fact that a resize of the OLE object modifies the model, because the visual area size is changed). If a font should not be scaled, the reference size is simply set to void. So far this concept is very good. It also solves the problem of getting the exact same font size after resizing the chart to an extreme (e.g. very small) and back to the original size again, which was a mess in the old chart.

However, there are objects where the fonts are supposed to scale with the diagram size rather than the page size, e.g. the axes. The problem here is that the diagram size may be void, which means the view has to calculate it by itself. The actual size of the diagram is therefore not known unless a view is created that does the auto-calculation. So, if you want to enable auto-scaling for an axis in a chart, you would have to set the actual diagram size as reference size at the model axis object. However, you don't know it when the diagram has an auto-size.

In the file-format we should also have the currently visible font size plus a boolean flag saying whether an object should auto-scale or not. So, here we have a similar problem. We would need to have a view that calculates the size of the diagram (if it has auto-size), after a chart was loaded, in order to translate the boolean flag into a fitting diagram reference size. And also for saving, we would need the current diagram size for calculating the currently visible font size.

So, for case that the diagram is auto-sized, to avoid having a view for switching auto-scale on for diagram-dependent font sizes, as well as saving and loading of files we need a solution here.

Solution ideas:

  • In case the diagram is auto-sized, set the page size as reference size. Unless the diagram is resized, the mechanism would then work for resizes of the whole chart like before. The font size written to the file would be the real font size in the model, which is ok. The view would have to use the page size for comparison when it finds out that the diagram's model has no size.
  • What happens if a user resizes the diagram? In this scenario we would need a mechanism that sets the currently calculated auto-size of the diagram (before the resize) as a fixed size to all model objects that use a diagram reference size and have the page size set as reference. Then it has to set the new diagram size at the diagram. This is error prone, as you might forget objects. Also, comparing the current reference size with the page size is also not very elegant.
  • Drop the feature of diagram-related font sizes. Use only the page as reference size. This would simplify things but would break the feature of getting smaller fonts when you resize the diagram inside a chart document. I currently prefer this solution for different reasons.
    • First, we only have one concept: a reference page size, not reference sizes that refer to different objects.
    • Second, we always know the page size, as it is equal to the visual area size of the model. Apart from that an auto-page size wouldn't make much sense. So, we solve this problem.
    • Third, the down-side of doing this is, that resizing the diagram would not scale the fonts at axes, (axis titles?) and data points. But on the other hand, a chart document should layout the chart in a nice way, so that a user should not have the need to resize the diagram. If a user still wants to change the diagram size, he can change the font sizes manually. I think this is a valid compromise, as I think this does not happen too often. Also, for small diagram size changes a non-scaling font doesn't matter too much.
    • It simply does not make sense to set a reference size that is not found at the correspodning object (the diagram) in the model, but can only be obtained by a view. If you replaced your view implementation or you changed the auto-calculation of the view, you will get different results, so it is not reliable.
  • On resize change the model.
    • Setting re-scaled fonts is a bad idea as the old chart showed. You get rounding errors with sometimes drastic results.
    • It would be ok for OLE objects, as they are modified anyway (the visual area size)
    • If we would introduce a stand-alone chart some day, we would have to change the behaviour similar to draw or writer: The chart has a fixed page size and resizes of the window only change the visual area of the document. However, this conflicts with the use of the visual area as page size in the OLE scenario, where we definitely want to have the page size equal to the visual area size.
    • Therefore, this is seems like a bad idea.
  • Do not allow an empty diagram size if there are objects with DiagramReferenceSizes.
    • This is also no good idea, as the diagram would need knowledge about all objects with diagram reference sizes, to behave accordingly on setting the size.
    • When we want auto-resize as default, this would also mean the default would be to have a concrete diagram size, which is not reasonable.

So, summarizing, dropping the feature of fonts scaling with the diagram seems the most appropriate thing to do. Of course we will lose a feature and have to decide if it is ok for users to no longer have this. As I said, I think resizing the diagram to something very much smaller than the one you get with auto-calculation is not a very common or typical scenario.

(article by Bm@openoffice.org)

Chart Types

This section will describe various chart types and also provide possible solutions for implementation with emphasis on external packages. It will also try to establish a realistic priority list.

This list is based primarily on the List of wished enhancements for Charts. However, I will also present a number of new features not covered in the original list. Some other issues are detailed on the statistics wiki page, too.

Intro: External Programs

Implementing all these issues directly into OOo will be both difficult and unnecessary. It is not always pertinent to reinvent the wheel and I will present later a better solution.


Why not implement everything in OOo?

DISADVANTAGES

global disadvantages

  • Resources:
    • many resources are needed (coders, persons to test the new features, financial constraints and time delays)
    • code becomes more and more complex, more difficult to understand by new developers, more testing time needed
    • most experienced coders work already on other open source projects; less availability for a new project

specific disadvantages

  • scripting (see later the asymptote program)
    • most of the current implementation has very limited scripting support
    • automation will therefore be difficult;
    • few users will develop new macros/ enhance the existing functionality of Chart because they will need to learn a new scripting language and/or OOo code
  • NO scripting means new functionality must be hardcoded into the Chart module
    • only users with advanced OOo knowledge will be able to do that

ALTERNATIVES

The main idea is:

  • to break the monolithic structure of Chart into various modules and
  • export the functionality into various external packages;
    • the new OXT extension architecture should ease this process;

A very good alternative is to do all advanced things with dedicated external software, when such free alternatives exist. Below are listed some existing programs suitable for this task: (see also Links section below)

    • gnuplot: powerful scientific package
    • asymptote: powerful scripting capabilities
      • powerful descriptive vector graphics language for technical drawing
    • R (with over 500 packages):
      • extensive data visualization capabilities: see examples below for details;
      • also ideal for scripting;
    • ggobi: data visualisation system for exploring high-dimensional data
      • see also the R-package rggobi
    • YALE: a data mining application
      • for specialized data visualization techniques, see the screenshots;
      • examples include: 2D and 3D scatter plots, the Self-Organising Maps (SOM) and many other advanced techniques
    • octave: a high level mathematical language
    • various other packages, e.g. FreeMind and many more, each suitable for some specific task.

ADVANTAGES

  • New Features: once the external program is embedded into OOo, one can easily implement many new features (from the specific program) with minimal effort (see also Automation)
  • Advanced Solutions: dedicated programs offer more advanced solutions than any OOo implementation (or even that of competitive software)
  • Resources:
    • less resources needed
    • programs are developed by their own groups,
    • tested by appropriate folks (mostly proffesionals) and therefore
    • NO major delays and less propensity for bugs
    • new features are implemented more easily (we at OOo need only make a new Menu/Gui and paste the correct syntax/call to the external program)
  • CODE: smaller code, smaller program
    • only users that need that option will run it (or download it; see also the note below)
    • most other users will not have it installed, therefore OOo should also run faster; not so many resources loaded
  • Automation:
    • some of these programs come with very powerful scripting capabilities, e.g. gnuplot, asymptote, R-software
    • the previous programs are used by millions of users, many of whom already know the scripting/ programming language, and therefore do NOT need to learn a new language (see downloads on sourceforge.net for the first 2 programs)
    • easy creation of new macros/scripts; no need to hardcode new functionality into OOo


Another strong reason why external software is a better alternative:

If the coders aren't working with people that know how to do numerical methods
then what are the odds that it'll even come out correctly [quote from a user]


A special note:

  • because some licenses may not be compatible with OOo (although they are still open source), I DO NOT MEAN to include the code in OOo
  • what I mean is a general mechanism in place that allows OOo to communicate (bidirectioanly) with the external software
  • instruct users what external programs (extensions) do exist and where to find them (provide url)
  • allow users to easily access the functionality of external programs through OOo Menus (or functions)




Chart Types

These are mainly covered in the List of wished enhancements for Charts (see points 6.a through 6.m). There are a number of other chart-types NOT covered in that document, and I will briefly describe those which are critical:

Higher Priority


In Original List:

Other Chart Types (less priority)

Complex Conditions:

Time Series

Receiver Operating Characteristics

  • ROC curves: see stat wiki page

Links

see also:

External Software

External Chart Types examples

Chart Annotation

VERY HIGH PRIORITY: Significance (p-Values)

One of the most useful informations present on a Chart which is way to often forgotten describes the significance of the displayed difference.

  • a trivial difference (i.e. non-significant) might look frightening big through mischoosing the axes
  • while a real difference might be overlooked


This conveys really additional information (unlike some other techniques, like voluminizing)!!!

It is so important, that it deserves highest priority. This feature would also force users to comply to best practice guidelines avoiding mispresentation of their data. [I do NOT mean by this to enforce p-value display, but when confronted with the possibility to automatically display the significance levels, many users will feel that they should do it.] I will describe in the next paragraphs some common scenarios.

When performing a statistical analysis we obtain a number of p-values for the various data groups. Depending on the purpose of our Chart we may choose different values to display. There are generally two approaches for displaying p-values:

  • explicit: like p = 0.002 (provide actual number)
  • code the p-value: NS (not significant), * (e.g. p< 0.05), ** (p< 0.01), and so on (the cutoffs are not universal, but might be changed)


Here are the various possibilities/ options that should be available:

  • Display Non-Significant results: useful when one wants to highlight the absence of an effect
    • display as NS
    • display full p value (e.g. p = 0.10)
    • display full p value only when p is marginal (e.g. p = 0.06; marginal cutoff is usually 0.05 < p < 0.10)
  • Display Significant Results: (usual cutoff p < 0.05)
    • explicit p: e.g. p = 0.02
      • if p> 0.01, display only (round to) 2 decimal points
      • if p = 0, display as p < 0.001 (set explicitly the value)
      • if p<0.01:
        • display as p<0.01
        • display 3 decimals
        • if p<0.001: display as p<0.001
        • for p of the form p < 1E-8, display in scientific format (i.e. like 1E-8, or 1E-3)
    • coded
      • NS see above
      • 0.01 < p < 0.05: code as one *
      • 0.001 < p < 0.01: code as two *, (aka **); (cutoff: 0.001 or 0.005)
      • p < 0.001, code as ***


TO BE EXPANDED

Legend

see points 7.a-7.f

  • Modifiy Legend Text: point 7.e - [important]
  • Legend on one/more Lines: point 7.a
  • Forced BREAK/ Disable Text Break: point 7.f
  • Delete Series from Legend: point 7.c
  • Direction of Legend: point 7.b
  • Split Legend into more parts: <-- NEW
    • for more complex Legends, it might be useful to split the legend into 2-3 parts and apply advanced sorting and formatting for the individual groups
    • e.g. we might have a legend detailing the p-Values (when they were coded as NS, *, ** and ***, see ||Significance Values)
    • and a second Legend detailing the actual data
  • Legend Styles: <-- some are NEW
    • shadow: see point 7.d
    • highlight individual Legend entries <-- NEW
    • ability to Import/Save such Legend Styles

Data Labels

see points 8.a-8.h


IMPORTANT: Outliers

Occasionally we have to represent data that contains outliers. This is currently not easy performed. Consider the following data series: 1, 2, 3, ..., 9, and 200.

Obviously 200 is far beyond the rest of the data. If we plot a bar-graph, the values 1-9 will be barely visible (and have almost the same magnitude), while the value 200 will strike out.

Workaround

To solve this issue, we need to be able to split the Y-Axis into 2 sections:

  • first section could be from 0 to 20
  • then it follows a short empty space (Y axis is discontinuous)
  • and the 2nd Y-axis, which contains values around 200

e.g.

  Y-axis
    ^
210 |
200 |
190 |
(a strike
 through     OR  ...
 the Y-axis)
20  |
10  |
 0  |


This way we will display accurately both the outlier (200 value) as well as the rest of the data, and enhance the visualisation of the important data.

Documentation

Some useful information

Personal tools