Calc/To-Dos/Statistics/Miscellaneous Data Analysis
Front matter....
Goal
One of the most important tasks in data analysis is to describe optimally the data. Other important issues include extracting all the useful information from the original data set and describe complex relationships/ variability. Some of these are accomplished using statistical techiques (especially in biomedical sciences, see Calc/To-Dos/Statistical Data Analysis Tool), yet most will use different techniques, which will be described here.
Unfortunately, I lost any contact with mathematics more than 10 years ago. Therefore, my comments will be very brief, and I hope that people with interest and knowledge will develop this page further.
...
Specific Techiques
Summarizing Data
Methods to summarize the information in a limited number of components, e.g. linear dimension reduction
- Principal Component Analysis:
- most variability is extracted from the original data;
- the resulting variables are non-correlated;
- optimal linear transformation
- disadvantage: resulting variables might be difficult to interpret (do not have any logical meaning)
- see http://en.wikipedia.org/wiki/Principal_components_analysis
- Varimax
- see http://de.wikipedia.org/wiki/Varimax
- see also: http://sekhon.berkeley.edu/stats/html/varimax.html (an R-implementation: package stats)
- Simple Component Analysis:
- variables are not necassarily non-correlated
- but are easier to understand/ to interpret
- see Rousson V, Gasser T. Simple component analysis. Appl. Statist. 2004; 53:539–555, http://www.unizh.ch/biostat/Manuscripts/simpcomp.pdf
- R-implementation: http://www.maths.lth.se/help/R/.R/library/sca/html/00Index.html (package sca)
Energy-Frequency Analysis
- Fourier Transform (limited to stationary and linear data)
- wavelet analysis
- Wigner-Ville distribution
- Empirical Mode Decomposition: more robust
- a detailed description is freely available here: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Huang et al. Proc. R. Soc. Lond. A 1998; 454:903-995
- see also this document for another good description of the algorithm and this accompanying power point presentation (see emd.ppt)
- further articles to download can be found here, like this one
- google also for "Empirical Mode Decomposition" to find additional material
Data Mining
see also http://en.wikipedia.org/wiki/Data_mining
Resources
Links
- ...