Acta Univ. Agric. Silvic. Mendelianae Brun. 2011, 59(2), 75-80 | DOI: 10.11118/actaun201159020075

Time series clustering in large data sets

Jiří Fejfar, Jiří ©»astný
Ústav informatiky, Mendelova univerzita v Brně, Zemědělská 1, 613 00 Brno, Česká republika

The clustering of time series is a widely researched area. There are many methods for dealing with this task. We are actually using the Self-organizing map (SOM) with the unsupervised learning algorithm for clustering of time series.
After the first experiment (Fejfar, Weinlichová, ©»astný, 2009) it seems that the whole concept of the clustering algorithm is correct but that we have to perform time series clustering on much larger dataset to obtain more accurate results and to find the correlation between configured parameters and results more precisely. The second requirement arose in a need for a well-defined evaluation of results. It seems useful to use sound recordings as instances of time series again. There are many recordings to use in digital libraries, many interesting features and patterns can be found in this area. We are searching for recordings with the similar development of information density in this experiment. It can be used for musical form investigation, cover songs detection and many others applications.
The objective of the presented paper is to compare clustering results made with different parameters of feature vectors and the SOM itself. We are describing time series in a simplistic way evaluating standard deviations for separated parts of recordings. The resulting feature vectors are clustered with the SOM in batch training mode with different topologies varying from few neurons to large maps.
There are other algorithms discussed, usable for finding similarities between time series and finally conclusions for further research are presented. We also present an overview of the related actual literature and projects.

Keywords: time series, self-organizing map, clustering
Grants and funding:

This paper is supported by IGA project 64/2010.

Received: December 17, 2010; Published: July 7, 2014  Show citation

ACS AIP APA ASA Harvard Chicago IEEE ISO690 MLA NLM Turabian Vancouver
Fejfar, J., & ©»astný, J. (2011). Time series clustering in large data sets. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis59(2), 75-80. doi: 10.11118/actaun201159020075
Download citation

References

  1. ABDALLAH, S. A., PLUMBLEY, M. D., 2007: Information Dynamics. Technical Report C4DM-TR07-01, Centre for Digital Music, Queen Mary University of London.
  2. FEJFAR, J., WEINLICHOVÁ, J., ©«ASTNÝ, J., 2010: Musical Form Retrieval. In: MENDEL 2010, 16th International Conference on Soft Computing. Brno: Brno University of Technology. ISSN 1803-3814.
  3. KOHONEN, T., 2001: Self-Organizing Maps. Secaucus, NJ, USA: Springer-Verlag New York, Inc. ISBN 3540679219. Go to original source...
  4. LAW, E., VON AHN, L., 2009: Input-agreement: A New Mechanism for Data Collection using Human Computation Games. Proc. of CHI, Boston, Massachusetts, USA. ACM press 978-1-60558-247-4, pp. 1197-1206.
  5. ©TENCL, M., ©«ASTNÝ, J., 2009: Advanced approach to numerical forecasting using artificial neural networks. Acta Universitatis agriculturae et silviculturae Mendelianae Brunensis, sv. 6, č. 2, pp. 297-304, ISSN 1211-8516. DOI: 10.11118/actaun200957060297 Go to original source...
  6. WEI, L., KEOGH, E. J., 2006: Semi-supervised time series classification. In: Proceedings of the Twelfth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA. ISBN 1-59593-339-5. Go to original source...

This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY NC ND 4.0), which permits non-comercial use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.