Acta Univ. Agric. Silvic. Mendelianae Brun. 2015, 63(6), 2229-2237 | DOI: 10.11118/actaun201563062229
Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth
- 1 Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 613 00 Brno, Czech Republic
- 2 Department of Applied Mathematics and Computer Science, Faculty of Economics and Administration, Masaryk University, Žerotínovo nám. 617/9, 601 77 Brno, Czech Republic
Customers of various services are often invited to type a summarizing review via an Internet portal. Such reviews, written in natural languages, are typically unstructured, giving also a numeric evaluation within the scale "good" and "bad." The more reviews, the better feedback can be acquired for improving the service. However, after accumulating massive data, the non-linearly growing processing complexity may exceed the computational abilities to analyze the text contents. Decision tree inducers like c5 can reveal understandable knowledge from data but they need the data as a whole. This article describes an application of windowing, which is a technique for generating dataset subsamples that provide enough information for an inducer to train a classifier and get results similar to those achieved by training a model from the entire dataset. The windowing results, significantly reducing the complexity of the learning problem, are demonstrated using hundreds of thousands reviews written in English by hotel-service customers. A user obtains knowledge represented by significant words. The results show classification accuracy errors, training and testing time, tree sizes, and words relevant for the review meaning in dependence on the training subsample size. Finally, a method of suitable training-set size estimation is suggested.
Keywords: text mining, customer opinion analysis, decision trees, decision rules, windowing, large data volumes, machine learning, computational complexity, training-set size
Prepublished online: December 26, 2015; Published: January 1, 2016 Show citation
ACS | AIP | APA | ASA | Harvard | Chicago | IEEE | ISO690 | MLA | NLM | Turabian | Vancouver |
References
- BREIMAN, L., FRIEDMAN, J. H., OLSHEN, R. A. and STONE, C. J. 1984. Classification and Regression Trees. Wadsworth International.
- BIFET, A., HOLMES, G., KIRKBY, R. and PFAHRINGER, B. 2010. MOA: Massive online analysis. Journal of Machine Learning Research, 99(2010): 1601-1604.
- CHIKALOV, I. 2011. Average Time Complexity of Decision Trees. Springer.
Go to original source...
- DAŘENA, F., ŽIŽKA, J. and PŘICHYSTAL, J. 2014. Clients' Freely Written Assessment as the Source of Automatically Mined Opinions. Procedia Economics and Finance, 12(2014): 103-110. DOI: 10.1016/S2212-5671(14)00325-6
Go to original source...
- GANGANWAR, V. 2012. An overview of classification algorithms for imbalanced datasets. International Journal of Emerging Technology and Advanced Engineering, 2(4): 4247.
- HASTIE, T., TIBSHIRANI, R. and FRIEDMAN, J. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edition. Springer.
Go to original source...
- HUTCHINSON, A. 1994. Algorithmic Learning. Oxford University Press.
- QUINLAN, J. R. 1993. C4.5: Programs for Machine Learning. Morgan Kaufmann.
- QUINLAN, J. R. 2013. Data Mining Tools See5 and C5.0. In: RuleQuest Research 2013. [Online]. Available at: https://www.rulequest.com/see5-info.html. [Accessed: March 30, 2015].
- SEBASTIANI, F. 2002. Machine Learning in Automated Text Categorization. ACM Computing Surveys, 34(1): 1-47. DOI: 10.1145/505282.505283
Go to original source...
- SRIVASTAVA, A. N., SAHAMI, M. (eds). 2009. Text Mining: Classification, Clustering, and Applications. CRC Press/A Chapman and Hall Book.
- VALIANT, L. G. 1984. A Theory of the Learnable. Communications of the ACM, 27(11): 1134-1142. DOI: 10.1145/1968.1972
Go to original source...
- WITTEN, I. H., FRANK, E., HALL, M. A. 2011. Data Mining: Practical Machine Learning Tools and Techniques. 3rd edition. Morgan Kaufmann.
Go to original source...
- ŽIŽKA, J., DAŘENA, F. 2011. Mining Significant Words from Customer Opinions Written in Different Natural Languages. In: Lecture Notes in Artificial Intelligence, 6836(1): 211-218.
Go to original source...
- ŽIŽKA, J., DAŘENA, F. 2012. Parallel Processing of Very Many Textual Customers' Reviews Freely Written Down in Natural Languages. In: IMMM 2012: The Second International Conference on Advances in Information Mining and Management. 147-153.
- ŽIŽKA, J., RUKAVITSYN, V. 2012. Automatic Categorization of Reviews and Opinions of Internet E-Shopping Customers. In: Transdisciplinary Marketing Concepts and Emergent Methods for Virtual Environments. Hershey, Pennsylvania (USA): IGI Global, 154-163.
Go to original source...
- ŽIŽKA, J., DAŘENA, F. 2015. Automated Mining of Relevant N-grams in Relation to Predominant Topics of Text Documents. In: Lecture Notes in Artificial Intelligence, 9302(1): 461-469. Springer Verlag.
Go to original source...
This is an open access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY NC ND 4.0), which permits non-comercial use, distribution, and reproduction in any medium, provided the original publication is properly cited. No use, distribution or reproduction is permitted which does not comply with these terms.