Acta Univ. Agric. Silvic. Mendelianae Brun. 2018, 66, 1431-1439

https://doi.org/10.11118/actaun201866061431
Published online 2018-12-19

Analysis of the Association Between Topics in Online Documents and Stock Price Movements

František Dařena, Jan Přichystal

Department of Informatics, Faculty of Business and Economics, Mendel University in Brno, Zemědělská 1, 61300 Brno, Czech Republic

References

1. AGGARWAL, C. C. and ZHAI, C. 2012. A survey of text clustering algorithms. In: AGGARWAL, C. C. and ZHAI, C. (Eds.). Mining text data. New York, NY: Springer, pp. 77–128.
2. BARÁK, K., DAŘENA, F. and ŽIŽKA, J. 2015. Automated Extraction of Typical Expressions Describing Product Features from Customer Reviews. European journal of business science and technology, 1(2): 83–92. <https://doi.org/10.11118/ejobsat.v1i2.27>
3. BENESTY, J., CHEN, J., HUANG, Y. and COHEN, I. 2009. Pearson Correlation Coefficient. Springer.
4. BLAU, B. M. and GRIFFITH, T. G. 2016. Price clustering and the stability of stock prices. Journal of Business Research, 69(10): 3933–3942. <https://doi.org/10.1016/j.jbusres.2016.06.008>
5. BORCH, K. 1963. Price movements in the stock market. Research paper no. 7 Econometric research program. Princeton University.
6. BSOUL, Q., SALIM, J. and ZAKARIA, L. Q. 2013. An Intelligent Document Clustering Approach to Detect Crime Patterns. Procedia Technology, 11: 1181–1187. <https://doi.org/10.1016/j.protcy.2013.12.311>
7. BUKOVINA, J. 2016. Social media big data and capital markets—An overview. Journal of Behavioral and Experimental Finance, 11: 18–26. <https://doi.org/10.1016/j.jbef.2016.06.002>
8. LE CESSIE, S. and VAN HOUWELINGEN, J. C. 1992. Ridge Estimators in Logistic Regression. Applied Statistics, 41(1): 191–201. <https://doi.org/10.2307/2347628>
9. DAŘENA, F., PETROVSKÝ, J., ŽIŽKA, J. and PŘICHYSTAL, J. 2018. Machine Learning-Based Analysis of the Association between Online Texts and Stock Price Movements. Inteligencia Artificial, 21(61): 95–110. <https://doi.org/10.4114/intartif.vol21iss61pp95-110>
10. DHILLON, I. S. and MODHA, D. S. 1999. Concept decompositions for large sparse text data using clustering. Machine Learning, 42: 143–175. <https://doi.org/10.1023/A:1007612920971>
11. FERRANO, G. and WANNER, L. 2012. Labeling Semantically Motivated Clusters of Verbal Relations. Procesamiento del Lenguaje Natural, 49: 129–138.
12. FRANK, E., HALL, M. A. and WITTEN, I. H. 2016. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”. Morgan Kaufmann.
13. GELBUKH, A. F., ALEXANDROV, M., BOUREK, A. and MAKAGONOV, P. 2003. Selection of Representative Documents for Clusters in a Document Collection. In: Proceedings of Natural Language Processing and Information Systems, 8th International Conference on Applications of Natural Language to Information Systems, 120–126.
14. GUO, Q. and ZHANG, M. 2009. Multi-documents Automatic Abstracting based on text clustering and semantic analysis. Knowledge-Based Systems, 22(6): 482–485. <https://doi.org/10.1016/j.knosys.2009.06.010>
15. JOACHIMS, T. 2002. Learning to classify text using support vector machines. Norwell, MA: Kluwer Academic Publishers.
16. KEARNEY, C. and LIU, S. 2014. Textual sentiment in finance: A survey of methods and models. International Review of Financial Analysis, 33: 171–185. <https://doi.org/10.1016/j.irfa.2014.02.006>
17. KUBAT, M., HOLTE, R. C. and MATWIN, S. 1998. Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30(2-3): 195–215. <https://doi.org/10.1023/A:1007452223027>
18. KUMAR, B. S. and RAVI, V. 2016. A survey of the applications of text mining in financial domain. Knowledge-Based Systems, 114: 128–147. <https://doi.org/10.1016/j.knosys.2016.10.003>
19. LEE, H., SURDEANU, M., MACCARTNEY, B. and JURAFSKY, D. 2014. On the Importance of Text Analysis for Stock Price Prediction. In: LREC, pp. 1170–1175.
20. LI, X. et al. 2014. News impact on stock price return via sentiment analysis. Knowledge-Based Systems, 69: 14–23. <https://doi.org/10.1016/j.knosys.2014.04.022>
21. LOUGHRAN, T. and MCDONALD, B. 2011. When is a liability not a liability? Textual analysis, dictionaries, and 10-ks. The Journal of Finance, 66: 35–65. <https://doi.org/10.1111/j.1540-6261.2010.01625.x>
22. MANNING, C. D., RAGHAVAN, P. and SCHÜTZE, H. 2008. Introduction to Information Retrieval. Cambridge University Press.
23. MOROZKOV, M., GRANICHIN, O., VOLKOVICH, Z. and ZHANG, X. 2012. Fast algorithm for finding true number of clusters. Applications to control systems. In: Control and Decision Conference (CCDC), pp. 2001–2006.
24. NIST/SEMATECH. 2016. e-Handbook of Statistical Methods. [Online]. Available at http://www.itl.nist.gov/div898/handbook. [Accessed: 2016, August 11].
25. PATEL, J., SHAH, S., THAKKAR, P. and KOTECHA, K. 2015. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications, 42(1): 259–268. <https://doi.org/10.1016/j.eswa.2014.07.040>
26. PLATT, J. 1998. Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: SCHOELKOPF, B., C. BURGES, C. and SMOLA, A. (Eds.). Advances in Kernel Methods – Support Vector Learning. MIT Press.
27. RANCO, G. et al. 2015. The effects of Twitter sentiment on stock price returns. PloS one, 10(9): e0138441. <https://doi.org/10.1371/journal.pone.0138441>
28. SALTON, G. and MCGILL, M. J. 1983. Introduction to Modern Information Retrieval. New York, NY: McGraw-Hill.
29. SCHUMAKER, R. P. and CHEN, H. 2009. Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Transactions on Information Systems, 27(2): a12. <https://doi.org/10.1145/1462198.1462204>
30. SIGANOS, A., VAGENAS-NANOS, E. and VERWIJMEREN, P. 2017. Divergence of sentiment and stock market trading. Journal of Banking & Finance, 78: 130–141. <https://doi.org/10.1016/j.jbankfin.2017.02.005>
31. SOKOLOVA, M., JAPKOWICZ, N. and SZPAKOWICZ, S. 2006. Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation. In: Australasian Joint Conference on Artificial Intelligence. Springer, pp. 1015–1021.
32. TSENG, Y.-H., LIN, C.-J. and LIN, Y. 2007. Text mining techniques for patent analysis. Information Processing & Management, 43(5): 1216–1247. <https://doi.org/10.1016/j.ipm.2006.11.011>
33. WEISS, S. M. et al. 2005. Text Mining: Predictive Methods for Analyzing Unstructured Information. Springer.
34. WENG, B., AHMED, M. A. and MEGAHED, F. M. 2017. Stock market one-day ahead movement prediction using disparate data sources. Expert Systems with Applications, 79: 153–163. <https://doi.org/10.1016/j.eswa.2017.02.041>
35. WONG, F. M. F., LIU, Z. and CHIANG, M. 2014. Stock market prediction from WSJ: text mining via sparse matrix factorization. In: 2014 IEEE International Conference on Data Mining. IEEE, pp. 430–439.
36. WUTHRICH, B., CHO, V., LEUNG, S., PERMUNETILLEKE, D., SANKARAN, K. and ZHANG, J. 1998. Daily stock market forecast from textual web data. In: 1998 IEEE International Conference on Systems, Man, and Cybernetics. Vol. 3, pp. 2720–2725.
37. YANG, Y. and PEDERSEN, J. O. 1997. A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420.
38. ZHAO, Y. and KARYPIS, G. 2001. Criterion Functions for Document Clustering: Experiments and Analysis. Technical Report #01-40. University of Minnesota, Department of Computer Science.
39. ZUO, Y. et al. 2016. Topic Modeling of Short Texts: A Pseudo-Document View. In: KDD ‘16 Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp. 2105–2114.
40. ŽIŽKA, J. and DAŘENA, F. 2011a. Mining Significant Words from Customer Opinions Written in Different Natural Languages. In: Proceedings of the 14th International Conference on Text, Speech and Dialogue, Lecture Notes in Artificial Intelligence. Heidelberg: Springer, pp. 211–218.
41. ŽIŽKA, J. and DAŘENA, F. 2011b. Mining Textual Significant Expressions Reflecting Opinions in Natural Languages. In: Proceedings of the 11th International Conference on Intelligent Systems Design and Applications, pp. 136–141.
42. ŽIŽKA, J., BURDA, K. and DAŘENA, F. 2012. Clustering a very large number of textual unstructured customers’ reviews in English. In: Proceedings of Artificial Intelligence: Methodology, Systems, and Applications. Heidelberg: Springer, pp. 38–47.
43. ŽIŽKA, J. and DAŘENA, F. 2013. Revealing Prevailing Semantic Contents of Clusters Generated from Untagged Freely Written Text Documents in Natural Languages. In: Text, Speech, and Dialogue. Heidelberg: Springer, pp. 434–441.
44. ŽIŽKA, J. and DAŘENA, F. 2015. Revealing potential changes of significant terms in streams of textual data written in natural languages using windowing and text mining. In: Artificial Intelligence and Natural Language and Information Extraction, Social Media and Web Search FRUCT Conference. IEEE, pp. 131–138.
front cover

ISSN 1211-8516 (Print)

ISSN 2464-8310 (Online)

Current issue

Review Management System NEW Indexed in DOAJ

Archive