Massively Parallel Feature Extraction Framework Application in Predicting Dangerous Seismic Events
Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 225–229 (2016)
Abstract. In this paper we introduce an automated mechanism for knowledge discovery from data streams. As a part of this work, we also present a new approach to the creation of classifiers ensemble based on a wide variety of models. Furthermore, we describe an innovative, highly scalable feature extraction and selection framework designed to work with the MapReduce programming model and the application of designed framework to build an ensemble of classifiers which takes into account both the quality and the diversity of individual models. The effectiveness of the solution has been verified through a participation in an open data mining competition which concerned the problem of predicting periods of increased seismic activity causing life-threatening accidents in coal mines. The submitted solution obtained the highest AUC score of all the solutions uploaded by 106 participating research teams.
- M. Boullé. Tagging fireworkers activities from body sensors under distribution drift. In Ganzha et al. , pages 389–396.
- J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Commun. ACM, 51(1):107–113, 2008.
- T. G. Dietterich. Ensemble methods in machine learning. In Proceedings of the First International Workshop on Multiple Classifier Systems, MCS ’00, pages 1–15, London, UK, UK, 2000. Springer-Verlag.
- M. Ganzha, L. A. Maciaszek, and M. Paprzycki, editors. 2015 Federated Conference on Computer Science and Information Systems, FedCSIS 2015, Lódz, Poland, September 13-16, 2015. IEEE, 2015.
- M. Grzegorowski. Scaling of complex calculations over big data-sets. In D. Ślęzak, G. Schaefer, S. T. Vuong, and Y. Kim, editors, Active Media Technology - 10th International Conference, AMT 2014, Warsaw, Poland, August 11-14, 2014. Proceedings, volume 8610 of Lecture Notes in Computer Science, pages 73–84. Springer, 2014.
- M. Grzegorowski and S. Stawicki. Window-Based Feature Engineering for Prediction of Methane Threats in Coal Mines. In Yao et al. , pages 452–463.
- M. Grzegorowski and S. Stawicki. Window-Based Feature Extraction Framework for Multi-Sensor Data: A Posture Recognition Case Study. In Ganzha et al. , pages 397–405.
- I. Guyon and A. Elisseeff. An introduction to variable and feature selection. J. Mach. Learn. Res., 3:1157–1182, Mar. 2003.
- A. Janusz, M. Sikora, Ł. Wróbel, and D. Ślęzak. Predicting Dangerous Seismic Events: AAIA16 Data Mining Challenge. In M. Ganzha, L. A. Maciaszek, and M. Paprzycki, editors, Proceedings of FedCSIS 2016. IEEE, 2016. In print September 2016.
- A. Janusz, M. Sikora, L. Wróbel, S. Stawicki, M. Grzegorowski, P. Wojtas, and D. Śl ̨ezak. Mining Data from Coal Mines: IJCRS’15 Data Challenge. In Yao et al. , pages 429–438.
- A. Janusz and D. Śl ̨ezak. Random probes in computation and assessment of approximate reducts. In M. Kryszkiewicz, C. Cornelis, D. Ciucci, J. Medina-Moreno, H. Motoda, and Z. W. Ras, editors, Rough Sets and Intelligent Systems Paradigms - Second International Conference, RSEISP 2014, Held as Part of JRS 2014, Granada and Madrid, Spain, July 9-13, 2014. Proceedings, volume 8537 of Lecture Notes in Computer Science, pages 53–64. Springer, 2014.
- A. Janusz and D. Ślęzak. Rough set methods for attribute clustering and selection. Applied Artificial Intelligence, 28(3):220–242, 2014.
- A. Janusz and D. Śl ̨ezak. Computation of approximate reducts with dynamically adjusted approximation threshold. In F. Esposito, O. Pivert, M. Hacid, Z. W. Ras, and S. Ferilli, editors, Foundations of Intelligent Systems - 22nd International Symposium, ISMIS 2015, Lyon, France, October 21-23, 2015, Proceedings, volume 9384 of Lecture Notes in Computer Science, pages 19–28. Springer, 2015.
- A. Krasuski, A. Jankowski, A. Skowron, and D. Ślęzak. From sensory data to decision making: A perspective on supporting a fire com- mander. In 2013 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, Atlanta, Georgia, USA, 17-20 November 2013, Workshop Proceedings, pages 229–236. IEEE Computer Society, 2013.
- P. Lameski, E. Zdravevski, R. Mingov, and A. Kulakov. SVM parameter tuning with grid search and its impact on reduction of model over-fitting. In Yao et al. , pages 464–474.
- J. Lasek and M. Gagolewski. The winning solution to the AAIA’15 data mining competition: Tagging firefighter activities at a fire scene. In Ganzha et al. , pages 375–380.
- M. Meina, A. Janusz, K. Rykaczewski, D. Ślęzak, B. Celmer, and A. Krasuski. Tagging firefighter activities at the emergency scene: Summary of AAIA’15 data mining competition at knowledge pit. In Ganzha et al. , pages 367–373.
- H. S. Nguyen. On efficient handling of continuous attributes in large data bases. Fundam. Inform., 48(1):61–81, 2001.
- Z. Pawlak. Rough sets. International Journal of Parallel Programming, 11(5):341–356, 1982.
- L. S. Riza, A. Janusz, C. Bergmeir, C. Cornelis, F. Herrera, D. Ślęzak, and J. M. Benítez. Implementing algorithms of rough set theory and fuzzy rough set theory in the R package "roughsets". Inf. Sci., 287:68–89, 2014.
- M. Sikora and B. Sikora. Improving prediction models applied in sys- tems monitoring natural hazards and machinery. International Journal of Applied Mathematics and Computer Science, 22(2):477–491, 2012.
- J. Stefanowski, A. Cuzzocrea, and D. Slezak. Processing and mining complex data streams. Inf. Sci., 285:63–65, 2014.
- S. Wawrzyniak and W. Niemiro. Clustering approach to the problem of human activity recognition using motion data. In Ganzha et al. , pages 411–416.
- A. Wieczorkowska, J. Wroblewski, P. Synak, and D. Ślęzak. Application of temporal descriptors to musical instrument sound recognition. J. Intell. Inf. Syst., 21(1):71–93, 2003.
- Y. Yao, Q. Hu, H. Yu, and J. W. Grzymala-Busse, editors. Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - 15th International Conference, RSFDGrC 2015, Tianjin, China, November 20-23, 2015, Proceedings, volume 9437 of Lecture Notes in Computer Science. Springer, 2015.
- A. Zagorecki. A versatile approach to classification of multivariate time series data. In Ganzha et al. , pages 407–410.