Predicting Dangerous Seismic Events in Coal Mines under Distribution Drift
Citation: Proceedings of the 2016 Federated Conference on Computer Science and Information Systems, M. Ganzha, L. Maciaszek, M. Paprzycki (eds). ACSIS, Vol. 8, pages 221–224 (2016)
Abstract. We describe our submission to the AAIA'16 Data Mining Competition, where the objective is to devise a reliable prediction model for detecting periods of increased seismic activity in coal mines. Our solution exploits a selective naive Bayes classifier, with optimal preprocessing, variable selection and model averaging, together with an automatic variable construction method that builds many variables from time series records. One challenging part of the competition is that the input variables are not independent and identically distributed (i.i.d.) between the train and test datasets, since the train data and test data rely on different coal mines and different times periods. We apply a drift-aware methodology to alleviate this problem, that enabled to get a final score of 0.9246 (team marcb), less than 0.015 from the challenge winner.
- M. Boullé, “Compression-based averaging of selective naive Bayes classifiers,” Journal of Machine Learning Research, vol. 8, pp. 1659– 1685, 2007.
- M. Boullé, “Towards automatic feature construction for supervised classifi- cation,” in ECML/PKDD 2014. Springer-Verlag, 2014, pp. 181–196.
- J. Dougherty, R. Kohavi, and M. Sahami, “Supervised and unsupervised discretization of continuous features,” in Proceedings of the 12th Inter- national Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA, 1995, pp. 194–202.
- M. Boullé, “MODL: a Bayes optimal discretization method for contin- uous attributes,” Machine Learning, vol. 65, no. 1, pp. 131–165, 2006.
- M. Boullé, “A Bayes optimal approach for partitioning the values of cate- gorical attributes,” Journal of Machine Learning Research, vol. 6, pp. 1431–1452, 2005.
- P. Langley and S. Sage, “Induction of selective Bayesian classifiers,” in Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, 1994, pp. 399–406.
- I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, Feature Extraction: Foundations And Applications. Springer, 2006.
- H. Liu and H. Motoda, Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers, 1998.
- A. J. Knobbe, H. Blockeel, A. Siebes, and D. Van Der Wallen, “Multi- Relational Data Mining,” in Proceedings of Benelearn ’99, 1999.
- S. Kramer, P. A. Flach, and N. Lavrač, “Propositionalization approaches to relational data mining,” in Relational data mining, S. Džeroski and N. Lavrač, Eds. Springer-Verlag, 2001, ch. 11, pp. 262–286.
- M. Boullé, “Tagging fireworkers activities from body sensors under distribution drift,” in Federated Conference on Computer Science and Information Systems, 2015. http://dx.doi.org/10.15439/2015F423 pp. 389–396.
- M. Boullé, “Prediction of methane outbreak in coal mines from historical sensor data under distribution drift,” in Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing - 15th International Conference, RSFDGrC, 2015. http://dx.doi.org/10.1007/978-3-319-25783-9 pp. 439–451.
- A. Bondu and M. Boullé, “A supervised approach for change detection in data streams,” in Proceedings of International Joint Conference on Neural Networks, 2011, pp. 519–526.