Main Article Content
The dropout rates in the European countries is one of the major issues to be faced in a near future as stated in the Europe 2020 strategy. In 2017, an average of 10.6% of young people (aged 18-24) in the EU-28 were early leavers from education and training according to Eurostat’s statistics. The main aim of this review is to identify studies which uses educational data mining techniques to predict university dropout in traditional courses. In Scopus and Web of Science (WoS) catalogues, we identified 241 studies related to this topic from which we selected 73, focusing on what data mining techniques are used for predicting university dropout. We identified 6 data mining classification techniques, 53 data mining algorithms and 14 data mining tools.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The author declares that the submitted to Journal of e-Learning and Knowledge Society (Je-LKS) is original and that is has neither been published previously nor is currently being considered for publication elsewhere.
The author agrees that SIe-L (Italian Society of e-Learning) has the right to publish the material sent for inclusion in the journal Je-LKS.
The author agree that articles may be published in digital format (on the Internet or on any digital support and media) and in printed format, including future re-editions, in any language and in any license including proprietary licenses, creative commons license or open access license. SIe-L may also use parts of the work to advertise and promote the publication.
The author declares s/he has all the necessary rights to authorize the editor and SIe-L to publish the work.
The author assures that the publication of the work in no way infringes the rights of third parties, nor violates any penal norms and absolves SIe-L from all damages and costs which may result from publication.
The author declares further s/he has received written permission without limits of time, territory, or language from the rights holders for the free use of all images and parts of works still covered by copyright, without any cost or expenses to SIe-L.
For all the information please check the Ethical Code of Je-LKS, available at http://www.je-lks.org/index.php/ethical-code
- Adejo, O. W., & Connolly, T. (2018). Predicting student academic performance using multi-model heterogeneous ensemble approach. Journal of Applied Research in Higher Education, 10(1), 61–75.
- Adil, M., Tahir, F., & Maqsood, S. (2018). Predictive Analysis for Student Retention by Using Neuro-Fuzzy Algorithm. In 2018 10th Computer Science and Electronic Engineering Conference (ceec) (pp. 41–45).
- Ahuja, R., & Kankane, Y. (2017). Predicting the probability of student’s degree completion by using different data mining techniques. 2017 Fourth International Conference on Image Information Processing (ICIIP), 1–4. https://doi.org/10.1109/ICIIP.2017.8313763
- Al-shargabi, A. A., & Nusari, A. N. (2010). Discovering Vital Patterns From UST Students Data by Applying Data Mining Techniques. In V. Mahadevan & Z. Jianhong (Eds.), 2010 2nd International Conference on Computer and Automation Engineering (iccae 2010), Vol 2 (pp. 547–551).
- Alban, M., & Mauricio, D. (2019). Neural networks to predict dropout at the universities. International Journal of Machine Learning and Computing, 9(2), 149–153. https://doi.org/10.18178/ijmlc.2019.9.2.779
- Alban, M., & Mauricio, D. (2019). Predicting University Dropout through Data Mining: A systematic Literature. Indian Journal of Science and Technology, 12(4), 1–12. https://doi.org/10.17485/ijst/2019/v12i4/139729
- Astin, A. W. (1971). Predicting academic performance in college: Selectivity data for 2300 American colleges.
- Bala, M., & Ojha, D. D. B. (2012). STUDY OF APPLICATIONS OF DATA MINING TECHNIQUES IN EDUCATION. Vol. No., (1), 10.
- Bean, J. P. (1990). Using retention research in enrollment management. The Strategic Management of College Enrollments, 170–185.
- Bernardo, A., Cervero, A., Esteban, M., Tuero, E., Casanova, J. R., & Almeida, L. S. (2017). Freshmen program withdrawal: Types and recommendations. Frontiers in Psychology, 8(SEP). https://doi.org/10.3389/fpsyg.2017.01544
- Burova, S., Meyer, D., Doube, W., & Apputhurai, P. (2014). Predicting Undergraduate Onsite Student Withdrawals Based On Enrolment, Progress, And Online Student Data (T. Loster & T. Pavelka, Eds.).
- Castro R., L. F., Espitia P., E., & Montilla, A. F. (2018). Applying CRISP-DM in a KDD process for the analysis of student attrition. Communications in Computer and Information Science, 885, 386–401. https://doi.org/10.1007/978-3-319-98998-3_30
- Costa, E. B., Fonseca, B., Santana, M. A., de Araujo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256. https://doi.org/10.1016/j.chb.2017.01.047
- Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4), 303–314. https://doi.org/10.1007/BF02551274
- Dekker, G. W., Pechenizkiy, M., & Vleeshouwers, J. M. (2009). Predicting students drop out: A case study. 41–50. Retrieved from Scopus.
- Delen, D. (2011). Predicting student attrition with data mining methods. Journal of College Student Retention: Research, Theory and Practice, 13(1), 17–35. https://doi.org/10.2190/CS.13.1.b
- Delen, D., Topuz, K., & Eryarsoy, E. (2019). Development of a Bayesian Belief Network-based DSS for predicting and understanding freshmen student attrition. European Journal of Operational Research. https://doi.org/10.1016/j.ejor.2019.03.037
- Delen, Dursun. (2010). A comparative analysis of machine learning techniques for student retention management. Decision Support Systems, 49(4), 498–506. https://doi.org/10.1016/j.dss.2010.06.003
- Dharmawan, T., Ginardi, H., & Munif, A. (2018). Dropout Detection Using Non-Academic Data. Presented at the Proceedings - 2018 4th International Conference on Science and Technology, ICST 2018. https://doi.org/10.1109/ICSTC.2018.8528619
- Gopalakrishnan, A., Kased, R., Yang, H., Love, M. B., Graterol, C., & Shada, A. (2017). A Multifaceted Data Mining Approach to Understanding what Factors Lead College Students to Persist and Graduate.
- Guner, N., Yaldir, A., Gunduz, G., Comak, E., Tokat, S., & Iplikci, S. (2014). Predicting Academically At-Risk Engineering Students: A Soft Computing Application. Acta Polytechnica Hungarica, 11(5), 199–216.
- Gustian, D., & Hundayani, R. D. (2017). Combination of AHP Method With C4.5 in The Level Classification Level Out Students.
- Hagedorn, L. S. (2005). How to define retention. College Student Retention Formula for Student Success, 90–105.
- Hasbun, T., Araya, A., & Villalon, J. (2016). Extracurricular activities as dropout prediction factors in higher education using decision trees. In J. M. Spector, C. C. Tsai, D. G. Sampson, Kinshuk, R. Huang, N. S. Chen, & P. Resta (Eds.), 2016 Ieee 16th International Conference on Advanced Learning Technologies (icalt) (pp. 242–244).
- Hegde, V., & Prageeth, P. P. (2018). Higher Education Student Dropout Prediction and Analysis through Educational Data Mining.
- Hoffait, A.-S., & Schyns, M. (2017). Early detection of university students with potential difficulties. Decision Support Systems, 101, 1–11. https://doi.org/10.1016/j.dss.2017.05.003
- Iam-On, N., & Boongoen, T. (2014). Using Cluster Ensemble to Improve Classification of Student Dropout in Thai University. In 2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (scis) and 15th International Symposium on Advanced Intelligent Systems (isis) (pp. 452–457).
- Iam-On, N., & Boongoen, T. (2017). Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. International Journal of Machine Learning and Cybernetics, 8(2), 497–510. https://doi.org/10.1007/s13042-015-0341-x
- Karamouzis, S. T., & Vrettos, A. (2008). An Artificial Neural Network for Predicting Student Graduation Outcomes. In Wcecs 2008: World Congress on Engineering and Computer Science (pp. 991–994).
- Khasanah, A. U., & Harwati. (2017). A Comparative Study to Predict Student’s Performance Using Educational Data Mining Techniques. 215. https://doi.org/10.1088/1757-899X/215/1/012036
- Kitchenham, B. (2004). Procedures for Performing Systematic Reviews (Keele University. Technical Report TR/SE-0401).
- Koedinger, K. R., D’Mello, S., McLaughlin, E. A., Pardos, Z. A., & Rosé, C. P. (2015). Data mining and education: Data mining and education. Wiley Interdisciplinary Reviews: Cognitive Science, 6(4), 333–353. https://doi.org/10.1002/wcs.1350
- Kondo, N., Okubo, M., & Hatanaka, T. (2017). Early Detection of At-Risk Students Using Machine Learning Based on LMS Log Data. 198–201. https://doi.org/10.1109/IIAI-AAI.2017.51
- Lacave, C., & Molina, A. I. (2018). Using Bayesian Networks for Learning Analytics in Engineering Education: A Case Study on Computer Science Dropout at UCLM. International Journal of Engineering Education, 34(3), 879–894.
- Lacave, C., Molina, A. I., & Cruz-Lemus, J. A. (2018). Learning Analytics to identify dropout factors of Computer Science studies through Bayesian networks. Behaviour & Information Technology, 37(10–11), 993–1007. https://doi.org/10.1080/0144929X.2018.1485053
- M. A. Hearst, S. T. Dumais, E. Osuna, J. Platt and B. Scholkopf (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13, 18-28. doi: 10.1109/5254.708428
- Malvestuto, F. M., Mezzini, M., & Moscarini, M. (2011). Computing simple-path convex hulls in hypergraphs. Information Processing Letters, 111(5), 231–234. https://doi.org/10.1016/j.ipl.2010.11.026
- Manhães, L. M. B., Da Cruz, S. M. S., & Zimbrão, G. (2014a). The impact of high dropout rates in a large public brazilian university a quantitative approach using educational data mining. 3, 124–129. Retrieved from Scopus.
- Manhães, L. M. B., Da Cruz, S. M. S., & Zimbrão, G. (2014b). WAVE: An architecture for predicting dropout in undergraduate courses using EDM. 243–245. https://doi.org/10.1145/2554850.2555135
- Manrique, R., Nunes, B. P., Marino, O., Casanova, M. A., & Nurmikko-Fuller, T. (2019). An analysis of student representation, representative features and classification algorithms to predict degree dropout. 401–410. https://doi.org/10.1145/3303772.3303800
- Martinho, V. R. C., Nunes, C., & Minussi, C. R. (2013a). An intelligent system for prediction of school dropout risk group in higher education classroom based on artificial neural networks. 159–166. https://doi.org/10.1109/ICTAI.2013.33
- Martinho, V. R. C., Nunes, C., & Minussi, C. R. (2013b). Prediction of school dropout risk group using neural network. 111–114. Retrieved from Scopus.
- Martins, L. C. B., Carvalho, R. N., Carvalho, R. S., Victorino, M. C., & Holanda, M. (2017). Early prediction of college attrition using data mining (X. Chen, B. Luo, F. Luo, V. Palade, & M. A. Wani, Eds.).
- Mashiloane, L., & Mchunu, M. (2013). Mining for marks: A comparison of classification algorithms when predicting academic performance to identify ‘students at risk’. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8284 LNAI, 541–552. https://doi.org/10.1007/978-3-319-03844-5_54
- Mason, C., Twomey, J., Wright, D., & Whitman, L. (2018). Predicting Engineering Student Attrition Risk Using a Probabilistic Neural Network and Comparing Results with a Backpropagation Neural Network and Logistic Regression. Research in Higher Education, 59(3), 382–400. https://doi.org/10.1007/s11162-017-9473-z
- Massa, S., & Puliafito, P. P. (1999). An application of data mining to the problem of the University students’ dropout using Markov chains. In J. M. Zytkow & J. Rauch (Eds.), Principles of Data Mining and Knowledge Discovery (Vol. 1704, pp. 51–60).
- Massa, S., & Puliafito, P. P. (2000). Data mining in temporal sequences: A technique based on MC. In N. Ebecken & C. A. Brebbia (Eds.), Data Mining Ii (Vol. 2, pp. 289–298).
- Mayra, A., & Mauricio, D. (2018). Factors to Predict Dropout at the Universities: A case of study in Ecuador. In Proceedings of 2018 Ieee Global Engineering Education Conference (educon). Emerging Trends and Challenges of Engineering Education (pp. 1238–1242).
- Meedech, P., Iam-On, N., & Boongoen, T. (2016). Prediction of Student Dropout Using Personal Profile and Data Mining Approach. In K. Lavangnananda, S. PhonAmnuaisuk, W. Engchuan, & J. H. Chan (Eds.), Intelligent and Evolutionary Systems, Ies 2015 (Vol. 5, pp. 143–155).
- Mezzini, M. (2010). On the complexity of finding chordless paths in bipartite graphs and some interval operators in graphs and hypergraphs. Theoretical Computer Science, 411(7), 1212–1220. https://doi.org/10.1016/j.tcs.2009.12.017
- Mezzini, M. (2011). Fast minimal triangulation algorithm using minimum degree criterion. Theoretical Computer Science, 412(29), 3775–3787. https://doi.org/10.1016/j.tcs.2011.04.022
- Mezzini, M. (2012). Fully dynamic algorithm for chordal graphs with O(1) query-time and O(n2) update-time. Theoretical Computer Science, 445, 82–92. https://doi.org/10.1016/j.tcs.2012.05.002
- Mezzini, M. (2016). On the geodetic iteration number of the contour of a graph. Discrete Applied Mathematics, 206, 211–214. https://doi.org/10.1016/j.dam.2016.02.012
- Mezzini, M. (2018). Polynomial time algorithm for computing a minimum geodetic set in outerplanar graphs. Theoretical Computer Science, 745, 63–74. https://doi.org/10.1016/j.tcs.2018.05.032
- Mezzini, M., & Moscarini, M. (2015). On the geodeticity of the contour of a graph. Discrete Applied Mathematics, 181, 209–220. https://doi.org/10.1016/j.dam.2014.08.028
- Mezzini, M., & Moscarini, M. (2016). The contour of a bridged graph is geodetic. Discrete Applied Mathematics, 204, 213–215. https://doi.org/10.1016/j.dam.2015.10.007
- Mezzini, M., Bonavolontà, G., & Agrusti, F. (2019). Predicting university dropout by using convolutional neural networks. In INTED2019.
- Mohamad, S. K., & Tasir, Z. (2013). Educational Data Mining: A Review. Procedia - Social and Behavioral Sciences, 97, 320–324. https://doi.org/10.1016/j.sbspro.2013.10.240.
- Mollica, C., & Petrella, L. (2017). Bayesian binary quantile regression for the analysis of Bachelor-to-Master transition. Journal of Applied Statistics, 44(15), 2791–2812. https://doi.org/10.1080/02664763.2016.1263835
- Moscoso-Zea, O., Vizcaino, M., & Luján-Mora, S. (2017). Evaluation of methods and algorithms of educational data mining. Presented at the 2017 Research in Engineering Education Symposium, REES 2017. Retrieved from Scopus.
- Moseley, L. G., & Mead, D. M. (2008). Predicting who will drop out of nursing courses: A machine learning exercise. Nurse Education Today, 28(4), 469–475. https://doi.org/10.1016/j.nedt.2007.07.012
- Murakami, K., Takamatsu, K., Kozaki, Y., Kishida, A., Kenya, B., Noda, I., … Nakata, Y. (2019). Predicting the Probability of Student Dropout through EMIR Using Data from Current and Graduate Students. 478–481. https://doi.org/10.1109/IIAI-AAI.2018.00103
- Mustafa, M. N., Chowdhury, L., & Kamal, M. S. (2012). Students Dropout Prediction for Intelligent System from Tertiary Level in Developing Country.
- Nagy, M., & Molontay, R. (2018). Predicting Dropout in Higher Education Based on Secondary School Performance. 000389–000394. https://doi.org/10.1109/INES.2018.8523888
- Nandeshwar, A., Menzies, T., & Nelson, A. (2011). Learning patterns of university student retention. Expert Systems with Applications, 38(12), 14984–14996. https://doi.org/10.1016/j.eswa.2011.05.048
- Olinsky, A., Schumacher, P., & Quinn, J. (2016). An Expanded Assessment of Data Mining Approaches for Analyzing Actuarial Student Success Rate. International Journal of Business Analytics, 3(1), 22–44. https://doi.org/10.4018/IJBAN.2016010102
- Oviedo, B., Moral, S., & Puris, A. (2016). A hierarchical clustering method: Applications to educational data. Intelligent Data Analysis, 20(4), 933–951. https://doi.org/10.3233/IDA-160839
- Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- Perez, B., Castellanos, C., & Correal, D. (2018). Applying Data Mining Techniques to Predict Student Dropout: A Case Study. Presented at the 2018 IEEE 1st Colombian Conference on Applications in Computational Intelligence, ColCACI 2018 - Proceedings. https://doi.org/10.1109/ColCACI.2018.8484847
- Pérez, B., Castellanos, C., & Correal, D. (2018). Predicting student drop-out rates using data mining techniques: A case study. Communications in Computer and Information Science, 833, 111–125. https://doi.org/10.1007/978-3-030-03023-0_10
- Ram, S., Wang, Y., Currim, F., & Currim, S. (2015). Using big data for predicting freshmen retention. Presented at the 2015 International Conference on Information Systems: Exploring the Information Frontier, ICIS 2015. Retrieved from Scopus.
- Ramentol, E., Madera, J., & Rodríguez, A. (2019). Early detection of possible undergraduate drop out using a new method based on probabilistic rough set theory. Studies in Fuzziness and Soft Computing, 377, 211–232. https://doi.org/10.1007/978-3-030-10463-4_12
- Rocha, C. F., Zelaya, Y. F., Sánchez, D. M., & Pérez, A. F. (2017). Prediction of university desertion through hybridization of classification algorithms. 2029, 215–222. Retrieved from Scopus.
- Romero, C., & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135–146. https://doi.org/10.1016/j.eswa.2006.04.005
- Sajjadi, S., Shapiro, B., McKinlay, C., Sarkisyan, A., Shubin, C., & Osoba, E. (2017). Finding Bottlenecks: Predicting Student Attrition with Unsupervised Classifier.
- Santoso, L. W., & Yulia. (2019). The Analysis of Student Performance Using Data Mining. Advances in Intelligent Systems and Computing, 924, 559–573. https://doi.org/10.1007/978-981-13-6861-5_48
- Sarker, F., Tiropanis, T., & Davis, H. C. (2014). Linked Data, Data Mining and External Open Data for Better Prediction of at-risk Students (I. Kacem, P. Laroche, & Z. Roka, Eds.).
- Sarra, A., Fontanella, L., & Di Zio, S. (2018). Identifying Students at Risk of Academic Failure Within the Educational Data Mining Framework. https://doi.org/10.1007/s11205-018-1901-8
- Serra, A., Perchinunno, P., & Bilancia, M. (2018). Predicting student dropouts in higher education using supervised classification algorithms. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10962 LNCS, 18–33. https://doi.org/10.1007/978-3-319-95168-3_2
- Shahiri, A. M., Husain, W., & Rashid, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, 72, 414–422. https://doi.org/10.1016/j.procs.2015.12.157
- Shiratori, N. (2017). Modeling dropout behavior patterns using Bayesian Networks in Small-Scale Private University (T. Matsuo, N. Fukuta, M. Mori, K. Hashimoto, & S. Hirokawa, Eds.). New York: Ieee.
- Shyamala, K., & Rajagopalan, S. P. (2007). Mining student data to characterize drop out feature using clustering and decision tree techniques. International Journal of Soft Computing, 2(1), 150–156. Retrieved from Scopus.
- Siri, A. (2014). Predicting students’ academic dropout using artificial neural networks. Scopus.
- Sivakumar, S., Venkataraman, S., & Selvaraj, R. (2016). Predictive modeling of student dropout indicators in educational data mining using improved decision tree. Indian Journal of Science and Technology, 9(4), 1–5. https://doi.org/10.17485/ijst/2016/v9i4/87032
- Solis, M., Moreira, T., Gonzalez, R., Fernandez, T., & Hernandez, M. (2018). Perspectives to Predict Dropout in University Students with Machine Learning. Presented at the 2018 IEEE International Work Conference on Bioinspired Intelligence, IWOBI 2018 - Proceedings. https://doi.org/10.1109/IWOBI.2018.8464191
- Sultana, S., Khan, S., & Abbas, M. A. (2017). Predicting performance of electrical engineering students using cognitive and non-cognitive features for identification of potential dropouts. International Journal of Electrical Engineering & Education, 54(2), 105–118. https://doi.org/10.1177/0020720916688484
- Tan, P.-N., Steinbach, M., Kumar, V. (2005). Introduction to Data Mining. Addison Wesley. ISBN: 0321321367.
- Timaran Pereira, R., & Caicedo Zambrano, J. (2017). Aplication of Decision Trees for Detection of Student Dropout Profiles (X. Chen, B. Luo, F. Luo, V. Palade, & M. A. Wani, Eds.).
- Tinto, V. (1987). Leaving college: Rethinking the causes and cures of student attrition. ERIC.
- Vila, D., Cisneros, S., Granda, P., Ortega, C., Posso-Yépez, M., & García-Santillán, I. (2019). Detection of desertion patterns in university students using data mining techniques: A case study. Communications in Computer and Information Science, 895, 420–429. https://doi.org/10.1007/978-3-030-05532-5_31
- Villwock, R., Appio, A., & Andreta, A. A. (2015). Educational Data Mining with Focus on Dropout Rates. International Journal of Computer Science and Network Security, 15(3), 17–23.
- Vossensteyn, J. J., Kottmann, A., Jongbloed, B. W. A., Kaiser, F., Cremonini, L., Stensaker, B., … Wollscheid, S. (2015). Dropout and completion in higher education in Europe: Main report. https://doi.org/10.2766/826962
- Wang, Z., Zhu, C., Ying, Z., Zhang, Y., Wang, B., Jin, X., & Yang, H. (2018). Design and Implementation of Early Warning System Based on Educational Big Data. In 2018 5th International Conference on Systems and Informatics (icsai) (pp. 549–553).
- Zea, L. D. F., Reina, Y. F. P., & Molano, J. I. R. (2019). Machine Learning for the Identification of Students at Risk of Academic Desertion. Communications in Computer and Information Science, 1011, 462–473. https://doi.org/10.1007/978-3-030-20798-4_40
- Zhang, L., & Rangwala, H. (2018). Early identification of at-risk students using iterative logistic regression. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10947 LNAI, 613–626. https://doi.org/10.1007/978-3-319-93843-1_45
- Zhang, Y., Li, Y., You, F., & Xu, X. (2010). Withdrawal prediction using the blackboard learning management system through SOM. 340–344. Retrieved from Scopus.
- Zhuhadar, L., Daday, J., Marklin, S., Kessler, B., & Helbig, T. (2019). Using survival analysis to discovering pathways to success in mathematics. Computers in Human Behavior, 92, 487–495. https://doi.org/10.1016/j.chb.2017.12.016
- Zuka, R., Krasts, J., & Rozevskis, U. (2017). Using Data Mining Technology For Student Data Analysis.