2. Bibliography¶
- AA17
Aderemi O Adewumi and Andronicus A Akinyelu. A survey of machine-learning and nature-inspired based credit card fraud detection techniques. International Journal of System Assurance Engineering and Management, 8(2):937–953, 2017.
- AHJ+20
Ayman Alazizi, Amaury Habrard, François Jacquenet, Liyun He-Guelton, and Frédéric Oblé. Dual sequential variational autoencoders for fraud detection. In IDA, 14–26. 2020.
- AFR97
Emin Aleskerov, Bernd Freisleben, and Bharat Rao. Cardwatch: a neural network based database mining system for credit card fraud detection. In Proceedings of the IEEE/IAFE 1997 computational intelligence for financial engineering (CIFEr), 220–226. IEEE, 1997.
- ASH+19
Haseeb Ali, Mohd Najib Mohd Salleh, Kashif Hussain, Arshad Ahmad, Ayaz Ullah, Arshad Muhammad, Rashid Naseem, and Muzammil Khan. A review on data preprocessing methods for class imbalance problem. International Journal of Engineering & Technology, 8:390–397, 2019.
- AC15
Jinwon An and Sungzoon Cho. Variational autoencoder based anomaly detection using reconstruction probability. Special Lecture on IE, 2(1):1–18, 2015.
- BCB14
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- BASO16
Alejandro Correa Bahnsen, Djamila Aouada, Aleksandar Stojanovic, and Björn Ottersten. Feature engineering strategies for credit card fraud detection. Expert Systems with Applications, 51:134–142, 2016.
- Ban20
European Central Bank. 6th report on card fraud. August 2020. [Online; Last consulted 09-October-2020]. URL: https://www.ecb.europa.eu/pub/cardfraud/html/ecb.cardfraudreport202008~521edb602b.en.html#toc2.
- BBM+03
Gustavo EAPA Batista, Ana LC Bazzan, Maria Carolina Monard, and others. Balancing training data for automated annotation of keywords: a case study. In WOB, 10–18. 2003.
- BPM04
Gustavo EAPA Batista, Ronaldo C Prati, and Maria Carolina Monard. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD explorations newsletter, 6(1):20–29, 2004.
- BTK+21
Marijan Beg, Juliette Taka, Thomas Kluyver, Alexander Konovalov, Min Ragan-Kelley, Nicolas M Thiéry, and Hans Fangohr. Using jupyter for reproducible scientific workflows. Computing in Science & Engineering, 23(2):36–46, 2021.
- BentejacCsorgoMartinezMunoz21
Candice Bentéjac, Anna Csörgo, and Gonzalo Martínez-Muñoz. A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54(3):1937–1967, 2021.
- BB12
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of machine learning research, 2012.
- Bis06
Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
- Bon21
Gianluca Bontempi. Statistical foundations of machine learning, 2nd edition. Université Libre de Bruxelles, 2021.
- BEP13
Kendrick Boyd, Kevin H Eng, and C David Page. Area under the precision-recall curve: point estimates and confidence intervals. In Joint European conference on machine learning and knowledge discovery in databases, 451–466. Springer, 2013.
- Bre96
Leo Breiman. Bagging predictors. Machine learning, 24(2):123–140, 1996.
- Bre01
Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.
- Car18
Fabrizio Carcillo. Beyond Supervised Learning in Credit Card Fraud Detection: A Dive into Semi-supervised and Distributed Learning. Université libre de Bruxelles, 2018.
- CDPLB+18
Fabrizio Carcillo, Andrea Dal Pozzolo, Yann-Aël Le Borgne, Olivier Caelen, Yannis Mazzer, and Gianluca Bontempi. Scarff: a scalable framework for streaming credit card fraud detection with spark. Information fusion, 41:182–194, 2018.
- CLBCB18
Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, and Gianluca Bontempi. Streaming active learning strategies for real-life credit card fraud detection: assessment and visualization. International Journal of Data Science and Analytics, 5(4):285–300, 2018.
- CLBC+19
Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Yacine Kessaci, Frédéric Oblé, and Gianluca Bontempi. Combining unsupervised and supervised learning in credit card fraud detection. Information Sciences, 2019.
- CTMozetivc20
Vitor Cerqueira, Luis Torgo, and Igor Mozetič. Evaluating time series forecasting models: an empirical study on performance estimation methods. Machine Learning, 109(11):1997–2028, 2020.
- Cha09
Nitesh V Chawla. Data mining for imbalanced datasets: an overview. In Data mining and knowledge discovery handbook, pages 875–886. Springer, 2009.
- CBHK02
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16:321–357, 2002.
- CCHJ08
Nitesh V Chawla, David A Cieslak, Lawrence O Hall, and Ajay Joshi. Automatically countering imbalance and its empirical relationship to cost. Data Mining and Knowledge Discovery, 17(2):225–252, 2008.
- CJK04
Nitesh V Chawla, Nathalie Japkowicz, and Aleksander Kotcz. Special issue on learning from imbalanced data sets. ACM SIGKDD explorations newsletter, 6(1):1–6, 2004.
- CLB+04
Chao Chen, Andy Liaw, Leo Breiman, and others. Using random forest to learn imbalanced data. University of California, Berkeley, 110(1-12):24, 2004.
- CG16
Tianqi Chen and Carlos Guestrin. Xgboost: a scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785–794. 2016.
- Cyb89
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of control, signals and systems, 2(4):303–314, 1989.
- DP15
Andrea Dal Pozzolo. Adaptive machine learning for credit card fraud detection. Université libre de Bruxelles, 2015.
- DPBC+17
Andrea Dal Pozzolo, Giacomo Boracchi, Olivier Caelen, Cesare Alippi, and Gianluca Bontempi. Credit card fraud detection: a realistic modeling and a novel learning strategy. IEEE transactions on neural networks and learning systems, 29(8):3784–3797, 2017.
- DPCLB+14
Andrea Dal Pozzolo, Olivier Caelen, Yann-Ael Le Borgne, Serge Waterschoot, and Gianluca Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Expert systems with applications, 41(10):4915–4928, 2014.
- DJS+20
Kanishka Ghosh Dastidar, Johannes Jurgovsky, Wissam Siblini, Liyun He-Guelton, and Michael Granitzer. Nag: neural feature aggregation framework for credit card fraud detection. In 2020 IEEE International Conference on Data Mining (ICDM), 92–101. IEEE, 2020.
- DG06
Jesse Davis and Mark Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, 233–240. 2006.
- DCLT18
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- DH00
Pedro Domingos and Geoff Hulten. Mining high-speed data streams. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 71–80. 2000.
- DEG18
Anna Veronika Dorogush, Vasily Ershov, and Andrey Gulin. Catboost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363, 2018.
- Elk01
Charles Elkan. The foundations of cost-sensitive learning. In International joint conference on artificial intelligence, volume 17, 973–978. Lawrence Erlbaum Associates Ltd, 2001.
- EMH19
Thomas Elsken, Jan Hendrik Metzen, and Frank Hutter. Neural architecture search: a survey. The Journal of Machine Learning Research, 20(1):1997–2017, 2019.
- FZ11
Guangzhe Fan and Mu Zhu. Detection of rare items with target. Statistics and Its Interface, 4(1):11–17, 2011.
- Faw04
Tom Fawcett. Roc graphs: notes and practical considerations for researchers. Machine learning, 31(1):1–38, 2004.
- Faw06
Tom Fawcett. An introduction to roc analysis. Pattern recognition letters, 27(8):861–874, 2006.
- FernandezGarciaG+18
Alberto Fernández, Salvador García, Mikel Galar, Ronaldo C Prati, Bartosz Krawczyk, and Francisco Herrera. Learning from imbalanced data sets. Springer, 2018.
- FK15
Peter Flach and Meelis Kull. Precision-recall-gain curves: pr analysis done right. In Advances in neural information processing systems, 838–846. 2015.
- FS97
Yoav Freund and Robert E Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55(1):119–139, 1997.
- FHT01
Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning. Volume 1. Springer series in statistics New York, 2001.
- Fri01
Jerome H Friedman. Greedy function approximation: a gradient boosting machine. Annals of statistics, pages 1189–1232, 2001.
- FCTZ16
Kang Fu, Dawei Cheng, Yi Tu, and Liqing Zhang. Credit card fraud detection using convolutional neural networks. In International conference on neural information processing, 483–490. Springer, 2016.
- GvZliobaiteB+14
João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. A survey on concept drift adaptation. ACM computing surveys (CSUR), 46(4):1–37, 2014.
- GR94
Sushmito Ghosh and Douglas L Reilly. Credit card fraud detection with a neural-network. In System Sciences, 1994. Proceedings of the Twenty-Seventh Hawaii International Conference on, volume 3, 621–630. IEEE, 1994.
- GBC16
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
- GTM+20
Akhilesh Gupta, Nesime Tatbul, Ryan Marcus, Shengtian Zhou, Insup Lee, and Justin Gottschlich. Class-weighted evaluation metrics for imbalanced data classification. arXiv preprint arXiv:2010.05995, 2020.
- HYS+17
Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, and Gong Bing. Learning from class-imbalanced data: review of methods and applications. Expert Systems with Applications, 73:220–239, 2017.
- HWM05
Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. Borderline-smote: a new over-sampling method in imbalanced data sets learning. In International conference on intelligent computing, 878–887. Springer, 2005.
- HBGL08
Haibo He, Yang Bai, Edwardo A Garcia, and Shutao Li. Adasyn: adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), 1322–1328. IEEE, 2008.
- HN92
Robert Hecht-Nielsen. Theory of the backpropagation neural network. In Neural networks for perception, pages 65–93. Elsevier, 1992.
- Imb21
Imblearn. Imbalanced learning library for python. 2021. [Online; Last consulted 26-June-2021]. URL: https://imbalanced-learn.org/.
- Ins18
Statistic Brain Research Institute. Credit card fraud statistics. April 2018. [Online; Last consulted 30-March-2021]. URL: https://www.statisticbrain.com/credit-card-fraud-statistics/.
- JGZ+18
Johannes Jurgovsky, Michael Granitzer, Konstantin Ziegler, Sylvie Calabretto, Pierre-Edouard Portier, Liyun He-Guelton, and Olivier Caelen. Sequence classification for credit-card fraud detection. Expert Systems with Applications, 100:234–245, 2018.
- Kag16
Kaggle. Credit card fraud detection dataset. November 2016. [Online; Last consulted 09-March-2021]. URL: https://www.kaggle.com/mlg-ulb/creditcardfraud.
- Kag19
Kaggle. Ieee-cis fraud detection - can you detect fraud from customer transactions? September 2019. [Online; Last consulted 26-August-2021]. URL: https://www.kaggle.com/c/ieee-fraud-detection.
- KMF+17
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. Lightgbm: a highly efficient gradient boosting decision tree. Advances in neural information processing systems, 30:3146–3154, 2017.
- KonevcnyMY+16
Jakub Konečn\`y, H Brendan McMahan, Felix X Yu, Peter Richtárik, Ananda Theertha Suresh, and Dave Bacon. Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492, 2016.
- Kri10
M. Krivko. A hybrid model for plastic card fraud detection systems. Expert Systems with Applications, 37(8):6070 – 6076, 2010. URL: http://www.sciencedirect.com/science/article/pii/S0957417410001582, doi:https://doi.org/10.1016/j.eswa.2010.02.119.
- LRT14
Balaji Lakshminarayanan, Daniel M Roy, and Yee Whye Teh. Mondrian forests: efficient online random forests. Advances in neural information processing systems, 27:3140–3148, 2014.
- LDB17
Felix Last, Georgios Douzas, and Fernando Bacao. Oversampling for imbalanced learning based on k-means and smote. arXiv preprint arXiv:1711.00837, 2017.
- Lau01
Jorma Laurikkala. Improving identification of difficult small classes by balancing class distribution. In Conference on Artificial Intelligence in Medicine in Europe, 63–66. Springer, 2001.
- LNC+11
Quoc V Le, Jiquan Ngiam, Adam Coates, Ahbik Lahiri, Bobby Prochnow, and Andrew Y Ng. On optimization methods for deep learning. In ICML. 2011.
- LLBHG+19
Bertrand Lebichot, Yann-Aël Le Borgne, Liyun He-Guelton, Frédéric Oblé, and Gianluca Bontempi. Deep-learning domain adaptation techniques for credit cards fraud detection. In INNS Big Data and Deep Learning conference, 78–88. Springer, 2019.
- LPS+21
Bertrand Lebichot, Gian Marco Paldino, W Siblini, L He-Guelton, F Oblé, and G Bontempi. Incremental learning strategies for credit cards fraud detection. International Journal of Data Science and Analytics, pages 1–10, 2021.
- LVLB+21
Bertrand Lebichot, Théo Verhelst, Yann-Aël Le Borgne, Liyun He-Guelton, Frédéric Oblé, and Gianluca Bontempi. Transfer learning strategies for credit card fraud detection. IEEE access, 9:114754–114766, 2021.
- LemaitreNA17
Guillaume Lemaître, Fernando Nogueira, and Christos K. Aridas. Imbalanced-learn: a python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5, 2017. URL: http://jmlr.org/papers/v18/16-365.html.
- LS08
Charles X Ling and Victor S Sheng. Cost-sensitive learning and the class imbalance problem. Encyclopedia of machine learning, 2011:231–235, 2008.
- LJ20
Yvan Lucas and Johannes Jurgovsky. Credit card fraud detection using machine learning: a survey. arXiv preprint arXiv:2010.06479, 2020.
- MO97
Richard Maclin and David Opitz. An empirical evaluation of bagging and boosting. AAAI/IAAI, 1997:546–551, 1997.
- MAT+19
Sara Makki, Zainab Assaghir, Yehia Taher, Rafiqul Haque, Mohand-Said Hacid, and Hassan Zeineddine. An experimental study with imbalanced classification approaches for credit card fraud detection. IEEE Access, 7:93010–93022, 2019.
- MZ03
Inderjeet Mani and I Zhang. Knn approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of workshop on learning from imbalanced datasets, volume 126. ICML United States, 2003.
- McK17
Wes McKinney. Python for data analysis: Data wrangling with Pandas, NumPy, and IPython - 2nd Edition. O'Reilly Media, Inc., 2017.
- MekterovicBrkicBaranovic18
Igor Mekterović, Ljiljana Brkić, and Mirta Baranović. A systematic review of data mining approaches to credit card fraud detection. WSEAS Transactions on Business and Economics, 15:437–444, 2018.
- Mus19
John Muschelli. Roc and auc with a binary predictor: a potentially misleading metric. Journal of Classification, pages 1–13, 2019.
- MullerG16
Andreas C Müller and Sarah Guido. Introduction to machine learning with Python: a guide for data scientists. O'Reilly Media, Inc., 2016.
- NH10
Vinod Nair and Geoffrey E Hinton. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on International Conference on Machine Learning, 807–814. 2010.
- NCK11
Hien M Nguyen, Eric W Cooper, and Katsuari Kamei. Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigms, 3(1):4–21, 2011.
- PGC+17
Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in pytorch. In NIPS-W. 2017.
- PL18
Vipul Patil and Umesh Kumar Lilhore. A survey on different data mining & machine learning methods for credit card fraud detection. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 3(5):320–325, 2018.
- PC18
Rimpal R Popat and Jayesh Chaudhary. A survey on credit card fraud detection using machine learning. In 2018 2nd International Conference on Trends in Electronics and Informatics (ICOEI), 1120–1125. IEEE, 2018.
- PP19
C Victoria Priscilla and D Padma Prabha. Credit card fraud detection: a systematic review. In International Conference on Information, Communication and Computing Technology, 290–303. Springer, 2019.
- PGV+17
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin. Catboost: unbiased boosting with categorical features. arXiv preprint arXiv:1706.09516, 2017.
- RWC+19
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, and others. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- rep19
Nilson report. Nilson report issue 1164. November 2019. [Online; Last consulted 09-October-2020]. URL: https://nilsonreport.com/upload/content_promo/The_Nilson_Report_Issue_1164.pdf .
- Rud16
Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747, 2016.
- RHW86
David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams. Learning representations by back-propagating errors. nature, 323(6088):533–536, 1986.
- SSB18
Imane Sadgali, Nawal Sael, and Faouzia Benabbou. Detection of credit card fraud: state of art. International Journal of computer science and network security, 18(11):76–83, 2018.
- SR15
Takaya Saito and Marc Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10(3):e0118432, 2015.
- SCF+21
Wissam Siblini, Guillaume Coter, Rémy Fabry, Liyun He-Guelton, Frédéric Oblé, Bertrand Lebichot, Yann-Aël Le Borgne, and Gianluca Bontempi. Transfer learning for credit card fraud detection: A journey from research to production. In Proceedings of Data Science and Advanced Analytics (DSAA 2021). 2021. URL: https://arxiv.org/abs/2107.09323.
- SKK18
Janvier Omar Sinayobye, Fred Kiwanuka, and Swaib Kaawaase Kyanda. A state-of-the-art review of machine learning techniques for fraud detection research. In 2018 IEEE/ACM Symposium on Software Engineering in Africa (SEiA), 11–19. IEEE, 2018.
- SHK+14
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 15(1):1929–1958, 2014.
- STZY18
Yu Sun, Ke Tang, Zexuan Zhu, and Xin Yao. Concept drift adaptation by exploiting historical knowledge. IEEE transactions on neural networks and learning systems, 29(10):4822–4832, 2018.
- SVL14
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, 3104–3112. 2014.
- Tha20
Alaa Tharwat. Classification assessment methods. Applied Computing and Informatics, 2020.
- T+76
Ivan Tomek and others. Two modifications of cnn. IEEE Trans. Syst. Man Commun, 1:769–772, 1976.
- VVBC+15
Véronique Van Vlasselaer, Cristián Bravo, Olivier Caelen, Tina Eliassi-Rad, Leman Akoglu, Monique Snoeck, and Bart Baesens. Apate: a novel approach for automated credit card transaction fraud detection using network-based extensions. Decision Support Systems, 75:38–48, 2015.
- VSP+17
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in neural information processing systems, 5998–6008. 2017.
- VAK+16
Kalyan Veeramachaneni, Ignacio Arnaldo, Vamsi Korrapati, Constantinos Bassias, and Ke Li. Aiˆ 2: training a big data machine to defend. In 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), 49–54. IEEE, 2016.
- WHJ+09
Christopher Whitrow, David J Hand, Piotr Juszczak, David Weston, and Niall M Adams. Transaction aggregation as a strategy for credit card fraud detection. Data mining and knowledge discovery, 18(1):30–55, 2009.
- Wil72
Dennis L Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, pages 408–421, 1972.
- YL09
Show-Jane Yen and Yue-Shi Lee. Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36(3):5718–5727, 2009.
- YAG19
Niloofar Yousefi, Marie Alaghband, and Ivan Garibay. A comprehensive survey on machine learning techniques and user authentication approaches for credit card fraud detection. arXiv preprint arXiv:1912.02629, 2019.
- ZP17
Chong Zhou and Randy C Paffenroth. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 665–674. 2017.
- Zho21
Zhi-Hua Zhou. Ensemble learning. In Machine Learning, pages 181–210. Springer, 2021.
- ZAM+16
Zahra Zojaji, Reza Ebrahimi Atani, Amir Hassan Monadjemi, and others. A survey of credit card fraud detection techniques: data and technique oriented perspective. arXiv preprint arXiv:1611.06439, 2016.