2. Book contributions

Original contributions of the book with respect to the current literature:

  • Simulator of synthetic transaction data: This book proposes a simulator for transaction data that allows the creation of synthetic transaction datasets of varying complexity. In particular, the simulator allows to vary the degree of class imbalance (low proportion of fraudulent transactions), contains both numerical and categorical variables (with categorical features that have a very high number of possible values), and features time-dependent fraud scenarios. The simulator is presented in Chapter 3, Section 2.

  • Reproducibility: This book is a Jupyter Book, which allows to interactively execute or modify the sections of this book that contain code. Together with the synthetic data generator, all the experiments and results presented in this book are reproducible. A description of how to execute this book on the cloud or on your computer is provided in Chapter 2, Section 3.

  • State-of-the-art review: The book synthesizes the recent surveys on the topic of machine learning for credit card fraud detection (ML for CCFD). It highlights the core principles presented in these surveys and summarizes the main challenges of fraud detection systems. The review is presented in Chapter 3, section 3.

  • Evaluation methodology: A major contribution of this book is a detailed presentation and discussion of the performance metrics and validation methodologies that can be used to assess the efficiency of fraud detection systems. Performance metrics are addressed in Chapter 4. Validation methodologies and strategies for model selection are addressed in Chapter 5.

  • Imbalanced learning: The book provides an extensive experimental evaluation of imbalanced learning approaches, spanning cost-sensitive, resampling, and ensemble techniques. For each approach, the experimental evaluation includes a toy example, a dataset of simulated transactions, and a real-world dataset. The topic is covered in Chapter 6. The main take-away of the proposed experimental evaluation is that the benefits of imbalanced learning techniques are mitigated and closely depend on the targeted performance metrics.

  • Deep learning: The recent advent of deep learning techniques has led the research community to become increasingly interested in their applications to fraud detection. This book is the first to go into the details of the use and implementation of these types of methods for the problem of credit card fraud detection. Chapter 7 covers the implementation and evaluation of techniques such as fully connected feed-forward neural networks, and more advanced techniques such as representation learning with autoencoders or sequential models like convolutional or long short-term memory networks.