6. Summary¶
Neural networks have always played an important role in automatic fraud detection systems [GR94]. They are, however, rarely the first choice in production since transactions are tabular data, and a lot of practitioners still rely on feature engineering and classical machine learning techniques such as random Forests or XGBoost. Nevertheless, not only can they reach very competitive performance, but they also have many advantages for the credit fraud detection problem. They represent an additional model that provides a different behavior, they can be federated easily, they automate feature engineering and representation learning, and they are differentiable and incremental.
In this chapter, the objective was to cover the methodology to build neural networks for fraud detection, from general considerations on the design of a deep learning pipeline to the implementation of several architectures: a feed-forward neural network, an autoencoder, a convolutional neural network, a long short-term memory network, and an LSTM with Attention.
This methodology gives an overview of the major elements in the design of a Neural Network. Comparatively to classical methods, there are an infinite set of hyperparameters and possibilities, which entails a time-consuming tuning process but allows a great expressivity.
The different architecture developed in the chapter belong to different families of techniques:
The regular feed-forward network is the simplest but most widely used architecture of deep learning. It is only made of fully connected neurons and is the standard choice to solve classification/regression problems on tabular data, with a set of features with numerical values.
The autoencoder’s goal is to learn representations to reconstruct descriptive variables, so it has been widely used for unsupervised learning problems. This method is interesting because anomaly detection, and in particular fraud detection, can be tackled with unsupervised or semi-supervised techniques. One way to use the autoencoder for that purpose is to consider its reconstruction error as an indicator for fraud risk. It can be used solely to detect outliers, but this generally leads to a low precision. It can also be used as an extra variable in supervised classification.
The CNN and LSTM (with or without attention) can be used as sequential models. They allow to automatically build features from contextual data. To classify a transaction as fraudulent or genuine, it is generally useful to resort to the regular behavior of the cardholder in order to detect a discrepancy. A manual method to integrate this contextual information is to proceed with feature engineering and the creation of expert features aggregation. Instead, sequential models rely on a sequence of transactions that precedes the current transaction and is linked to it with respect to a landmark variable such as the Customer ID, and they automatically compute a representation that summarizes the sequence.
In conclusion, deep learning methods for fraud detection are varied, appear to be competitive with classical machine learning approaches on both simulated and real-world data, and have multiple practical advantages. Therefore, they definitely deserve a place in the fraud detection practitioner’s toolbox.