Credit Card Fraud

Credit Card Fraud

Credit card fraud is a persistent issue and one that credit card companies and banks constantly need to consider. Many banks and credit card companies offer free monitoring and alerts in an attempt to identify potentially fraudulent activity as soon as possible. With the standard practice of reimbursing customers for fraudulent charges on their credit cards and accounts, it is in the best interest of credit card companies and banks to identify, stop and prevent fraudulent activity.

ETL Process

Data was downloaded from Kaggle. Initial cleanup and transformation was done in Colab. Amazon RDS was used to store the data. The data set contains 284,807 credit card transactions.

Challenge

Data was highly imbalanced. There were only 0.17% fraud transactions in the data set. Any model that always predict 'non-fraud' will be accurate at 99.83% of the time.

Solution

We resampled the data to have equal number of fraud and non-fraud transactions. Also several models were tried and the performance was compared based on the precision/recall values.

Models

K-Nearest Neighbour

Logistic

Simple Neural Network

Deep Neural Network

Support Vector Machines

Random Forest

Best Model

The best model was found to be the Random Forest Model. This model can predict fraud transacrtions at a 93% precision.

Confusion Matrix -'1' represents fraud transactions

Used: Colab, Amazon RDS, PySpark,sklern, TensorFlow, Keras, PostgreSQL

Github: https://github.com/lumindak/Credit-Card-Fraud-Detection-Project

Implementation

Confusion Matrix