top of page

Create a Neural Network to decide if a Bank client will leave or not!!!

  • Εικόνα συγγραφέα: Grigoris Athanasiadis
    Grigoris Athanasiadis
  • 9 Μαρ 2020
  • διαβάστηκε 5 λεπτά

Έγινε ενημέρωση: 11 Μαρ 2020


The purpose of this article is to use the "Churn_Modelling.csv" dataset from Kaggle in order to make some predictions. (https://www.kaggle.com/shrutimechlearn/churn-modelling).

Customer churn is the percentage of customers that stopped using your company's product or service during a certain time frame. So our target in this process is to predict customer churn based on different customer attributes such as age, gender, geography, and more.


Firstly, as all machine and deep learning projects, we have to examine and interpret the dataset that anybody can obtain from the above link. This data set contains details of a bank's customers and the target variable is a binary variable reflecting the fact whether the customer left the bank (closed his account) or he continues to be a customer.



In the above picture we observe the first 30 records of the bank data that the rows correspond to each client. Overall, we have 10000 rows and 14 columns from which the dependant variable is the last one which corresponds to the Exited column that is a Binary flag and shows 1 if the customer closed account with bank and 0 if the customer is retained. The independent variables are the RowNumber(Row Numbers from 1 to 10000), CustomerID(Unique Ids for bank customer identification), Surname(Customer's last name), CreditScore(Credit score of the customer), Geography(The country from which the customer belongs), Gender(Male or Female), Age(Age of the customer), Tenure(Number of years for which the customer has been with the bank), Balance(Bank balance of the customer), NumOfProducts(Number of bank products the customer is utilising), HasCrCard(Binary Flag for whether the customer holds a credit card with the bank or not), IsActiveMember(Binary Flag for whether the customer is an active member with the bank or not), EstimatedSalary(Estimated salary of the customer in Dollars).

In this project we will use Python which comes with a variety of data science and machine learning libraries that can be used to make predictions based on different features or attributes of a dataset. Python’s scikit-learn and tensorflow library is one such tool. In this article, we’ll use this library for customer churn prediction.


In the starting lines i chose to import the modules we will need for the project that we i will explain one by one when i use them. So in the line 14 we use our first module which is the pandas library to import our dataset. In the next two lines we separate the dependent as y and independent variables as X but i remove the first three columns (RowNumber , CustomerID and Surname) because it is obvious that they don't affect to the dependant variable. In the line 19, i use the module LabelEncoder which encodes target labels with value between 0 and n_classes-1. So in our paradigm the columns Geography that has 3 values France - Spain - Germany will take values 0 - 1 - 2 respectively and the column Gender the values 0 - 1 for Female and Male.

The problem now is since there are different numbers in the same column, the model will misunderstand the data to be in some kind of order, 0 < 1 < 2. But this isn’t the case at all. To overcome this problem, we use One Hot Encoder. The input to this transformer should be an array-like of integers or strings, denoting the values taken on by categorical (discrete) features. The features are encoded using a one-hot (aka ‘one-of-K’ or ‘dummy’) encoding scheme. This creates a binary column for each category and returns a sparse matrix or dense array (depending on the sparse parameter)


 onehotencoder = OneHotEncoder(categorical_features=[1])
 X = onehotencoder.fit_transform(X).toarray()

To avoid dummy variable trap we remove the first column from the dataset X. The Dummy variable trap is a scenario where there are attributes which are highly correlated (Multicollinear) and one variable predicts the value of others. When we use one hot encoding for handling the categorical data, then one dummy variable (attribute) can be predicted with the help of other dummy variables. Hence, one dummy variable is highly correlated with other dummy variables. Using all dummy variables for regression models lead to dummy variable trap. So, the regression models should be designed excluding one dummy variable.


X = X[:, 1:]

Subsequently, i 've utilize from module sklearn the sub-library model_selection i've imported train_test_split() function in order to split the dataset into training and test set. The test_size=0.2 inside the function indicates the percentage of the data that should be held over for testing. It’s usually around 80/20 or 70/30.


X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                    test_size=0.2)

The next step is to perform feature scaling which standardize features by removing the mean and scaling to unit variance.


sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Let's now create the neural network creating the function build_classifier() with one parameter which we will use in the next phase to find the best parameters.The sequential API allows you to create models layer-by-layer for most problems. It is limited in that it does not allow you to create models that share layers or have multiple inputs or outputs. In this function, we have created an object called classifier and then we add the hidden layers. We have created a neural network that in every hidden layer we have 6 units and we chose as activation function the relu function. In the output layer our neuron have one final layer with one unit and as final activation we use the sigmoid activation function.

 def build_classifier(optimizer):
    # Initialising the ANN
    classifier = Sequential()
    # Adding the input layer and the first hidden layer
    classifier.add(Dense(units=6, kernel_initializer='uniform', 
                               activation='relu', input_dim=11))
    # Adding the second hidden layer
    classifier.add(Dense(units=6, kernel_initializer='uniform', 
                                             activation='relu'))
    # Adding the output layer
    classifier.add(Dense(units=1, kernel_initializer='uniform', 
                                           activation='sigmoid'))
    # Compiling the ANN
    classifier.compile(optimizer=optimizer, loss='binary_crossentropy', 
                                             metrics=['accuracy'])
    return classifier

Essentially, i created our neural network in the function build_classifier with two hidden layers that every one has 6 units and it 's time to evaluate and optimize my choices with parameter tuning. So as you noticed in this neural network we have two types of parametes. We have the parameters that are learned from the model during the training and these are the weights and we have the parameters that stay fixed, the so-called hyperparameters. For example, these hyperparameters in our project are the number of epoch, the batch size, the optimizer or the number of neurons in the layers. Parameter tuning consists of finding the best values of these hyperparameters and the technique we are going to use called grid search taht basically will test several combinations of these values and will eventually return the best selection.


from sklearn.model_selection import GridSearchCV
from keras.wrappers.scikit_learn import KerasClassifier

classifier = KerasClassifier(build_fn = build_classifier)
parameters = {'batch_size': [25, 32],
              'epochs': [100, 500],
              'optimizer': ['adam', 'rmsprop']}

We created an object alled classifier and we passed as parameter our builded function and then we ccreated a dictionary called parameters with a selection of those between batch_size, epochs and optimizer. Subsequently, we crated another object called grid_search using GridSearchCV() function and then we fit our training and test set in our new object and finally we save our best parameters and the best accuracy.

grid_search = grid_search.fit(X_train, y_train)
best_parameters = grid_search.best_params_
best_accuracy = grid_search.best_score_

We are ready now to run my neural network and my results are that the best hyperparameters are batch_size=25, nb_epochs=100 and optimizer='adam' and our accuracy are 0.795.

Σχόλια


The Science & 

Mathematics blog

© 2017 by Athanasiadis Grigoris 

  • Facebook Social Icon
  • LinkedIn Social Icon
  • Twitter Social Icon
bottom of page