Optimization of Automated Teller Machines (ATMs) Locations using Location data and K-Means Clustering.

Musa Phiri
7 min readApr 8, 2021

Prologue

In recent months, I decided to further my understanding of data science by enrolling in the IBM Data Science Professional Certificate Course. Owing to my particular interest in data and how it is used to draw various insights, I found this course to be both interesting and quite challenging. While taking this course, I was able to build upon some of the skills that I have acquired over the years, but also, I learnt how to use various tools and methods such as IBM Watson Studio, CRISP and Python with its diverse libraries, some of which I came across for the first time. Being relatively new to the Data Science field, my aim in pursuing this course was to deepen my understanding of Machine Learning Algorithms and concepts, while giving me the tools needed as I take the necessary steps into fully plunging myself into data science as a career over the coming years.

That being said — The final project of this course in which many of the tools and methods learned throughout the recent months were applied in a self-chosen challenge using real-world data centred around a general idea of a “Battle of Neighbourhoods”, e.g. the use of location data to explore trending areas in different cities. I chose Vancouver city in Canada as my areas of study for this project. To complete this final project, I was required to create a Jupyter Notebook with Python as the programming language and a blog post or presentation. This is the required blog post. Hope you find it insightful.

Introduction

The Economist Intelligence Unit is a research and analysis division of The Economist Group providing forecasting and advisory services through research and analysis, such as monthly country reports, five-year country economic forecasting, country risk service reports and industry reports. Recently, the Economist Intelligence Unit released its “Global Liveability report” which ranked three of Canada’s cities among the top 10. These were Toronto, Vancouver and Calgary.

These cities were recognised as being some of the most welcoming cities with a strong intercultural population. With their thriving cultural services, restaurants, recreational facilities, stable economy, excellent health care, etc. All these are conditions that are very favourable for a variety of institutions. For example, financial institutions.

Financial service industries face increased regulations and disruption from new technology. Banks and financial institutions that want to maintain their competitive edge need to leverage location data. If retail banks are not able to meet these expectations, they will be left behind in this fast-paced industry. In order to provide the best user experience for their customers, one consideration could be the strategic placement of Automated Teller Machines (ATMs). Too many ATMs at one location can drive up operational costs, while too few limits the ability to provide adequate service to customers in the area.

In this project, I explored, segmented and clustered the neighbourhoods in Vancouver to make a recommendation for the optimum location for ATM placement.

Data Description

To consider the problem we can list the data as below;

  • For the neighbourhood data on Vancouver, I used an overview of boroughs, postal codes and neighbourhood data of the city found on Wikipedia.
  • I used Foursquare API to get the most trending venues or areas with the most foot traffic (Train Stations, Gas Stations, Supermarkets and malls) given the boroughs of Vancouver.
  • I used Geopy to geocoding my data.
  • K-Means Clustering Algorithm.

Methodology

In this section, I built the code to scrape the List of postal codes of Canada: V — Wikipedia Wikipedia page in order to obtain the data that is in the table of postal codes and transformed the data into a pandas data frame. The resulting data frame can be seen below:

Once that was done, I used geopy to add geospatial data to the data frame, that is the latitude and the longitude that can be seen in the data frame below:

Using python’s folium library and the data frame obtained above, I visualised the geographic details of Vancouver. I created a map of Vancouver showing all of its boroughs.

In this project, my main focus was on the use of mobility data to optimize the location of ATMs around the city of Vancouver. I considered four venues with a high density of footfall traffic, these were Train Stations, Gas Stations, Supermarkets and Malls. Using the Foursquare API to fetch all the ATMs, Train Stations, Gas Stations, Supermarkets and Malls. The map below shows areas with a high-density areas of footfall traffic denoted by grey dots, while the location of ATMs in the area are denoted by red dots.

From the map view, we can see that ATMs are located near train stations, gas stations, supermarkets and malls in most scenarios. Our main target area shall have more gas stations, train stations, supermarkets, and malls but fewer ATMs nearby. Before that, we need to cluster all these venues based on which borough they are found. To do this, I used geopy to make an API request to Nominatim (a geocoding software used for OpenStreetMap, Google Maps, Bing Maps, and many others). Then merged all this data with the original data frame I created in the beginning.

We can see that we some common venue categories in boroughs. It is for this reason I used an unsupervised learning K-means algorithm to cluster the boroughs. K-Means algorithm is one of the most common cluster methods of unsupervised learning.

K-Means Clustering

To find the clusters in the different boroughs, I first transformed the data frame with venues, associated with boroughs, by one-hot encoding (0/1), as seen in the picture below.

Next, I used grouping to show the frequency of each category of venue in each borough.

I used this information to create a data frame in which you can see the most common venue type for each borough.

Now, with all this data, I could run an unsupervised machine learning algorithm, more specifically, a k-means clustering algorithm from the scikit-learn package. I used the elbow method to systematically define the k-value.

From the figure above, we can see that the optimum k-value is 5. Having obtained the k-value, I was able to the algorithm and merge the resulting clusters labels with the data frame.

We can now use the cluster labels to show the boroughs marked with their different clusters.

Now, what is the final result of this exercise? We can now show the five clusters.

Cluster 1

Cluster 2

Cluster 3

Cluster 4

Cluster 5

Results and Discussion

From the data analysis and visualization, we can see that ATMs are usually located near shopping malls, train stations, gas stations and supermarkets, which inspired me to find out the areas with more footfall traffic and fewer ATMs.

After the K-Means Clustering machine learning algorithm, I got the clusters with the most footfall traffic and fewer ATMs on average. For more accurate guidance, the data set used can be expanded and the details of the neighbourhood such as population census data, crime data, other trending, etc can also be drilled in.

Conclusion

The purpose of this project was to explore areas in Vancouver and their respective distribution of ATM facilities and also find areas to install these ATMs.

After wrangling data from several data sources and cleaning them to obtain clean datasets, applying the K-Means clustering algorithm, I picked the clusters with more footfall traffic and fewer ATMs on average. Clusters 2 and 4 showed that ATMs were the least common venue/facility in those neighbourhood areas. This could be used as a starting point for final exploration by financial institutions wishing to expand their services.

Of course, the final decision on optimal ATM location will be made by the stakeholders based on specific characteristics of neighbourhoods, taking into consideration additional factors like the crime rate in the area, population distribution, etc.

Code

For those that would like to follow along with my code Github, the notebook is available to the public on my account. Feel free to contact me if you have any questions or comments.

References:

--

--

Musa Phiri

Machine Learning Enthusiast | Data Analyst | Statistician | Research Analyst