14 February, 2022 ML to accurately cluster the banking customer Is it possible to gain a deeper understanding of a customer portfolio without invading the personal privacy of those who make it up, even in areas as sensitive as banking? A national bank asked us this question and our Data Intelligence and Machine Learning solution gave them an answer. One of the most important challenges for any company is to offer products that meet the customer's needs as closely as possible. But to do so, it is essential to have a deep understanding of what customers need. It is possible to offer products and services that satisfy them by understanding their demands, habits and other personal characteristics. However, we do not always have all this information at our disposal. In most cases, it involves personal and sensitive data, especially in sectors such as healthcare or banking. One of the main Spanish banking institutions wondered how to improve the use of their databases so they could offer products and services that are better adapted to their customers' profiles. And they wanted to go further: how could they do the same for those for whom they had little or no information? The solution to meet this challenge lay in Data Intelligence and Machine Learning implementation. That is why they turned to eºmergya, a Google Cloud ML specialist partner that had already provided solutions to other challenges previously posed about Business Intelligence and Conversational Intelligence. Improving the clustering of the known customer The challenge required the achievement of several objectives. Firstly, it was necessary to develop an advanced analytical model to improve the clustering and segmentation of the known customer portfolio, for example, those for whom the bank had a lot of information in its database. Until now, clustering was executed through the CRM used by the bank. However, the grouping of customers was not sufficiently precise and, as a consequence, neither were the sales actions carried out based on this segmentation. Improved clustering was essential to fine-tune sales actions in the known customer portfolio. But, in addition, the model applied would be exported to the unknown customer portfolio. In other words, it was not only a matter of improving the clustering of customer segments where there was a lot of information but also of establishing a model that could determine patterns and coincidences in those other portfolios for which there was hardly any data. The next challenge was related to the volume of data. For Machine Learning models to be effective, they need to be fed with large amounts of information. However, as mentioned above, the model developed was to be applied to a customer segment for which little data was available. Therefore, the use of external sources was considered to cover this lack of information. In parallel, work would be done to maximize the use of the public cloud, taking advantage of Google Cloud's most advanced Artificial Intelligence tools. The importance of data volume and data quality Before developing the model, the team needed to understand what data was available and how it was structured. It was also essential to clean and normalize the different data to work with it. A system for exporting data from internal systems to the cloud was also proposed, ensuring the secure processing of anonymized data. In other words, the ML model would work with totally anonymous data so that no specific individuals could be identified, but rather sets of customers grouped by similar characteristics. In total, almost 1,200,000 user records were exported from the entity's history. With the infrastructure defined and the data exported, the next task was to tackle the clustering of known customers. We started by analyzing some 25 user variables to find the most optimal ones for clustering. Different clustering models were also studied to find the one that offered the best return in terms of cost vs. results. This resulted in customer groups with which to develop well-targeted commercial campaigns. Implementing the ML model to the unknown customer One of the main challenges of the project was to obtain more data to apply the developed solution to the unknown customer's portfolio. For example, a vital piece of information to carry out marketing campaigns for banking products is purchasing power. However, in this customer portfolio, this data was undetermined. Thus, to define the income level, we resorted to the public sources of information, evaluating a total of 10 tools from the National Statistics Institute (INE). The income level of 96.1% of the unknown customers was obtained with a high level of reliability thanks to the data obtained from the different sources of information that were processed with the ML model developed. A scalable solution with great projection The main result obtained with the Machine Learning model development has been the possibility of carrying out the clustering of the unknown customer portfolio. This has been achieved by associating this portfolio with that of the known customer, which results in a high potential for use in real campaigns. In addition, thanks to the external sources and the incorporation of Artificial Intelligence tools, it has been possible to obtain a projection of the income data of the little-known customer solely from its address. The architecture was built on the Google Cloud, chosen for its efficiency. And a secure export system was created from the bank's Big Data systems. The ML models implemented, which have also managed to improve the known customer clustering, do not require new training. Therefore, they can be productized at a relatively low cost.