Skip to content

Python code for the final thesis for the computer engineering degree at Pompeu Fabra University, titled "Automation of risk profiling and asset allocation processes with machine learning".

Notifications You must be signed in to change notification settings

simoncraf/skarb

Repository files navigation

Skarb - Automation of risk profiling and asset allocation processes with machine learning

This repository hosts the code of my final thesis for the computer engineering degree at Pompeu Fabra University, titled "Automation of risk profiling and asset allocation processes with machine learning".

Description

This end-of-degree project has consisted of the execution of different clustering algorithms in order to be able to classify the population based on their risk profiles in order to later, through Reinforcement Learning models, be able to execute optimal investment strategies for the future. risk tolerance level of each individual.

Abstract

Asset allocation is one of the essential components of wealth management activities. It consists of advice to a client on the best investing strategy considering his risk profile based on his actual position in personal and economic terms as well as their goals for the future and the general behavior of the markets. It is a complicated task because there are many factors to take into account, and in fact there is no standard in the industry to make this type of prediction despite the insistence of the institutions to regulate and standardize this kind of process to make it more transparent for consumers. In order to predict the risk profile, the clients of the advisors usually fill in forms with standardized questions and that sometimes do not reflect the real situation of the client since they do not take into account relevant information. The advisor then takes this information and develops an investment strategy that on several occasions does not correlate with the ideal level of risk previously predicted. The objective of this work is to try to simplify the full process automating it with the help of machine learning.

With this goal in mind, this task has been divided into two phases. The first phase is focused on identifying the customer's risk tolerance. For this, demographic, financial and psychological traits extracted from the database containing the results of the 2017 Financial Survey of Spanish Families have been considered. During the process, different techniques have been used to reduce the dimensionality of the data as well as for its visualization. First the data has been duly processed selecting those features that are most useful according to the current financial literature, and combining some of these features so that they have even more relevance, while simplifying the task. Some outliers in the financial features have been eliminated to improve the clustering process. For the visualization of the data, classic Python plots have been used, as well as the algorithm for the visualization of data of more than three dimensions t-SNE.

The next step has been to evaluate different clustering algorithms trying to tune the parameters of each one as best as possible in order to choose the best one. Finally, spectral clustering has turned out to be the most efficient procedure for the research data. Then the clusters have been analyzed and the main characteristics of each one have been extracted. Comparing these characteristics with the current literature on risk tolerance, each group has been provided with a certain level of tolerance to market volatility. Among the different clusters, five different risk profiles have been detected. To further personalize the risk tolerance level of each individual, the z-score of each feature within each cluster has been calculated to assess the total deviation between the responses of an individual and the mean of the values of its cluster. This z-score has been added to the cluster score to create a final and individual score.

The next phase has consisted in the application of reinforcement learning techniques to create an agent capable of organizing a portfolio containing the risk below the limit assigned to each individual in the previous clustering phase. To simplify the task, only two types of assets have been considered for the asset distribution model within the portfolio. These assets are: stocks and bonds. A personalized environment has been created in which an agent can change the distribution of assets within the portfolio and is rewarded or punished based on the portfolio's return and the risk assumed in relation to a certain risk tolerance level. The agent can see the prices of the instruments in a given number of days and the value of the portfolio at any given time, and based on this, takes actions in the form of distribution weights. The result is a distribution of the weights of each asset in the portfolio.

A small graphical interface has been developed in the form of a web application so that a user is able to carry out the entire process from the internet. From this interface it is therefore possible to fill in the necessary information to identify the level of risk of a user, and later see the best distribution of assets taking into account the maximum level of risk tolerated by the user.

In conclusion, the objective of this final degree project was to verify that thanks to machine learning techniques It was possible to automate these complex and sometimes opaque processes such as risk profiling and asset allocation and has achieved its goal. With these methods, and the graphical interface provided, a user could obtain a realistic estimate of her risk profile and an investment strategy consistent with that risk profile, without the intervention of any human advisor. Despite this the parameters used in the different processes, as well as the criteria for selecting certain elements, could improve with greater knowledge in the financial sector. Therefore this does not mean that human intervention is no longer necessary in this type of process, but it is a good starting point for an initial orientation of the user.

Files

  • In the Financial Risk Tolerance folder are the different files to preprocess the Financial survey dataset and execute the different clustering algorithms. These are the clustering algorithms that have been tested, and that have been tried to be optimized by the most recent literature: K-Means, DBSCAN, OPTICS, BIRCH, Gaussian Mixtures, Agglomerative Clustering and Spectral Clustering.
  • In the RL folder are all the files for the creation of the environment and for the training of the RLlib agents.
  • In the Skarb folder there is a Streamlit webapp to interact with the trained models
  • In the eff folder there are the different financial surveys for the clustering step.

Results

This research project has shown that it is possible to automate risk profiling and asset allocation processes through the use of machine learning from start to finish. It has begun by finding and polishing relevant data to be able to segment the population into different risk profiles. To achieve this, numerous clustering algorithms have been used, as well as different metrics for cluster validation and techniques to tune the hyperparameters of each algorithm. Finally, the spectral clustering method has turned out to be the most effective for the data with which we worked. Once the clusters have been obtained, a further step has been added to personalize the process using the z-scores. Subsequently, different Reinforcement Learning agents have been trained in an environment specifically created to train agents who not only prioritize the return on investment, but also take into account the tolerance and risk capacity of each individual. The results of the agents have then been compared with a reference benchmark such as the Dow Jones 30 and it has been shown that the Proximal Policy Optimization (PPO) algorithm performs much better when it comes to protecting the user against risk taking into account the user's risk profile and without having to sacrifice returns. Finally, a small but practical application has been created so that these models can be accessible to the public.

Clustering

Result of Spectral Clustering visualized with t-SNE.

image

Trading strategy results depending on tolerance to financial turbulence

image image

Skarb webapp built with Streamlit

About

Python code for the final thesis for the computer engineering degree at Pompeu Fabra University, titled "Automation of risk profiling and asset allocation processes with machine learning".

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published