Investment banks have been slow to embrace big data, but the sector is finally starting to realise the benefits. For the investment world, big data is providing technologies that help managers solve trivial problems or work on huge datasets that create new or better information. It’s also creating more jobs across the financial sector.
Throughout my career, I have worked largely on big data projects including those taking place within the investment banks. I’m well-placed, therefore, tell you about the tools, technologies and skills that, from my point of view, are the most important.
In the investment banking field, until recently, quants took the key role in crunching the data used by investment banks. More recently, data scientists have moved in. Smart developers combine IT skills with finance and mathematics, and they are becoming increasingly important. In the past, developers were only implementers of algorithms created by quants. Data scientists are able to create new kinds of information and implement them, and they are becoming vital for banks and financial institutions.
Most investment banks keep their big data work under wraps, but we can say that there are two main groups of projects related to this topic:
1. Data analysis: Analysis of a huge quantity of unstructured or structured data to provide effective advice about investments. Take for example the natural language processing of news that generates indicators for traders (or trading platforms).
2. Predictive analysis: This mainly serves to forecast what a market, index or other products will be like in the future. This is done by analysing thousands of pieces of information and correlating them with each other. Some projects that I have seen on this topic are related to the expansion of previously created tools by leveraging traditional databases, and these tools are now used on a massive scale.
‘Data scientist’ can be sometimes something of a nebulous term and, predictably, investment banks are more selective than most with their recruitment process.
There are a number of skills that you need to have in order to be considered. On the IT front, Hadoop is clearly an important tool to know. However, I have only seen it used in a few projects, since greater importance is still given to custom environments, usually created in C/C++ or Java.
For the programming languages, C/C++/Java are still the best ones to know, even though Python and R are emerging in projects and they are quite effective. Apart from the skills that can be learned at university, I think that a key IT skill is the ability to combine and analyse massive datasets that are unrelated. For example combining financial data with geographical information.
Related to hardware, it is important to remember that investment banks are mostly *nix world (Linux etc.) so having some knowledge of this system is always essential, together with SQL (which will not disappear in the near future) and NoSQL knowledge of databases.
Then there are the soft skills; people must be very curious and constantly interested in deep investigation of new ideas and technologies to find possible answers to complex and evolving problems.
Then, of course, there are the all-important maths skills. Ideally candidates must have a clear understanding of statistical analysis, predictive modelling and possibly calculus (including differential, integral and stochastic forms).
Anyone wishing to become a data scientist should be excited about the opportunities available now. So far, it’s still difficult to spot candidates straight out of university, so there are many specific course available that develop the necessary skills. It’s the first wave of big data technology, and the truth is that investment banks are only just discovering how to best leverage this technology for their own purposes. It’s therefore a good time to try and get into the sector.
Marco Visibelli is a data scientist who worked for IBM before founding Kuldat, a Big Data application that companies use to gain useful sales and marketing insights, analyze their feasibility, and present possible outcomes.