Same Same but Different
After talking to over 300 Finance professionals over the past couple of years either face to face or online there still appears to be a lot of uncertainty and confusion out there relating to analytics and modeling.
Despite writing a blog on the differences a year ago (albeit FM vs Analytics), many were not clear on the differences.
It's almost analysis paralysis with all the jargon and especially when misused and twisted by software vendors around what it means to do modelling and analytics within the packages they are selling.
The purpose of this blog is really to go a little deeper into predictive analytics and financial modeling.
Now, I might be a financial modeling expert but I will be the first to say that I am not an expert in Predictive Analytics. Why not??? you probably asking, and that's because it's actually fundamentally a different skill set.
The software applications and languages / coding used at times are also significantly different to each other.
So, to give this blog a bit more credibility and balance I have decided to co-author this blog with Igor Panivko. Igor is the CFO if Konika Minolta and sits on the board of both the Ukraine and Russian entities.
He has impressively largely replaced the finance "human" team with automated BI and analytics capabilities meaning that he is then able to capitalise on AI and ML to perform advanced predictive analytics.
I have already learnt a lot from Igor, we both sit on an AI/ML committee alongside Larysa Melnychuk and Irina Steenbeek (amongst a few others) on behalf of FPA Trends. I hope those reading this will all also learn something from this blog and more of it to come.
For the purpose of this blog we are completely excluding generic data analytics terms and associated systems like PowerBI, Tableau, Qlik etc as none of these contain any native modelling or more complex predictive capabilities.
Our Personal Definitions
Financial modeling (FM)
I believe it's the overall skill of making better business decisions using well-constructed Excel model by applying business logic, accounting and problem solving.
At the basic end this might include revenue or cost analysis and dashboards with simple formulas and at the more complex end an integrated 3-way, structured referenced, advanced scenario management and Monte Carlo simulation based model. . This could even be coupled with some VBA coding and advanced technology to enhance visual influencing dynamically and live.
Predictive analytic (PA)
Igor believes it’s a set of statistical tools designed for finding patterns in the behavior of the targeted outcomes by finding dependencies between dataset predictors and the target.
The PA process is based on the search for optimal statistical algorithms which fit the best into the prediction of an outcome through either the highest accuracy rate for a classification problem (discrete outcome prediction) or the minimum error rate for a regression problem (prediction of continuous values of an outcome).
Predictive analytics is actually the same as a supervised machine learning and classic statistical learning and any of these concepts can be used interchangeably.
Differences / Similarities
However, definitions don’t always make it clear as some words can still be confusing. So, to make this even easier to dissect we have summarised this into key segments namely
· Computational complexity,
· Data requirements,
· and Computing power / hardware
Understanding of finance and economics of the key value drivers of performance.
Having an intuitive gut feel whether the projected results appear logically reasonable.
Having a healthy dose and professional scepticism on the projections. Irrespective which is used neither is 100% accurate, but some are pretty close.
PA has to deal with cleaned and well-prepared datasets whereas FM may be applied on very small pieces of unconnected data or without data at all.
PA has a very heavy reliance on statistics and mathematical calculations including complex algorithms and coding. This is coupled with certain program languages like Python and R.
FM may require you to use more advanced systems and functions. For example, this might include writing or reading VBA and perhaps Index Match vs Lookups, offset, nested if statements, goal seek and randbetween etc. However, this isn't an explicit requirement unlike PA which simply isn't possible without complex mathematical algorithms and coding.
I think that FM more relies on the intuition of a modeler rather than patterns derived from data.
Both approaches use software tools such as data frames: data with columns and rows.
In FM language, it’s a spreadsheet with either simulated or manually prepared data.
In PA language it’s either a matrix or a data frame as in Excel uploaded from another system, database or another Excel file.
Both heavily rely on important software techniques as a data transformation, for example combining several tables, multiplying the whole table or one column (row) by either a number or another column, row or even table.
Both use cross-tabulation, like pivoting and unpivoting. All these functions are embedded in Excel and in both R and Python programming environments.
PA deals with the programming environment rich with embedded statistical libraries. For example, the R statistical language has more than 10,000 libraries.
FM, for those really true diehard fans and experts will still largely be Microsoft Excel. Yes, Excel on Apple, Google sheets and other knockoffs are just not up to scratch for experts. You certainly won't see any of the Modeloff competitors using these other spreadsheet programs.
You also won't be able to do any VBA automation, run Monte Carlo simulation and other complex FM techniques. You may even struggle to find goal seek which is incredibly useful for modeling.
Not a lot.
Highly complex statistical based algorithms probably considered to be the highest form of complexity there is from a pure mathematical perspective. This is why this article is co-authored with a expert, like Igor.
The level of complexity is sometimes already beyond human understanding as there are certain PA techniques which work like black boxes and do not show how they reached the highest level of prediction.
Certainly some formulas can get reasonable complex like deep nested if statements, or reverse engineered goal seeks or XMIRR calculations etc...but for the most part complexity can also be avoided by breaking out logic into simplified IF statements which are reasonably simple to write and follow. VBA can also be avoided completely by simply repeating a process with the F4 key.
Both require accurate data to provide any sense of reasonable insight to support/validate assumptions.
Most importantly would be actual data against which projections/forecasts can be compared to for confirming the accuracy of the predictions.
They both therefore need to be tweaked, updated, refined based on real live data to also gain increased confidence on the predictive outcomes.
The data collected should have a reasonable level of controls to provide assurances of its accuracy i.e. garbage in garbage out.
PA may work well both on small and big datasets, it depends on the skills of a modeler to choose the optimal algorithm. Sometimes, algorithms prepared on large datasets do not work better than smaller data sets that are well-prepared.
Many PA algorithms can work well on small datasets and show quite reliable results, but it’s not possible without any data.
Processing big volumes of data maybe be beneficial but is not always necessary. Usually, the dataset preparation takes 70-80% of all the research time and it may shrink the initial raw data ten-fold. Quite often, it is enough to run a test model on a subset of data before launching a powerful server for enormous calculations.
For the purpose of analysis, PA modeler often uses mathematical transformation of the raw data, like rescaling, logarithmic transformation of numbers, turning words into numbers, etc.
For FM, it is possible and very often the case that almost no data is required to build a financial model purely based on someone's professional judgement and business idea.
Many startups have great ideas but struggle to pull this succinctly into a story of numbers...sort of like paint by numbers. It is therefore quiet common to build a financial model forecast with no historical data.
This where the art vs the science of FM comes into play.
Computing power / hardware
Both run better on higher performance machines and processors. Especially when the model contains multiple workbooks each 5-10mb (xlsb format) and all need to be opened at the same time...meaning sometimes even 8gig of memory isn't even enough.
It is possible, albeit not advised, to build and run a financial model on a Pentium 1 machine with Windows XP. I kid you not, someone attended one of my courses with this setup. I politely explained coming to a future of financial modeling course with such an old setup isn't going to be fun. It's like turning up to a cricket game with a baseball bat. It's possible to hit the ball, just don't know if it's the best idea.
It is possible to run most of the PA algorithms on a single machine, like a standard laptop with 8-12 RAM. A dataset with size of 1-5 Gb can be run on most of personal laptops. It is quite rarely that such huge volume of data is needed to be analyzed for a research purpose. If the data is properly cleaned the required dataset can be downsized ten-fold and successfully be run on a standard laptop.
In some cases, PA might need to process huge volumes (many gigabytes or terabytes) of data then cloud solutions can be used.
Cloud based solutions for Excel based modeling is rare. There are some forecasting tools that use cloud, but these tools don’t provide the sort of complex financial modeling techniques like Monte Carlo simulation or multi-dimensional scenario management.
Although there are some substantial differences between FM and PA, both help us cope with uncertainties and make better decisions.
We can make financial models without data using our experience and business intuition and be very accurate with our forecasts and we can also make very good models based on proven predictive algorithms built on various data sources.
It would be nice to have both approaches working and supplementing each other.
There is no a competition between them, rather an opportunity not to rely heavily on only one way of decision-making but to combine and innovate.
Written by Lance Rubin and Igor Panivko