The Start-up File

11 June
2014

Big Data, FICO, and Start-ups

Data Science pivotal in the next battle in financial services

[Disclaimer - I know this area and am active in all of these components. I have a stake in this game, and a long history here, so am about as biased as one can be.]

Of news recently is "Levchin’s Payments Startup Affirm Has Raised $45 Million". One of many - see "Programmers Size Up Bank Borrowers With Algorithms Rather Than FICO Scores". Even more in the offing from my business contacts. We are on the verge of another spate of "me too" investments on both coasts. Not to mention internal acquisitions already under way. FICO scoring is under attack from long standing issues, exacerbated by the need to apply Big Data, and poor application of data science is where the battle lies. Read more.

Where to start? FICO scoring for rating has been around a long time, as has been my history in supporting financial services, consulting on financial services, and being a constant pain in the ass to a variety of attempts to further misuse the area as an attempt to channel interests into "ultraconsumers".

The irony is my involvement was by accident. It came about because of my background in a unusual area of mathematics, and a need to justify massive databases to address long term needs in financial services. I was a product manager of a technology to allow petabyte scale databases, at a time when the largest commercial databases were only just a fraction of a terabyte.

The question wasn't about growth, but how to apply the growth. Because the scoring techniques got worse with more data. Why? Those involved couldn't answer the question. But I had seen this before many times in the past. It was an artefact of the underlying mathematics.

So why care about this now you ask? In short, every direction you look in no matter what, has this problem in one form or another.

We are supposedly in a era of "Big Data", which is to most a marketing term. The knee jerk reaction is to ignorantly throw more data at problems, because, hey, "Big Data" is good, right?

Wrong. Suddenly things get worse. There have been hysterical "dumb uses" of big data lately, like Target shooting out photos of a hypothetical new baby to data mining of those considered likely to conceive one ahead of time, triggering a negative cascade from the customer base and loss of business as a direct result (see: How Your Data Are Being Deeply Mined). If you are a consultant in this area, you are well employed as most in management are under pressure to use more data, and are provoking constantly this kind of thing, because of poor application and misuse.

Back to decades earlier with FICO. Adding data, either intentionally or due to growth, tended to make things worse. Bring on then the deliberate misuse of scoring in addition - to favour a group that consumed more, thus the "ultraconsumers" who would drive your profitability more, yet because they were fewer in number, drive up your costs less. Sounds "simple" and great, right? So "cherry pick" best scores?

Wrong. This could and did drive up default rates, because many that scored well got locked into updating delays, overbought because they could and defaulted. Then the model got too proscriptive and went the other way. Locked out the "normal" purchasers who then "under bought".
All the while data growth is happening, and different kinds of data like social data started entering the area.

FICO is under attack from within by efforts to make it more relevant with offsetting means to "band aid" the past into the present, and by outside new start-ups trying to horn in. The latest is Max Levchin's "Affirm" startup, and his approach is to partner with the likes of Palantir to massively analyse data from all sources to promote under represented consumers, especially the youngest, to offset FICO lag times that lock them out due to things like student loans.

Could spend hours describing the hundreds of new approaches here with different aspects/upsides/downsides. Let me boil it down. It's not about the scale of data, but the mathematics. If the math and policy don't match, all of these may be equally doomed. So if its that easy as being "data science, why so much competition?

Math has always been hard to do. And sell. For it to be effective, you must prove it formally. Most that passes for data science right now is simply poorly justified marketing. One seems as good as another, and may provide enough apparent benefit, so they cling to it.

Back to FICO - "seemed good enough" was the answer I got decades back. Yet its been flailing for years. What would likely happen with a replacement for FICO? It would look good, and then be flailing for years. Why? Because the poor data science would not track "invariants".

The key to good datascience is invariants. Things that don't change. If you don't see them prominent, they probably aren't present at all. How do you sell a businessman on proving the need for determining things that don't change, when all he sells off of ... is change?

To measure anything, you need a good, stable reference.Getting and maintaining this is terrifically hard, because in financial services all manner of working the numbers are in action.

The good thing about all this is that we finally have a way to make things better, because past practices aren't good enough. I look forward to the battle and its many new combatants. At least the financial services side now gets the chance to increase its GDP contribution tenfold as the potential benefit of this long term. Who actually profits by it is by far unclear at this point.

Battle on.


	The Start-up File
	William Jolitz on the newest businesses