The Signal and The Noise

I’ve recently finished reading Nate Silvers the signal and the noiseTrained as an engineer, now working as a data scientist, this book appealed to me.  I figured that I could reap some important lessons that I might be able to leverage professionally.  I found that the book drove through several crossroads of behaviour and incentive, belief and evidence.  Nate Silver can tie statistical analysis into a compelling narrative, similar to what he has been doing on Five Thirty Eight.


I found that the two most applicable lessons in business from Nates book were the tale of the fox and the hedgehog, and a revisiting of Frequentism vs Bayes.

Foxes and Hedgehogs

In the world of business intelligence, there are elements of awareness and understanding of what leadership in the organization is thinking.  These thought processes manifest themselves in the types of questions that are asked, or perhaps the questions that aren’t asked.  To be a successful analyst in this space, you often have to question the question.

There’s a long list of organizations that were not asking the right questions until it was too late.  Some of it’s a function or leadership and direction, big egos and hubris, or simply the wrong incentives.  Maybe you’re blockbuster turning your nose at Netflix, when the former is making money and the latter was struggling.  Maybe you’re HP, acquiring Compaq and investing into the desktop when you should have been investing into the cloud.  Perhaps you’re blackberry, and you want to build a keyboard style phone when the market has clearly gone touchscreen.  You get the idea.  The point is, these organizations were so deeply invested into an ideology and a business model, that they could not acknowledge the red flags in front of them and pivot.  A sunk ideology is a sunk cost, it takes chutzpah to admit you were wrong, especially at a leadership level.

Nates interpretation of Osaiah Berlin’s Fox and the Hedgehog is very apt in the business world.  Foxes and hedgehogs come from a Berlin essay on the Russian novelist Leo Tolstoy.  “The fox knows many little things, but the hedgehog knows one big thing” – Archilochus.

From the book;

Hedgehogs, have more trouble distinguishing their rooting interests from their analysis.  Big bold predictions.  Hedgehogs are type A personalities who believe in big ideas – in governing principles about the world that behave as though they were physical laws and undergrid virtually every interaction in society.  Too stubborn to learn from their mistakes.  Fantasize they will make  daring, audatious, outside of the box predictions.  Breaking from consensus.

Foxes, on the other hand…

Foxes, are scrappy little creatures who believe in a plethora of little ideas and in taking a multitude of approaches towards a problem.  They tend to be more tolerant of nuance, uncertainty, complexity, and dissenting opinion.  If hedgehogs are hunters, always looking out for the big kill, then foxes are gatherers.

With some context, it becomes easier to break down leadership and constituent individuals in an organization into these categories.  Hedgehogs are by their nature specialized, stalwart, stubborn, order-seeking, confident, and ideological.  They are individuals who’ve become specialized in a domain, and have a sunk ideology in that domain, where the rules abide by simple governing relationships (order-seeking), and are stubborn to change their perception based on new information.  The model has been over fitted.

Foxes happen to make better predictions.  They are quicker to recognize how noisy the data can be, and they are less inclined to chase false signals.  They know more about what they don’t know.  Foxes also have trouble fitting into type A careers like business, their belief that there are uncertainties are taken as weakness.  Quite a lot of evidence suggest aggregate or group forecasts are 15-20% more accurate.  Foxes emulate this consensus process.  Keep asking questions of yourself and your organization.  How often do you work in an organization where you hear people say “I don’t know”?

The reason I find this applicable in the data analytics field is in the manufacture of consensus.  Individuals who have big ideas (hedgehogs) are challenging to convince, even with compelling data, because they’ve already sunk so much into their ideology.  Those who are filled with some doubt or cynicism are easier to align with, because they may have the sophistication to know that they don’t know.  You may have the data to paint a picture of the business circumstance that’s 90% accurate, leaving the hedgehog to dismiss the 90% known on account of the 10% unknown.  When you deal with a fox, the difference will be that the fox will want to know more about the missing 10%, and will seek to understand.  These people are the ones that are curious and will allow you to iterate quickly and refine your predictions.  They are the ones who enable Bayes.

So there you have it, those who keep asking questions can easily fall into the fox category, and those who stick with a “that’s the way it is” will remain in the hedgehog category.  Seek out the former, ignore the latter.

Frequentism vs Bayes

Nobody in business wants to hear about big data any more.  It’s a worn out promise that has met it’s expiration date.  It sounds obvious, but a lot of data does not mean a lot of knowledge.  These days, business is looking for good analysts to find that value, which is a hard skill to quantify or qualify.  There’s a lesson for the analyst in discussing these statistical ideologies.

Frenquentism probability is the belief that more trials in your sample space will yield a better approximation of the true frequency of an event occurring.  In other words, the more data you have, the more accurate your predictions.  Frenquentism is subject to the wicked bible problem.  Bad data, or bad ideas are easy to replicate, and easily pollute the sample set.  Frequentism is the big data promise that organizations have lost faith in.

Bayes probability assumes a prior probability.  Fundamentally, it’s an iterative ideology.  The following explanation may be a bit much for you, perhaps the Ashley Madison leaks will help keep your attention…

Bayesian probability is as follows;

\displaystyle{\Pr(\mathrm{A}|\mathrm{X}) = \frac{\Pr(\mathrm{X}|\mathrm{A})\Pr(\mathrm{A})}{\Pr(\mathrm{X|A})\Pr(\mathrm{A})+ \Pr(\mathrm{X | not \ A})\Pr(\mathrm{not \ A})}}

The probability of A given X (the odds A will occur if X does) is equal to the probability of X given A  multiplied by the probability of A and divided by the probability of X.  The probability of A P(A), is called the prior probability.  This means you have some past experience that this event will occur.  This means that with every iteration of Bayes, the likelyhood will change, thus refining the probability.  Nate’s book explains it best.

Suppose you are living with a partner and come home from a business trip to discover a strange pair of underwear in your dresser drawer. You will probably ask yourself: what is the probability that your partner is cheating on you?

We’ll assume B is the probability of the panties, and A is the probability of him cheating on you.

If he’s cheating on you, it’s certainly easy enough to imagine how the panties got there. Then again, even (and perhaps especially) if he is cheating on you, might expect him to be more careful. Let’s say the probability of the panties appearing, conditional on him cheating on you, is 50 percent.  So P(B|A)=0.5.

So what is the underwear were a gift, and was left by a friend, or there was a luggage mix up, (tee hee!).  This is just the probability of the panties, no condition.  Nate puts this probability at 5%.  So P(X not A)=0.05.

The final piece is the prior probability.  What is the probability that, had you never found the underwear, you think your partner is cheating on you.  Most studies show that around 4% of married partners cheat on their spouses in any given year.  So our prior P(A)=0.04.  This means our inverse of the prior P(not A) is 0.96.

So this means the probability of cheating give then panties P(A|X) = (0.5*0.04) / (0.5*0.04) + (0.05)*(0.96) = 29.41% probability that your husband is cheating on you.

Now to expand on Nates example.  Say there’s fancy meals and flowers, and all is forgiven.  Say you the find another pair of panties two weeks later.  Well, now your prior probability in this case is 29.41%….. much much higher than the last prior of 4%.  So now the probability is:

P(A|X) = (0.5*0.29)/(0.5*0.29) + (0.05)*(0.70) = 0.805 or 80.5%.  As you can see, the likelihood of infidelity very rapidly approaches 100%.  In other words, if cheating partners were rational and used Bayesian probability to access their relationships, flower vendors would soon be out of business.

So in other words, a smaller prior probability decreases the denominator.  What this iterative process does, is really weed out the outliers.  While that may be a little too boring for some, it should be squarely within the realm of the analyst.  Instead of thinking of Bayes as a method for improving spam filtering, or understanding a data set, we can instead give it another narrative in business that will better serve the analyst.

Frequentism vs Bayes – The Lay Explanation

If frequentism is big data, than Bayes is the iterative process.  Big Data implies more of a result, such that we’ve got all of this information, we’ll have an analyst sift through it, come to a conclusion, et voila!… business intelligence.  I’ve seen big BI services and systems be built to answer “the question”, when in fact the question itself has changed by the time the system is ever deployed.  Business intelligence is better narrated in the context that it’s a journey.  Strive to build an architecture that can answer the question quickly, even if you incur technical debt doing so.  Business intelligence and analytics needs to be sold to the business as an iterative process, not a result.  We’ll throw some spaghetti against the wall, and see if it sticks.  Answering one question may pose several more.  We might have an iteration that is a complete wash, where nothing at all is learned, other than the approach should not have been taken.  Business intelligence needs to be sold as a fail fast, fail early, and fail often.  If you can garner this support in your organization, you will be able to build the trust you need in order to pursue your analytics projects.



Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise. -John Tukey
The plural of anecdote is not data. - John Myles White

Recent Posts

RSS PowerBI blog