Tag Archives: Prediction

The Problem with Big Data: Lies, Damn Lies, and Statistics

I’ve used the subtitle in a previous post and I think the application to the content of this post also makes it worthwhile to use again. I was reading a post from Tim Ferriss the other day and it made me think of statistics. The post is about alternative medicine, but understanding that isn’t entirely necessary for the point I’m making. Here’s some context:

Imagine you catch a cold or get the flu. It’s going to get worse and worse, then better and better until you are back to normal. The severity of symptoms, as is true with many injuries, will probably look something like a bell curve.

The bottom flat line, representing normalcy, is the mean. When are you most likely to try the quackiest shit you can get your hands on? That miracle duck extract Aunt Susie swears by? The crystals your roommate uses to open his heart chakra? Naturally, when your symptoms are the worst and nothing seems to help. This is the very top of the bell curve, at the peak of the roller coaster before you head back down. Naturally heading back down is regression toward the mean.

If you are a fallible human, as we all are, you might misattribute getting better to the duck extract, but it was just coincidental timing.

The body had healed itself, as could be predicted from the bell curve–like timeline of symptoms. Mistaking correlation for causation is very common, even among smart people.

And the important part of the quote [Emphasis Added]:

In the world of “big data,” this mistake will become even more common, particularly if researchers seek to “let the data speak for themselves” rather than test hypotheses.

Spurious connections galore–that’s what the data will say, among other things.  Caveat emptor.

This analogy reminded me of the first time I learned about correlation and causation in my first psychology class as an undergraduate. It had to do with ice cream, hot summer days, and swimming pools. In fact, here’s a quick summary from wiki:

An example of a spurious relationship can be illuminated by examining a city’s ice cream sales. These sales are highest when the rate of drownings in city swimming pools is highest. To allege that ice cream sales cause drowning, or vice-versa, would be to imply a spurious relationship between the two. In reality, a heat wave may have caused both. The heat wave is an example of a hidden or unseen variable, also known as a confounding variable.

Getting back to what Ferriss was saying near the end of his quote: as “Big Data” grows in popularity (and use), there may be an increased likelihood of making errors in the form of spurious relationships. One way to mitigate this error is education. That is, if the people who are handling Big Data know and understand things like correlation vs. causation and spurious relationships, these errors may be less likely to occur.

I suppose it’s also possible that some, knowing about these kinds of errors and how little the average person might know when it comes to statistics, could maliciously report statistics based on numbers. I’d like to think that people aren’t doing this and it just has more to do with confirmation bias.

Regardless, one way to guard against this inaccurate reporting would be to use hypotheses. That is, before you look at the data, make a prediction about what you’ll find in the data. It’s certainly not going to solve all the issues, but it’ll go a long way towards doing so.

What if Predicting the Future is a Skill?

In the shower this morning, I was thinking about some old research. Old since it’s almost 10 years old — so not “old,” per se. Anyway, I was thinking about those experiments where a person’s body knew before a startling picture was about to appear before them. In layman’s terms — predicting the future. Then I thought (because all great ideas start in the shower, right?) what if predicting the future is a skill… and we just have to develop it!

I’ve written before about the evidence for predicting the future (precognition) — there’s lots of evidence to support that this phenomenon exists. There’s also lots of research that talks about — at birth — we have the capacity to speak every language. I should say, we have the capacity to develop the ability to speak every language. It has to do with connectivity in our brain, phonemes, and the like. So, isn’t it possible that there are also neural pathways that could be developed to improve our ability to predict the future?

And if this were the case, isn’t it possible that we can also develop this skill later in life. There are infinite examples of people learning new languages after the so called “do or die” time when they’re babies, so isn’t it possible that people could then develop the ability to predict the future later in life, too?

I don’t have any definitive answers to the questions I’m asking, but it’s certainly a thought worth entertaining this Friday morning.

Higher Education is More Like Telecommuting and Less Like Newspapers, Part 1

I came across an interesting article in The American Interest magazine a couple of days ago. It was by way of tweet (as it most often is). This tweet came from one of the professors at George Mason University, Prof. Auerswald. He’s done some really cool stuff, so be sure to check ’em out! The tweet which led me to the article:

Intriguing, yes? Well, it was to me, so I proceeded to read the article from the magazine. As for the argument that universities are going the way of the newspaper because of the internet — I don’t necessarily agree with it.

In fact, I think that higher education will go the way of telecommuting more than it will the way of newspapers. What do I mean? Well, telecommuting first became popular last century. It only existed as a possibility from about the 1970s on. By now, you’d expect that lots of people would telecommute, right? Depending on your definition of lots…

Total Number of US teleworkers

This graphic shows that there are only about 3 million total employees who telecommuted in 2011. If I were asked to guess in 1990s how many folks would be telecommuting in the 2010s, I would have guessed waay more than 3 million — as I’m sure most people would.

Higher education — learning — has, for the most part, been an in-person thing. People enroll in university and spend the next 4-5 years living on- (or off-) campus taking classes. In that time, they may also join student organizations, hold internships, and meet a whole bunch of new people. Some of those people become their friends for the rest of their lives.

MOOCs do not have the same qualities of in-person education. Learning online (or on your own) won’t necessarily reap the same benefits of attending university.

I understand the argument and the correlation between newspapers and higher education makes sense, but I just don’t buy it. I don’t believe that higher education will go the same way as Newsweek or other publications. Higher education is more than just the degree. That’s not to say that some consumers won’t choose to go the way of online learning, but I don’t think that it will pull enough folks away from wanting the in-person learning. This is why I think MOOCs and online education is more likely to go the way of telecommuting.

That being said, I do think that MOOCs present a major threat to the higher education market because consumers will perceive it as a shortcut to a degree.

And more than that, I think that advances in telecommuting could shift the way we telecommute — and by extension — higher education. In fact, I remember during the 2008 election, CNN had a “virtual presence” technology wherein one of their guests was somewhere else entirely, but there was a holographic representation of them in the studio (with which Wolf Blitzer was interacting). That was 4 years ago!

I don’t know what happened to that technology (if it’s being developed for commercial use, etc.), but I think that could seriously change the way we interact. I think if that technology were introduced on a larger scale, that would certainly increase the number of telecommuters. Similarly, I think that would have a chance at seriously changing the face of higher education. This technology, assuming it’s “just as good as being there,” would allow folks to be in the comfort of their basements (or virtual presence studio?), while still being at work or in a classroom.

Just as a closing: anything written about the future is inherently flawed. There’s no way to know (for sure) what will happen or won’t happen in the future. So, while these are some predictions or guesses I’m making about the future, they may turn out to be wildly wrong (or surprisingly right).

Note: After writing this, I realized that there were a few more things I wanted to touch on. Look for Part 2 tomorrow!