The Problem with Big Data: Lies, Damn Lies, and Statistics

I’ve used the subtitle in a previous post and I think the application to the content of this post also makes it worthwhile to use again. I was reading a post from Tim Ferriss the other day and it made me think of statistics. The post is about alternative medicine, but understanding that isn’t entirely necessary for the point I’m making. Here’s some context:

Imagine you catch a cold or get the flu. It’s going to get worse and worse, then better and better until you are back to normal. The severity of symptoms, as is true with many injuries, will probably look something like a bell curve.

The bottom flat line, representing normalcy, is the mean. When are you most likely to try the quackiest shit you can get your hands on? That miracle duck extract Aunt Susie swears by? The crystals your roommate uses to open his heart chakra? Naturally, when your symptoms are the worst and nothing seems to help. This is the very top of the bell curve, at the peak of the roller coaster before you head back down. Naturally heading back down is regression toward the mean.

If you are a fallible human, as we all are, you might misattribute getting better to the duck extract, but it was just coincidental timing.

The body had healed itself, as could be predicted from the bell curve–like timeline of symptoms. Mistaking correlation for causation is very common, even among smart people.

And the important part of the quote [Emphasis Added]:

In the world of “big data,” this mistake will become even more common, particularly if researchers seek to “let the data speak for themselves” rather than test hypotheses.

Spurious connections galore–that’s what the data will say, among other things.  Caveat emptor.

This analogy reminded me of the first time I learned about correlation and causation in my first psychology class as an undergraduate. It had to do with ice cream, hot summer days, and swimming pools. In fact, here’s a quick summary from wiki:

An example of a spurious relationship can be illuminated by examining a city’s ice cream sales. These sales are highest when the rate of drownings in city swimming pools is highest. To allege that ice cream sales cause drowning, or vice-versa, would be to imply a spurious relationship between the two. In reality, a heat wave may have caused both. The heat wave is an example of a hidden or unseen variable, also known as a confounding variable.

Getting back to what Ferriss was saying near the end of his quote: as “Big Data” grows in popularity (and use), there may be an increased likelihood of making errors in the form of spurious relationships. One way to mitigate this error is education. That is, if the people who are handling Big Data know and understand things like correlation vs. causation and spurious relationships, these errors may be less likely to occur.

I suppose it’s also possible that some, knowing about these kinds of errors and how little the average person might know when it comes to statistics, could maliciously report statistics based on numbers. I’d like to think that people aren’t doing this and it just has more to do with confirmation bias.

Regardless, one way to guard against this inaccurate reporting would be to use hypotheses. That is, before you look at the data, make a prediction about what you’ll find in the data. It’s certainly not going to solve all the issues, but it’ll go a long way towards doing so.

To Tech or Not To Tech: Hiking the Appalachian Trail

It’s hard to believe that it’s only been 1 month since my last post. It feels like the last time I wrote something was ages ago. In March, I said that I intended on writing something once a week, but I suppose having an infant, moving, and preparing to start a new job have made that a little harder than I imagined. Nonetheless, I stole away some time today to write about technology and the Appalachian Trail (AT).

A few summers ago (actually, now that I think about it, it was 6 years ago), I had the good fortune to spend some time hiking on the Appalachian Trail. It was my first time on an extended hike and I really enjoyed it. While on the hike, I learned that the trail spans 14 states including the beginning/end in Maine/Georgia. Many folks try to hike the whole thing in a summer. Lots succeed, but many more give up. When I hiked part of the AT in 2008, technology wasn’t as advanced as it is today (obviously), but I was wondering how I might want to approach this subject when I decide to hike the AT again.

This thought was sparked by a post in Scientific American bemoaning the use of technology on the trail. I can see where she’s coming from — for sure. Most people decide to go into nature to get away from technology. She also makes some good points as to how technology can help in an emergency (read: bear eats pack).

I think if I were to hike the AT tomorrow, I might bring along a MacBook Air — for the sole purpose of writing. That is, I’d intend to do like David Roberts did and take a hiatus from social media (which for me, mainly means Twitter). I say intend because I’ve learned that making hard-and-fast rules can sometimes make things more difficult to uphold. I suppose I could not get some sort of data plan and therefore it would be quite difficult to check things like Twitter.

When I do decide to hike the whole of the AT (sometime in the next 30 years), our relationship to technology may be very different. Maybe Google Glass (or an iteration thereof) might be more user-friendly. Maybe it’ll be ingrained in the way we live our days like smartphones have become. Maybe there’ll be something after Google Glass and something beyond the impending smartwatches. Regardless of how technology evolves, we’ll always be left with the choice: to tech or not to tech.

How to Solve the Password Problem: Teach Kids When They’re Young

I came across an article a few days ago that explained how to teach humans to remember really complex passwords. As I was reading it, I couldn’t help but think that there’s an important piece to the solution to helping humans remember really complex passwords: habit.

When we first started using computers, coming up with a super-difficult password wasn’t necessary as we were usually just trying to keep our stuff protected from our family members. Then, it was trying to keep things protected from our co-workers. Slowly, that grew and grew until now, someone (or something!) on the other side of the planet can figure out your password and hack into your online accounts.

I wonder, if we were taught how to come up with complex passwords when we were younger, would there still be such a high percentage of people using easy-to-crack passwords? That is, if we only knew passwords to be in the form of “passphrases,” would someone still try to use a word as their password? While there would still probably be some, my guess is that the percentage would drop.

So, how do we teach our kids to use smarter passwords? Well, assuming that kids at some point are still taught how to type in school, I see this as the perfect opportunity to also teach them about how to use passphrases for accounts. Assuming that students will have to logon to a computer to use the program that teaches them how to type, this is the best time to imprint the habit of using an effective password.

Of course, this won’t solve the problem of all the people out there today who still use “password” or “1234password” for their password, but it will help to correct problem by not adding more people to the number of people who use poor password habits.

~

Extending this idea, there may still be some adults or teens out there who are still learning how to type. In these cases, we could have the software that is teaching them how to type also teach them about good password habits. If the adults are learning how to type in some sort of class, this could also be a good place to teach them about good password habits.

How Smartphones Can Lead to Better Parents

Over three years ago, I wrote a post about cell phone etiquette. At the time I wrote that, I wouldn’t have guessed that three years later, I’d be considering the possibility that smartphones could actually lead to better parents.

But that’s exactly what this post is about.

The stereotype goes that many parents will bring their children to the park (and/or some activity) and upon arriving, they shoo away their children only to peer down at their cell phone. Some folks do this while out to dinner with friends (even though they don’t have kids, see here). Many will cringe upon seeing parents sitting on the bench enwrapped in the goings on of their cell phone. Farhad Manjoo, however, points out how smartphones can actually make for more available parents [Emphasis Added]:

But we rarely consider how, by liberating us from the office, smartphones have greatly expanded the opportunity for certain kinds of workers to increase their involvement in their children’s lives. Because you can work from anywhere thanks to your phone, you can be present and at least partly attentive to your children in scenarios where, in the past, you’d have had to be totally absent. Even though my son had to yell for my attention once when I was fixed to my phone, if I didn’t have that phone, I would almost certainly not have been able to be with him that day — or at any one of numerous school events or extracurricular activities. I would have been in an office. And he would have been with a caretaker.

Stop and consider that for a moment: having a smartphone can actually make you more available as a parent. Now, this isn’t a commercial for smartphones, but it’s certainly something that should give you pause for consideration. I know it did for me when I read it. This idea put forth from Manjoo is exactly the kind of thing that I’m talking about when I say putting a new perspective on things. Someone who is so focused on how smartphones are bad for parents and how they keep parents from their children wouldn’t be able to see the possibility that for a small population, having a smartphone can actually allow a parent to be away from the office and with their children.

This idea isn’t meant to invalidate the idea that smartphones are changing the relationship we have with our children, but the idea that smartphones are allowing us to be with our children more is, to be hyperbolic for a moment, paradigm-altering. A key step to being a better parent is being able to be with your children. So, if smartphones can get us out of the office and next to our kids, isn’t that an important step?

~

There still might be some of you out there that unequivocally think we shouldn’t be on our phones when we’re with our kids and that’s okay, but I hope that you’ll at least consider (reflect, think about, ponder, etc.) the possibility that the opposite may be true. It’ll put you one step closer to defending against the confirmation bias.

The Problem With Facebook: Young People Really Are Social Networking Elsewhere

Remember yesterday when I was talking about Facebook’s “young person” problem? It turns out, there’s actually data to back this up. It turns out, there was actually an article in TIME that I didn’t realize had data when I was writing my post yesterday:

According to iStrategy, Facebook has 4,292,080 fewer high-school aged users and 6,948,848 college-aged users than it did in 2011.

That amounts to more than 11 million users gone in the past 3 years. While Facebook has more than 1 billion people, so 11 million might not seem like much, but is it a trend? That is, should this be something that the folks over at Facebook should be worried about. Well, there’s a handy graphic that can also be found in the TIME article, (but it comes from iStrategy):

Two of the cells I want to draw your attention to are already conveniently highlighted in red: the ages 13-17 and 18-24. If you’ll notice, both of these age groups are experiencing negative growth. Of particular noteworthiness is the 13-17 age group, which is down 25% over the last 3 years. Again, as I said earlier, Facebook’s user base is rather large right now, so it might not have that big of an effect anytime soon, but it is something to watch out for.

In the article, the author also points out that part of the reason people advertise with Facebook isn’t necessarily for the volume of its users, but because of all the information that it has on its users making microtargeting that much more effective. Maybe this information is enough to overcome the decline in new users, who knows. As I said yesterday, if I were part of Facebook’s team, I would be worried about the continued decline in my user base — especially because it’s the younger folks who are leaving. Why?

Pretty soon, these young folks are going to be reaching those prime marketing age groups (18-34) and if they’re already not using Facebook, that could be bad news. In fact, if they’re not using Facebook, they’re probably using some other social network to communicate and that is where the marketing dollars are going to go. I suppose only time will tell.

The Problem With Facebook: Is It Really Out of Room to Grow?

I rarely read the front page of YouTube, but today when I typed in YouTube to my address bar (with the intention of finding some music to listen to while I worked), one of the videos I saw on the front page was titled “The Problem With Facebook.” Truth be told, I thought it was a video by MinutePhysics and thought that there was going to be some scientific explanation of Facebook’s problems, but it turns out the video was by 2veritasium. (I guess MinutePhysics may have liked the video, so that’s why I saw their name or maybe they had just come out with another video, who knows.)

Anyway, if you have Facebook (or had Facebook) or know anything about Facebook, I’d say it’s worth the 6 and a half minutes to watch it:

I’m not sure what the fellow’s name is, but it reminds me of when George Takei went on a bit of a rant about Facebook not letting him reach all of his fans on Facebook. At the time, I think I still had a Facebook profile (rather than the page I have now) and I thought that was strange that your posts weren’t reaching all of your friends — by design.

The fellow in this video makes that same point, but he does it in a more thorough way than I remember Takei doing it (which is not to say Takei didn’t do it), and he also juxtaposes Facebook with YouTube. He makes a rather compelling argument, but something I don’t think he highlights is that he kind of has a vested interest in YouTube being more successful — his videos are hosted on YouTube! Now, this doesn’t really take anything away from the argument — it’s sound — but I think it’s worth noting.

Throughout the video, he talks about the incentives. I wonder what Michael Sandel would say about the incentives in this situation. Would he say that the incentives have been perverted? It’s tough to say because Facebook is trying to make money and there’s nothing inherently wrong with that, but I wonder if maybe they’ve strayed a bit too far from the original purpose of the site.

There’s one last thing I want to highlight from the video — in part — because it dovetails nicely with something that I’ve been trumpeting on here for awhile. He argues that Facebook has already maxed out, with regard to the amount of time people spend on the site per day (approximately 30 minutes) and that Facebook has already reached just about everyone in the developing world. When it comes to online video, however, he argues that there is still lots of room to grow based on the fact that people still don’t watch that much of it when compared to television. I might not put it in those words exactly, but I think he’s on the right track.

If even the President of the United States knows that Facebook is becoming or already is unpopular with young folks, I have to think that the smart people over at Facebook know this, too. As they’ve got a fiduciary duty to their shareholders, I’m sure they’ve been hard at work trying to figure out just how they’re going to capture more value — translation: how they are going to make more money.

Who knows… maybe Facebook will soon go the way of the social networks that have gone before it. Remember MySpace?

Facebook is a Poor Predictor of Performance of Job Applicants

A few months ago, I planned on writing more posts about academic research. I wrote one about spending your bonus on others making you happier (than if you’d spent it on yourself), but haven’t got around to it since. My intentions were good as anyone can see from looking at the list of tweets I’ve favourited over the last 100 days. Just about all the tweets I’ve “bookmarked” to read are academic in nature.

I came across an academic article the other day that seemed quite interesting and reminded me of much of what you hear when you’re in university: be careful what you put online! Even after you’ve graduated, you often hear that your employer (or potential employer) will be watching to see what you put online, so be careful what you put on Facebook. We’re told that it can have an adverse effect on our ability to be hired (or maintain our current employment).

This particular study tried to address a gaping hole in empirical research. That is, the popular press often talk about how important it is to have a pared down social media profile, but there hasn’t been much research studying the effects of potential employers using social media profiles in screening candidates. Before we take a look at some of the results, I wanted to share three important points from the article:

First, as discussed, SM [Social Media] platforms such as Facebook are designed to network with friends and family rather than to measure job-relevant attributes. Indeed, most SM information pertains to applicants’ outside-of-work interests and activities, which may have little bearing on work behavior. This factor, in and of itself, may be enough to suggest that criterion- related validity for SM assessments may be low. [Emphasis added]

The researchers raise an important point that — no doubt — you’ve seen elsewhere. Most people use Facebook in order to connect with friends & family and as a result, it may not be the best measure of how one would function at work.

Second, the sheer volume of SM information also may inhibit decision makers from drawing valid inferences. . . This large amount of information may put demands on decision makers’ ability to process all the potential cues and to determine what information (if any) is relevant and what is not. This situation may cause decision makers to rely on biases and cognitive heuristics may reduce validity. [Emphasis added]

I’ve written extensively about cognitive biases. The researchers mention of the volume of information regarding social media makes me wonder how long before organizations are using Big Data to try and analyze all the social media data in painting a portrait of a candidate.

Finally, inaccurate information may undermine the criterion-related validity of SM assessments. For example, the desire to be perceived as socially desirable may lead applicants to embellish or fabricate information they post on SM, such as experience, qualifications, and achievements. Furthermore, because other people can post information about applicants on SM platforms (e.g., Facebook), applicants do not have complete control of their information. As such, applicants may be unduly “penalized” for what others post. In fact, one study found that comments posted by others on one’s Facebook profile had a greater effect on observers’ impressions than did one’s own comments (Walther, Van Der Heide, Kim, Westerman, & Tong, 2008).

~

In this study, the researchers had recruiters rate Facebook profiles of potential job candidates and then followed up with those job candidates after they’d secured employment. As you might expect from where this post has led, the evaluations the recruiters gave of the potential job candidates based on their Facebook profiles were unrelated to the ratings issued by supervisors on a number of factors: job performance, turnover intentions, and actual turnover. Moreover, these predictions based on Facebook profiles aren’t more useful than other, more common methods: cognitive ability, personality, self-efficacy, or even GPA. What’s more, they found that Facebook ratings were higher for females (vs. males) and that ratings were higher for White candidates (vs. Black and/or Hispanic candidates).

I understand that many managers think more data will help them make better decisions, but as has been demonstrated in this article, when it comes to job candidates, maybe checking their Facebook profiles could lead managers to make the wrong decisions.

ResearchBlogging.orgChad H. Van Iddekinge, Stephen E. Lanivich, Philip L. Roth, & Elliott Junco (2013). Social Media for Selection? Validity and Adverse Impact Potential of a Facebook-Based Assessment Journal of Management DOI: 10.1177/0149206313515524

Global Museum Attendance has Doubled in the Last Two Decades

A little more than a week ago, The Economist published an article about museums. In particular, they drew attention to the fact that the number of museums isn’t in decline. Instead, it’s quite the opposite. Would you have guessed that today, not only are museums not in decline, but that there are more than double the number of museums there were two decades ago?

As a soon-to-be parent, I can’t help but be pleased with this fact. I’m very much looking forward to taking my little one(s) to the museum to learn about the natural world around them. It seems I’m not the only one pleased by this either, with museum attendance way up.

I suppose what’s surprising to me about this is that I figured that with the advances we have had in technology, most people would be more inclined to explore the natural world around them from the convenience of their couch. While I’m glad that this is not the case, I wish someone would do some sort of study to better understand this behaviour. The article ties in the idea of higher education. That is, more and more people are going to university and graduates are more likely to visit museums. This makes sense, but I don’t think that it explains the whole story.

Another point raised in the article is the burgeoning growth in other countries. If you look at the graph embedded above, you’ll see that there’s quite a bit of growth planned for the Southeast Asian countries. [As an aside, in The Economist’s “The World in 2014,” you may be surprised to know that over 40% of the world’s population will be voting in a national election next year.] While this growth may help explain an addition piece of the growth in museums, it still doesn’t quite feel like it’s explained the whole puzzle. Of course, in science, especially the social sciences, we know that it’s not always possible to completely explain behaviour, but I’d like to think that one aspect of this has to do with technology.

That is, I’d like to think that as a species, we’re recognizing that technology is a useful tool for helping us navigate the world around us, but that it’s not the be-all and end-all of human existence. Don’t get me wrong, I absolutely appreciate technology. Without it, I wouldn’t be able to type on this external keyboard connected to my laptop, while looking at an external monitor connected to my laptop. Beyond that, you wouldn’t be able to read this article on your smartphone or on your laptop/computer, if it weren’t for technology.

With that being said, technology, in my opinion, hasn’t been able to capture the visceral experience of being there and seeing something. Technology can’t (at least not yet) involve all five of our senses in experiencing. Until it does, I’m happy to continue visiting museums.

Still Looking for a Christmas Present? Try These Projects on Kickstarter Canada

It’s the last weekend before Christmas, so there’s a good chance that a lot of you out there are out in the hustle and bustle trying to find last-minute gifts for friends and family. If the weather forecasts are to be believed, some of you might not be able to make it out into the madness that is last-minute shopping before Christmas. That’s great! Why? Well, that means that you’ll have to be a bit more creative with your gift ideas.

So, why don’t you make someone’s day (in addition to the person who you’re giving the gif to) by making a donation in their name to one of these projects on Kickstarter Canada. Also, you could just donate to them anyways — and not make the donation on behalf of someone else: it could be on behalf of you!

Note: I’ve only included projects that — at the time of writing this post — hadn’t reached their goal.

NASH: The Movie

“You may have heard of Steve Nash, the NBA superstar and multiple MVP winner. You may also know that he’s Canadian. A Vancouver documentary crew secured unparalleled access to Nash, and they’re in the middle of raising money for production and editing costs for the final film. Unlike many film projects, tiers of this project include a physical and digital copy of the final product, which gives potential backers a tangible reward for their donation.” (Source)

Stratus Watch

“The concept is as simple as it is unprecedented; a titanium wristwatch with a face that you can choose. You can choose from dozens of patterns and colours from the manufacturer, or design your own and submit it to them. The watches exude a clean, straightforward charm, and even the lowest funding tier gifts you one of them.” (Source)

Shot Time

“In what could easily be the ruin of many a young soul, this is a shot glass that measures the amount of liquor consumed over a period of time; a potent mix of a stopwatch and a case of acute alcohol poisoning. The consequences of such a device are best left to the imagination, but if it meets its funding goals, the consequences may become very real, very quickly. Hooray for progress?” (Source)

Canadian Black Garlic

“Exactly what it says on the tin; backers are funding the creation and shipping of various black-garlic-based condiments and seasonings. The majority of the project’s funding goal will go to securing a large batch of Canadian-grown garlic, and the rest will go into the blackening and production/packaging process. Is there anything more Canadian than authentic Northern delicacies?” (Source)

SpecShot

“Like the mirror universe version of the Shot Time, the SpecShot is a two-in-one system that scans your drinking water for contaminants and then posts the results online. This process could be equal parts fascinating and harrowing, depending on your results, but the ultimate goal is to spread awareness through hard data, and hopefully inspire some change to our water quality standards.” (Source)

How Americans Get to Work: Is It Time to Change Incentives?

This past Friday, there was a rather startling chart from The Atlantic. The chart illustrated how Americans get to work, by volume. That is, the total number of people who take the bus, the total number of people who drive, the total number of people who walk — you get the idea. Before clicking through to read the post, I was hopeful… afterwards, not so much:

In case the numbers are too small to read, the effect should still stand — well beyond the majority of Americans drive alone to work. Now, it’s not that there’s anything inherently wrong with this, but now that we’ve seen things like the image below, that illustrates the space needed to transport 60 people in various ways, it seems more reasonable that people shouldn’t drive alone in their car.

Of course, some folks might jump to the argument that there are more people who live in rural areas in America — not true. “In 2010, a total of 80.7 percent of Americans lived in urban areas, up from 79 percent in 2000.” However, just because the vast majority of American live in urban areas, that doesn’t mean that they have access to viable alternative means of transportation. Maybe it’s time for Americans to reconsider the emphasis on culture of cars.