What is Data Science?

There’s no question that “data science” is becoming more and more popular. In fact, Booz Allen Hamilton (a consultancy) found:

The term Data Science appeared in the computer science literature throughout the 1960s-1980s. It was not until the late 1990s, however, that the field as we describe it here, began to emerge from the statistics and data mining communities. Data Science was first introduced as an independent discipline in 2001. Since that time, there have been countless articles advancing the discipline, culminating with Data Scientist being declared the sexiest job of the 21st century.

Unsurprisingly, there are countless graduate and undergraduate programs in data science (Harvard, Berkeley, Waterloo, etc.), but what is data science, exactly?

Given that the field is still in its proverbial infancy, there are a number of different perspectives. Booz Allen offers the following in their Field Guide to Data Science from 2015: “Describing Data Science is like trying to describe a sunset — it should be easy, but somehow capturing the words is impossible.”

Pithiness aside, there does seem to be consensus around some of the pertinent themes contained within data science. For instance, a key component is usually “Big Data” (both unstructured and structured data). Dovetailing with Big Data, “statistics” is often cited as an important component. In particular, an understanding of the science of statistics (hypothesis-testing, etc.), including the ability to manipulate data and almost always — the ability to turn that data into something that non-data scientists can understand (i.e. charts, graphs, etc.). The other big component is “programming.” Given the size of the datasets, Excel often isn’t the best option for interacting with the data. As a result, most data scientists need to have their programming skills up to snuff (often times in more than one language).

What’s a Data Scientist?

Now that we know the three major components of data science are statistics, programming, and data visualization, do you think you could identify data scientists from statisticians, programmers, or data visualization experts? It’s a trick question — they’re all data scientists (broadly speaking).

A few years ago, O’Reilly Media conducted research on data scientists:

Why do people use the term “data scientist” to describe all of these professionals?


We think that terms like “data scientist,” “analytics,” and “big data” are the result of what one might call a “buzzword meat grinder.” The people doing this work used to come from more traditional and established fields: statistics, machine learning, databases, operations research, business intelligence, social or physical sciences, and more. All of those professions have clear expectations about what a practitioner is able to do (and not do), substantial communities, and well-defined educational and career paths, including specializations based on the intersection of available skill sets and market needs. This is not yet true of the new buzzwords. Instead, ambiguity reigns, leading to impaired communication (Grice, 1975) and failures to efficiently match talent to projects.

So… the ambiguity in understanding the meaning of data science stems from a failure to communicate? Classic movie references aside, the research from O’Reilly identified four main “clusters” of data scientists (and roles within said “clusters”):

Within these clusters fits some of the components described earlier, including two additional components: math/operations research (including things like algorithms and simulations) and business (including things like product development, management, and budgeting). The graphic below demonstrates the t-shaped-nature of data scientists — they have depth of expertise in one area and knowledge of other closely related areas. NOTE: ML is an acronym for machine learning.


NOTE: This post originally appeared on GCconnex.


Where on the Internet is Jeremiah Stanghini – June 2016

One of the first few posts I wrote when I first started writing was a collection of the different places I could be found on the internet. That post was more than five (!) years ago. The other day, I happened to come across that post almost by accident and actually, even though I wrote two ‘updates’ to that post, it turns out that I wrote a second post almost a year and a half after that. In looking at those posts, I thought it might be fun to write an update to the series.

Even though I’ve already written an updated post to the first post, I thought I’d still look back on some of the places I used to frequent in that very first post five years ago.

Five years ago, it looks like I had planned on developing a presence on YouTube:

I have a channel on YouTube where I upload videos of presentations. You’ll also find videos that I “like” on YouTube along with videos that I have commented on.

As it happens, there really isn’t much more to my YouTube profile than links back to other places you can find me. I do have some things on YouTube, but that’s only if you’re a student in one of my classes (and have access to the lectures I’ve uploaded).

Similarly, I used to do a lot of writing for Squidoo. It’s been so long since I’d visited any of the things I’d written for that site that it’s not even called Squidoo (!) anymore — HubPages acquired them.

I also let my BodyTalk certification lapse, as my career went in a different direction.

It looks like I used to be a frequent commenter at other sites. In particular, I had profiles with IntenseDebate and Disqus (two popular commenting services). It looks like I haven’t had a comment with either of those two services in more than 2 years (almost 3.5 years with IntenseDebate).

Lastly, I highlighted two Toronto sports blogs that I used to be an active member of: Bluebird Banter and Pension Plan Puppets. If I check-in on my comment history for both those sites, it won’t even let me discern when I last made a post (as it’s been that long).


If I look at the second post I wrote (in late 2012), the only carryover from the first post (of places I’m no longer that active) is the two commenting services: IntenseDebate and Disqus.

Now, let’s look at some of the places that I still frequent (in one way or another).

In that first post, I talked about writing posts (I’m nearly up to 600 on here). I also highlighted my LinkedIn profile (it’s up to date!), and my Twitter account (I like to share articles that I think people will find useful).

In the second post, I added two other places: Facebook and Quora. At the time, I used to be a frequent contributor to Facebook. Like Twitter, I liked to share articles that I thought people would find useful. I also liked to share pictures I found on the Internet that were either beautiful or provided a different perspective. Somewhere along the way, Facebook changed its algorithms and the people who “liked/followed” your page were no longer receiving all your updates. As a result, I stopped actively contributing in that environment. However, whenever I publish a new post, a link to that post is automatically uploaded to Facebook.

As for the second place — Quora — at the time, I did spend some time trying to build a presence on Quora. I wrote more than 60 answers, but it looks like I haven’t written anything for Quora in almost 3 years. I didn’t realize this until writing this post, but it looks like there are a number of answers that I’ve written for Quora that have more views than some of the things that I’ve written for this website.


So, in the last 3+ years, how have my internet frequenting habits changed? Well, the best place to find me is still here on this site. Twitter and LinkedIn are also places that I continue to update. Two new places: Business2Community and Research Blogging. Business 2 Community is one of the top business blogs and Research Blogging is a community and collection of posts written about academic research.