‘Big data’ has become the marketing buzzword of 2012. As more and more information about the world is stored digitally, some collections of data that are invaluable for marketers are simply too big for a company to process and analyse by conventional means. Jim Sterne, founder of the eMetrics Summit and president of the Digital Analytics Association looks at the areas every marketer and retailer needs to know about big data, including how to store it, how to split up the processing and what to do with this information.
“Big data.” There’s no escaping it. It’s catchy. It’s generic enough that everybody is using it for everything. It’s a one-size-fits-all phrase.
It’s so all-encompassing that the best definition I’ve seen recently is from Stephane Hamel,Director of Strategic Services at Cardinal Path, who put it this way: “The simplest definition of big data is It doesn’t fit in excel.”
So with “big data” on everybody’s lips, here’s all you (the marketing executive) need to know to keep up your end of the conversation.
1. Disk drives got cheaper so we can store more data. The ways and means of collecting all sorts of data have proliferated faster than Twitter traffic or TSA lines at the airport. We have more of data, more types of data, and it’s coming at us faster (real time) than ever dreamed possible. That’s what makes up “volume, variety, velocity.”
So, the ability to replace big disk drives with many smaller, cheaper drives that we can wire together is the first, significant technical advance.
2. We can split up the processing. The second advance is the ability to augment the great big processors with many smaller, cheaper servers. We have distributed the processing to the data instead of waiting for the data to rocket back and forth from disk farm to processor.
So what?
So, there are two things to keep in mind when your marketingbudget is being allocated to what seems like pure IT projects.
The more data you throw into the pot, the more likely you are of finding some sort of relationship (correlation) to act on.
This practice of splitting up the data, solving smaller problems, and bringing it back together is very useful for some specific types of processing. Getting thisunder your belt gives you voting rights when discussing options.
Big analytics processors are very good at finding hidden pieces in a hurry. (Show me all the customers who have bought in the past three months after clicking on these special offers and abandoning their shopping carts.)
But those types of questions are known unknowns. You know the things you’re going to ask and the entire database is set up that way. You know you’ll want to see things by date, by region, by product line, etc. That is what gives these enterprise data warehouses their power: they are designed in advance to answer the questions you know you might ask, and they can answer them very quickly so you can refine your questions – as long as you have deep knowledge about what data you have and how it is structured in the database.
But theother data – the messy data – is chock-full of unknown unknowns. We know the information might be valuable, but we don’t know what to ask.
MapReduce is a low-cost storage medium for unstructured data and for refining that data into a more structured form for heavy analysis. Social media data, call center transcripts, clickstream data, website content, and sensor data all start out unstructured.
MapReduce is ideal for pre-processing text, turning all those tweets into numerical models of opinion (sentiment analysis), which can then be fed to the big analytics machines for correlation discovery and problem solving. It’s great for asking slower questions of larger amounts of data. It’s great for finding a representative sample of data so the big processors don’t have to juggle all of the bits at once.
I’ll be discussing Big Data at this year’s eMetrics Summit in London on 27th-28 November. No doubt we’ll be discussing these issues further and more importantly, where are big data and analytics going next?
By Jim Sterne
Founder of the eMetrics Summit
www.emetrics.org/London
About the author
Jim Sterne is founder of the eMetics Summit, president of the Digital Analytics Association and author of the book “Social Media Metrics”.