Information Management published a useful article, Big Data is Scaling BI and Analytics, on how the information overload is changing the way organizations use business intelligence and analytics. They started by noting that IDC research on digital data indicates that the amount of digital information in the world reached beyond a zettabyte in size in 2010 and this equals one trillion gigabytes of information. They added that a zettabyte is roughly the size of 125 billion 8GB iPods fully loaded.
Now Apple sales figures have been growing but that is not the answer to the storage issue. More important than the storage issue is how to use this data as an opportunity to make effective business decisions and not simply a challenge over where to put it.
They go on to write that the term "big data" has emerged to describe this growth along with the related systems technology. There is still some fuzziness on the definition of big data. Information Management defines it as “data sets that can no longer be easily managed or analyzed with traditional or common data management tools, methods and infrastructures.”
After demonstrating how much data jet engine can generate, they move on to social media. Twitter has more than 200 million users who produce more than 90 million "tweets" per day, or 800 per second. This traffic leads Twitter to produce a total of eight terabytes of data per day. The New York Stock Exchange only produces about one terabyte of data per day.
So far companies have just starting to deal with big data. Information Management reports that, using anecdotal references, less than 10 percent of enterprises appear to have deployed a big data project. One of the tools to confront this data challenge is the open source platform, Hadoop. It consists of three projects: “Hadoop Common, a utility layer that provides access to the Hadoop Distributed File System and Hadoop subprojects. HDFS acts as the data storage platform for the Hadoop framework and can scale to massive size when distributed over numerous computing nodes.” Facebook is now operating the “world's largest Hadoop analytic data warehouse, using HDFS to store more than 30 petabytes of data.” A number of other tools are mentioned.
Going back to the original definition, “data sets that can no longer be easily managed or analyzed with traditional or common data management tools, methods and infrastructures,” I want to suggest another related but separate issue: big content. If we simply change content for data we get to a related challenge. Big content is “content that can no longer be easily managed or analyzed with traditional or common data management tools, methods and infrastructures.”
While big data is a challenge for many large organizations, big content is a challenge that all Web users face every day. How do you deal with the fire hose of content coming at you from the Web, especially from the all the user-generated content sites like Twitter or blogs? It is one challenge to find a place to store it and another to make sense of it. Even if an individual is not processing terabytes of information, we all still have to deal with the problem of too much content to absorb and understand. This is why I am adding the term content so the focus is on the meaning and not the number of bytes. The two terms complement each other.
Here is where the Darwin Awareness Engine™ comes into play. I have covered it a bit on this blog so here is a brief summary. The Awareness Engine creates content visualizations that allow you to quickly scan across the themes contained in the content within your topic of interest. To find what is meaningful to you. With the Scan Cloud, the top 100 themes within a set of content are displayed in a manner that allows for easy sorting and investigation. With Buzz tape, the topics of rising interest appear like a stock ticker and then you can make the ones of interest become the center of a Scan Cloud.
There is much more about the Awareness Engine on the Darwin web site. Here are Awarness Engine FAQs and here is an overview.
Comments
You can follow this conversation by subscribing to the comment feed for this post.