Knowledge Management resources

Interesting Blogs

Google Analytics

« Putting Social Media to Work | Main | From Social Insights to Social Business Innovation: MIT Enterprise Forum »

September 21, 2011

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

AlanMorrison

Hi, Bill. I'm a bit weary of what seems to be whining and just plain ignorance on the part of the BI community that seems evident in these survey results. Many BI analysts are unaware of more modern, open source, content-centric integration and analytic approaches that scale better than ETL + data warehousing and deal with less structured data. For example:

(1) Standardized graph data stores (RDF or comparable triple or quad stores) for Web-scale integration: Graphs are more articulated and much easier to join than tabular, relational databases. Some vendors like InSilico Discovery now serve as on-the-fly report integration SaaSes for banks. Data description via inferencing and ontologies scales, as ISD and others have proven. The Semantic Web stack (RDF/RDFS/OWL) is in use at numerous media companies such as the BBC, NYT, Reuters, Wolters Kluwer, Lexis-Nexis--i.e., content companies. Lately, software vendors like Cisco and Amdocs are basing their products on these triple stores for scalability reasons. Many BI specialists just haven't worked much with content or are averse to trying a method that initially seems alien to them. See http://www.pwc.com/us/en/technology-forecast/spring2009/semantic-web-technologies.jhtml and my Sem Web Quora answers at http://www.quora.com/Alan-Morrison/Semantic-Web/answers for more detail.

(2) Parallel processing a la Hadoop (derived from the Google Cluster Architecture, Bigtable and MapReduce) or its NoSQL cousins: This method speeds up high-volume data crunching and makes it cost effective. Companies like Backtype (bought by Twitter)and FlightCaster have been analyzing scads of Web data on the cheap, and started just with a handful of staff and EC2 clusters. Others like Disney just kept servers they were going to retire and with the help of a few savvy staffers made them into Hadoop clusters. See http://www.pwc.com/us/en/technology-forecast/2010/issue3/features/big-data-pg1.jhtml for more detail.

In other words, the large-scale methods are out there, but just aren't evenly distributed. Shades of George Box.... There are ways to do large scale integration high-volume, fast crunching of less-structured data, and companies like Google and the BBC have paved the way. Other companies just need to pay attention.

Social information will actually help machines make the connections, but data in graph form is what will enable sufficiently context-rich, large-scale integration. A brief animation at http://www.pwc.com/us/en/technology-forecast/2011/issue3/index.jhtml explains the phenomenon. We also interviewed your pal Sameer Patel in this issue of our journal.

Hope this helps for background, and that your painting is going well....

@AlanMorrison

bill  Ives

Alan - Thanks for your lengthy comment. In the middle of a two day event but give it proper attention over the weekend. Bill

bill  Ives

Alan - Thanks for your extensive commentary and useful links. The handling of big data from a technical side seems to be a problem that can be addressed as you point out. It just takes the will to do it. There are ways to store it and there are ways to visualize it to discover meaning, It is the discovery of meaning that is key as you note.

In a parallel way we, as individuals, also have to deal with an expanded set of stuff to look at. I used the term "big content" as a complement to the term big data. Big data effects certain organizations who deal with massive data sets. Big content effects all of us. Here is my post - “Big Data” vs. “Big Content” Complementary Sides of Information Overload - http://billives.typepad.com/portals_and_km/2011/09/big-data-vs-big-content-and-information-overload.html

Alan Morriosn

Bill,

I like the Big Content meme you've elaborated on, but am wondering if it's ultimately more useful to consider ways to analyze less and more structured data together. ("Less structured" data includes content.) Cassandra's being used for a range of different data types--see Bill Bosworth's comments here: http://venturebeat.com/2011/09/21/datastax-lands-11-million-to-further-the-nosql-data-store-revolution/

Open Link's using SPARQL to query a blend of XBRL financial data mapped onto RDF along with less structured sources such as DBpedia: http://www.openlinksw.com/dataspace/dav/wiki/Main/VOSArticleRDFandMappedBI.

These methods are directly relevant to conventional BI. We quoted Doug Lenat of quad-store provided Cycorp awhile back, who pointed out that many BI folks are looking for their keys underneath the lamppost because that's where the light is. Integrating more sources--particularly blending external with internal data--makes it possible to light a larger area and query a broader footprint of information in one fell swoop.

bill  Ives

Alan

You raise good points. What I was referring to was a complementary way to heavy duty BI that puts the ability to find the unexpected in the hands of the topic expert. I am not suggested it replace traditional BI or even the more new wave approaches that you are describing. Thanks for the additional links to useful material in this space.

The comments to this entry are closed.

My Photo

RSS Subscribers

Share Portals and KM on Facebook

  • Share on Facebook

Subscribe

AddThis Social Bookmark Button

Some Recent Articles

Linked In

  • View Bill Ives's profile on LinkedIn

Site Meter