Data Summit 2.0 – Emerging Trends

It’s 2012 and we’re talking about Data. But this isn’t your grandmother’s data. This is Data with a big ‘D’. As dull as the concept might sound there are some amazing things happening in the realm of Big Data right now. Patti Chan spent a day last week at the Data 2.0 Summit in San Francisco and she has reported back with some of the highlights!

If you’re not already familiar with what big data is and why it’s a popular topic, read through our recent posts on the topic!

« Democratized Data and the Missing Interface

« Simplified Relational Hierarchy Visualization


In the “Monetizing the Data Revolution” panel speakers from Microsoft, DataSift and other leading companies discussed the ways in which people are trying to offer “data as a service” and why lingering confusion over standards and protocols are preventing DaaS offerings from being viable at this point.

Additionally, it was pointed out that however valuable data may be, we cannot sell just “data” alone. Creating a business model based on the “data revolution” requires three things: data, analysis and workflow (i.e., hardware, processes).

And finally, it is difficult to monetize the data revolution because we do not yet have a well-built portfolio of tools to expose data in order to value-add services as part of the package.

After the session, Patti pointed out (via Twitter) that there are three components to data accessibility:

  1. Data – the actual raw data.
  2. Tools to the leverage that data.
  3. Well designed user interfaces to gain insights from data.


Due to the nature of the work that Google does they naturally had a strong presence at the summit. Navneet Joneja, Product Manager at Google, talked in depth about the useful and interesting things Google is doing with data.

  • Google BigQuery: enables you to perform sql-like queries over large datasets. We’re talking billions of rows of data. It calculates meaningful insights in just seconds. It’s useful for things like creating interactive tools, spam filtering, detecting trends, making web dashboards and network optimization tools.
  • Google App Engine: App Engine can be coupled with NoSQL datastore for supreme awesomeness. This is what Google Spreadsheet is on. App Engine data logs can be easily exported to Cloud Storage and then analyzed with Big Query.

During the panel Patti asked Navneet:

App Engine obfuscates the underlying harware / VM’s / stack (this is their value proposition). In enterprise software we often dip down into that layer in order to optimize our app. Does App Engine have documentation or open source that exposes those layers, in case we need it?

To which Navneet responded:

There is a good amount of documentation, but the goal and point of App Engine is to make that need obsolete, so that you can concentrate on the development and features and not worry about the hardware.

Other Interesting Tidbits

  • SalesForce had a team at the Summit talking about and layering social graphs on top of the cloud contacts and data you already use, in order to achieve a more complete profile.
  • DataSift, a powerful social media data platform, did a great mashup of data sets to show how combining related sets of data can help users derive real meaning.
  • CrowdFlower CTO Chris Van Pelt was present to show off the work they are doing in distributed human computing.
  • Wishery is a new application which adds full customer profiles into existing point-of-contact apps like Gmail. It uses the customer’s email address as the canonical identifier. This presentation received the most questions from the VC panel and is definitely one to watch out for in the coming months.

Final Thoughts

People and organizations are making huge leaps of progress in the field of Big Data. While we still need more hardware, software, and UI tools built to make big data more accessible, it’s clear that there is important work being done in this area. It will be exciting to watch developers and designers collaborate in the coming months and years to help unleash the inherent power of Data with a big ‘D’.