Michael is a mid tier specialist and Adam is with Greenplum. What is responsible for the big jump in data? It is the proliferation of smart devices recording and uploading images and data around the globe.
How are companies using big data?
Getting to know their customers
Budget and planning exercises
Performance management of workloads
Pricing and costing exercises
This explosion in data introduces new opportunities for business. For example being able to tailor make smart phone adds as the customer is entering the store based on prior business habits. This provides a localized experience for consumers.
The first thing you need is lots of space. For example: Broad Institute is using Isilon to store data for genome sequencing. Isilon has a single management interface amalgamating petabytes of potential storage space.
Once you have all that data is localized what do you do with it? You need to apply analytics to data to turn it into business value. This segmentation of massive amounts of data is called micro segmentation.
Greenplum data analytics for structured or unstructured data (Hadoop) adds nodes for linear scalability and performance. Queries can run in parallel and are tuned to scale.
The layers of this model is the Isilon platform and presentation of the data using the HDFS protocol. The Greenplum uses an HDFS API to access this data.
Big Data analytics require data science; essentially you are running mathematical algorithms to predict want would be needed next. In the example these algorithms were used to predict what the customer would be interested in and target market to them.
How many packaged apps are built around big data; very few so they are all custom built at this point in time. Developing these custom interfaces can be very advantages: as an example their are a few online retailers enabling partners to query their customer data to understand the shopping habits.
Greenplum Chorus brings a social networking type interface to allow you to interact with big data. You can create a workspace, create a team of users to interact with it and add a sandbox to store your data. Once your workspace is created you can grab an instance of data to interact with it. You can tag it to associate it with your workspace (vs. moving it). This is beneficial as it gives you the ability to associate but avoid having to create copies or moving large amounts of data. You have the ability to join relational and hadoop based data sets. The point of this flexibility is to enable a business to do things like customer profiling.
Pivotlabs helped built these interfaces and EMC liked it them so much they essentially acquired the company. They bring the application development piece that was missing in the EMC big data message.
To reiterate the layers; Petabyte Storage, Analytics Platform (Greenplum), Data Science (Greenplum analytics lab) and the ability to develop new applications (Pivot Labs).
- Posted using BlogPress from my iPad