Big Bets On Big Data
Guest post written by Pat Gelsinger
Pat Gelsinger is President and COO of information infrastructure products at EMC.
Massachusetts Gov. Deval Patrick, and leaders from technology firms, venture capital and research universities recently announced a joint effort to position the Bay State as a hub for job creation in data science. It is a bet worth making.
Thirty-five years ago, computer science didn’t exist as an academic discipline. Now, it is taught on a global basis. Within years, data science will become the new “hot” discipline for the next generation, driven by the hottest buzzword in technology today: Big Data.
Big Data refers to the growth of data sets that have become so large that they defy the ability of conventional forms of information technology to store, manage, analyze and gather important insights from them.
According to the research firm IDC, all of the new information generated in the world in the year 2000 amounted to about 2 million terabytes. The digital universe now generates more than twice that much in a single day.
Driving the data deluge are technologies and applications that permeate our daily lives: mobile sensors, smart phones and social networking.
A utility that reads a smart meter on your home every 15 minutes generates 3,000 times more information about your electricity usage than the old way of reading meters once a month. Multiply that by millions of customers in a metro area, and a smart grid can produce a data set large enough to monitor power demand in real time and predict probable outages before they occur.
Until recently, the cost to store, access and analyze big data was so high that only government and a few business models could justify the cost. Law enforcement has discovered innovative ways to match fingerprints and ballistics against centralized, digitized databases. Counter-terrorism agencies and the gaming industry have adopted facial recognition technology to identify the bad guys before they step on an airplane or onto the casino floor.
Real-time analysis of Big Data is within the reach of almost any business model in any industry. The cost to sequence one human genome, for example, has fallen from $100 million in 2001 to less than $10,000, putting personalized medical treatments within the reach of masses.
Online retailers can sift through billions of purchase observations to recommend additional items that a particular consumer might want to buy at the point of sale. Pretty soon, location data on your smartphone matched with shopping records gathered from your loyalty card will enable a supermarket to send you personalized, digital coupons for items as you walk down the aisle.
Page 2 of 2
Crowd sourcing – think tens of thousands of drivers on your local highways – already generate enough traffic data for the GPS in your smartphone to suggest alternative routes when a stretch of road is clogged. Machine learning, the ability of computers to get smarter as they gather more information and more feedback from users will lead to even greater precision in information exchange.
To accelerate job creation in data science, Massachusetts announced matching grants for Big Data projects to be funded by public/private partnerships, funding for technology research at the Massachusetts Institute of Technology, along with a Big Data internship program modeled after a successful program in the life sciences, and support for an innovative non-profit community presence in Boston, where data scientists can share infrastructure and knowledge.
This is an exciting time for innovators in this arena. Last month, EMC held our second annual summit for data scientists in government, academia, biotechnology, retailing, marketing and other fields.
There, John Brownstein from Harvard Medical School told how an online innovation he co-founded, healthmap.org, seeks to be the weather.com for infectious diseases, using crowd sourcing to provide a global view of outbreaks – two weeks ahead of the Centers for Disease Control or the U.N.
He told how the SARS outbreak in China was first detected by someone examining a stock performance chart of a Chinese company that sold herbal remedies, and how the H1N1 virus outbreak was first reported by a local TV station in Veracruz, Mexico. The mobile app OutbreaksNearMe will disseminate information back to users who report in, providing an incentive for people to input their health information anonymously into a pool of data large enough to analyze. Information that once took months to flow up the chain of public health bureaucracy now reaches the public in weeks.
Entrepreneur Tarek Kamil spoke of how 6,000 sensors in a basketball can identify telemetry data that a human cannot see, feeding information for analysis that can recommend drills to improve technique. Business consultant Piyanka Jain explained how some of the quickest returns on Big Data analytics can be found in uncovering pain points in customer relationships to improve loyalty and profitability.
Making use of Big Data, however, requires new technology architectures and algorithms that go beyond conventional database management and business intelligence. Most new information created is not transaction data that resides in neat rows and columns in traditional office databases managed by an administrator. Eighty percent of new information is generated outside of enterprise data centers, and traditional databases simply cannot scale to the challenges of data sets 100 to 1,000 times larger.
Data science also requires a new breed of scientist. Imagine a Ph.D. in life sciences who manages petabytes of information for a pharmaceutical company that conducts drug research. These people have business acumen and subject matter expertise and understand the newest technology tools and possess the curiosity to help an organization recognize predictive patterns, or anomalies in patterns. Their skills are valuable and rare.
As academics from Columbia, Stanford and UC Berkeley told our summit last month, data science is a curriculum that did not exist five years ago. It is a science that requires multi-disciplinary skills, and most universities are not organized to teach in multi-disciplinary ways. Partnerships like the ones announced in Massachusetts will change that.
Data is the new science. Big Data holds the answers. Are you asking the right questions?