Researcher, industry analyst, strategy consultant for all things data
Mark Madsen is a researcher, consultant and former CTO. Mark designs, builds and uses analytics and decision support systems and the data management and infrastructure behind them. His research focuses on emerging technology and practices in analytics, information management, and user experience for decision support and use of data. He consults and speaks internationally on all topics related to data.
YOW! 2014 Sydney
Following Google: Don’t Follow the Followers, Follow the Leaders
TALK – VIEW SLIDES
It makes good sense to follow Google’s lead with technology. Not because what Google does is particularly complex – it isn’t. We follow Google for two reasons:
- Google is operating at an unprecedented scale and every mistake they make related to scale is one we don’t have to repeat, while every good decision they make (defined as “decisions that stick”) is one we should probably evaluate;
- Google is as strong an attractor of talent as IBM’s labs once were; that much brainpower – even if a large part of it is frittered away on the likes of Wave, Buzz and Aardvark – produces value for all of us.
Using Hadoop is not following Google’s lead. It’s following Yahoo’s lead, or more precisely, venture capitalists who took a weak idea and made an industry of it. MapReduce is behind state-of-the-art to the point that Google discarded it as a cornerstone technology years ago.
The problems of scale, speed, persistence and context are the most important design problem we’ll have to deal with during the next decade.
We must work through what we mean by “big data”, what we mean by “structured” and “unstructured” and why we need new technologies to solve some of our data problems. But “new technologies” doesn’t mean reinventing old technologies while ignoring the lessons of the past. There are reasons relational databases survived while hierarchical, document and object databases were market failures, technologies that may be poised to fail again, 20 years later.
What can following-Google, as a design principle, tell us about scale, speed, persistence and context? Perhaps that workloads are broader than a single application. That synthetic activities downstream from the point where data is recorded are as important as that initial point. Or that relational models of some sort will be in your future.