Data Lake and Data Refinery – Gartner Controversy!
Much discussion has been going on the new phrase called Data Lake. Gartner
wrote a report on the ‘Data Lake’ fallacy, saying to be careful about
‘data lake’ or ‘data swamp’. Then Andrew Oliver wrote in the
InfoWorld these beginning words, “For $200, Gartner tells you ‘data
lakes’ are bad and advises you to try real hard, plan far in advance, and
get governance correct”. Wow, what an insight!
During my days at IBM and Oracle, Gartner wanted to get time on my calendar
to talk about database futures. Then afterwards, I realized that I paid
significant fee to attain the Gartner conference to hear back what I had told
them. Good business of information gathering and selling back. Without
meaning any disrespect, many analysts like to create controversial statements
to stay relevant. Here is such a case with Gartner.
The ... (more)
I attended a Meetup yesterday in Mountain View, hosted by The Hive group on
the subject of Lambda Architecture. Since I had never heard about this new
phrase, my curiosity took me there. There was a panel discussion and
panelists came from Hortonworks, Cloudera, MapR, Teradata, etc.
Lambda Architecture is a useful framework to think about designing big data
applications. Nathan Marz designed this generic architecture addressing
common requirements for big data based on his experience working on
distributed data processing systems at Twitter. Some of the key requirements
in buildi... (more)
Last week, a public company Informatica got acquired by two private equity
funds – the Permira fund and Canada Pension Plan Investment Board (CPPIB)
for $5.3B. This is the biggest leveraged buyout so far this year.
I am happy for my friend Sohaib Abbasi (we were colleagues at Oracle during
the 1990s) who is CEO of Informatica after being a board member for a couple
of years. During Sohaib’s time, the company entered into playing a bigger
role in data archiving and life cycle management. It also made progress into
offering cloud-based services.
Gourav Dhillon (founder, Snaplogic) ... (more)
Recently I listened to a discussion on Big Data Visualization hosted by Bill
McKnight of the McKnight Consulting group. The panelists agreed that Big Data
is shifting from the hype state to an “imperative” state. For start-up
companies, there are more Big Data projects whereas true Big Data is still a
small part of the enterprise practice. At many companies, Big Data is moving
from POC (Proof of Concept) to production. Interest in visualization of data
from different sources is certainly increasing. There is a growth in
data-driven decision-making as evidenced by the increasing u... (more)
Fast Data vs. Big Data – How to Combine?
Today, all the discussion on Big Data centers around “static data” in a
data lake (old Data Warehouse) accessed by BI tools or SQL on Hadoop (Hawk,
Impala) or Map/Reduce algorithms (MapR) for analysis. This is looking at
historical data and finding trends. Some new tools are trying to provide
predictive analysis based on past trends. This area deals with mostly the
volume and variety aspect of Big Data, but not the velocity or for “data in
The term “Fast Data” is applied to data that is in motion. This component
is getting more ... (more)