Real-Time Hadoop queries will be a reality in 2013
Real-Time Hadoop queries will be a reality in 2013 thanks to two new projects from Cloudera: Impala and Trevni.
Impala is the open source version of Dremel, Google’s proprietary big data query solution. A first beta is available and the production version is foreseen for Q1 2013.
Impala allows you to run real-time queries on top of Hadoop’s HDFS, Hbase and Hive. No migrations necessary.
However the real revolution will only get better when Doug Cutting [the creator of Lucene, Hadoop, etc.]‘s Trevni is integrated into Impala. Trevni is a new columnar data storage format that promises superior performance for reading large columnar stored data sets.
Impala+Trevni is promising real-time big data queries with multiple joins that are on par in performance but have more functionality than Google’s Dremel…
How this will affect Storm/Trident? Your thoughts?
Very good question. I think they are complementary. Trident Storm is dealing with real-time events but when combined with Impala’s real-time queries events can be enriched tremendously.
Hi Maarten, thanks for sharing your thoughts.
Presently, I have a Storm Topology+Cassandra Counters app to throw RT events/alarms to a dashboard. And to retrieve the alarm details, I have to query my Hadoop ecosystem.
So it seems from your reading that with Impala I can completely get rid of Cassandra? Will I be able to store even my RT counters and associated alarm/event definitions also directly in Hadoop?
Thanks,
Pranabesh
I wouldn’t substitute Cassandra by Impala but instead substitute Cassandra by Hbase and add Impala. Hbase will give you real-time storage and Impala real-time complex queries. Impala is for queries (like in a data warehouse) but not for counters accounting [reading is possible and would be perfect scenario for Impala]. The reason why Hbase would be superior for this setup is because al
l the data would be in Hadoop instead of spread over multiple systems. Also impala would be needed for complex queries. If queries do not need joins then Cassandra or Hbase are good enough and no Impala is needed.
Thanks a lot.