Home > Big Data, Big Data Future, Disrup. Technology, High Scalability > A Big Data-Base that is fast but inaccurate: BlinkDB

A Big Data-Base that is fast but inaccurate: BlinkDB


The idea might sound strange at first. Why would you want a database that delivers inaccurate data? However BlinkDB trades accuracy for speed. When you query data you can specify when you want the answer, e.g. within 2 seconds, or how accurate you want the answer to be, e.g. 1% error with 95% confidence.

So if you have very large amounts of data (10-100s of Tera Bytes or even Peta Bytes) and you want quick good enough answers then BlinkDB is for you. An early adopter is Facebook. Would you rather have Justin Bieber‘s followers count exactly right in minutes or 99% right as long as your page loads almost instantly? So if you need fast reasonably accurate answers over slow correct answers, BlinkDB is worth checking out.

What can you use BlinkDB for?

  • The obvious use case would be real-time reporting? If you need to take decisions in the blink of an eye, e.g. day traders, and 5-10% error is acceptable, e.g. what is the average change of all commodity prices in the last 2 seconds.
  • Real-time bookings or price comparison in which users want to know the best possible offer but accept some small error margin, e.g. mobile bar-code scanners that deliver product price comparisons in 1 second instead of 10 will dominate the App Store.
  • Any visitor, friends, tweets, total search results, etc. counter on a large website in the world.
  • Any Power Law or Long Tail data in which there are some extremely popular cases, e.g. Justin Bieber followers, or a very large set of infrequent cases, e.g. the number of blogs that have under 1000 visitors per month.
  • Machine Learning solutions and recommendation engines that are using Collaborative Filtering and other types of algorithms that compare an item or user with large groups of other items and users.
  • and many other use cases…
About these ads
  1. May 4, 2013 at 6:35 pm

    Hi Maarten,
    I just read your blogpost and agree with your observations about the need for scalable machine learning and at the same time user friendly analytical workbench.
    I have started an open source project ( http://www.saarus.org )with an initial goal to integrate KNIME with hadoop and mahout.Our effort is to keep the platform language independent ( Java,R,Pig… ) and retain the user interface/ drag drop feature of KNIME to run MR jobs on hadoop.
    Let us know if you’d be interested.
    -Rohit

    • May 5, 2013 at 5:06 pm

      Definitely interested in knowing more about the current state, future direction and writing a post about Saarus…

      Maarten

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 300 other followers

%d bloggers like this: