MapD – Massively Parallel GPU-based database

An MIT student recently created a new type of massively distributed database, one that runs on graphical processors instead of CPUs. Mapd, as it has been called, makes use of the immense computational power available in off-the-shelf graphics cards that can be found in any laptop or PC. Mapd is especially suitable for real-time quering, data analysis, machine learning and data visualization. Mapd is probably only one of many databases that will try new hardware configurations  to cater for specific application use cases.

Alternative approaches could focus on large sets of cheap mobile processors, Parallella processors, Raspberry PIs, etc. all stitched together. The idea would be to create massive processing clouds based on cheap specialized hardware that could beat traditional CPU Clouds both in price and performance at least for some specific use cases…

A Big Data-Base that is fast but inaccurate: BlinkDB

April 6, 2013 2 comments

The idea might sound strange at first. Why would you want a database that delivers inaccurate data? However BlinkDB trades accuracy for speed. When you query data you can specify when you want the answer, e.g. within 2 seconds, or how accurate you want the answer to be, e.g. 1% error with 95% confidence.

So if you have very large amounts of data (10-100s of Tera Bytes or even Peta Bytes) and you want quick good enough answers then BlinkDB is for you. An early adopter is Facebook. Would you rather have Justin Bieber‘s followers count exactly right in minutes or 99% right as long as your page loads almost instantly? So if you need fast reasonably accurate answers over slow correct answers, BlinkDB is worth checking out.

What can you use BlinkDB for?

  • The obvious use case would be real-time reporting? If you need to take decisions in the blink of an eye, e.g. day traders, and 5-10% error is acceptable, e.g. what is the average change of all commodity prices in the last 2 seconds.
  • Real-time bookings or price comparison in which users want to know the best possible offer but accept some small error margin, e.g. mobile bar-code scanners that deliver product price comparisons in 1 second instead of 10 will dominate the App Store.
  • Any visitor, friends, tweets, total search results, etc. counter on a large website in the world.
  • Any Power Law or Long Tail data in which there are some extremely popular cases, e.g. Justin Bieber followers, or a very large set of infrequent cases, e.g. the number of blogs that have under 1000 visitors per month.
  • Machine Learning solutions and recommendation engines that are using Collaborative Filtering and other types of algorithms that compare an item or user with large groups of other items and users.
  • and many other use cases…

Build your own 4 G LTE pico cell, GPS receiver, Bluetooth, zig bee, etc.

Software defined radio is like software defined networking but for radio networking, you can build whatever by updating the software. Recently a new project got funded on Kickstarter that allows radio amateurs to build anything they want related to radio. BladeRF is an open source USB 3.0 software defined radio for $400.

20130322-221511.jpg

So the usual suspects will be existing 4G, GPS, Zigbee, Wifi, etc. standards but what if some innovators start thinking outside of the box? White spaces would be one option. But what if 5G or 6G no longer is defined in standard bodies but by a community of open source amateurs that jointly work together? Probably it is going a step too far but M2M (machine to machine) / IoT (Internet of things) can still use more efficient standards. Also federated ad-hoc networks that circumvent local censorship or solve outages could become options. Let’s just hope Chinese suppliers can bring down the price of the BladeRF…

Amazon AWS awkward features to fix and enterprise features to add

AWS is used by more and more enterprises today but Amazon should work on several awkward “features” that make daily usage by enterprises difficult.

AWS console consistency
The console is not very consistent and could be made a lot easier for users. Why do elastic load balancers do not have tags? Why VPC, subnets, route tables, etc. do not have names and do you need to work with their IDs? Why are network ACLs stateless and security groups state full? Why are VPC security groups administration pages in VPC and EC2 different? Why can I not see the name of a security group when I use it in an inbound or outbound rule? Why can I give a temporary role to an API but not give a user or group a temporary role similar to sudo or delegated administration? Why RDS tags do not filter out Cloudformation tags when editing and EC2 tags do?

IAM and the console
End-users that are limited to a small subset of services and resources are up for a surprise. They will be able to see the same options as an administrator but after clicking will get a no permission option. It would be so much easier if services, buttons, menus, etc. you don’t have permission to are not visible.

Java AWS API and Eclipse plugin
Probably the worst Java API of the last 10 years. You have to go to restricted instances to see your on-demand instances. You have list, after list, after list to go through to get somewhere. Some times you do getTags, some times you do request and response. You have to use the RDS ARN to get to tags but you only get the ID from the RDS instance. Etc. etc. etc. Amazon should do a 100K competition on who can create a better API. Whoever gets more than 1 million users for their API wins.

Installing the Eclipse plugin
If you don’t use Eclipse JEE, you will need to fight with several plugins but nobody told you that the plugin is only compatible with JEE. If you do not have the Android SDK installed you can not accept the Eclipse license.

CloudFormation
It seems like few are using it because there are no support posts when you Google for it. Then again you can understand why people do not use it. Several limitations in the parameters page. Try creating a secure password for your RDS master user and you can only use letters and numbers. Only have three valid values for a parameter? Why not put them in a drop down? Wait there is no drop down. You go to the end of the wizard before it complains about a problem in the first page. Start a stack name with a number and it will complain at the end as well. Inside Cloudformation scripts you will find several inconsistencies as well, e.g. no tags for security groups, you can not use underscores in name, try using the instance ID in the tag for the name and you get a circular error, etc.

Missing enterprise functionality
Try encrypting your EBS, good luck. Having finally managed to setup a VPN in your VPC and your IT department is ready to start opening it to multiple departments. Wait how are we going to charge them? Linked accounts is no option because we are not going to setup a VPN for each each department. Adding tags to each instance to include them in your usage report? Good luck with automating tags with referential errors, etc. in Cloudformation or rebuilding a custom portal based on the API. What about limiting department X to instance A, B and C? Inconsistently implemented if at all available for the service you want to use. Migrating instances between VPC subnets? Stop, create AMI, start new instance. Forgot to add a security group to an instance? Stop, create AMI, new instance. Why?

Conclusion
Is AWS a bad service or product? Not at all. Is it ready for global enterprise deployment? It will be in the next 24 months. Should I wait till then? If you are not using the Cloud today, then you are already a year late. Elastic scaling, instant provisioning, pay per use, etc. they beat any awkward “features”. But some API design competitions, customer usability studies and a community roadmap driven by votes would go a long way…

5 hardware trends to watch…

      1. Open Compute

        Open Compute is focusing on creating a new type of server, an open source server based on open source storage, motherboards, racks, data center designs, etc. Instead of proprietary designs, Open Compute makes the design open source. Expect prices for these “commoditized” servers to be substantially lower and ready to enable unseen web-scale data centers. The big driver behind the initiative is Facebook.

      2. Printing everything

        Imprint Energy is a start-up that is putting research of the University of California into practice. By printing batteries they become bendable and can have very thin shapes. A new series of applications are possible that were previously unimagible. 3D printing is probably becoming mainstream in 2013-2014 via manufacturing-as-a-service with consumers buying their first printer in 2014-2015. But also bio printing can allow us to create innovation.

      3. Wearable Tech or Fashion Electronics

        Google Glass, Smart Cloths, Nike’s Fuelband, etc. are all examples of wearable tech. However expect printable batteries to make the tech really flat (cloths) or really small (glasses). This means that we haven’t seen anything yet. Also expect the data explosion of sensor data to also include a lot of “human performance data”.

      4. Miniature Arduino

        RFDuino is a good example of how Arduino’s are shrinking. Open source intelligent miniature hardware will revolutionize many industries, e.g. Jardin & pool computers, bike computers, etc.

      5. FPGAs and other open source hardware

        Mojo is a good example of how not only micro-controllers can be made open source but also FPGAs and other hardware controllers. Due to its parallel processing and multimedia processing capabilities, expect revolutionary products in this domain.

How Intel’s Hadoop distribution wants to be different

February 27, 2013 Leave a comment

Intel announced this week it’s Big Data strategy with its crown jewel their own Hadoop distribution. Many people will be surprised that a chip maker wants to be your Hadoop supplier as well. Mcafee is Intel’s most visible enterprise software offering and it was an acquisition not an offering based on organic growth.
Intel’s Hadoop distribution on the other hand was a Chinese project some years ago that turned into a product.
So how is Intel going to compete with Cloudera, Hortonworks, MapR, IBM, EMC/Greenplum, etc.?
Intel Hadoop Distribution is having real-time queries just like Cloudera’s Impala. But instead of being a separate product, they will be embedded in Hive. Intel also looked at Cloudera Manager for inspiration around how to make Hadoop management easy. This part will however only be available for enterprise customers.
One of the main selling point will be performance. intel’s Hadoop will be fully optimized for Intel’s processors and SSD. Another selling point is security. Intel is launching project Rhino that will include more fine grained security and faster encryption. Further more Intel’s Hadoop is based on Yarn, the latest Hadoop branch, that comes with extra features like support for other than map-reduce frameworks and advanced resource management.
Finally unlike Cloudera, MapR and Hortonworks, Cloudera is a blue chip company with a global footprint and big name partnerships like Cisco, SAP, Terradata, Wipro, SAS, Dell, Redhat, etc.
Will it be enough to stop people from running Hadoop on large volumes of low-cost ARM chips? Only time can tell…

5 Strategies for Making Money with the Cloud

January 22, 2013 1 comment

Everybody is hearing Cloud Computing on the television now. Operators will store your contacts in the Cloud. Hosting companies will host your website in the Cloud. Others will store your photos in the Cloud.

However how do you make money with the Cloud?

The first thing is to forget about infrastructure and virtualization. If you are thinking that in 2013, the world needs more IaaS providers then you haven’t seen what is currently on offer (Amazon, Microsoft, Google, Rackspace, Joyent, Verizon/Terramark, IBM, HP, etc.).

So what are alternative strategies:

1) Rocket Internet SaaS Cloning

Your best hope is SaaS and PaaS. The best markets are non-English speaking markets. We have seen an explosion of SaaS in the USA but most have not made it to the rest of the world yet. Only some bigger SaaS solutions (Webex, GoToMeeting, Office 365, etc.)  and PaaS platforms (Salesforce, Workday, etc.) are available outside of the US and the UK. However most SaaS and PaaS solutions are currently still English-only. So the quickest solution to make some money is to just copy, translate and paste some successful English-only SaaS product. If you do not know how to copy dotcoms, take a look at how the Rocket Internet team is doing it. Of course you should always be open for those annoying problems everybody has that could use a new innovative solution and as such create your own SaaS.

2) SaaSification

During the gold rush, be the restaurant, hotel or tool shop. While everybody is looking for the SaaS gold, offer solutions that will save gold diggers time and money. SaaSification allows others to focus on building their SaaS business, not on reinventing for the millionth time a web page, web store, email server, search, CRM, monthly subscription billing, reporting, BI, etc. Instead of a “Use Shopify to create your online store”, it should be “Use <YOUR PRODUCT> to create a SaaS Business”.

3) Mobile & Cloud

Everybody is having, or at least thinking about buying, a Smartphone. However there are very few really good mobile services that fully exploit the Cloud. Yet I can get a shopping list app but most are just glorified to-do lists. None is recommending me where to go and buy based on current promotions and comparison with other buyers. None is helping me find products inside a large supermarket. None is learning from my shopping habits and suggesting items on the list. None is allowing me to take a number at the seafood queue. These are just examples for one mobile + cloud app. Think about any other field and you are sure to find great ideas.

4) Specialized IaaS

I mentioned it before, IaaS is already overcrowded but there is one exception: specialized IaaS. You can focus on specialized hardware, e.g. virtualized GPU, DSP, mobile ARM processors. On network virtualization like SDN and Openflow. Mobile and tablet virtualization. Embedded device virtualization. Machine Learning IaaS. Car Software virtualization.

5) Disruptive Innovations + Cloud

Selling disruptive innovations and offering them as Cloud services. Examples could be 3D printing services, wireless sensor networks / M2M, Big Data, Wearable Tech, Open Source Hardware, etc. The Cloud will lower your costs and give you a global elastically scalable solution.

Follow

Get every new post delivered to your Inbox.

Join 141 other followers

%d bloggers like this: