Presto is Facebook´s answer to Cloudera´s Impala, Hortonworks´ Stinger and Google´s Dremel. Presto is an ANSI-SQL compatible real-time data warehouse query engine so existing data tools should be working with it unlike Hive which needed special integration. Presto is in-memory and runs simple queries in few hundred milliseconds and complex queries in a few minutes. Ideal for interactive data warehousing. Unfortunately Presto will not be open sourced until later this year [probably fall], so the Big Data community will have to be patient.
Open Source real-time massive-scale data warehousing is likely to disrupt existing players like Teradata, Oracle, etc. who until recently were able to charge $100K per tera-byte…
An MIT student recently created a new type of massively distributed database, one that runs on graphical processors instead of CPUs. Mapd, as it has been called, makes use of the immense computational power available in off-the-shelf graphics cards that can be found in any laptop or PC. Mapd is especially suitable for real-time quering, data analysis, machine learning and data visualization. Mapd is probably only one of many databases that will try new hardware configurations to cater for specific application use cases.
Alternative approaches could focus on large sets of cheap mobile processors, Parallella processors, Raspberry PIs, etc. all stitched together. The idea would be to create massive processing clouds based on cheap specialized hardware that could beat traditional CPU Clouds both in price and performance at least for some specific use cases…
Software defined radio is like software defined networking but for radio networking, you can build whatever by updating the software. Recently a new project got funded on Kickstarter that allows radio amateurs to build anything they want related to radio. BladeRF is an open source USB 3.0 software defined radio for $400.
So the usual suspects will be existing 4G, GPS, Zigbee, Wifi, etc. standards but what if some innovators start thinking outside of the box? White spaces would be one option. But what if 5G or 6G no longer is defined in standard bodies but by a community of open source amateurs that jointly work together? Probably it is going a step too far but M2M (machine to machine) / IoT (Internet of things) can still use more efficient standards. Also federated ad-hoc networks that circumvent local censorship or solve outages could become options. Let’s just hope Chinese suppliers can bring down the price of the BladeRF…
AWS is used by more and more enterprises today but Amazon should work on several awkward “features” that make daily usage by enterprises difficult.
AWS console consistency
The console is not very consistent and could be made a lot easier for users. Why do elastic load balancers do not have tags? Why VPC, subnets, route tables, etc. do not have names and do you need to work with their IDs? Why are network ACLs stateless and security groups state full? Why are VPC security groups administration pages in VPC and EC2 different? Why can I not see the name of a security group when I use it in an inbound or outbound rule? Why can I give a temporary role to an API but not give a user or group a temporary role similar to sudo or delegated administration? Why RDS tags do not filter out Cloudformation tags when editing and EC2 tags do?
IAM and the console
End-users that are limited to a small subset of services and resources are up for a surprise. They will be able to see the same options as an administrator but after clicking will get a no permission option. It would be so much easier if services, buttons, menus, etc. you don’t have permission to are not visible.
Java AWS API and Eclipse plugin
Probably the worst Java API of the last 10 years. You have to go to restricted instances to see your on-demand instances. You have list, after list, after list to go through to get somewhere. Some times you do getTags, some times you do request and response. You have to use the RDS ARN to get to tags but you only get the ID from the RDS instance. Etc. etc. etc. Amazon should do a 100K competition on who can create a better API. Whoever gets more than 1 million users for their API wins.
Installing the Eclipse plugin
If you don’t use Eclipse JEE, you will need to fight with several plugins but nobody told you that the plugin is only compatible with JEE. If you do not have the Android SDK installed you can not accept the Eclipse license.
It seems like few are using it because there are no support posts when you Google for it. Then again you can understand why people do not use it. Several limitations in the parameters page. Try creating a secure password for your RDS master user and you can only use letters and numbers. Only have three valid values for a parameter? Why not put them in a drop down? Wait there is no drop down. You go to the end of the wizard before it complains about a problem in the first page. Start a stack name with a number and it will complain at the end as well. Inside Cloudformation scripts you will find several inconsistencies as well, e.g. no tags for security groups, you can not use underscores in name, try using the instance ID in the tag for the name and you get a circular error, etc.
Missing enterprise functionality
Try encrypting your EBS, good luck. Having finally managed to setup a VPN in your VPC and your IT department is ready to start opening it to multiple departments. Wait how are we going to charge them? Linked accounts is no option because we are not going to setup a VPN for each each department. Adding tags to each instance to include them in your usage report? Good luck with automating tags with referential errors, etc. in Cloudformation or rebuilding a custom portal based on the API. What about limiting department X to instance A, B and C? Inconsistently implemented if at all available for the service you want to use. Migrating instances between VPC subnets? Stop, create AMI, start new instance. Forgot to add a security group to an instance? Stop, create AMI, new instance. Why?
Is AWS a bad service or product? Not at all. Is it ready for global enterprise deployment? It will be in the next 24 months. Should I wait till then? If you are not using the Cloud today, then you are already a year late. Elastic scaling, instant provisioning, pay per use, etc. they beat any awkward “features”. But some API design competitions, customer usability studies and a community roadmap driven by votes would go a long way…
- Open Compute
Open Compute is focusing on creating a new type of server, an open source server based on open source storage, motherboards, racks, data center designs, etc. Instead of proprietary designs, Open Compute makes the design open source. Expect prices for these “commoditized” servers to be substantially lower and ready to enable unseen web-scale data centers. The big driver behind the initiative is Facebook.
- Printing everything
Imprint Energy is a start-up that is putting research of the University of California into practice. By printing batteries they become bendable and can have very thin shapes. A new series of applications are possible that were previously unimagible. 3D printing is probably becoming mainstream in 2013-2014 via manufacturing-as-a-service with consumers buying their first printer in 2014-2015. But also bio printing can allow us to create innovation.
- Wearable Tech or Fashion Electronics
Google Glass, Smart Cloths, Nike’s Fuelband, etc. are all examples of wearable tech. However expect printable batteries to make the tech really flat (cloths) or really small (glasses). This means that we haven’t seen anything yet. Also expect the data explosion of sensor data to also include a lot of “human performance data”.
- Miniature Arduino
RFDuino is a good example of how Arduino’s are shrinking. Open source intelligent miniature hardware will revolutionize many industries, e.g. Jardin & pool computers, bike computers, etc.
- FPGAs and other open source hardware
Mojo is a good example of how not only micro-controllers can be made open source but also FPGAs and other hardware controllers. Due to its parallel processing and multimedia processing capabilities, expect revolutionary products in this domain.
Intel announced this week it’s Big Data strategy with its crown jewel their own Hadoop distribution. Many people will be surprised that a chip maker wants to be your Hadoop supplier as well. Mcafee is Intel’s most visible enterprise software offering and it was an acquisition not an offering based on organic growth.
Intel’s Hadoop distribution on the other hand was a Chinese project some years ago that turned into a product.
So how is Intel going to compete with Cloudera, Hortonworks, MapR, IBM, EMC/Greenplum, etc.?
Intel Hadoop Distribution is having real-time queries just like Cloudera’s Impala. But instead of being a separate product, they will be embedded in Hive. Intel also looked at Cloudera Manager for inspiration around how to make Hadoop management easy. This part will however only be available for enterprise customers.
One of the main selling point will be performance. intel’s Hadoop will be fully optimized for Intel’s processors and SSD. Another selling point is security. Intel is launching project Rhino that will include more fine grained security and faster encryption. Further more Intel’s Hadoop is based on Yarn, the latest Hadoop branch, that comes with extra features like support for other than map-reduce frameworks and advanced resource management.
Finally unlike Cloudera, MapR and Hortonworks, Cloudera is a blue chip company with a global footprint and big name partnerships like Cisco, SAP, Terradata, Wipro, SAS, Dell, Redhat, etc.
Will it be enough to stop people from running Hadoop on large volumes of low-cost ARM chips? Only time can tell…