Have you ever counted the number of Linux devices at home or work that haven’t been updated since they came out of the factory? Your cable/fibre/ADSL modem, your WiFi point, television sets, NAS storage, routers/bridges, media centres, etc. Typically this class of devices hosts a proprietary hardware platform, an embedded proprietary Linux and a proprietary application. If you are lucky you are able to log into a web GUI often using the admin/admin credentials and upload a new firmware blob. This firmware blob is frequently hard to locate on hardware supplier’s websites. No wonder the NSA and others love to look into potential firmware bugs. They are the ideal source of undetected wiretapping.
The next IT revolution: micro-servers
The next IT revolution is about to happen however. Those proprietary hardware platforms will soon give room for commodity multi-core processors from ARM, Intel, etc. General purpose operating systems will replace legacy proprietary and embedded predecessors. Proprietary and static single purpose apps will be replaced by marketplaces and multiple apps running on one device. Security updates will be sent regularly. Devices and apps will be easy to manage remotely. The next revolution will be around managing millions of micro-servers and the apps on top of them. These micro-servers will behave like a mix of phone apps, Docker containers, and cloud servers. Managing them will be like managing a “local cloud” sometimes also called fog computing.
Micro-servers and IoT?
Are micro-servers some form of Internet of Things. Yes they can be but not all the time. If you have a smarthub that controls your home or office then it is pure IoT. However if you have a router, firewall, fibre modem, micro-antenna station, etc. then the micro-server will just be an improved version of its predecessor.
Why should you care about micro-servers?
If you are a mobile app developer then the micro-servers revolution will be your next battlefield. Local clouds need “Angry Bird”-like successes.
If you are a telecom or network developer then the next-generation of micro-servers will give you unseen potentials to combine traffic shaping with parental control with QoS with security with …
If you are a VC then micro-server solution providers is the type of startups you want to invest in.
If you are a hardware vendor then this is the type of devices or SoCs you want to build.
If you are a Big Data expert then imagine the new data tsunami these devices will generate.
If you are a machine learning expert then you might want to look at algorithms and models that are easy to execute on constraint devices once they have been trained on potentially thousands of cloud servers and petabytes of data.
If you are a Devop then your next challenge will be managing and operating millions of constraint servers.
If you are a cloud innovator then you are likely to want to look into SaaS and PaaS management solutions for micro-servers.
If you are a service provider then this is the type of solutions you want to have the capabilities to manage at scale and easily integrate with.
If you are a security expert then you should start to think about micro-firewalls, anti-micro-viruses, etc.
If you are a business manager then you should think about how new “mega micro-revenue” streams can be obtained or how disruptive “micro- innovations” can give you a competitive advantage.
If you are an analyst or consultant then you can start predicting the next IT revolution and the billions the market will be worth in 2020.
The next steps…
It is still early days but expect some major announcements around micro-servers in the next months…
Data volumes are growing exponentially. Unstructured data from Twitter, LinkedIn, Mailling Lists, etc. has the potential to transform many industries if it could be combined with structured data. Machine learning, natural language processing, sentiment analysis, etc. everybody talks about them, hardly anybody is really using them at scale. Too many people when they talk about Big Data unfortunately start with the answer and then ask what the problem it. The answer seems to be Hadoop. News flash: Hadoop is not the answer and if you start from the answer to look for problems then you are doing it wrong.
What are Common Data Problems?
Most Big Data problems are about storage and reporting. How do I store all the exponentially growing data in such a way that business managers can get to in seconds when they need it? Ad-hoc reporting, adequate prediction, and making sense of the exponentially growing data stream are the key problems.
Big Data Storage?
Do you have relational data, unstructured data, graph data, etc.? How do you store different types of data and make it available inside an enterprise? The basics for big data storage is cloud storage technology. You want to store any type of data and be able to quickly scale up storage. RedHat did not buy Inktank for $175M because traditional storage has solved all of today’s problems. Premium SAN and other storage technologies are old school. They are too expensive for Big Data. They were designed with the idea that each byte of data is critical for an enterprise. Unfortunately this is no longer the case. You mind loosing transactional sales data. You don’t mind so much loosing sample tweets you bought from Datasift or Apache log files from an internal low-impact server. This is where cloud storage solutions like Inktank’s Ceph allow commodity storage to be built that is reliable, scalable and extremely cost effective. Does this mean you don’t need SANs any more? Wrong again. TV did not kill Radio. Same here.
Cloud storage technologies are needed because each type of data behaves differently. If you have log data that only is appended then HDFS is fine. If you have read-mostly data then a relational database is ideal. If you have write-mostly data then you need to look at NoSQL. If you need heavy read-and-write then you need strong Big Data architecture skills. What is more important: short latency, consistency, reliability, cheap storage, etc.? Each of these means that the solution is different. No latency means in-memory or SSD. Consistency means transactional. Reliability means replication. You can even now find inconsistent databases like BlinkDB. There is no longer one size fits all. Oracle is no longer the answer to everybody’s data questions.
What will companies need? Companies need cloud storage solutions that offer these different storage capabilities like a service. Amazon’s RDS, DynamoDB, S3 and Redshift are examples of what companies need. However companies need more flexibility. They need to be able to migrate their data between public cloud providers to optimise their costs and have added security. They also need to be able to store data in private local clouds or nearby hosted private clouds for latency or regulatory reasons.
The future of ETL & BI
Traditional ETL will see a revolution. ETL never worked. Business managers don’t want to go and ask their IT department to make a change in a star schema in order to import some extra data from the Internet followed by updates to reports and dashboards. Business managers want an easy to use tool that can answer their ad-hoc queries. This is the reason why Tableau Software + Amazon Redshift are growing like crazy. However if your organisation is starting to pump terabytes of data into Redshift, please be warned: The day will come that Amazon sends you a bill that your CxO will not want to pay and he/she will want you to move out of Amazon. What will you do then? Do you have an exit strategy?
The future of ETL and BI will be web tools that any business manager can use to create ad-hoc reports. The Office generation wants to see dynamic HTML5 GUIs that allow them to drag-and-drop data queries into ad-hoc reports and dashboards. If you need training then the tool is too difficult.
These next-generation BI tools will need dynamic back-office solutions that allow storing real-time, graph, blob, historical relational, unstructured, etc. data into a commonly accessible cloud storage solution. Each one will be hosted by a different cloud service but they will all be an API away. Software will be packaged in such a way that it knows how to export its own data. Why do you need to know where Apache stores the access and error logs and in which format? Apache should be able to export whatever interesting information it contains in a standardised way into some deep storage. Machine learning should be used to make decisions on how best to store that data for ad-hoc reporting afterwards. Humans should no longer be involved in this process.
Talking about machine learning. With the volumes of data growing from gigabytes into petabytes, traditional data scientists will not scale. In many companies a data scientist is similar to a report monkey: “Find out why in region X we sold Y% less”, etc. Data scientist should not be synonymous for dynamic report generators. Data scientists should be machine learning experts. They should tell the computer what they want, not how they want it. Today’s data scientists pride themselves they know R, Python, etc. These tools are too low-level to be usable at scale. There are just not enough people in the world to learn R. Data is growing exponentially, R experts at best can grow linear. What we need are machine learning GUI solutions like RapidMiner Studio but supported by Petabyte cloud solutions. A short term solution could be an HTML5 GUI version of RapidMiner Studio that connects to a back-end set of cloud services that use some of the nice Apache Spark extensions for machine learning, streaming, Big Data warehousing/SQL, graph retrieval, etc. or solutions based on Druid.io. For sure there are other solutions possible.
What is important is that companies start realising that data is becoming a strategic weapon. Those companies that are able to collect more of it and convert it into valuable knowledge and wisdom will be tomorrow’s giants. Most average machine learning algorithms become substantially better just by throwing more and more data at them. This means that having a Big Data architecture is not as critical as having the best trained models in the industry and continue to train them. There will be a data divide between the have’s and have-not’s. Google, Facebook, Microsoft and others have been buying any startup that smells like Deep Belief Networks. They have done this with a good reason. They know that tomorrow’s algorithms and models will be more valuable than diamonds and gold. If you want to be one of the have’s then you need to invest in cloud storage now. You need to have massive historical data volumes to train tomorrow’s algorithms and start building the foundations today…
An MIT student recently created a new type of massively distributed database, one that runs on graphical processors instead of CPUs. Mapd, as it has been called, makes use of the immense computational power available in off-the-shelf graphics cards that can be found in any laptop or PC. Mapd is especially suitable for real-time quering, data analysis, machine learning and data visualization. Mapd is probably only one of many databases that will try new hardware configurations to cater for specific application use cases.
Alternative approaches could focus on large sets of cheap mobile processors, Parallella processors, Raspberry PIs, etc. all stitched together. The idea would be to create massive processing clouds based on cheap specialized hardware that could beat traditional CPU Clouds both in price and performance at least for some specific use cases…
Everybody is hearing Cloud Computing on the television now. Operators will store your contacts in the Cloud. Hosting companies will host your website in the Cloud. Others will store your photos in the Cloud.
However how do you make money with the Cloud?
The first thing is to forget about infrastructure and virtualization. If you are thinking that in 2013, the world needs more IaaS providers then you haven’t seen what is currently on offer (Amazon, Microsoft, Google, Rackspace, Joyent, Verizon/Terramark, IBM, HP, etc.).
So what are alternative strategies:
1) Rocket Internet SaaS Cloning
Your best hope is SaaS and PaaS. The best markets are non-English speaking markets. We have seen an explosion of SaaS in the USA but most have not made it to the rest of the world yet. Only some bigger SaaS solutions (Webex, GoToMeeting, Office 365, etc.) and PaaS platforms (Salesforce, Workday, etc.) are available outside of the US and the UK. However most SaaS and PaaS solutions are currently still English-only. So the quickest solution to make some money is to just copy, translate and paste some successful English-only SaaS product. If you do not know how to copy dotcoms, take a look at how the Rocket Internet team is doing it. Of course you should always be open for those annoying problems everybody has that could use a new innovative solution and as such create your own SaaS.
During the gold rush, be the restaurant, hotel or tool shop. While everybody is looking for the SaaS gold, offer solutions that will save gold diggers time and money. SaaSification allows others to focus on building their SaaS business, not on reinventing for the millionth time a web page, web store, email server, search, CRM, monthly subscription billing, reporting, BI, etc. Instead of a “Use Shopify to create your online store”, it should be “Use <YOUR PRODUCT> to create a SaaS Business”.
3) Mobile & Cloud
Everybody is having, or at least thinking about buying, a Smartphone. However there are very few really good mobile services that fully exploit the Cloud. Yet I can get a shopping list app but most are just glorified to-do lists. None is recommending me where to go and buy based on current promotions and comparison with other buyers. None is helping me find products inside a large supermarket. None is learning from my shopping habits and suggesting items on the list. None is allowing me to take a number at the seafood queue. These are just examples for one mobile + cloud app. Think about any other field and you are sure to find great ideas.
4) Specialized IaaS
I mentioned it before, IaaS is already overcrowded but there is one exception: specialized IaaS. You can focus on specialized hardware, e.g. virtualized GPU, DSP, mobile ARM processors. On network virtualization like SDN and Openflow. Mobile and tablet virtualization. Embedded device virtualization. Machine Learning IaaS. Car Software virtualization.
5) Disruptive Innovations + Cloud
Selling disruptive innovations and offering them as Cloud services. Examples could be 3D printing services, wireless sensor networks / M2M, Big Data, Wearable Tech, Open Source Hardware, etc. The Cloud will lower your costs and give you a global elastically scalable solution.
If you just invested a lot of money in a Big Data solution from any of the traditional BI vendors (Teradata, IBM, Oracle, SAS, EMC, HP, etc.) then you are likely to see a sub-optimal ROI in 2013.
Several innovations will come in 2013 that will change the value of Big Data exponentially. Other technology innovations are just waiting for smart start-ups to put them into good use.
The first major innovation will be Google’s Dremel-like solutions coming of age like Impala, Drill, etc. They will allow real-time queries on Big Data and be open source. So you will get a superior offering compared to what is currently available for free.
Cloud-Based Big Data Solutions
The absolute market leader is Amazon with EMR. Elastic Map Reduce is not so much about being able to run a Map Reduce operation in the Cloud but about paying for what you use and not more. The traditional BI vendors are still getting their head around a usage-based licensing for the Cloud. Except a lot of smart startups to come up with really innovative Big Data and Cloud solutions.
Big Data Appliances
You can buy some really expensive Big Data Appliances but also here disruptive players are likely to change the market. GPUs are relatively cheap. Stack them into servers and use something like Virtual OpenCL to make your own GPU virtualization cluster solution. These type of home-made GPU clusters are already being used for security Big Data related work.
Finally Parallella will put a 16-core supercomputer into everybody’s hands for $99. Their 2013 supercomputer challenge is definitely something to keep your eyes on. Their roadmap talks about 64 and 1000 core versions. If Adapteva can keep their promises and flood the market with Parallella’s then expect Parallella Clusters to be 2013 Big Data Appliance.
Distributed Machine Learning
Mahout is a cool project but Map Reduce might not be the best possible architecture to run iterative distributed backpropagation or any other machine learning algorithms. Jubatus looks promising. Also algorithm innovations like HogWild could really change the dynamics for efficient distributed machine learning. This space is definitely ready for more ground-breaking innovations in 2013.
Easier Big Data Tools
This is still a big white spot in the Open Source field. Having Open Source and easy to use drag-and-drop tools for Big Data Analytics would really excel the adoption. We already have some good commercial examples (Radoop = RapidMiner + Mahout, Tableau, Datameer, etc.) but we are missing good Open Source tools.
I am currently looking for new challenges so if you are active in the Big Data space and are looking for a knowledgable senior executive be sure to contact me at maarten at telruptive dot com.