Every 6 months at Canonical, the company behind Ubuntu, I work on something technical to test our tools first hand and to show others new ideas. This time around I created an Instant Big Data solution, more concretely “Instant Storm”.
Storm is now part of the Apache Foundation but previously Storm was build by Nathan Marz during his time at Twitter. Storm is a stream processing engine for real-time and distributed computation. You can use Storm to aggregate real-time flows of events, to do machine learning, for analytics, for distributed ETL, etc.
Storm is build out of several services and requires Zookeeper. It is a complex solution and non-trivial to deploy, integrate and scale. The first technical project I did at Canonical was to create a Storm Juju charm. Although I was able to automate the deployment of Storm, there were still problems because users still had to read about how to actually use Storm.
Instant Storm is the first effort to resolve this problem. I created a StormDeployer charm that can read a yaml file in which a developer can specify multiple topologies. For each you specify the name of the topology, the jar file, the location in Github, how to package the jar file, etc. Afterwards by uploading the yaml file to Github or any public web server and giving it the extension .storm anybody in the world is able to reuse the topologies instantly in two steps:
Alternatively you can use the Juju command line:
juju set stormdeployer "deploy=http://somedomain/somefile.storm"
There are several examples already available on Github but here is one that for sure works:
This is the easiest way to deploy Storm on any public cloud, private cloud or if Ubuntu’s Metal-as-a-Service / MaaS is used on any bare metal server (X86, ARM64, Power 8). See here for Juju installation instructions.
This is a first version with some limitations. One of the really nice things to add would be to use Juju to make integrations between a topology and other charms dynamic. You can for instance create a spout or bolt that connects to the Kafka or Cassandra charms. Juju can automatically tell the topology the connection information and make updates to the running topologies should anything change. This would make it a lot more robust to run long running Storm topologies.
I am happy to donate my work to the Apache Foundation and guide anybody who wants to take ownership…
In 1999 you could easily spend $1M on having a company build a static web site. A few years later any student could make you a web site. HTML became a commodity. The same commodity effect needs to happen to Big Data.
The past: build your own petabyte solution
A few years back only the happy few extremely technically gifted companies were able to create solutions to store TBs and even PBs of data. Google started to write papers. Yahoo and Facebook started to release open source solutions. Shortly after Big Data became a buzz word and anybody that was somebody in the IT consultancy space was talking about Hadoop.
Now: open source solutions and lots of handholding
In 2014 it is possible to download Hadoop, Spark, Storm, etc. You can even find prepackaged solutions from Hortonworks, Cloudera, MapR, Pivotal, IBM, etc. But still Big Data projects are hard. You need very bright people or spend quite a lot to get anywhere. Many projects run over budget and under deliver.
Future: instant Big Data solutions
We are ready for the next step and convert Big Data in a commodity. Several startups are launching Big Data solutions as a service. Unfortunately for many SaaS providers, having a Big Data SaaS solution is not enough. Big Data means lots of data. Data that can hold sensitive information. Data that can grow with GBs a day. This is the reason why if any SaaS Big Data solution ought to be successful, it also needs an on-premise alternative.
We are also missing a portable Big Data logic container. The industry is raving about Docker. Several startups are working on making Docker containers the way to share your map-reduce logic. I predict that many more Big Data logic can be containerised and made portable. Any data scientist should be able to reuse Deep Belief or Random Forest algorithms by just reusing a container.
The other part of the puzzle that is still missing is data visualisation and manipulation tools. There are many Big Data key-value stores and map-reduce engines. However the data visualisation and reporting space is still wide open. The Apache Foundation does not [yet] provide a drag-and-drop tool to setup dashboards, generate reports, schedule notifications, run workflows, automate data imports, etc.
Industry specific reusable assets is another part that is missing. Nobody wants to go and reinvent eCommerce recommendation algorithms every time a new Big Data platform becomes available.
However all of this is coming at enormous speeds. As soon as all the pieces of the puzzle are coming together then cloud orchestration solutions like Juju, ServiceMesh, Brooklyn, etc. will allow enterprises to start consuming Big Data solutions as a commodity. Instant Big Data solutions are 6-36 months away depending on your requirements.
Have you ever counted the number of Linux devices at home or work that haven’t been updated since they came out of the factory? Your cable/fibre/ADSL modem, your WiFi point, television sets, NAS storage, routers/bridges, media centres, etc. Typically this class of devices hosts a proprietary hardware platform, an embedded proprietary Linux and a proprietary application. If you are lucky you are able to log into a web GUI often using the admin/admin credentials and upload a new firmware blob. This firmware blob is frequently hard to locate on hardware supplier’s websites. No wonder the NSA and others love to look into potential firmware bugs. They are the ideal source of undetected wiretapping.
The next IT revolution: micro-servers
The next IT revolution is about to happen however. Those proprietary hardware platforms will soon give room for commodity multi-core processors from ARM, Intel, etc. General purpose operating systems will replace legacy proprietary and embedded predecessors. Proprietary and static single purpose apps will be replaced by marketplaces and multiple apps running on one device. Security updates will be sent regularly. Devices and apps will be easy to manage remotely. The next revolution will be around managing millions of micro-servers and the apps on top of them. These micro-servers will behave like a mix of phone apps, Docker containers, and cloud servers. Managing them will be like managing a “local cloud” sometimes also called fog computing.
Micro-servers and IoT?
Are micro-servers some form of Internet of Things. Yes they can be but not all the time. If you have a smarthub that controls your home or office then it is pure IoT. However if you have a router, firewall, fibre modem, micro-antenna station, etc. then the micro-server will just be an improved version of its predecessor.
Why should you care about micro-servers?
If you are a mobile app developer then the micro-servers revolution will be your next battlefield. Local clouds need “Angry Bird”-like successes.
If you are a telecom or network developer then the next-generation of micro-servers will give you unseen potentials to combine traffic shaping with parental control with QoS with security with …
If you are a VC then micro-server solution providers is the type of startups you want to invest in.
If you are a hardware vendor then this is the type of devices or SoCs you want to build.
If you are a Big Data expert then imagine the new data tsunami these devices will generate.
If you are a machine learning expert then you might want to look at algorithms and models that are easy to execute on constraint devices once they have been trained on potentially thousands of cloud servers and petabytes of data.
If you are a Devop then your next challenge will be managing and operating millions of constraint servers.
If you are a cloud innovator then you are likely to want to look into SaaS and PaaS management solutions for micro-servers.
If you are a service provider then this is the type of solutions you want to have the capabilities to manage at scale and easily integrate with.
If you are a security expert then you should start to think about micro-firewalls, anti-micro-viruses, etc.
If you are a business manager then you should think about how new “mega micro-revenue” streams can be obtained or how disruptive “micro- innovations” can give you a competitive advantage.
If you are an analyst or consultant then you can start predicting the next IT revolution and the billions the market will be worth in 2020.
The next steps…
It is still early days but expect some major announcements around micro-servers in the next months…
At TADHack some months ago it was clear that SMS and phone calls are out and WebRTC is the new hot technology for developers. Via your browser you can talk to your salesman, doctor and coach. Your browser can be mobile. This means that video calls will be universal as soon as 4G is everywhere. Bad news for operators that will see data on their networks balloon without new revenues. Good news for users that will have a whole new world of communication opening up with voice, video, screen sharing, web apps, etc. all seamlessly integrated.
How can business be generated with WebRTC?
Per minute call billing is out. Unless of course you are talking to a highly paid consultant that charges you by the second or minute. One time payment like mobile apps are only viable if you can embed WebRTC technology in a mobile app, not if you need to support an ongoing business. This means that we need a new subscription model for WebRTC. We need a micro subscription model. Especially for services that will be used on a long term basis, e.g. conference facilities, next generation voice mails, etc. As always operators will be hesitant to cannibalise a juicy per minute business for a low margin 1-99 cents per months subscription service. So are there others that could bill micro-subscriptions? The obvious choice would be cloud providers. They can already do hourly micro billing on monthly cycles hence adding some recurring element would be straightforward. So my prediction is that WebRTC will see operator’s problems accelerate whereby cloud will no longer deliver you only IT solutions but also your communication services.
We all have “enjoyed” working with some software that was purchased because “You can’t get fired because you bought…”. This software is known for being the industry leader. Not because it is easy to use, easy to integrate, easy to scale, easy to do anything with,… It often is quite the opposite.
So why do people buy it? First of all it is easy to find experts. There are people out there that have been “enjoying” working with this solution for the last 10 years. It is relatively stable and reliable. There is a complete solution for it with hundreds or thousands of partner solutions. People have just given up on trying to convince their bosses on trying something different.
5 steps to disrupt the Dinosaur
Step 1: the basic use cases
The Pareto rule. What are the 80% of the use cases that only reflect 20% of the functionality.
Step 2: the easy & beautiful & horizontally scalable & multi-tenant clone
Make a solution that reflects 80% of these use cases but make it beautiful and incredibly easy to use. Use the latest horizontally scalable backends, e.g. Cassandra. Build multi-tenancy into the software from day 1.
Step 3: make it open source
Release the “improved clone” as an open source product.
Step 4: the plugin economy
Add a plugin mechanism that allows others to create plugins to fill in the 20% use case gap. Create a marketplace hence others can make money with their plugins. Make money by being the broker. Think App Store but ideally improve the model.
Step 5: the SaaS version
Create a SaaS version and attack the bottom part of the market. Focus on the enterprises that could never afford the original product. Slowly move upwards towards the top segment.
The expected result
You will make have a startup or a new business unit that will make money pretty quickly and will soon be the target of a big purchase offer from the Dinosaur or one of its competitors. You will spend a lot less sleepless nights trying to make money this way then via the creation of the next Angry Bird, Uber 0r Facebook clone.
How do you know if your company is making billions but is about to be disrupted? Imagine you were working at Nokia some years back and you just made a record year but at the same time both the iPhone as well as Android were going viral. If you would have known back then what the future had in store, then you would have switched to Samsung, Google or Apple and would now be an affluent star instead of a jobless dinosaur. What are the 5 signs you should have picked up?
1. Viral competitors
If your competitors are having more potential customers than they can cater for and your company hasn’t: red alert.
2. Lack of leadership
Can you name any Nokia CEO before Elop? [Author of the worst CEO email ever, the one about leaving the burning oil platform but offer no place to go].
3. Many new products but no successes
Remember the first touchscreen Nokia phone. I can not belief anybody liked that product.
4. Growth by expansion
Nokia was growing revenues not because they sold more units in Europe or the US but because they expanded very aggressively globally. Their money maker was their most basic product line that was sold in developing countries. This was in contrast with their competitors that were growing like crazy in Nokia’s key markets.
5. Old technology that is not user friendly
Remember those J2ME times. You wrote apps and packed them in a format that in theory could work everywhere. However users would have to be very persuasive to actually install your application because they would go through several scary dialogues about them really being sure they wanted to install this package.
Who is working in the next Nokia?
Any telecom employee!
1.Viral competitor: viral Facebook/WhatsApp and Google/Hangout
2. Leadership: Except for Cesar Alierta, name 3 telecom CEOs?
3. New products: any new products your operator launched that you were not ashamed to show your friends? Anybody???
4. Growth by expansion: Telefonica’s cash cow = Latin America. Spain is economically dead for them. WhatsApp is growing strong in Spain.
5. Old technology: SS7. No further argument needed.
Any other industries?
Retail vs. eCommerce [Bezos against the world]. Retail banking vs. PayPal/Stripe/Square/etc. HP/Dell/IBM vs. AWS/Azure/etc. VMWare vs. OpenStack.
If you think/know your company or industry is on the list, then nothing better to do then to start crafting your CV and to get up to speed on the competitor’s innovations. Several ex-Nokia experts found good jobs at Apple and Google in the early days. Waiting means you get to see how a new CEO can burn down a successful empire in 24 months…