Every day a new orchestration solution is being presented to the world. This post is not about which one is better but about what will happen if you embrace these new technologies.
The traditional scale-up architecture
Before understanding the new solutions, let’s understand what is broken with the current solutions. Enterprise IT vendors have traditionally made software that was sold based on the number of processors. If you were a small company you would have 5 servers, if you were big you would have 50-1000 servers. With the cloud anybody can boot up 50 servers in minutes, so reality has changed. Small companies can manage easily 10000 servers, e.g. think of successful social or mobile startups.
Also software was written optimised for performance per CPU. Many traditional software comes with a long list of exact specifications that need to be followed in order for you to get enterprise support.
Big bloated frameworks are used to manage the thousands of features that are found in traditional enterprise solutions.
The container micro services future
Enterprise software is often hard to use, integrate, scale, etc. This is all the consequence of creating a big monolithic system that contains solutions for as many use cases possible.
In come cloud, containers, micro-services, orchestration, etc. and all rules change.
The best micro services architecture is one where important use cases are reflected in one service, e.g. the shopping cart service deals with your list of purchases however it relies on the session storage service and the identity service to be able to work.
Each service is ran in a micro services container and services can be integrated and scaled in minutes or even seconds.
What benefits do micro services and orchestration bring?
In a monolithic world change means long regression tests and risks. In a micro services world, change means innovation and fast time to market. You can easily upgrade a single service. You can make it scale elastically. You can implement alternative implementations of a service and see which one beats the current implementation. You can do rolling upgrades and rolling rollbacks.
So if enterprise solutions would be available as many reusable services that can all be instantly integrated, upgraded, scaled, etc. then time to market becomes incredibly fast. You have an idea. You implement five alternative versions. You test them. You combine the best three in a new alternative or you use two implementations based on a specific customer segment. All this is impossible with monolithic solutions.
This sounds like we reinvented SOA
Not quite. SOA focused on reusable services but it never embraced containers, orchestration and cloud. By having a container like Docker or a service in the form of a Juju Charm, people can exchange best practice’s instantly. They can be deployed, integrated, scaled, upgraded, etc. SOA only focused on the way services where discovered and consumed. Micro services focus additionally on global reuse, scaling, integration, upgrading, etc.
We are not quite there yet. Standards are still being defined. Not in the traditional standardisation bodies but via market adoption. However expect in the next 12 months to see micro services being orchestrated at large scale via open source solutions. As soon as the IT world has the solution then industry specific solutions will emerge. You will see communication solutions, retail solutions, logistics solutions, etc. Traditional vendors will not be able to keep pace with the innovation speed of a micro services orchestrated industry specific solution. Expect the SAPs, Oracles, etc. of this world to be in chock when all of a sudden nimble HR, recruiting, logistics, inventory, supplier relationship management solutions, etc. emerge that are offered as SaaS and on-premise often open source. Super easy to use, integrate, manage, extend, etc. It will be like LEGO starting a war against custom made toys. You already know who will be able to be more nimble and flexible…
Cisco came up with the term of Fog Computing and The Wall Street Journal has endorsed it, so I guess Fog Computing will become the next hype.
What is Fog Computing?
Internet of Things will embed connectivity into billions of devices. Common thinking says your IoT device is connected to the cloud and shares data for Big Data analytics. However if your Fitbit starts sending your heartbeat every 5 seconds, your thermometer tells the cloud every minute that it is still 23.4 degrees, your car tells the manufacturer its hourly statistics, farmers measure thousands of acres, hospitals measure remote patients health continuously, etc. then your telecom operator will go bankrupt because their network is not designed for this IoT Data Tsunami.
Fog Computing is about taking decisions as close to the data as possible. Hadoop and other Big Data solutions have started the trend to bring processing close to where the data is and not the other way around. Now Fog Computing is about doing the same on a global scale. You want decisions to be taken as close to where the data is generated and stop it from reaching global networks. Only valuable data should be travelling on global networks. Your Fitbit could sent average heartbeat reports every hour or day and only sent alerts when your heartbeat passed a threshold for some amount of time.
How to implement Fog Computing?
Fog Computing is best done via machine learning models that get trained on a fraction of the data on the Cloud. After a model is considered adequate then the model gets pushed to the devices. Having a Decision Tree or some Fuzzy Logic or even a Deep Belief Network run locally on a device to take a decision is lots cheaper than setting up an infrastructure in the Cloud that needs to deal with raw data from millions of devices. So there are economical advantages to use Fog Computing. What is needed are easy to use solutions to train models and send them to highly optimised and low resource intensive execution engines that can be easily embedded in devices, mobile phones and smart hubs/gateways.
Fog Computing is also useful for Non-IoT
Also network elements should become a lot more intelligent. When was the last time you were on a large event with many people around you. Can you imagine any event in the last 24 months where WiFi was working brilliantly? Most of the time WiFi works in the morning when people are still getting in but soon after it stops working. Fog Computing can be the answer here. You only need to analyse data patterns and take decisions on what takes up lots of data. Chances are that all the mobiles, tablets and laptops that are connected to the event WiFi have Dropbox or some other large file sharing enabled. You take some pictures of things on the event and since you are on WiFi the network gets saturated by a photo sharing service that is not really critical for the event. Fog Computing would detect this type of bandwidth abuse and would limit it or even block it. At the moment this has to be done manually but computers would do a lot better job at it. So Software Defined Networking should be all over Fog Computing.
Telecom Operators and Equipment Manufacturers Should Embrace Fog Computing
Telecom operators should heavily invest in Fog Computing by making Open Source standards that can be easily embedded in any device and managed from any cloud. When I say standards, I don’t mean ETSI. I mean organise a global Fog Computing competition with a $10 million award for the best open source Fog Computing solution. Make a foundation around it with a very open license, e.g. Apache License. Invite and if necessary oblige all telecom and general network suppliers to embed it.
The alternatives are…
Not solving this problem will provoke heavy investment in global networks that carry 90% junk data and an IoT Data Tsunami. Solving this problem via network traffic shaping is a dangerous play in which privacy and net neutrality will come up earlier than later. You can not block Dropbox, YouTube or Netflix traffic globally. It is a lot easier if everybody blocks what is not needed or at least minimises such traffic themselves. Most people have no idea how to do it. Creating easy to use open source tools would be a first good step…
An online bookstore did not only redefine retail, content distribution and gave the postal services a second chance, it also is becoming the world’s data centre. The best way, to find out if the hot school girl is open for a new relationship, is now showing IT companies how to build servers & routers and telecom giants how people like to communicate. An online search and advertisement company has revolutionised how you find anything from text, images, location, etc. It redefined mobile computing together with a fruit-like branded company. It has global networks that even the biggest telecom incumbents can only dream off. It has cars that drive alone. Body accessories that puts science fiction authors next to historians.
At the same time stamps, travel agents, maps, telephone books, book publishers, bill boards, broadcasters, movie theatres, journalists, photo film, media storage, video cameras, taxi services, estate agents, high street shops, etc. have changed and not always for better.
If you work for a “traditional” company are you sure that in five years your company still is in business or can it be that some unknown small company launched a product that makes your company’s best products look like they belong in the history museum? Remember Nokia phones!!! Five years ago they had record sales…
If software disruptors have so much power, why aren’t companies hiring chief disruption officers. Senior executives whose goal it is to setup disruptive new product families that are owned by traditional players but are allowed to question any industry rules and launch cannibalising offerings often as independent companies.
It is a lot better that a big bank owns a bit coin exchange, a peer to peer lender, a crowd funded venture capitalist, a mobile payment provider, a micro payment cloud broker, a mobile app currency exchange, a machine learning financial adviser, etc. then being put out of business by any disruptive challenger.
Of course you can always copy the telecom model. Have everybody in your company look for potential cost reductions in the form of virtualized networks, squeezing (and killing) suppliers, etc. while your (mobile) broadband network is 12-36 months away from a data tsunami in the form of 4k streaming video, free mobile video calls, fitbits telling the cloud every minute (or second) your average heart beat and twenty other vital signs, free frequency crowd sourced mobile networks, etc. At a time where your business model has not seen a margin improvement in 10 years, your costs are exploding and your revenue will melt faster than ice in the Sahara.
Why don’t you think about hiring a chief disruption officer before you need to hire a chief miracle officer…
Why is it that a 5 people startup can bring an industry on its knees? There are many answers but open source and horizontal scaling are good answers. Traditionally companies have made solutions that were proprietary and optimised for deployments on a small number of expensive servers. It toke traditional IT departments quite some time to integrate those solutions and they would not touch them for multiple years. The result is that software companies would add a long list of features because customers wanted to be sure the future would be assured. These solutions would be “featureware”. The market leader would have a long list of features and could solve any problem given enough time and money. The more the better.
There is no better example than the telecom industry. Telecom solutions are overloaded with features, hard to use & integrate and as a result very expensive.
If you see this pattern as a disruptor or challenger then you should be extremely pleased. It means that brains can beat the dinosaurs.
Make your solution open source and make it horizontally scalable. Why? Traditional software vendors optimised for specific expensive hardware. Their thinking is that to grow you need a bigger box. Their licences are expensive per socket so customers would be buying the biggest and most expensive servers possible.
If your solution however installs on any public or private cloud, scales horizontally and it is open source then customers that want to save costs (almost everybody in almost all industries!!!), will have their R&D departments try your solution. The temptation is just too big. Make your software easy to use by using the latest web technologies and by focusing on more is less, and you will be a winner.
Let me give you an example. Metaswitch is by all means a traditional telecom solution provider that has been playing according to the rules. One day however they decided that they wanted to be different. They made a open source ims solution (something all telecoms use to handle calls) and used the latest dotcom solutions like memcached and Cassandra. The result is that any telecom R&D department is now testing Clearwater. Via working with Canonical and their award winning open source product Juju, Clearwater will be able to deployed, integrated and scaled in minutes everywhere. So what traditional vendors do in 12 months for many millions you can now do for free in minutes. However nobody will put their solution in production hence customers will pay for a commercially supported version.
Does this only apply to telecom? No! In industrial domains, banking, retail, media, etc. there are many similar potential examples that are coming. Brains will win from Dinosaurs. So if you are willing to be a challenger and convert a billion dollar market into many millions but flowing to only one company, it has never been a better time to become a blue ocean strategist…
Normally I write blog posts in which I answer questions. This time I would like to have somebody else provide the answer. Why is IT solving problems nobody has experienced yet?
I attend a lot of professional events around cloud, big data, IoT, etc. Hardly do I meet customers there. Mostly I meet suppliers that show me the solution to a problem that perhaps Google will experience in 5 years. I am overreacting but most IT problems are about scaling beyond terabytes. The problem is that most enterprises can’t find a quick way to setup a sub domain or to provision a new user in a central identity management system. Most enterprises need weeks if not months to do tasks that IT companies solved 5 or even 10 years ago in minutes. So why is it that trivial problems seem to capture enterprise attention? Just look at what is currently hot! Tableau software, Amazon Redshift and Dotcloud Docker. You would say that SAS, IBM, Teradata, Canonical, RedHat, Solaris/Sun/Oracle, etc. would have solved reporting, data storage for analytics and packaging Linux software. The market does not seem to agree. Can it be that the initial problems where aimed at early adopters and more and more features where added? The result is that by the time the majority started to use the “solution” it was already to complex?
Why do companies like complex solutions? Why are early adopters the drivers of people’s roadmap and not the majority? What does the IT industry need to do to better understand its enterprise customers? What are enterprise customers telling the IT industry? Are they saying one thing and doing another?
Data volumes are growing exponentially. Unstructured data from Twitter, LinkedIn, Mailling Lists, etc. has the potential to transform many industries if it could be combined with structured data. Machine learning, natural language processing, sentiment analysis, etc. everybody talks about them, hardly anybody is really using them at scale. Too many people when they talk about Big Data unfortunately start with the answer and then ask what the problem it. The answer seems to be Hadoop. News flash: Hadoop is not the answer and if you start from the answer to look for problems then you are doing it wrong.
What are Common Data Problems?
Most Big Data problems are about storage and reporting. How do I store all the exponentially growing data in such a way that business managers can get to in seconds when they need it? Ad-hoc reporting, adequate prediction, and making sense of the exponentially growing data stream are the key problems.
Big Data Storage?
Do you have relational data, unstructured data, graph data, etc.? How do you store different types of data and make it available inside an enterprise? The basics for big data storage is cloud storage technology. You want to store any type of data and be able to quickly scale up storage. RedHat did not buy Inktank for $175M because traditional storage has solved all of today’s problems. Premium SAN and other storage technologies are old school. They are too expensive for Big Data. They were designed with the idea that each byte of data is critical for an enterprise. Unfortunately this is no longer the case. You mind loosing transactional sales data. You don’t mind so much loosing sample tweets you bought from Datasift or Apache log files from an internal low-impact server. This is where cloud storage solutions like Inktank’s Ceph allow commodity storage to be built that is reliable, scalable and extremely cost effective. Does this mean you don’t need SANs any more? Wrong again. TV did not kill Radio. Same here.
Cloud storage technologies are needed because each type of data behaves differently. If you have log data that only is appended then HDFS is fine. If you have read-mostly data then a relational database is ideal. If you have write-mostly data then you need to look at NoSQL. If you need heavy read-and-write then you need strong Big Data architecture skills. What is more important: short latency, consistency, reliability, cheap storage, etc.? Each of these means that the solution is different. No latency means in-memory or SSD. Consistency means transactional. Reliability means replication. You can even now find inconsistent databases like BlinkDB. There is no longer one size fits all. Oracle is no longer the answer to everybody’s data questions.
What will companies need? Companies need cloud storage solutions that offer these different storage capabilities like a service. Amazon’s RDS, DynamoDB, S3 and Redshift are examples of what companies need. However companies need more flexibility. They need to be able to migrate their data between public cloud providers to optimise their costs and have added security. They also need to be able to store data in private local clouds or nearby hosted private clouds for latency or regulatory reasons.
The future of ETL & BI
Traditional ETL will see a revolution. ETL never worked. Business managers don’t want to go and ask their IT department to make a change in a star schema in order to import some extra data from the Internet followed by updates to reports and dashboards. Business managers want an easy to use tool that can answer their ad-hoc queries. This is the reason why Tableau Software + Amazon Redshift are growing like crazy. However if your organisation is starting to pump terabytes of data into Redshift, please be warned: The day will come that Amazon sends you a bill that your CxO will not want to pay and he/she will want you to move out of Amazon. What will you do then? Do you have an exit strategy?
The future of ETL and BI will be web tools that any business manager can use to create ad-hoc reports. The Office generation wants to see dynamic HTML5 GUIs that allow them to drag-and-drop data queries into ad-hoc reports and dashboards. If you need training then the tool is too difficult.
These next-generation BI tools will need dynamic back-office solutions that allow storing real-time, graph, blob, historical relational, unstructured, etc. data into a commonly accessible cloud storage solution. Each one will be hosted by a different cloud service but they will all be an API away. Software will be packaged in such a way that it knows how to export its own data. Why do you need to know where Apache stores the access and error logs and in which format? Apache should be able to export whatever interesting information it contains in a standardised way into some deep storage. Machine learning should be used to make decisions on how best to store that data for ad-hoc reporting afterwards. Humans should no longer be involved in this process.
Talking about machine learning. With the volumes of data growing from gigabytes into petabytes, traditional data scientists will not scale. In many companies a data scientist is similar to a report monkey: “Find out why in region X we sold Y% less”, etc. Data scientist should not be synonymous for dynamic report generators. Data scientists should be machine learning experts. They should tell the computer what they want, not how they want it. Today’s data scientists pride themselves they know R, Python, etc. These tools are too low-level to be usable at scale. There are just not enough people in the world to learn R. Data is growing exponentially, R experts at best can grow linear. What we need are machine learning GUI solutions like RapidMiner Studio but supported by Petabyte cloud solutions. A short term solution could be an HTML5 GUI version of RapidMiner Studio that connects to a back-end set of cloud services that use some of the nice Apache Spark extensions for machine learning, streaming, Big Data warehousing/SQL, graph retrieval, etc. or solutions based on Druid.io. For sure there are other solutions possible.
What is important is that companies start realising that data is becoming a strategic weapon. Those companies that are able to collect more of it and convert it into valuable knowledge and wisdom will be tomorrow’s giants. Most average machine learning algorithms become substantially better just by throwing more and more data at them. This means that having a Big Data architecture is not as critical as having the best trained models in the industry and continue to train them. There will be a data divide between the have’s and have-not’s. Google, Facebook, Microsoft and others have been buying any startup that smells like Deep Belief Networks. They have done this with a good reason. They know that tomorrow’s algorithms and models will be more valuable than diamonds and gold. If you want to be one of the have’s then you need to invest in cloud storage now. You need to have massive historical data volumes to train tomorrow’s algorithms and start building the foundations today…
This week a new Juju Lab was launched: Instant Single Sign-on and 2-Factor Authentication. The Juju Lab is a new direction for Juju Innovation in which a community of contributors builds a revolutionizing solution for a common problem. This time the problem is how to make the world more secure instantly. Juju Labs works like Kickstarter. Either goals are met and the project becomes a full Juju solution or the project dies.
Future Juju Labs are being considered. Everything from enterprise Java auto-tuning, instantly scaling PHP, instant legacy integration, instant BI, etc. As long as it solves a common problem in an exponentially better way, creating a Juju Lab is an option.
The main problem is how can you quickly evaluate which common problem to tackle first. Any ideas are welcome…