Sentiment analysis beyond Tweets

October 18, 2014 Leave a comment

Deep belief networks have made it possible to train computers to predict if a sentence is positive, negative or neutral. Most sentiment analysis captures headlines because tweets can be analysed. However are there business applications beyond social networking analytics?

Here are five examples:
1) Investment banking – reading complex reports
The financial industry is shaving off microseconds for high-frequency trading. However these algorithms assume that they can predict what a single big trade will be like. What if super computers would analyse any governmental report, news feed, etc. in real-time at a fraction of the time a human can do this. Initially these algorithms could get the most import data in front of analysts but there is no reason why automatic algorithms would not be able to make trades. There could be algorithms that look for natural disasters. Others that look at the sentiment of national bank reports.
2) Telecom: detecting defects and reading complaints
What do you do when call quality is bad? You send an SMS to the other person with your message plus some insult about your mobile provider. If your bill is too high, then you call their call centre or open a complaint on the website. Computers can more efficiently detect patterns in this behaviour than humans and can raise alerts before large groups of customers start to complain on Twitter.
3) IT: log processing and intrusion detection
Often strange user behaviour can be detected by analysing the commands that are introduced on a command line. Are they neutral, positive or negative? A hacker that is trying to exploit a bug and afterwards enters into log files to destroy their tracks could be caught because their commands are highly negative.
4) Retail: product reviews
What if a customer starts leaving bad reviews? Or even worse average reviews because they feel bad about a certain feature or services but not about the overall experience. Would you rather have a computer tell you in advance or wait until a crowd gathers enough tweets?
5) Politics: election sentiment
Real-time dashboards with sentiments for different candidates by analysing all written press. Find out what voters feel strong about.

The Cloud Winners and Losers?

October 15, 2014 Leave a comment

The cloud is revolutionising IT. However there are two sides to every story: the winners and the losers. Who are they going to be and why? If you can’t wait here are the losers: HP, Oracle, Dell, SAP, RedHat, Infosys, VMWare, EMC, Cisco, etc. Survivors: IBM, Accenture, Intel, Apple, etc. Winners: Amazon, Salesforce, Google, CSC, Workday, Canonical, Metaswitch, Microsoft, ARM, ODMs.

Now the question is why and is this list written in stone?

What has cloud changed?
If you are working in a hardware business (storage, networking, etc. is also included) then cloud computing is a value destroyer. You have an organisation that is assuming small, medium and large enterprises have and always will run their own data centre. As such you have been blown out of the water by the fact that cloud has changed this fundamental rule. All of a sudden Amazon, Google and Facebook go and buy specialised webscale hardware from your suppliers, the ODMs. Facebook all of a sudden open sources hardware, networking, rack and data centre designs and makes it that anybody can compete with you. Cloud is all about scale out and open source hence commodity storage, software defined networks and network virtualisation functions are converting your portfolio in commodity products. If you are an enterprise software vendor then you always assumed that companies will buy an instance of your product, customise it and manage it themselves. You did not expect that software can be offered as a service and that one platform can offer individual solutions to millions of enterprises. You also did not expect that software can be sold by the hour instead of licensed forever. If you are an outsourcing company then you assume that companies that have invested in customising Siebel will want you to run this forever and not move to Salesforce.

Reviewing the losers
HP’s Cloud Strategy
HP has been living from printers and hardware. Meg rightfully has taken the decision to separate the cashcow, stop subsidising other less profitable divisions and let it be milked till it dies. The other group will focus on Cloud, Big Data, etc. However HP Cloud is more expensive and slower moving than any of the big three so economies of scale will push it into niche areas or make it die. HP’s OpenStack is a product that came 2-3 years late to the market. A market as we will see later that is about to be commoditised. HP’s Big Data strategy? Overpay for Vertica and Autonomy and focus your marketing around the lawsuits with former owners, not any unique selling proposition. Also Big Data can only be sold if you have an open source solution that people can test. Big Data customers are small startups that quickly have become large dotcoms. Most enterprises would not know what to do with Hadoop even if they could download it for free [YES you can actually download it for free!!!].
Oracle’s Cloud Strategy
Oracle has been denying Cloud existed until their most laggard customer started asking questions. Until very recently you could only buy Oracle databases by the hour from Amazon. Oracle has been milking the enterprise software market for years and paying surprise visits to audit your usage of their database and send you an unexpected bill. Recently they have started to cloud-wash [and Big Data wash] their software portfolio but Salesforce and Workday already are too far ahead to catch them. A good Christmas book Larry could buy from Amazon would be “The Innovator’s Dilemma“.
Dell’s Cloud Strategy
Go to the main Dell page and you will not find the word Big Data or Cloud. I rest my case.
SAP’s Cloud Strategy
Workday is working hard on making SAP irrelevant. Salesforce overtook Siebel. Workday is likely to do the same with SAP. People don’t want to manage their ERP themselves.
RedHat’s Cloud Strategy
[I work for their biggest competitor] RedHat salesperson to its customers: There are three versions. Fedora if you need innovation but don’t want support. CentOS if you want free but no security updates. RHEL is expensive and old but with support. Compare this to Canonical. There is only one Ubuntu, it is innovative, free to use and if you want support you can buy it extra.
For Cloud the story is that RedHat is three times cheaper than VMWare and your old stuff can be made to work as long as you want it according to a prescribed recipe. Compare this with an innovator that wants to completely commoditise OpenStack [ten times cheaper] and bring the most innovative and flexible solution [any SDN, any storage, any hypervisor, etc.] that instantly solves your problems [deploy different flavours of OpenStack in minutes without needing any help].
Infosys or any outsourcing company
If the data centre is going away then the first thing that will go away is that CRM solution we bought in the 90’s from a company that no longer exists.
For the company that brought virtualisation into the enterprise it is hard to admit that by putting a rest API in front of it, you don’t need their solution in each enterprise any more.
Commodity storage means that scale out storage can be offered at a fraction of the price of a regular EMC SAN solution. However the big killer is Amazon’s S3 that can give you unlimited storage in minutes without worries.
A Cisco router is an extremely expensive device that is hard to manage and build on top of proprietary hardware, a proprietary OS and proprietary software. What do you think will happen in a world where cheap ASIC + commodity CPU, general purpose OS and many thousands of network apps from an app store become available? Or worse, a network will no longer need many physical boxes because most of it is virtualised.
What does a cloud loser mean?
A cloud loser means that their existing cash cows will be crunched by disruptive innovations. Does this mean that losers will disappear or can not recuperate? Some might disappear. However if smart executives in these losing companies would be given the freedom to bring to market new solutions that build on top of the new reality then they might come out stronger. IBM has shown they were able to do so many times.

Let’s look at the cloud survivors.
IBM has shown over and over again that it can reinvent itself. It sold its x86 servers in order to show its employees and the world that the future is no longer there. In the past it bought PWC’s consultancy which will keep on reinventing new service offerings for customers that are lost in the cloud.
Just like PWC’s consultancy arm within IBM, Accenture will have consultants that help people make the transition from data centre to the cloud. Accenture will not be leading the revolution but will be a “me-to” player that can put more people faster than others.
X86 is not going to die soon. The cloud just means others will be buying it. Intel will keep on trying to innovate in software and go nowhere [e.g. Intel's Hadoop was going to eat the world] but at least its processors will keep it above the water.
Apple knows what consumers want but they still need to prove they understand enterprises. Having a locked-in world is fine for consumers but enterprises don’t like it. Either they come up with a creative solution or the billions will not keep on growing.
What does a cloud survivor mean?
A cloud survivor means that the key cash cows will not be killed by the cloud. It does not give a guarantee that the company will grow. It just means that in this revolution, the eye of the tornado rushed over your neighbours house, not yours. You can still have lots of collateral damage…

IaaS = Amazon. No further words needed. Amazon will extend Gov Cloud into Health Cloud, Bank Cloud, Energy Cloud, etc. and remove the main laggard’s argument: “for legal & security reasons I can’t move to the cloud”. Amazon currently has 40-50 Anything-as-a-Service offerings in 36 months they will have 500.
PaaS & SaaS = Salesforce. Salesforce will become more than a CRM on steroids, it will be the world’s business solutions platform. If there is no business solution for it on Salesforce then it is not a business problem worth solving. They are likely to buy competitors like Workday.
Google is the king of the consumer cloud. Google Apps has taken the SME market by storm. Enterprise cloud is not going anywhere soon however. Google was too late with IaaS and is not solving on-premise transitional problems unlike its competitors. With Kubernetes Google will re-educate the current star programmers and over time will revolutionise the way software is written and managed and might win in the long run. Google’s cloud future will be decided in 5-10 years. They invented most of it and showed the world 5 years later in a paper.
CSC has moved away from being a bodyshop to having several strategic important products for cloud orchestration and big data. They have a long-term future focus, employing cloud visionaries like Simon Wardley, that few others match. You don’t win a cloud war in the next quarter. It took Simon 4 years to take Ubuntu from 0% to 70% on public clouds.
What Salesforce did to Oracle’s Siebel, Workday is doing to SAP. Companies that have bought into Salesforce will easily switch to Workday in phase 2.
Since RedHat is probably reading this blog post, I can’t be explicit. But a company of 600 people that controls up to 70% of the operating systems on public clouds, more than 50% of OpenStack, brings out a new server OS every 6 months, a phone OS in the next months, a desktop every 6 months, a complete cloud solution every 6 months, can convert bare-metal into virtual-like cloud resources in minutes, enables anybody to deploy/integrate/scale any software on any cloud or bare-metal server [Intel, IBM Power 8, ARM 64] and is on a mission to completely commoditise cloud infrastructure via open source solutions in 2015 deserves to make it to the list.
Metaswitch has been developing network software for the big network guys for years. These big network guys would put it in a box and sell it extremely expensive. In a world of commodity hardware, open source and scale out, Clearwater and Calico have catapulted Metaswitch to the list of most innovative telecom supplier. Telecom providers will be like cloud providers, they will go to the ODM that really knows how things work and will ignore the OEM that just puts a brand on the box. The Cloud still needs WAN networks. Google Fibre will not rule the world in one day. Telecom operators will have to spend their billions with somebody.
If you are into Windows you will be on Azure and it will be business as usual for Microsoft.
In an ODM dominated world, ARM processors are likely to move from smart phones into network and into cloud.
Nobody knows them but they are the ones designing everybody’s hardware. Over time Amazon, Google and Microsoft might make their own hardware but for the foreseeable future they will keep on buying it “en masse” from ODMs.
What does a cloud winner mean?
Billions and fame for some, large take-overs or IPOs for others. But the cloud war is not over yet. It is not because the first battles were won that enemies can’t invent new weapons or join forces. So the war is not over, it is just beginning. History is written today…

Amazon AWS will continue to compete with its best customers…

If you thought Amazon’s Prime Instant Films is just an exception of Amazon trying to compete with its best customer then you are wrong. This is not an exception but a rule. Simon Wardley just explained why Amazon is fast following their best customers and why more companies should do it to, even in the physical world. The summary is:

If you don’t want to launch a 100 new services and assume failure on 90-95, then let others launch thousands and you commoditise the successful innovations.

So what does this mean?

It means that if you are a young startup that builds everything on AWS then they will just look at the traffic that goes through your servers. If all of a sudden they see that you are picking up more traffic then anybody else, then they will launch a competing solution shortly that commoditises your business. Since they have access to your solution they can actually look inside and see how it works and redesign a more optimised solution.

How to avoid your service to be commoditised by a fast follower?

First of all move faster than anybody else. Full automation is key. If you are faster to respond to customer’s needs then you will attract all customers in a winner takes it all market. Also follow lean startup and A/B testing. Do continuous experiments and only scale up engineering on a new feature after it was demonstrated to be successful with customers on a small scale test.

Second, don’t build for one cloud, build for multiple clouds. If you use cloud orchestration solutions that allow your solution to be moved from one cloud to another one then you are less likely to be trackable by one cloud provider. Treat the cloud providers like they are commodity and move your workloads where it makes more financial sense. Whatever you do, don’t get locked-in by some proprietary services because you will have a hard time moving out. Just ask Netflix how they feel about having their platform ran on top of their biggest competitor’s infrastructure without a chance of moving a way soon. Don’t want to be in their shoes? Use a cloud orchestration solution. Don’t know any open source? Check out Juju

Third, assume you will have fast followers when you start so try to put barriers of entry in place. A good strategy would be to build a business on top of a network effect. Examples: Facebook has over 1 billion users. The more users the more synergies. Even if you would steal away all the code from Facebook and launch Headbook you would not be successful. Network effect businesses tend to be a winner takes it all markets as a consequence. The other counter intuitive strategy is to strategically open source parts of your solution. If you open source parts of your solution then there is nobody that can offer a “cheaper” solution then your freely available solution. So the incentive of building another solution to compete with a free solution is low. Additionally you will get contributions from others hence your team will be able to run faster than anybody else. Finally open source does not mean zero revenue. Netflix has open sourced their architecture. This means they lower their cost and higher their innovation speed but since you don’t have access to their content library and the multiple content they create themselves, it is extremely hard to compete with them. So open source those parts that help your strategy…

Instant Big Data Stream Processing = Instant Storm

September 2, 2014 1 comment

Every 6 months at Canonical, the company behind Ubuntu, I work on something technical to test our tools first hand and to show others new ideas. This time around I created an Instant Big Data solution, more concretely “Instant Storm”.

Storm is now part of the Apache Foundation but previously Storm was build by Nathan Marz during his time at Twitter. Storm is a stream processing engine for real-time and distributed computation. You can use Storm to aggregate real-time flows of events, to do machine learning, for analytics, for distributed ETL, etc.

Storm is build out of several services and requires Zookeeper. It is a complex solution and non-trivial to deploy, integrate and scale.  The first technical project I did at Canonical was to create a Storm Juju charm. Although I was able to automate the deployment of Storm, there were still problems because users still had to read about how to actually use Storm.

Instant Storm is the first effort to resolve this problem. I created a StormDeployer charm that can read a yaml file in which a developer can specify multiple topologies. For each you specify the name of the topology, the jar file, the location in Github, how to package the jar file, etc. Afterwards by uploading the yaml file to Github or any public web server and giving it the extension .storm anybody in the world is able to reuse the topologies instantly in two steps:

1. Deploy the Storm bundle that comes with Storm + Zookeeper + StormDeployer via a simple drag and drop in Juju:

Screen Shot 2014-09-02 at 11.16.442. Get a URL to a storm file and put it into the deploy field of the service settings of the StormDeployer :

Screen Shot 2014-09-02 at 11.20.41


Alternatively you can use the Juju command line: 

juju set stormdeployer "deploy=http://somedomain/somefile.storm"

There are several examples already available on Github but here is one that for sure works:

Screen Shot 2014-09-02 at 11.18.44The StormDeployer will download the project from Github, package the jar with Maven and upload the jar to Storm.  You can check progress in the logs (/opt/storm/latest/log/deploy.log).

This is the easiest way to deploy Storm on any public cloud, private cloud or if Ubuntu’s Metal-as-a-Service / MaaS is used on any bare metal server (X86, ARM64, Power 8). See here for Juju installation instructions.

This is a first version with some limitations. One of the really nice things to add would be to use Juju to make integrations between a topology and other charms dynamic. You can for instance create a spout or bolt that connects to the Kafka or Cassandra charms. Juju can automatically tell the topology the connection information and make updates to the running topologies should anything change. This would make it a lot more robust to run long running Storm topologies.

I am happy to donate my work to the Apache Foundation and guide anybody who wants to take ownership…

Commoditizing Big Data via Instant Big Data Solutions

September 1, 2014 Leave a comment

In 1999 you could easily spend $1M on having a company build a static web site. A few years later any student could make you a web site. HTML became a commodity. The same commodity effect needs to happen to Big Data.

The past: build your own petabyte solution

A few years back only the happy few extremely technically gifted companies were able to create solutions to store TBs and even PBs of data. Google started to write papers. Yahoo and Facebook started to release open source solutions. Shortly after Big Data became a buzz word and anybody that was somebody in the IT consultancy space was talking about Hadoop.

Now: open source solutions and lots of handholding

In 2014 it is possible to download Hadoop, Spark, Storm, etc. You can even find prepackaged solutions from Hortonworks, Cloudera, MapR, Pivotal, IBM, etc. But still Big Data projects are hard. You need very bright people or spend quite a lot to get anywhere. Many projects run over budget and under deliver.

Future: instant Big Data solutions

We are ready for the next step and convert Big Data in a commodity. Several startups are launching Big Data solutions as a service. Unfortunately for many SaaS providers, having a Big Data SaaS solution is not enough. Big Data means lots of data. Data that can hold sensitive information. Data that can grow with GBs a day. This is the reason why if any SaaS Big Data solution ought to be successful, it also needs an on-premise alternative.

We are also missing a portable Big Data logic container. The industry is raving about Docker. Several startups are working on making Docker containers the way to share your map-reduce logic. I predict that many more Big Data logic can be containerised and made portable. Any data scientist should be able to reuse Deep Belief or Random Forest algorithms by just reusing a container.

The other part of the puzzle that is still missing is data visualisation and manipulation tools. There are many Big Data key-value stores and map-reduce engines. However the data visualisation and reporting space is still wide open. The Apache Foundation does not [yet] provide a drag-and-drop tool to setup dashboards, generate reports, schedule notifications, run workflows, automate data imports, etc. 

Industry specific reusable assets is another part that is missing. Nobody wants to go and reinvent eCommerce recommendation algorithms every time a new Big Data platform becomes available. 

However all of this is coming at enormous speeds. As soon as all the pieces of the puzzle are coming together then cloud orchestration solutions like Juju, ServiceMesh, Brooklyn, etc. will allow enterprises to start consuming Big Data solutions as a commodity. Instant Big Data solutions are 6-36 months away depending on your requirements. 

Categories: Big Data, Big Data Future

The next IT revolution: micro-servers and local cloud

Have you ever counted the number of Linux devices at home or work that haven’t been updated since they came out of the factory? Your cable/fibre/ADSL modem, your WiFi point, television sets, NAS storage, routers/bridges, media centres, etc. Typically this class of devices hosts a proprietary hardware platform, an embedded proprietary Linux and a proprietary application. If you are lucky you are able to log into a web GUI often using the admin/admin credentials and upload a new firmware blob. This firmware blob is frequently hard to locate on hardware supplier’s websites. No wonder the NSA and others love to look into potential firmware bugs. They are the ideal source of undetected wiretapping.

The next IT revolution: micro-servers
The next IT revolution is about to happen however. Those proprietary hardware platforms will soon give room for commodity multi-core processors from ARM, Intel, etc. General purpose operating systems will replace legacy proprietary and embedded predecessors. Proprietary and static single purpose apps will be replaced by marketplaces and multiple apps running on one device. Security updates will be sent regularly. Devices and apps will be easy to manage remotely. The next revolution will be around managing millions of micro-servers and the apps on top of them. These micro-servers will behave like a mix of phone apps, Docker containers, and cloud servers. Managing them will be like managing a “local cloud” sometimes also called fog computing.

Micro-servers and IoT?
Are micro-servers some form of Internet of Things. Yes they can be but not all the time. If you have a smarthub that controls your home or office then it is pure IoT. However if you have a router, firewall, fibre modem, micro-antenna station, etc. then the micro-server will just be an improved version of its predecessor.

Why should you care about micro-servers?
If you are a mobile app developer then the micro-servers revolution will be your next battlefield. Local clouds need “Angry Bird”-like successes.
If you are a telecom or network developer then the next-generation of micro-servers will give you unseen potentials to combine traffic shaping with parental control with QoS with security with …
If you are a VC then micro-server solution providers is the type of startups you want to invest in.
If you are a hardware vendor then this is the type of devices or SoCs you want to build.
If you are a Big Data expert then imagine the new data tsunami these devices will generate.
If you are a machine learning expert then you might want to look at algorithms and models that are easy to execute on constraint devices once they have been trained on potentially thousands of cloud servers and petabytes of data.
If you are a Devop then your next challenge will be managing and operating millions of constraint servers.
If you are a cloud innovator then you are likely to want to look into SaaS and PaaS management solutions for micro-servers.
If you are a service provider then this is the type of solutions you want to have the capabilities to manage at scale and easily integrate with.
If you are a security expert then you should start to think about micro-firewalls, anti-micro-viruses, etc.
If you are a business manager then you should think about how new “mega micro-revenue” streams can be obtained or how disruptive “micro- innovations” can give you a competitive advantage.
If you are an analyst or consultant then you can start predicting the next IT revolution and the billions the market will be worth in 2020.

The next steps…
It is still early days but expect some major announcements around micro-servers in the next months…

The next communication challenge: making money with WebRTC

At TADHack some months ago it was clear that SMS and phone calls are out and WebRTC is the new hot technology for developers. Via your browser you can talk to your salesman, doctor and coach. Your browser can be mobile. This means that video calls will be universal as soon as 4G is everywhere. Bad news for operators that will see data on their networks balloon without new revenues. Good news for users that will have a whole new world of communication opening up with voice, video, screen sharing, web apps, etc. all seamlessly integrated.

How can business be generated with WebRTC?

Per minute call billing is out. Unless of course you are talking to a highly paid consultant that charges you by the second or minute. One time payment like mobile apps are only viable if you can embed WebRTC technology in a mobile app, not if you need to support an ongoing business. This means that we need a new subscription model for WebRTC. We need a micro subscription model. Especially for services that will be used on a long term basis, e.g. conference facilities, next generation voice mails, etc. As always operators will be hesitant to cannibalise a juicy per minute business for a low margin 1-99 cents per months subscription service. So are there others that could bill micro-subscriptions? The obvious choice would be cloud providers. They can already do hourly micro billing on monthly cycles hence adding some recurring element would be straightforward. So my prediction is that WebRTC will see operator’s problems accelerate whereby cloud will no longer deliver you only IT solutions but also your communication services.


Get every new post delivered to your Inbox.

Join 314 other followers

%d bloggers like this: