Archive

Posts Tagged ‘big data’

The next IT revolution: micro-servers and local cloud

Have you ever counted the number of Linux devices at home or work that haven’t been updated since they came out of the factory? Your cable/fibre/ADSL modem, your WiFi point, television sets, NAS storage, routers/bridges, media centres, etc. Typically this class of devices hosts a proprietary hardware platform, an embedded proprietary Linux and a proprietary application. If you are lucky you are able to log into a web GUI often using the admin/admin credentials and upload a new firmware blob. This firmware blob is frequently hard to locate on hardware supplier’s websites. No wonder the NSA and others love to look into potential firmware bugs. They are the ideal source of undetected wiretapping.

The next IT revolution: micro-servers
The next IT revolution is about to happen however. Those proprietary hardware platforms will soon give room for commodity multi-core processors from ARM, Intel, etc. General purpose operating systems will replace legacy proprietary and embedded predecessors. Proprietary and static single purpose apps will be replaced by marketplaces and multiple apps running on one device. Security updates will be sent regularly. Devices and apps will be easy to manage remotely. The next revolution will be around managing millions of micro-servers and the apps on top of them. These micro-servers will behave like a mix of phone apps, Docker containers, and cloud servers. Managing them will be like managing a “local cloud” sometimes also called fog computing.

Micro-servers and IoT?
Are micro-servers some form of Internet of Things. Yes they can be but not all the time. If you have a smarthub that controls your home or office then it is pure IoT. However if you have a router, firewall, fibre modem, micro-antenna station, etc. then the micro-server will just be an improved version of its predecessor.

Why should you care about micro-servers?
If you are a mobile app developer then the micro-servers revolution will be your next battlefield. Local clouds need “Angry Bird”-like successes.
If you are a telecom or network developer then the next-generation of micro-servers will give you unseen potentials to combine traffic shaping with parental control with QoS with security with …
If you are a VC then micro-server solution providers is the type of startups you want to invest in.
If you are a hardware vendor then this is the type of devices or SoCs you want to build.
If you are a Big Data expert then imagine the new data tsunami these devices will generate.
If you are a machine learning expert then you might want to look at algorithms and models that are easy to execute on constraint devices once they have been trained on potentially thousands of cloud servers and petabytes of data.
If you are a Devop then your next challenge will be managing and operating millions of constraint servers.
If you are a cloud innovator then you are likely to want to look into SaaS and PaaS management solutions for micro-servers.
If you are a service provider then this is the type of solutions you want to have the capabilities to manage at scale and easily integrate with.
If you are a security expert then you should start to think about micro-firewalls, anti-micro-viruses, etc.
If you are a business manager then you should think about how new “mega micro-revenue” streams can be obtained or how disruptive “micro- innovations” can give you a competitive advantage.
If you are an analyst or consultant then you can start predicting the next IT revolution and the billions the market will be worth in 2020.

The next steps…
It is still early days but expect some major announcements around micro-servers in the next months…

Software Defined Everything

The other day Taxis in London where on strike because Uber was setting up shop in London. Do you know a lot of people that still send paper letters? Book holiday flights via a travel agent?  Buy books in book stores? Rent DVD movies?

5 smart programmers can bring down a whole multi-billion industry and change people’s habits. It has long been known that any company that changes people habits becomes a multi-billion company. Cereals for breakfast, brown coloured sweet water, throw-away shaving equipment, online bookstore, online search & ads, etc. You probably figured out the name of the brand already.

Software Defined Everything is Accelerating

The Cloud, crowd funding, open source, open hardware, 3D printing, Big Data, machine learning, Internet of Things, mobile, wearables, nanotechnology, social networks, etc. all seem individual technology innovations. However things are changing.

Your Fitbit will send your vital signs via your mobile to the cloud where deep belief networks analyse it and find out that you are stressed. Your smart hub detects you are approaching your garage and your Arduino controller linked to your IP camera encased in a 3D printing housing detects that you brought a visitor. A LinkedIn and Facebook image scan finds that your visitor is your boss’s boss. Your Fitbit and Google Calendar have given away over the last months that whenever you have a meeting with your boss’s boss, you get stressed. Your boss’s boss music preferences are guesses based on public information available on social networks. Your smart watch gets a push notification with the personal profile data that could be gathered from your boss’s boss: he has two boys and a girl, got recently divorced, the girl recently won a chess award, a facebook tagged picture shows your boss in a golf tournament three weeks ago, an Amazon book review indicates that he likes Shakespeare but only the early work, etc. All of a sudden your house shows pictures of that one time you plaid golf. Music plays according to what 96.5% of Shakespeare lovers like from a crowd-funded bluetooth in-house speaker system…

It might be a bit farfetched but what used to be disjoint technologies and innovations are fast coming together. Those companies that can both understand the latest cutting-edge innovations and be able to apply them to improve their customer’s life or solve business problems will have a big competitive edge.

Software is fast defining more and more industries. Media, logistics, telecom, banking, retail, industrial, even agriculture will see major changes due to software (and hardware) innovations.

What should you do? If you are technology savvy?

You should look for customers that want faster horses and draw a picture of a car. Make a slide deck. Get feedback and adjust. Build a prototype. Get feedback and adjust. Create a minimum valuable product. Get feedback and adjust… Change the world.

If you have a business problem and money but are not technology savvy?  

Organise a competition in which you ask people to solve your problem and give prices to the best solution. You will be amazed by what can come out of these.

If you work in a traditional industry and think software is not going to redefine what you do?

Call your investment manager and ask them if you have enough money in the bank to retire in case you would get fired next year and wouldn’t be able to find a job any more. If the answer is no! Then start reading the top of the blog post again…

The future of Big Data is linked to Cloud

Data volumes are growing exponentially. Unstructured data from Twitter, LinkedIn, Mailling Lists, etc. has the potential to transform many industries if it could be combined with structured data. Machine learning, natural language processing, sentiment analysis, etc. everybody talks about them, hardly anybody is really using them at scale. Too many people when they talk about Big Data unfortunately start with the answer and then ask what the problem it. The answer seems to be Hadoop. News flash: Hadoop is not the answer and if you start from the answer to look for problems then you are doing it wrong.

What are Common Data Problems?

Most Big Data problems are about storage and reporting. How do I store all the exponentially growing data in such a way that business managers can get to in seconds when they need it? Ad-hoc reporting, adequate prediction, and making sense of the exponentially growing data stream are the key problems.

Big Data Storage?

Do you have relational data, unstructured data, graph data, etc.? How do you store different types of data and make it available inside an enterprise? The basics for big data storage is cloud storage technology. You want to store any type of data and be able to quickly scale up storage. RedHat did not buy Inktank for $175M because traditional storage has solved all of today’s problems. Premium SAN and other storage technologies are old school. They are too expensive for Big Data. They were designed with the idea that each byte of data is critical for an enterprise. Unfortunately this is no longer the case. You mind loosing transactional sales data. You don’t mind so much loosing sample tweets you bought from Datasift or Apache log files from an internal low-impact server. This is where cloud storage solutions like Inktank’s Ceph allow commodity storage to be built that is reliable, scalable and extremely cost effective. Does this mean you don’t need SANs any more? Wrong again. TV did not kill Radio. Same here.

Cloud storage technologies are needed because each type of data behaves differently. If you have log data that only is appended then HDFS is fine. If you have read-mostly data then a relational database is ideal. If you have write-mostly data then you need to look at NoSQL. If you need heavy read-and-write then you need strong Big Data architecture skills. What is more important: short latency, consistency, reliability, cheap storage, etc.? Each of these means that the solution is different. No latency means in-memory or SSD. Consistency means transactional. Reliability means replication. You can even now find inconsistent databases like BlinkDB. There is no longer one size fits all. Oracle is no longer the answer to everybody’s data questions.

What will companies need? Companies need cloud storage solutions that offer these different storage capabilities like a service. Amazon’s RDS, DynamoDB, S3 and Redshift are examples of what companies need. However companies need more flexibility. They need to be able to migrate their data between public cloud providers to optimise their costs and have added security. They also need to be able to store data in private local clouds or nearby hosted private clouds for latency or regulatory reasons.

The future of ETL & BI

Traditional ETL will see a revolution. ETL never worked. Business managers don’t want to go and ask their IT department to make a change in a star schema in order to import some extra data from the Internet followed by updates to reports and dashboards. Business managers want an easy to use tool that can answer their ad-hoc queries. This is the reason why Tableau Software + Amazon Redshift are growing like crazy. However if your organisation is starting to pump terabytes of data into Redshift, please be warned: The day will come that Amazon sends you a bill that your CxO will not want to pay and he/she will want you to move out of Amazon. What will you do then? Do you have an exit strategy?

The future of ETL and BI will be web tools that any business manager can use to create ad-hoc reports. The Office generation wants to see dynamic HTML5 GUIs that allow them to drag-and-drop data queries into ad-hoc reports and dashboards. If you need training then the tool is too difficult.

These next-generation BI tools will need dynamic back-office solutions that allow storing real-time, graph, blob,  historical relational, unstructured, etc. data into a commonly accessible cloud storage solution. Each one will be hosted by a different cloud service but they will all be an API away. Software will be packaged in such a way that it knows how to export its own data. Why do you need to know where Apache stores the access and error logs and in which format? Apache should be able to export whatever interesting information it contains in a standardised way into some deep storage. Machine learning should be used to make decisions on how best to store that data for ad-hoc reporting afterwards. Humans should no longer be involved in this process.

Talking about machine learning. With the volumes of data growing from gigabytes into petabytes, traditional data scientists will not scale. In many companies a data scientist is similar to a report monkey: “Find out why in region X we sold Y% less”, etc. Data scientist should not be synonymous for dynamic report generators. Data scientists should be machine learning experts. They should tell the computer what they want, not how they want it. Today’s data scientists pride themselves they know R, Python, etc. These tools are too low-level to be usable at scale. There are just not enough people in the world to learn R. Data is growing exponentially, R experts at best can grow linear. What we need are machine learning GUI solutions like RapidMiner Studio but supported by Petabyte cloud solutions. A short term solution could be an HTML5 GUI version of RapidMiner Studio that connects to a back-end set of cloud services that use some of the nice Apache Spark extensions for machine learning, streaming, Big Data warehousing/SQL, graph retrieval, etc. or solutions based on Druid.io. For sure there are other solutions possible.

What is important is that companies start realising that data is becoming a strategic weapon. Those companies that are able to collect more of it and convert it into valuable knowledge and wisdom will be tomorrow’s giants.  Most average machine learning algorithms become substantially better just by throwing more and more data at them. This means that having a Big Data architecture is not as critical as having the best trained models in the industry and continue to train them. There will be a data divide between the have’s and have-not’s. Google, Facebook, Microsoft and others have been buying any startup that smells like Deep Belief Networks. They have done this with a good reason. They know that tomorrow’s algorithms and models will be more valuable than diamonds and gold. If you want to be one of the have’s then you need to invest in cloud storage now. You need to have massive historical data volumes to train tomorrow’s algorithms and start building the foundations today…

 

Solving the pressing need for Linux talent…

September 17, 2013 Leave a comment

The Linux Foundation shared the below infographics recently.  Click on it and you get the associated report. The short message is, if you are an expert in Linux you are in high demand because companies don’t find enough experts due to the Cloud and Big Data boom.

Unless cloning machines are discovered later this year, quickly expanding the number of Linux experts is unlikely to happen. This means total cost of ownership for enterprises is likely to rise. This is ironic since Linux is all about open source and providing some of the most amazing solutions for free.

The obvious alternative is to focus on Microsoft products. They are relatively cheap in total cost of ownership since licenses are “payable” and average Windows skills can be easier found.

However Microsoft is loosing the server war, especially in the web application space. So this is not a winning strategy if you are going to do Cloud.

How to solve the pressing need for Linux talent?

The only possible strategy is to lower the number of experts needed per company. Larger companies always will need some but they should be focused on the “interesting high-value tasks”. This concept of interesting and high-value is key. With the number of cloud servers exploding, we can not expect the number of experts to explode.

Open source products like Puppet and Chef have helped to alleviate the pain for the more “skilled” companies. One DevOp was able to manage more than ten times as many machines as before. Unfortunately these server provisioning tools are not for the faint of heart. They  require experts that know both administration and coding.

It is time for the next generation of tools. Ubuntu, the number #1 Cloud operating system, is leading the way with Juju. If Linux wants to continue to be successful then the common problems, the boring problems, the repetitive problems, etc. should be solved. Solved by Linux gurus in such a way that we, the less IT gifted, can get instant solutions for these common problems.

We need a Linux democracy in which the lesser skilled, but unfortunately the majority, can instantly reuse best-in-class blueprint solutions. Juju is a new class of tools that gives you instant solutions. For all those common problems: scaling a web application, monitoring your infrastructure, sharding MongoDB, replicating a database, installing a Hadoop cluster, setting up continuous integration, etc. Juju can offer solutions. The individual software components have been “charmed”. A Charm allows the software to be instantly deployed, integrated and scaled. However the real revolution is just starting. Juju will have bundles pretty soon. Technically speaking, a bundle is a collection of pre-configured and integrated Charms. In lays speak, a bundle is an instant solution for a common problem. You instantly deploy a bundle [one command or drag-and-drop] and you get a blue-print solution. Since Juju is open source, the community can create as many instant solutions as there are common problems.

So if you want to scale your IT solutions without stretching neither your budget or cloning your employees and without the lock-in of any proprietary and expensive commercial software, then you should try Juju today. Play with the GUI or install Juju today.

MapD – Massively Parallel GPU-based database

An MIT student recently created a new type of massively distributed database, one that runs on graphical processors instead of CPUs. Mapd, as it has been called, makes use of the immense computational power available in off-the-shelf graphics cards that can be found in any laptop or PC. Mapd is especially suitable for real-time quering, data analysis, machine learning and data visualization. Mapd is probably only one of many databases that will try new hardware configurations  to cater for specific application use cases.

Alternative approaches could focus on large sets of cheap mobile processors, Parallella processors, Raspberry PIs, etc. all stitched together. The idea would be to create massive processing clouds based on cheap specialized hardware that could beat traditional CPU Clouds both in price and performance at least for some specific use cases…

5 Strategies for Making Money with the Cloud

January 22, 2013 1 comment

Everybody is hearing Cloud Computing on the television now. Operators will store your contacts in the Cloud. Hosting companies will host your website in the Cloud. Others will store your photos in the Cloud.

However how do you make money with the Cloud?

The first thing is to forget about infrastructure and virtualization. If you are thinking that in 2013, the world needs more IaaS providers then you haven’t seen what is currently on offer (Amazon, Microsoft, Google, Rackspace, Joyent, Verizon/Terramark, IBM, HP, etc.).

So what are alternative strategies:

1) Rocket Internet SaaS Cloning

Your best hope is SaaS and PaaS. The best markets are non-English speaking markets. We have seen an explosion of SaaS in the USA but most have not made it to the rest of the world yet. Only some bigger SaaS solutions (Webex, GoToMeeting, Office 365, etc.)  and PaaS platforms (Salesforce, Workday, etc.) are available outside of the US and the UK. However most SaaS and PaaS solutions are currently still English-only. So the quickest solution to make some money is to just copy, translate and paste some successful English-only SaaS product. If you do not know how to copy dotcoms, take a look at how the Rocket Internet team is doing it. Of course you should always be open for those annoying problems everybody has that could use a new innovative solution and as such create your own SaaS.

2) SaaSification

During the gold rush, be the restaurant, hotel or tool shop. While everybody is looking for the SaaS gold, offer solutions that will save gold diggers time and money. SaaSification allows others to focus on building their SaaS business, not on reinventing for the millionth time a web page, web store, email server, search, CRM, monthly subscription billing, reporting, BI, etc. Instead of a “Use Shopify to create your online store”, it should be “Use <YOUR PRODUCT> to create a SaaS Business”.

3) Mobile & Cloud

Everybody is having, or at least thinking about buying, a Smartphone. However there are very few really good mobile services that fully exploit the Cloud. Yet I can get a shopping list app but most are just glorified to-do lists. None is recommending me where to go and buy based on current promotions and comparison with other buyers. None is helping me find products inside a large supermarket. None is learning from my shopping habits and suggesting items on the list. None is allowing me to take a number at the seafood queue. These are just examples for one mobile + cloud app. Think about any other field and you are sure to find great ideas.

4) Specialized IaaS

I mentioned it before, IaaS is already overcrowded but there is one exception: specialized IaaS. You can focus on specialized hardware, e.g. virtualized GPU, DSP, mobile ARM processors. On network virtualization like SDN and Openflow. Mobile and tablet virtualization. Embedded device virtualization. Machine Learning IaaS. Car Software virtualization.

5) Disruptive Innovations + Cloud

Selling disruptive innovations and offering them as Cloud services. Examples could be 3D printing services, wireless sensor networks / M2M, Big Data, Wearable Tech, Open Source Hardware, etc. The Cloud will lower your costs and give you a global elastically scalable solution.

Big Data 2013 Predictions

January 1, 2013 5 comments

If you just invested a lot of money in a Big Data solution from any of the traditional BI vendors (Teradata, IBM, Oracle, SAS, EMC, HP, etc.) then you are likely to see a sub-optimal ROI in 2013.

Several innovations will come in 2013 that will change the value of Big Data exponentially. Other technology innovations are just waiting for smart start-ups to put them into good use.

Real-Time Hadoop

The first major innovation will be Google’s Dremel-like solutions coming of age like Impala, Drill, etc. They will allow real-time queries on Big Data and be open source. So you will get a superior offering compared to what is currently available for free.

Cloud-Based Big Data Solutions

The absolute market leader is Amazon with EMR. Elastic Map Reduce is not so much about being able to run a Map Reduce operation in the Cloud but about paying for what you use and not more. The traditional BI vendors are still getting their head around a usage-based licensing for the Cloud. Except a lot of smart startups to come up with really innovative Big Data and Cloud solutions.

Big Data Appliances

You can buy some really expensive Big Data Appliances but also here disruptive players are likely to change the market. GPUs are relatively cheap. Stack them into servers and use something like Virtual OpenCL to make your own GPU virtualization cluster solution. These type of home-made GPU clusters are already being used for security Big Data related work.

Also expect more hardware vendors to pack mobile ARM processors into server boxes. Dell, HP, etc. are already doing it. Imagine the potential for Distributed Map Reduce.

Finally Parallella will put a 16-core supercomputer into everybody’s hands for $99. Their 2013 supercomputer challenge is definitely something to keep your eyes on. Their roadmap talks about 64 and 1000 core versions. If Adapteva can keep their promises and flood the market with Parallella’s then expect Parallella Clusters to be 2013 Big Data Appliance.

Distributed Machine Learning

Mahout is a cool project but Map Reduce might not be the best possible architecture to run iterative distributed backpropagation or any other machine learning algorithms. Jubatus looks promising. Also algorithm innovations like HogWild could really change the dynamics for efficient distributed machine learning. This space is definitely ready for more ground-breaking innovations in 2013.

Easier Big Data Tools

This is still a big white spot in the Open Source field. Having Open Source and easy to use drag-and-drop tools for Big Data Analytics would really excel the adoption. We already have some good commercial examples (Radoop = RapidMiner + Mahout, Tableau, Datameer, etc.) but we are missing good Open Source tools.

I am currently looking for new challenges so if you are active in the Big Data space and are looking for a knowledgable senior executive be sure to contact me at maarten at telruptive dot com.

Follow

Get every new post delivered to your Inbox.

Join 301 other followers

%d bloggers like this: