Archive

Archive for the ‘High Scalability’ Category

Cloudify, an Open Source PaaS from GigaSpaces, is making Big Data Clouds easier to manage

Cloudify, from the scalability experts GigaSpaces, is still its early stages. Unlike Google App Engine, Azure, Heroku, etc. this PaaS is more focused on the application life cycle and not on being a “transparent” application server and database.  The main focus is automating application and services deployment, monitoring, autoscaling, etc. The closest competitor would be Scalr.

Unlike Scalr, Cloudify’s focus is on Cloud-neutrality. Cloudify is not focusing on using specific Amazon services for scalability but instead to make a neutral Cloud platform. The advantage is that every possible Cloud being it private or public can be used and scenarios like hybrid clouds with Cloud bursting from private to public cloud are possible. The deep understanding of large-scale architectures in a company like GigaSpaces is a guarantee that Cloudify will scale in the future.

Cloudify is still missing some important functionality like security, multi-tenancy, integrations with lower-level automation frameworks (e.g. Chef and Puppet), complex upgrade management [e.g. rolling upgrades, MySQL schema upgrades, A/B testing of new features, etc.], etc. However the roadmap is pointing towards most of these items.

Software architects should understand the possibilities Cloudify, Scalr, etc. bring. By having a reusable automation framework companies are able to spend more development and operations time on bringing new business features and less on reinventing the wheel.

 

Rainbird could be Hadoop for Real-Time Analytics if only Twitter would open source it…

May 24, 2012 1 comment

Twitter is having a Real-Time Analytics solution that could easily become as important as Hadoop. They talked about open sourcing it but so far have not done so.

This post is an open invitation to Twitter open source Rainbird and accelerate Real-Time Analytics adoption in the world. Hadoop has changed thousands if not millions of companies. Rainbird could do a similar thing.

In order to gather people around this subject, I am proposing that you include #TWOSRB in your tweets. #TWOSRB stands for Twitter please Open Source RainBird:

//

Open Source Big Data Reporting & ETL show promises

With Hadoop/Hbase/Hive, Cassandra, etc. you can store and manipulate peta-bytes of data. But what if you want to get nice looking reports or compare data held in a NoSQL solution with data held elsewhere? There have been two market leaders in the Open Source business intelligence space that are putting all their firepower onto Big Data now.

Pentaho Big Data seems to be a bit further ahead. They offer a graphical ETL tool, a report designer and a business intelligence server. These are existing tools but support for Hadoop HDFS, Map-Reduce, Hbase, Hive, Pig, Cassandra, etc. have been added.

Jaspersoft’s Open Source Big Data strategy is a little bit behind because connectors are not included yet into the main product and several are still in beta quality and with missing documentation.

Both companies will accelerate the adoption of big data since the main problem with Big Data is easy reporting. Unstructured data is harder to format into a very structured report than structured data. Any solutions that will make this possible and additionally are Open Source are very welcome in times of cost cutting…

What comes after SaaSification?

Fujitsu just presented SaaSification on Cebit. Existing applications can be easily brought to the Cloud and sold via App Stores and SaaS marketplaces. IBM is also working on SaaSification and even adds multi-tenancy.

What is next?

Everybody wants to have a full App Store or SaaS Marketplace, so SaaSification is the next step after launching your store. However converting a client/server application to the Cloud is only step 1. Step 2 is creating new services that are specifically built for the Cloud.

What does Built-for-the-Cloud means?

Application design is changing. Traditional Web applications are built on a LAMP architecture. New Cloud-Ready applications should be Big Data ready and should be looking at SMAQ architectures.

Cloud-Ready applications should also accept the new reality of APIs. Both for exposure as well as consumption. This means that applications need to be redesigned according to application slices.

So if SaaSification wants to be successful then it needs to add quick enablers for multi-tenancy, big data, integration with external APIs as well as API exposure, etc. This integration concept can be called iPaaS or integration platform-as-a-Service. iPaaS should not only focus on exposing or integrating APIs but on providing complex services by integration multiple SaaS solutions together.

Other enablers should be added as well. Basically 80% of a SaaS solution consists out of the same elements or tries to solve the same problems. These could all be provided via a SaaSification PaaS:

  • Blog – to describe the newest ideas.
  • Forum – for people to get answers from the community.
  • IT PaaS – where you run the actual business logic and UI. Data storage is assumed to be provided by the Big Data elements.
  • Portal and Mobile Portal – allows to quickly define the “static” content for the web and mobile site.
  • Deployment management – ideally continuous deployment or integration tools that allow fast feature by feature deployment.
  • A/B testing – allow new features to be deployed to subsets of users and check which version of a feature has the highest impact on the bottom-line. A/B testing was made popular by Amazon.
  • Automated testing – lots of testing can be automated but especially end-to-end and performance testing are the harder tests that should be focused on.
  • Configuration management – manage the version control of the code.
  • Metering and billing – be able to meter the resource usage by users, companies or any other element you want to meter and be able to bill users both for subscriptions as well as for usage, ideally with advanced set-up with overage, etc.
  • Marketplace listing and provisioning – automate the listing of products on the marketplace as well as the provisioning of new services.
  • Single sign-on & identity management - allow companies to use their own user credentials (e.g. SAML), authorization for third-parties (e.g. oAuth), etc.
  • Reporting and data warehousing – this can be part of the big data stack but especially being able to create ad-hoc reports for instance for A/B testing . Of course regular business reporting needs to be included as well.
  • ERP – accounting, resource management, etc.
  • CRM – sales and lead management
  • Operations & Maintenance – automation of back-ups, monitoring both for the performance and fault management but as well business monitoring.
  • Support – helpdesk, ticketing system, SLA management, etc.
  • Social integration – tools to add social aspects like Facebook apps, Twitter feeds, etc.
  • etc.

The idea is not that a SaaSification PaaS offers all these solutions by custom development. Instead the SaaSification PaaS should allow startups to assemble an ideal architecture by combining different solutions from different providers. For example you would be able to select the support solution you prefer, e.g. desk.com, zendesk.com, etc. and this solution would be completely integrated into the overall stack, e.g. CRM integration with help desk and fault management together with sign sign-on.

SaaSification 2.0 should focus on making sure that 2-5 people can start a new dotcom solution and focus on creating a killer service and not on building up yet another stack of solutions for configuration management, support, billing, etc. If a SaaSification PaaS can shorten the time to launch with months and reduce the needs to operate the solution with several people then startups will see the value. Instead of SaaSification PaaS a good term could be Incubation PaaS, to incubate SaaS solutions. Once the business model and solution is proven, there will be money to move to a custom-build stack but during incubation and crossing-the-chasm enterpreneurs should be able to focus on delivering value to their customers and not on re-inventing the startup wheel.

NextGen Hadoop, beyond MapReduce

Hadoop has run into architectural limitations and the community has started working on the Next Generation Hadoop [NGN Hadoop]. NGN Hadoop has some new management features of which multi-tenant application management is the major one. However the key change is that MapReduce no longer is entangled inside the rest of Hadoop. This will allow Hadoop to be used for MPI, Machine Learning, Master-Worker, Iterative Processing, Graph Processing, etc. New tools to better manage Hadoop are also being incubated, e.g. Ambari and HCatalog.

Why is this important for telecom?
Having one platform that allows massive data storage, peta-byte data analytics, complex parallel computations, large-scale machine learning, big data map reduce processing, etc. all in one multi-tenant set-up means that telecom operators could see massive reductions in their architecture costs together with faster go-to-market, better data intelligence, etc.

Telecom applications, that are redesigned around this new paradigm, can all use one shared back-office architecture. Having data centralized into one large Hadoop cluster instead of tens or hundreds of application-specific databases, will enable unseen data analytics possibilities and bring much-needed efficiencies.

Is this shared-architecture paradigm new? Not at all. Google has been using it since 2004 at least when they published Map Reduce and BigTable.

What is needed is that several large operators define this approach as their standard architecture hence telecom solution providers will start incorporating it into their solutions. Commercial support can be easily acquired from companies like Hortonworks, Cloudera, etc.

Having one shared data architecture and multi-tenant application virtualization in the form of a Telco PaaS would allow third-parties to launch new services quickly and cheaply, think days in stead of years…

The competitive advantage of a shared architecture

January 24, 2012 Leave a comment

Why can Facebook, Google, Salesforce and Twitter role out new features every day and regular telecom operators only every 6 months? Although they are dotcoms, they have thousands of employees and a lot of legacy systems as well. However they are able to roll out a new feature every day, if not every hour or minute and large new systems every so many months, weeks or even days.

How do they do it and how can the telecom industry learn from it?

On highscalability, you will find a lot of information on the architectures of large dotcoms. However if you look at different articles you see that each of the larger dotcoms has an architecture that is shared among different products and services, e.g. scaling messages at facebook.

This is the secret sause of the dotcoms. They have built and continuously improved a highly distributed architecture that can handle millions of users and peta bytes of information. On top of this “shared architecture” go the services. New employees are able to quickly create new services because they do not have to worry about scaling data, monitoring the service, deploying/upgrading versions, backing up data, versioning code, etc.

On the other hands operators have no standardized shared architecture. Instead there is a puzzle of different solutions that often use totally different technologies, hardware, etc. Maintenance and upgrades are a nightmare.

Trying to launch any new service requires a massive amount of planning, lots of different skills, expensive investments in third-party licenses and hardware, etc.

How can you do it differently?

Building a private cloud with virtual servers and storage will not resolve operator’s problems. Just virtualizing the puzzle of solutions is not going to do away with complex integrations.

Operators need to make a more bolt move. They need to separate the new from the old. Legacy systems should be kept and isolated. However a new architecture should be built that works in parallel with the legacy systems. This new architecture should focus on launching new services and partner services at dotcom speeds. Everything should be handled as an independent service. Each service should get its own API. A storage services, a billing service, a monitoring service, a provisioning service, an identity service, a datawarehouse service, a deployment service, a mobile shop service, an inventory service, a support service, etc.

All APIs should use a common technology. APIs for third-parties could use REST. APIs for internal high-load usage could use Thrift or Protobuffers. Each API should have two versions, the easy and the low-level version. The easy API offers the most used but in general basic functionality, e.g. sendSMS(from, to, message). The low-level API offers a complete feature set, e.g. sendBinarySMS, sendSMSWithDeliveryConfirmation, etc. This will allow most services to use the easy API but to have access to the advanced functionality when needed.

Loadbalancing when using the services is key. The loadbalancer is the secret for many rolling upgrades in the dotcom world. An application that uses a certain service will use client-based loadbalancing. By having the loadbalancing be able to receive events, it is possible to dynamically add/remove instances of an API, gradually move requests to a new version of the API, etc.

New service developers will now have to focus on building the business logic for the new service and not on data migrations, scaling, monitoring, backups, etc. The service can have completely new ways of billing and charging, a complex deployment workflow, advanced monitoring requirements, large data storage requirements, etc. However it is not the billing or charging system that has to be extended. Neither a centralized EAI. Nor the monitoring system. Instead it is the service that decides what is best for the service via the use of the easy or low-level APIs. By moving the peculiarities of every service into the service and not into generic OSS and BSS systems, these support systems can be drastically simplified.

Operators should try to focus on launching a lot more niche services and opening up their infrastructure to a long-tail of service suppliers. Instead of general services like PBX for SME, operators should think about hotel reservation services, doctor scheduling services, etc. The value of the operator should be in offering a reliable back-office architecture, assuring service quality and managing the support eco-system. The long-tail of service suppliers should be put to work to launch competing niche offerings and let customers decide which one will survive or not.

Creating an amazing fast IaaS and PaaS platform, the Cloud OS

January 23, 2012 Leave a comment

Universities are starting to explore the future of the cloud. This future starts by getting rid of the many layers that separate software from physical hardware or bare metal. Currently you need a hypervisor (e.g. Xen, KVM, VMWare), an operating system (e.g. Linux, Windows, Mac OS), a language virtual machine (e.g. JVM), an application server (e.g. Tomcat, JBoss, etc.) and then the application.

In this article, researchers and academics are arguing that there is too much abstractions going on that could be removed in benefit of unseen performance. Projects like Open Mirage, Exokernel and Apache Mesos are examples.

If telecom operators want to offer IaaS and PaaS then they should focus on having a competitive edge that is not currently offered by established providers like Amazon and Rackspace. This competitive edge could be to build a new Cloud OS that has storage and processing nodes that run as close as possible to the bare metal. Building data storage solutions like Hadoop or Cassandra close to bare metal hardware and using the latest solid state drives would offer unseen performance. The cost per user would be substantially lower then less optimized set-ups. Ideally PaaS platforms can be delivered that allow “cloud application servers” to run on base metal. The model would be Heroku on bare metal instead of on Xen+Linux+JVM+App Server+Java App.

 

Scaling Machine Learning

November 14, 2011 1 comment

I stumbled upon GraphLab. GraphLab allows for scaling machine learning. It is sort of the Hadoop for Machine Learning. It recently changed to the very business friendly open source Apache license. GraphLab is written in C++ but has Java and Python APIs. Things like PageRank, Collaborative Filtering, Clustering, etc. are what GraphLab can be used for.

So what is so important about GraphLab for telcos?

Having a scalable and business friendly open source solution for machine learning will allow operators to use algorithms like PageRank for calls (Who is the most important person among a set of subscribers), Collaborative Filtering (make recommendations on which services or apps a subscriber should buy based on what others have bought that are similar), Clustering (grouping subscribers that have common features automatically and target them with promotions), etc.

Mobicents is moving telecom development towards the cloud

September 4, 2011 Leave a comment

Although the video seems to be ahead of the software, the vision of mobicents for the cloud is disruptive. The economics of setting up a global communication infrastructure in the cloud and integrate it with Web 2.0, Smartphones, Internet of Things, etc. will drastically change when first-class open source solutions will be available:

New NoSQL and similar products to keep on the radar.

August 12, 2011 1 comment

Google has open sourced a low-level nosql storing engine that is authored by the creators of mapreduce and bigtable. Definitely worthwhile to keep an eye on: leveldb. Especially for the products that will be incorporating it.

In a previous post I mentioned that open source graph databases where not ready yet. This one looks promising. Especially because the authors are the number three in the social networking space. At least if they provide access to the code and use a business friendly open source license like Apache’s: stigdb.

Twitter is open sourcing storm on September 19th. It has been referred to as the hadoop of realtime processing. All stream related data is likely to see big advantages by using storm. Update: Storm has been released on github. Check out the wiki pages.

Follow

Get every new post delivered to your Inbox.

Join 189 other followers