Archive

Posts Tagged ‘social graph’

Social Graph for Big Data just got a new Open-Source member: Giraph

February 21, 2012 1 comment

The Big Data elephant just got a well-connected Giraff friend. Putting it differently, Yahoo and LinkedIn have open sourced scalable social graph software. If Hadoop was the Open Source version of the Google File System and HBase the Bigtable version, now it is time for an Open Source version of Google’s Pregel: Giraph.

In Open Source Social Graph Software Not Ready Yet I complained about the social graph not being ready. Giraph should change this.

So why is this important for operators?

Any service that wants to “be social” needs a social graph solution. A social graph links the Twitter followers, the LinkedIn colleagues, the Facebook friends, etc. For operators a mobile social graph can link callers. Who calls who, who influences who, who is going to churn with whom, who might also appreciate this marketing campaign, who should definitely know about this new service, etc.?

The “Hello World” example of Giraph is Google’s Pagerank. Pagerank is the power of Google search and now it is available to everybody that has millions of users. Be sure to  keep an eye on this Giraph because the “Apache Zoo” just acquired a new important animal in its Big Data Analytics department…

Open Source Social Graph Software Not Ready Yet

December 2, 2010 2 comments

UPDATE: There is a new social graph player that implements Pregel on Hadoop: Giraph

Lately there is a lot of talk going on about graph databases and its main applications for things like social graphs. Google’s Pregel and the bulk synchronous parallel model are also important hints. Building on the mobile social graph idea, I am evaluating different graph databases. For revenue sharing engagements, cost is critical. As such real “open source” solutions are preferable over expensive licenses.

What open source graph databases are available?

On paper the most promising one was Neo4J. After making some tests with it, I discovered however a quite important limitation: There is no remote thread-safe API. This means that when making a multi-threaded solution you run into problems when updating relationships between nodes. Under stress you are likely to want to update a relationship while another thread has a lock and as such you run into problems.

Sones has a very restrictive open source version, so not really useful.

OrientDB looks very promising for some applications but is not really build to execute complex graph algorithms like large scale pagerank.

Infogrid is extremely complex with a lot of individual components that are all in different stages of development. However there are some promising aspects.

Hama is one of the most promising technology-wise but until you can actually store data in Hadoop and quickly manipulate large sets of matrices is unusable for the moment. However having a group like Apache and more importantly having an Apache license should make it the best option. Especially for businesses that want to evaluate Graph databases and don’t want to spend fortunes on licenses or open source their complete solution when it is only a minor part in a larger solution.

FlockDB is very ruff around the edges (still). It might fit Twitter’s needs but most other people would like partitioning over multiple servers to be transparent and would like to traverse a graph.

In short there is no real solution yet, instead there are a lot of promises. Although commercial options exist, there are too few big ongoing graph projects in Telecom that would justify expensive licenses. Telecom is not a mature graph market yet. It is just starting or graph databases are used on side projects only. Since graph databases are an infrastructure element, having a open-source business-friendly license is preferable. Money can still be make via consultancy, support, administrative tools and a revenue sharing market place for re-usable algorithms. It is now more important to be market-leader in this developing market, then to have the highest sales volume of a niche market.

Why is a graph database important to telecom?

If I call you and you call me then we have a relationship. If I am the key “connector”, “maven” or “salesman” (See The Tipping Point) among my friends or business contacts then I would be the perfect marketing objective. Unfortunately RDBMs are not good at finding those profiles between millions of subscribers.

This is an open invitation for people to join forces and build tomorrow’s architecture, preferably with an Apache License, extremely scalable (billions not thousands) and with support for complex algorithms.

Facebook’s Seamless Messaging and what should operators do?

November 24, 2010 1 comment

Facebook is rolling out seamless messaging which allows people to focus on what they want to communicate and not on how to communicate.  This is again an example of using the social graph to communicate better.

Under the hood Facebook is using Hbase and Hadoop so there is no reason why Telecom operators could not have launched a unified communication system. True the operators don’t have an advanced social networking platform but they can use the user’s mobile social graph as a substitute. If I call you and you call me then we are friends. In the operator’s systems (CRM, HLR, etc.) there is information about who is who. This information is not perfect so operators would need to add a social address book in which users can update their own information and get other people’s updates, much like Plaxo. Adding SMS, instant messaging and email to voice calls, store it in the Cloud and we would have a seamless messaging solution.

The problem is not how hard it would be to implement but why operators are not focusing on this type of solutions. Focus is on market segmentation to find the right tariff plan and device to sell. However operators that want to be around whenever their call and SMS revenues start to seriously decline, will have to do a large mindset change: “Focus on why people want to communicate and not how!”. Find the why and you are likely to come up with alternative hows that are currently not available. A lot of buzz is being generated around Unified Communications Suites but they are the telecom answer to the how not the why. Facebook is definitely shooting in the right direction. Let’s see if operators can do so as well…

If you are not doing Cloud Computing now, then you are late!

September 14, 2010 Leave a comment

Any operator that has not started a project on Cloud Computing is late. The typical data center at an operator is filled with servers that are under utilized e.g. application servers and database servers are running at 30% of memory, disk and CPU. Just by doing step one of getting to Cloud Computing: virtualization, operators are able to save substantially in the cost of hardware, electricity, maintenance, etc. Virtualization means decoupling software from hardware. This allows to run multiple operating systems on one server.

However this would only be focusing on the tip of the iceberg. Cloud Computing is so much more…

Private Clouds

Automatic Scaling

Let´s first focus on the internal systems of an operator. After solutions have been virtualized, then you are able to scale them to more or less servers. The first step is to automate this process. If you have an application server cluster, do you need 8 nodes all the time? You probably only need them the week before Christmas or during some other peak period. So the ideal is to be able to measure the load and to automate the deployment of more or less cluster nodes based on load. The same can be done with the database. During the night you have 2 nodes. In the morning 3. During the day 4. During peak moments 8. In the evening 3 again. You could save massive amounts of money if application servers and databases can be scaled in this way. You ideally also are able to pay licenses based on what you really use and not on your maximum number of nodes during a yearly peak.

Redesigning Applications and Data

Both Amazon and Google found out that if they redesign their applications then they can get even more gains than pure virtualization. Amazon´s S3 service is a clear example. However internally they started with services like Dynamo on which S3 is build. The first step is to build general data stores. Multiple applications should be using a common data store instead of needing a separate database cluster each.

Unlike popular believe in the IT world, the dotcoms are not filling their data centers with Oracle RAC clusters. The dotcoms are designing special purpose data stores. The data volumes any market-leading dotcom has to deal with are so massive that a SQL database can not keep up. SQL databases are very good at running efficient queries on structural data or making sure transactions are consistent. However they fail when data is unstructured, write operations are massive or data volumes grow with terabytes every data.

Relational Data

So for all low-volume applications that need transactional data and read more than they write, you could still use a unified Oracle RAC cluster to serve multiple applications. An alternative approach are the data stores that have been build by Amazon (Relational Database Service or SimpleDB) or Google´s App Engine (Datastore with JDO).

What other alternatives are there?

Read Mostly Data

Data that needs to be read a lot and is not updated frequently can get an enormous performance and scalability boost by using an in-memory data store. The dotcom standard is memcached. Facebook (800 servers and 28TB) and Twitter are addicted to memcached.

Documents, Images & Videos

Binary and media files are best stored outside of a database. In small numbers they are often stored on a file system. However they occupy a lot of disk as well as network bandwidth when moved around. The ideal is a document store with a content-delivery network or CDN as a front-end. Amazon´s S3 and CloudFront are examples. Storing them in a compressed format, e.g. LZO can save valuable space. Also transcoding into different formats, e.g. thumbnails or preview can help save network bandwidth.

Unstructed Realtime Data

Data that is unstructured and needs to be stored and accessed in real-time in high volumes are best stored in special purpose data stores. You can write a book about the latest NoSQL solutions. Write an email to maarten at telruptive dot com if you are interested.

Analytics Data

Twitter has described most extensively how they use all the unstructured data they get from their logs and other sources. They use technology from Facebook to stream it into a high-available file-system from Yahoo. There they run massive parallel map-reduce operations to get to know a lot more about what users are doing and who is influencing who, etc.

Social Graph

The social graph is about who knows who and what kind of relationship you have. This data is best stored in graph data stores.

Collective Intelligence

Again a chapter by itself but dotcoms are also heavy users of collective intelligence which often means dedicate systems.

Accessing Data

Instead of stove pipes with data, the dotcoms are making data accessible to all their applications. Either via search interfaces, web technology to access data (e.g. REST and JSON) or efficient binary interfaces (Thrift and Protocol Buffers).

Messaging and Notification

Amazon is having a simple queue service and a simple notification service to make sure applications communicate in a uniform matter.

Applications

If applications have access to all the above services then the architecture of an application is simplified enormously. Most of the famous dotcoms don´t use middleware. They prefer the SOA principle. However unlike the IT SOA solutions, a dotcom would take an application and make it into a chain of reusable services. Let´s take an IVR application as an example. There would be a service to do voice recognition. Another one for voice transcription. Another one for text-to-speech. A transcoding service to transcode between different media formats (e.g. high-quality voice and low-phone-quality voice). And so on. Each service has independent load-balancing and can be scaled separately. Services can be re-used between applications. An application is very short because it just need to define which services need to work together and how.

Application Deployment

The dotcoms deploy new features on a daily and even hourly basis. This means that all application deployment is fully automated. When a new feature is deployed it does not necessarily overwrite an existing feature. It is possible that a new functionality has been solved in 5 different approaches. Dotcoms would split the total user base and let small parts of users try out the different approaches. Depending on the user´s feedback they would take the preferred approach and slowly scale up from 1% to 100%. If they detect that the feature has a performance problem or a bug then they would be able to roll-back or decrease the load, fix it and deploy gradually again.

The Network, OSS and BSS

There is a substantial effort needed to redesign a network to be cloud-aware. Some components need latencies lower than 10 milli-seconds (e.g. antennas), hence most of this logic will have to be processed locally. However all systems that can live with 100 milli-seconds latencies benefit from a cloud make-over.

Especially in the area of OSS and BSS there is room for optimizing applications and making them cloud-aware. Global services like a network inventory service, a user profile service, a device profile service, etc. would mean simpler applications and less data duplication.

Opening the Cloud

So the network and IT infrastructure is being redesigned to allow for faster innovation and lower costs. However Cloud Computing can also be used to increment revenues.

Being a Cloud Infrastructure Provider

Many IT consultancies and software/hardware vendors will tell an operator that they could be a Cloud infrastructure provider. On slides this really looks nice. However unless an operator is not using the cloud computing principles for their own systems as described in the first part, they are lacking substantial knowledge about how to manage such an infrastructure. Without this knowledge it would be hard to have a very optimized solution and as such be price competitive with the existing players.

Being a Cloud Platform Provider

Although closer to the operator´s core competencies, being a cloud platform provider would still be for those operators that are Cloud experts. A Cloud platform provider would allow others to use the infrastructure services to create applications on top. The complexity lies in the fact that malicious users try to break the platform which could have a very negative effect on the infrastructure if not handled correctly.

Being a Cloud Service Provider

This is the default option most operators should explore first before moving into the other areas. Being a service provider also has a roadmap:

Reselling SaaS

The easiest step is to be the storefront and to resell IT applications from others, e.g. cloud backup storage, security solutions, etc.

Offering Telco SaaS

The next step would be to offer specific telecom applications. Applications that are build for the operator or even better applications that can be build by others based on the operator´s assets. An example would be a PBX in the Cloud.

Open Market for SaaS

Building all telecom applications yourself is hard. Attracting others to do it for you is easier. However just putting a “Net App Store” and an SDK on the web will not get you to dominate the market. Only an open market with a large eco-system of companies and developers can generate large quantities of “Net Apps”. If you are thinking about building an open market, why don´t we talk first. Send an email to maarten at telruptive dot com.

Mobile Social Graph

September 9, 2010 4 comments

Facebook is hot. Google is trying to create an equal successful social network. There are specialized social networks like Linkedin, Plaxo, etc. focusing on specific social networking aspects.

If you are not social, you are not Web 2.0!

Why are telecom operators not social?

Telefonica launched Keteke and bought Twenti so operators must be social. However launching or buying a social network does not make an operator social and web 2.0 ready!

Social networking is all about the social graph and what you do with it. Operators have had access to one of the best social graphs for years. However they have chosen to ignore it: the mobile social graph.

If I call you and you call me then we know one another. If the both of us call a third person then we have relationship to a person in common. It is true that the operator does not know what the relationship is between two persons that call one another. But that should not stop them from asking!

What can I do with a mobile social graph?

As soon as I provide an operator with the type of relationships I have with the people I call or send a message to, my mobile social graph will become useful both for me and the operator.

There is a whole list of applications that can be build on top of this mobile social graph. Let me give two examples.

A social addressbook

Just by telling which of the calls are business, family and friends, collective intelligence can do the rest. If I have two colleagues that marked a person as a colleague and I call that person then my addressbook can suggest to add this new phone as a colleague. There is a lot of more advanced features that could be added, but the basic idea is that a better addressbook generates more calls and SMS.

Find the influencer

This service is more advanced. The assumption is that if I call on Friday night a restaurant and during the next two weeks five of my friends call the same restaurant then I might have influenced my friends. Knowing who influences others is already used for churn analysis.

In our case the restaurant owner might be willing to pay a premium to contact me about a new promotion. Afterwards he or she can follow up if I or one of my friends made a call in the weeks after the promotion was send to me.

If you want to learn more about the mobile social graph don´t hesitate to contact the author at maarten at telruptive dot com.

Follow

Get every new post delivered to your Inbox.

Join 189 other followers