Architecting Scalable Systems for Cloud

In my last post I asked whether “PHP is easy, is it a boon or bane?” because it is easy, most of the developers just develop systems but never pay attention to the architecture of the system on which they are developing. They never ask questions like “is the system stable?” or “will it be able to handle a lot of concurrent users?” or “is my code optimized enough for as the system required?” or “is the system that is holding my application and data designed well to ensure a better user experience?” or “is the system righteously architected / designed?”. Honestly I never asked myself a lot of these questions until I started working in Cloud Computing Environments, all that bothered me is the efficiency of my code, efficiency of my SQL queries, normalization status of my database, whether I need to de-normalize, finding solutions writing less numbers of lines of codes or single complex SQL JOIN queries, browser compatibility, etc, etc.

Encountering Cloud came as a shocker! I thought I knew fairly well about developing applications using LAMP stack, well I did know but what didn’t know was using LAMP stack on Cloud and what the architecture my cloud environment should have in order to sustain large volume of concurrent traffic. Also how should I write my SQL queries, how should I break a complex SQL query into multiple simple SQL queries and how should I complement complex database routines by PHP code.

As I started becoming familiar with Cloud Computing Environment a lot of different terminologies popped up which were kind of vague to me till I understood their application in Cloud. The terms were “cloud computing”, “scalability”, “vertical scalability”, “horizontal scalability”, “redundancy”, “load balancing”, “caching”, “content distribution”, “CDN”, “high availability”, “compute unit”, “task scheduling”. The problem with these terms is we think we understand them but when it comes to practical implementation our understanding falls short!

As I go on explaining these terms we’ll understand more about architecting a scalable system for cloud.

Cloud Computing

The problem with “Cloud Computing” is everyone has a different definition. Till it was only “The Cloud” people could figure it as a metaphor for internet, but in combination with “Computing” it gets bigger and fuzzier. Some analysts define cloud computing narrowly as an updated version of utility computing: basically virtual servers available over the Internet. Others go very broad, arguing anything you consume outside the firewall is “in the cloud,” including conventional outsourcing. However, Wikipedia’s definition is a good starting point to understand Cloud Computing… “Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams”.

I would like to define “Cloud Computing” as an IT Infrastructure that you need on the fly. It is subscription based and pay-per-use. If your expenditure plan is like “I’ll spend as I grow” Cloud Computing is the solution for you. It is a way to increase capacity or add capabilities to on the fly without investing into new infrastructure, training new personnel or licensing new software. It is currently in nascent stage and there are limited number of vendors who offer Cloud Computing services, Amazon was the first major cloud provider. Other cloud providers include Apple, Cisco, Citrix, IBM, Joyent, Google, Microsoft, Rackspace, Salesforce.com and Verizon/Terremark.

Scalability

Scalability is the ability of a system network, or process, to handle a growing amount of work in a capable manner or its ability to be enlarged to accommodate that growth. In a Cloud Computing growth is referred as growth in concurrent traffic volume and growing more complex from application point of view. Broadly scalability can be achieved by either of the two means 1) horizontal scalability (scale out) and 2) vertical scalability (scale up).

To scale horizontally (scale out) means adding more nodes to the system i.e., if you have one machine you add another to double up capacity. If you had 3 machines and you add another your capacity would increase by 33%.

To scale vertically (scale up) means adding more resources to a single node i.e., if you add more RAM to your machine or add new HDD you would be scaling your machine vertically by making it more powerful. Vertical scalability is undesirable for large systems.

Your system’s scalability requirement is directly dependent on your Cloud Provider’s machine Compute Units, One EC2 Compute Unit provides the equivalent CPU of 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor. The more number of Compute Units your instance have is more powerful as far as processing is concerned.

Load Balancing: Scalability and Redundancy

The requirement is to create a scalable, fail safe system which ideally doesn’t fall if one server goes down or there is a sudden surge of traffic volume or the computation becomes more complex. The question is what do we need to achieve in order to create such a system? The answer is we achieve scalability and redundancy via load balancing.

Load balancing is the process of distributing requests across multiple computers following some algorithm like random, round robin, random with weighting for machine capacity, etc and their current available status like available for requests, not responding, high error rate, etc. Load originally meant as workload in computer networking but in Cloud Computing load is meant for elevated traffic load, high concurrency, etc.

CDN (Content Delivery Network or Content Distribution Network)

As per Wikipedia definition “A content delivery network or content distribution network (CDN) is a large distributed system of servers deployed in multiple data centers in the Internet. The goal of a CDN is to serve content to end-users with high availability and high performance. CDNs serve a large fraction of the Internet content today, including web objects (text, graphics, URLs and scripts), downloadable objects (media files, software, documents), applications (e-commerce, portals), live streaming media, on-demand streaming media, and social networks” .

Amazon launched it’s Cloud Services back 2006 with Simple Storage Service (Amazon S3), it was for storing static content on Cloud, in 2008 they released first beta of CloudFront which made the term CDN (Content Delivery Network) kind of available or rather understandable to semi techie or non-techie users, but Content Delivery Network algorithm was already developed and deployed by Akamai Technologies since 1998.

The Architecture – CDN, Web Server, Elastic Load Balancer, Database Load Balancer, Database Server

When user / client sends a request do the server, first thing server should do is identify the request type, i.e., is it for a dynamic page or for static content. Requests for dynamic pages are routed to the web server and static content request are routed to CDN.

This was the first level of architecting your system on Cloud, next step is to define the architecture for serving dynamic page requests which involves load balancing and achieving redundancy and scalability via load balancing.

Load needs to be balanced between user requests and your web servers, but must also be balanced at every stage to achieve full scalability and redundancy for your system. A moderately large system may balance load at three layers:

Layer1: from the user to your web servers
Layer2: from your web servers to an internal platform layer
Layer3: from your internal platform layer to your database

In Layer1 users connects to the Elastic Load Balancer first which distributes the load among the connected Web Servers. This can be achieved in any kind of Cloud using an Elastic Load Balancer and connected VMs (Virtual Machines on Cloud). While you launch and configure new VM and add them to your Elastic Load Balancer please follow the following guidelines in order achieve a reliable, redundant & scalable system.

No matter how many web servers you are connecting to your Elastic Load Balancer please choose same capacity and configuration for each of them
Create a master-master RSync setup among all the connected servers so that they copy each other’s content almost on the fly (there will always be a one minute delay), write your RSync logic in a way so that it is not IP or machine bound
For auto scaling create and save machine image of any of the servers, so that you can launch a new VM instantly with the same configuration
Write cleaning script for regular cleaning of logs and set them as cron jobs (cron = task scheduler for Linux, cron jobs = scheduled tasks)

For Layer2 I would a Database Load Balancer, there are many DB Load Balancer products available but I found only 2 of them good to recommend ScaleArc iDB and ScaleBase, first I have used in many of my applications and second I have evaluated. When I evaluate a Database Load Balancer Software I look for the following features:

It works as a proxy server to my Databases
I can cache query patterns for better response time
It load balances between the databases
It has logical Sharding features
It providers clustering

ScaleArc iDB offers all 5 of them but ScaleBase does not offer Caching. To start with you can have 1 DB Load Balancer Server between your Application Servers and Database Servers. If your applications are required to run on 0 downtime then you can have 1 DB Load Balancer Server active and 1 for High Availability (HA) which will not be used actively but will be kept as a backup to support 0 downtime, this more like investing in a health insurance where you have to invest regularly but you’ll claim hardly. So you’ll have to choose between either of the 2 architectural options one with HA or the one without HA.

In Layer3 I have my Databases connected to each in a master-master architecture where each of my databases replicates the other with 0 seconds lag. Since most of my tables use InnoDB engine so just setting up a master-master replication won’t yield best results, each MySQL Database Servers needs to tuned / optimized for InnoDB parameters so that replication file (MySQL ibdata1 file) in each server doesn’t grow uncontrollably.

I wanted to discuss RSync and InnoDB Configuration as well but then figured they should be covered in separate articles. Scope of this article was to provide a solution for achieving scalability and redundancy in Cloud via Load Balancing.

Comments

Leave a Reply Cancel reply