Changing India’s future byte by byte: The rising role of cloud based Big Data in big ticket Government projects
Posted on Monday, October 15, 2012 - Rahul Chitale, Director – Cloud Services, Microsoft Indiadst
The Government of India, with its large scale national programmes such as NREGA (National Rural Employment Guarantee Act), CPSMS (Central Plan Scheme Monitoring System) or NRLM (National Rural Livelihoods Mission), is the largest generator and consumer of Big Data anywhere in the world. The solution to making sense of all these petabytes of information intelligently, is by using cloud computing to handle the massive amounts of unstructured Big Data. Microsoft’s Rahul Chitale, Director – Cloud Services, Microsoft India, takes a look at how cloud computing can help to process the massive scale of Big Data generated by the 27 Mission Mode Projects (MMPs) set up by the Government of India, which are expected to transform high priority citizen services from their current manual delivery into e-delivery.
The entry of cloud-based data storage, world-wide, has brought in its trail millions of streams of data bytes. As government processes increasingly turn to the cloud for data discovery, processing and analysis, the sheer scale of data being generated by government run programmes will rise sharply.
There is now a tipping point in Big Data generation that has quickly outpaced Moore’s Law. But it is very clear that the ICT industry is at an inflection point where cloud based trends in technology in Database Systems, Storage costs, Sensor networks, and Tablets are all converging with large spends in ICT within India. This is giving rise to a ‘super-Moore’s’ law scenario, where the data storage and processing requirements are, in fact, increasing at a much larger rate than the growth within ICT itself.
With the Government of India poised to become the single largest generator and consumer of Big Data in the country, the ICT industry has the biggest opportunity in history to transform the computing scenario and support the Government in its’ effort to plan, run and monitor nation programmes for its citizens effectively and economically.
Let’s take a look at two scenarios which can emerge for the Government with Big Data. First of all, systems like CPSMS (Central Plan Scheme Monitoring System) that usually collect summary data will soon face a data explosion, once information is also looped in from the village, block and district level. Secondly, the other more dramatic scenario is where the nature of data collected itself changes because of the additional possibilities bought up by ICT. As an example, States conducting the Socio-Economic Caste Census could add precise BPL (Below Poverty Line) data augmented with Video Data regarding each family to the population survey data. The data storage requirement of each separate state would be of the order of a Petabytes (1 Petabyte is ~1000 Terabytes). To give a perspective on the enormity of this size, the data collected by the entire US Library of Congress as recently as 2011 is a quarter of that!
What are the technologies that are in use for handling Big Data? First of all, new technologies, such as Apache Hadoop, have emerged to offer customers the opportunity to store and analyze petabytes of unstructured data inexpensively. In addition, organizations can connect data from hundreds of trusted data providers – including demographic data, environment data, financial data, retail and sports data, and social data combining it with their personal data through self-service tools like Excel Power Pivot. Using Apache Hadoop, the free big data tool, users can filter hundreds of millions of rows in tools like Excel which is several orders of magnitude compared to a few years back. Hadoop allows you to process very large amounts of data in short bursts across hundreds or even thousands of servers which will allow any Public Sector department to process previously unmanageable datasets. For example, this could be used for estimating the rate of deforestation in India by looking at detailed satellite imagery.
Secondly, a core issue for a country like India is enabling a cheap, plentiful mechanism for storing large amounts of data. Taking an example of say, a NREGA data collection in any average state, the data collection would usually amount to around 6 Terabytes. The cost for storing & accessing this data would usually rapidly escalate to crores of rupees for just this small database collection and reporting component, if looked at from an Enterprise Data Processing perspective. To tackle this, the cloud community has come up with the idea of NoSQL, also referred to as Table or BLOB storage which offers a massively scalable storage which uses attached storage on inexpensive commodity servers instead of expensive hardware. Combining the collective attached storage capacity then gives you massive data storage containers at a fraction of the cost. It is of no surprise that massively scaled out Internet systems, like SkyDrive, already use such approaches.
While concrete demonstrable benefits of cloud computing to handle Big Data will emerge in the Indian Context, there is no doubt that Big Data is poised to play a significant economic transformation role in how public sector organizations can drive their priorities. How does Big Data contribute to this operation efficiency? Departments who try to solve problems using traditional Enterprise Application patterns will not succeed in meeting their business objectives due to the scale of the data, given that storage and processing of a Petabyte can cross crores of rupees, taking into account the total cost of Hosting, the Database Management System, Storage and Processing.
Big Data management also results in making information accessible and transparent and allows innovation in service delivery. Most important, real time tracking of dynamic information will allow projects to truly target its beneficiaries in a better, more transparent manner. This feature, amongst others completes the whole offering of Big Data, enabling socio-economic impact in India.
Cloud based computing for Big Data is the way forward for government programmes. Some of these employment and health programmes are among the largest such programmes in the world and can set the global standard in effective use of Big Data management to provide real world results.