Monday, June 18, 2012

Why Distributed Caching Matters ...


     My online shopping experience is limited to electronics and books, and the web site that provides good, fast and secure user experience invariably becomes my vendor of choice. For many web applications, consumers are likely to access the same data many, many times over the course of a day, and they all want fast concurrent access to, say item prices and other information. Behind the scenes, web and grid computing applications may repeatedly retrieve popular data, such as product descriptions, or they may create application-wide data, such as game scores, schedules, or interim stock trading results, all of which have to be accessible to all application servers deployed in a data grid. So, from a business perspective, it is important to recognize that there is an adverse impact on revenue when web page loads slowly and access to information is frustrated.

     Challenges: A rising necessity to share more users’ session data between different Web applications, domains or application servers is a major challenge in designing distributed architecture. With the decline in exponential growth of CPU speed over the past few years, the onus is on the developers to wring out every microsecond of latency and to maximize application throughput. So, the paramount need for deploying high-performant web environments, is to move away from the methodology of having applications query the database or metadata store directly each time data is required to be retrieved, updated or passed around, to one where data lives closer to the application tier. This is where data grid solutions come in, where its innovations in distributed caching make their mark.

     Business Solution:  The main goal of Distributed Caching (or Data Grid) is to provide as much data as possible from memory on every grid node and to ensure data coherency (distributed transactions). By reliably staging application data in memory within the compute grid's memory, data is simultaneously available to all compute nodes with very low latency. This facilitates distributing application’s workloads and the result is many fewer round trips to the database and faster access to data at in-memory speeds.

     Milliseconds translate to significant dollars spent or saved and distributed caching reduces not only the cost but also the risk as compared to scaling up lower data-tiers.For web applications, the in-memory data grid solution enables organizations to economically address data demand usecases involving repetitive reads/writes and session state management. In the era of cloud computing, caching (access to in-memory data) plays a pivotal role in the design of distributed systems.

      Data Grid/Cache Vendors:  Both commercial and open source offerings are available. JBoss Cache, GridGain and EhCache are some of the popular Java open source software for processing in-memory data. On the commercial side, for example, Oracle Coherence lets session states be managed in a variety of caching topologies and enables session data to be stored outside of Java EE application servers. This means that application server heap space is freed up and servers can restart without session data loss.

     Browsing through the product offerings will generally give you an idea of the advantages and solutions readily available for leveraging important application performance gain, and why distributed caching matters.