Monday, December 26, 2011

MapReduce and Grid Computing


There is a lot of interest and discussion around Hadoop MapReduce' success stories these days, with the likes of Amazon, Yahoo, Facebook, Google etc.. advocating and adopting the framework implementation for their production systems. I got curious to understand the core concept behind the MapReduce framework and what makes it so unique for distributed processing of large data sets.

Reading some interesting articles online, I understand MapReduce  framework at its core is a combination of two functions map ( ) and reduce ( ).The map function understands exactly where it should go to process the data i.e. the computation happens on the distributed nodes  in a completely parallel manner. The reduce function on the other hand, operates on the sorted output of the mappers' intermediate results from each computing node and performs a function on the list. Both the input and the output of the map/reduce tasks are stored in a file-system, for example  proprietary Google File System(GFS), Hadoop Distributed File System (HDFS) or something else. Typically, the compute nodes (MapReduce framework) and the storage nodes (HDFS) are co-located and run on the same set of nodes or physical box based on the assumption that remote data can only be accessed with low bandwidth and high latency. This configuration allows the framework to effectively schedule tasks on the nodes where data is already present, resulting in very high aggregate bandwidth across the cluster. So, is this an extension to the architectural approach for storage grid computing?

The idea of Grid computing arose from the need to solve highly-parallel computational problems that were beyond the processing capability of any single computer. Oracle has been offering its version of grid technology since 2000. The database grid, representing the approach taken with Oracle Database, deploys code redundantly on multiple servers (or nodes), which break up the workload based on an optimized scheme and execute tasks in parallel against common data resources. If any node fails, its work is taken up by a surviving node to ensure high availability. Simply put, RAC Database grid architecture assigns computing tasks to computing resources, and it assigns data to storage resources in a way that enables such resources to be easily added or removed and provides the flexibility for tasks and data to be moved as needed.

My take-away points from a computing perspective: Both MapReduce and Oracle RAC computing environments harness the processing power of multiple interconnected computers and are promising technologies to invest in (depending on the business case) for solving data-intensive and resource-intensive computing problems. A key premise for MapReduce-style computing systems is that there is insufficient network bandwidth to move the data to the computation, and thus computation must move to the data instead. The key differentiator or limitation I observed (at the point of blogging) is the High Availability. Unlike a RAC database transaction processing system, MapReduce-HDFS-style computing systems does not provide high availability as its HDFS file-system instance' name node server is a single point of failure.

I am looking forward to more advancement in these technologies at an affordable cost for addressing the growing data-intensive computing requirements of today’ business economics.

Tuesday, November 29, 2011

A peek into Sustainability Balanced Scorecard for Enterprise


“        Sustainability” is a term I was familiar with, but my recent read of Paul Hawken’s “Ecology of Commerce: A Declaration of Sustainability” helped me realize that it is a much bigger and important concept. Two important things (or the 2Rs of sustainability) caught my attention: 
  1. Wise use of economic and natural Resources 
  2. Respect for people and other living things
The goal of a Balanced Scorecard, as I understand, is a management tool for communicating the enterprise strategy for execution. Pursuing sustainability goals may not be the top priority for most businesses, but I believe a strategy-based balanced scorecard system aligned with principles of the sustainability ‘Triple Bottom Line’ will offer corporations a way to accomplish social and environmental goals while integrating them fully with financial performance and competitive advantage.

In an effort to understand how the ‘Sustainability’ theme can be described through each of the four perspectives of the balanced scorecard, I created a sustainability balanced scorecard as a 6 step process. 


In case you are wondering, the above 6-step sustainability strategic planning through execution process was created using MS Powerpoint. Creating a pretty balanced scorecard picture like this is easy though, but monitoring the impact of the corporate initiatives and measuring the performance on a timely and regular basis is where the challenge lies in. So, are there any software tools/solutions that will help management teams discover the power of enterprise performance management (EPM) to improve transparency, insight, and decision-making? 

Acquiring the right technology is key to improved enterprise planning, and spreadsheets were the most commonly used tool to support business intelligence and EPM processes. Oracle’s Hyperion Performance Scorecard is one solution I am aware of that provides a flexible approach to development of scorecards supporting recognized scorecarding methodologies and industry benchmarks. You can read more about Oracle’ EPM solution here 

Sunday, October 30, 2011

Introduction to Oracle Enterprise Scheduling Services (ESS)


The much awaited Oracle Fusion Applications release is finally out (in the later half of 2011) and with this the Oracle Enterprise Scheduling Services (ESS) application enters the Oracle marketplace.  So, what is Oracle ESS - a batch processing application? The official Oracle documentation describes ESS as an enterprise application that provides time and schedule based callbacks to other applications to run their jobs. In simple technical terminology, ESS is primarily a J2EE application that is deployed to the Oracle Weblogic Server providing scheduling services for distributed job request processing across a grid of application servers. 

Oracle Fusion Applications is a deployment of applications product offerings built on the Oracle Fusion Middleware technology stack and the Oracle Database. Oracle ESS is not currently shipped as a separate product offering, but is generally available with the Fusion Applications product offerings. All the Fusion Applications product families (for e.g. Fusion - HCM, CRM, Financials, Projects) heavily use the ESS functionality to offload larger business transactions processing to run at a future defined schedule and monitoring of job requests. 

In the Oracle e-Business suite, “Concurrent Manager” served several important administrative, batch processing and report generation functions to ensure that the Oracle Applications are not overwhelmed with job requests.  Similarly, in the context of Oracle Fusion Apps, “ESS” complements the functionality of ‘Concurrent Processing’ and is a key component for the Fusion Applications, performing important transaction processing, monitoring and notification functions.

Oracle ESS provides the ability to run different job types, including: Java, PL/SQL and spawned jobs. For now, I will leave you with a high level snapshot of the possible ESS execution methods commonly used in Fusion Apps: