Welcome to the blog of Solomon Nelson, a software technologist by profession. This blog will contain my experiences with information technology and reflections on the management related topics that interest me. DISCLAIMER: The views expressed on this blog are my own and do not necessarily reflect the views of my employer.
Monday, December 26, 2011
MapReduce and Grid Computing
There is a lot of interest and discussion around Hadoop MapReduce' success stories these days, with the likes of Amazon, Yahoo, Facebook, Google etc.. advocating and adopting the framework implementation for their production systems. I got curious to understand the core concept behind the MapReduce framework and what makes it so unique for distributed processing of large data sets.
Reading some interesting articles online, I understand MapReduce framework at its core is a combination of two functions map ( ) and reduce ( ).The map function understands exactly where it should go to process the data i.e. the computation happens on the distributed nodes in a completely parallel manner. The reduce function on the other hand, operates on the sorted output of the mappers' intermediate results from each computing node and performs a function on the list. Both the input and the output of the map/reduce tasks are stored in a file-system, for example proprietary Google File System(GFS), Hadoop Distributed File System (HDFS) or something else. Typically, the compute nodes (MapReduce framework) and the storage nodes (HDFS) are co-located and run on the same set of nodes or physical box based on the assumption that remote data can only be accessed with low bandwidth and high latency. This configuration allows the framework to effectively schedule tasks on the nodes where data is already present, resulting in very high aggregate bandwidth across the cluster. So, is this an extension to the architectural approach for storage grid computing?
The idea of Grid computing arose from the need to solve highly-parallel computational problems that were beyond the processing capability of any single computer. Oracle has been offering its version of grid technology since 2000. The database grid, representing the approach taken with Oracle Database, deploys code redundantly on multiple servers (or nodes), which break up the workload based on an optimized scheme and execute tasks in parallel against common data resources. If any node fails, its work is taken up by a surviving node to ensure high availability. Simply put, RAC Database grid architecture assigns computing tasks to computing resources, and it assigns data to storage resources in a way that enables such resources to be easily added or removed and provides the flexibility for tasks and data to be moved as needed.
My take-away points from a computing perspective: Both MapReduce and Oracle RAC computing environments harness the processing power of multiple interconnected computers and are promising technologies to invest in (depending on the business case) for solving data-intensive and resource-intensive computing problems. A key premise for MapReduce-style computing systems is that there is insufficient network bandwidth to move the data to the computation, and thus computation must move to the data instead. The key differentiator or limitation I observed (at the point of blogging) is the High Availability. Unlike a RAC database transaction processing system, MapReduce-HDFS-style computing systems does not provide high availability as its HDFS file-system instance' name node server is a single point of failure.
I am looking forward to more advancement in these technologies at an affordable cost for addressing the growing data-intensive computing requirements of today’ business economics.
Tuesday, November 29, 2011
A peek into Sustainability Balanced Scorecard for Enterprise
“ Sustainability”
is a term I was familiar with, but my recent read of Paul Hawken’s “Ecology of Commerce: A Declaration of
Sustainability” helped me realize that it is a much bigger and important
concept. Two important things (or the 2Rs of sustainability) caught my attention:
- Wise use of economic and natural Resources
- Respect for people and other living things
The goal of a
Balanced Scorecard, as I understand, is a management tool for communicating the
enterprise strategy for execution. Pursuing sustainability goals may not be the
top priority for most businesses, but I believe a strategy-based balanced scorecard system aligned with principles of the
sustainability ‘Triple Bottom Line’ will offer corporations a way to accomplish
social and environmental goals while integrating them fully with financial
performance and competitive advantage.
In an effort to understand how the ‘Sustainability’ theme can be
described through
each of the four perspectives of the balanced scorecard, I created a sustainability balanced scorecard as a 6 step process.
In case you are wondering, the above 6-step sustainability strategic planning through execution process was created using MS Powerpoint. Creating a pretty balanced scorecard picture like this is easy though, but monitoring the impact of the corporate initiatives and measuring the performance on a timely and regular basis is where the challenge lies in. So, are there any software tools/solutions that will help management teams discover the power of
enterprise performance management (EPM) to improve transparency, insight, and
decision-making?
Acquiring the right technology is key to improved enterprise planning, and spreadsheets were the most commonly used tool to support business intelligence and EPM processes. Oracle’s Hyperion Performance Scorecard is one solution I am aware of that provides a flexible approach to development of scorecards supporting recognized scorecarding methodologies and industry benchmarks. You can read more about Oracle’ EPM solution here
Acquiring the right technology is key to improved enterprise planning, and spreadsheets were the most commonly used tool to support business intelligence and EPM processes. Oracle’s Hyperion Performance Scorecard is one solution I am aware of that provides a flexible approach to development of scorecards supporting recognized scorecarding methodologies and industry benchmarks. You can read more about Oracle’ EPM solution here
Sunday, October 30, 2011
Introduction to Oracle Enterprise Scheduling Services (ESS)
The much awaited Oracle Fusion
Applications release is finally out (in the later half of 2011) and with this
the Oracle Enterprise Scheduling Services (ESS) application enters the Oracle
marketplace. So, what is Oracle ESS - a batch processing application? The
official Oracle documentation describes ESS as an enterprise application that
provides time and schedule based callbacks to other applications to run their
jobs. In simple technical terminology, ESS is primarily a J2EE application that
is deployed to the Oracle Weblogic Server providing scheduling services for
distributed job request processing across a grid of application servers.
Oracle Fusion Applications is a deployment
of applications product offerings built on the Oracle Fusion Middleware
technology stack and the Oracle Database. Oracle ESS is not currently shipped
as a separate product offering, but is generally available with the Fusion
Applications product offerings. All the Fusion Applications product families
(for e.g. Fusion - HCM, CRM, Financials, Projects) heavily use the ESS
functionality to offload larger business transactions processing to run at a
future defined schedule and monitoring of job requests.
In the Oracle e-Business suite,
“Concurrent Manager” served several important administrative, batch processing
and report generation functions to ensure that the Oracle Applications are not
overwhelmed with job requests. Similarly, in the context of Oracle Fusion
Apps, “ESS” complements the functionality of ‘Concurrent Processing’ and is a
key component for the Fusion Applications, performing important transaction
processing, monitoring and notification functions.
Oracle ESS provides the ability to
run different job types, including: Java, PL/SQL and spawned jobs. For now, I will leave you with a high level snapshot of the
possible ESS execution methods commonly used in Fusion Apps:
Subscribe to:
Posts (Atom)