YOUR FEEDBACK
Rapid Module Development for DotNetNuke
MICHEAL SMITH wrote: GO TO THE LINK, U HAVE EVERYTHING U WANT THERE. MICHEAL...


2007 West
GOLD SPONSORS:
Active Endpoints
Your SOA Needs BPEL for Orchestration
BEA
Virtualized SOA: Adaptive Infrastructure for Demanding Applications
Nexaweb
Overcoming Bandwidth Challenges with Nexaweb
TIBCO
What is Service Virtualization?
SILVER SPONSORS:
WSO2
Using Web Services Technologies and FOSS Solutions
Click For 2007 East
Event Webcasts

2008 East
PLATINUM SPONSORS:
Appcelerator
Think Fast: Accelerate AJAX Development with Appcelerator
GOLD SPONSORS:
DreamFace Interactive
The Ultimate Framework for Creating Personalized Web 2.0 Mashups
ICEsoft
AJAX and Social Computing for the Enterprise
Kaazing
Enterprise Comet: Real–Time, Real–Time, or Real–Time Web 2.0?
Nexaweb
Now Playing: Desktop Apps in the Browser!
Sun
jMaki as an AJAX Mashup Framework
POWER PANELS:
The Business Value
of RIAs
What Lies Beyond AJAX?
KEYNOTES:
Douglas Crockford
Can We Fix the Web?
Anthony Franco
2008: The Year of the RIA
Click For 2007 Event Webcasts
SYS-CON.TV
TOP LINKS YOU MUST CLICK ON


Data Warehouse Adoption of the Linux-Based Platform
A Study of Trends and Challenges

Digg This!

Data warehouse implementations represent one of the most challenging types of deployments for the enterprise. Several factors contribute to the challenge of deploying a successful data warehouse. Among these are large-scale and complex system configurations, sophisticated data modeling and analysis tools, and high visibility in a broad range of important business functions within the company.

Data warehouse workloads can serve as a litmus test to determine the enterprise readiness of a given deployment platform. For this reason it's interesting to determine how well Linux can support such challenging workloads. To that end I began a study, examining two interrelated aspects of enterprise readiness for a data warehouse on Linux:

  1. Is the solution stack supported on Linux?
  2. Are end-user companies actively deploying the stack to support their business needs?
To investigate this issue, I chose to work in cooperation with the Data Center Linux initiative at OSDL. Building on personal, practical experience with data warehouse deployments, I conducted an informal survey of the readiness of the Linux platform for this workload. This article is a summary of the findings of that survey.

Data Warehouse Solution Participants

The survey examined three types of participants in the data warehouse solution or ecosystem:
  1. Independent software vendors (ISV)
  2. Independent hardware vendors (IHV)
  3. End-user company deployments
A number of adequate studies has been published that shows how Linux is well accepted on a variety of industry-standard vendor platforms, so its base acceptance was taken as an assumption within my study. Rather, the focus of my study was on Linux readiness within the ISV and end-user communities.

I used Ralph Kimball's "High Level Warehouse Technical Architecture" as a reference for analysis and to provide common terminology for analysis of the solution stack. I broke down the list of vendors into "front room" and "back room" categories, based upon Kimball's architecture.

The study involved a total of 18 vendors. It's important to note that this roster did not represent a de facto list chosen to illustrate Linux usage. In fact the list represented the dominant vendors, chosen based upon experience in deployments at a number of large companies.

Study Results - Data Warehouse Trends

The study found that overall there exists reasonable support for Linux from ISVs that comprise the data warehouse solution, with 14 of 18 vendors offering some level of support for the open source OS. Within Kimball's technical architecture, the vendors supplying products to meet the needs for the "front room" were predominantly hosting their offerings on client platforms. They had weaker support overall for Linux than the "back room" vendors with products in such areas as extract, transform, and load (ETL) and database. Specifically, the ETL vendors tended to support one particular Linux distribution very well, while database vendors tended to support multiple Linux distributions.

The study further examined motivators and other issues driving (and inhibiting) Linux adoption and support by ISVs, with the following findings.

Motivators

  • Market demand for the Linux platform
Issues
  • How many and which distributions to support
  • Differences in packages across distributions
  • Lack of standardization among maintenance tools and lack of usability features
While issues exist with regard to supporting the Linux platform, clearly a majority of ISVs within the data warehouse felt that the market demand was sufficiently compelling to deliver products for that platform.

By examining end-user company deployments, my study focused on companies that had data warehouse and/or data mart implementations that would be considered medium-sized to large (i.e., total implementation data size was at least one terabyte), with a typical configuration around 60 terabytes. These types of configurations shared some common themes:

  • Overall configuration elements - medium to large data warehouse:
    - SAN disk - use of failover
    - Employ NFS
    - Use multiple file systems as well as raw disk partitions
    - Employ large file systems
    - Multi-CPU large servers dominant - use of partitioning
The study further surveyed a subset of companies from a group of companies with data warehouse implementations within the target size. Initially, a small sample set of companies was chosen to limit the scope of the study and get an initial picture of the deployments. In the future, the study of companies will be expanded.

Of the seven companies surveyed, the responses broke down as shown in Table 1.

The following is a summary of the issues and motivators for the three groups above.

Group 1

  • While there are some potential motivators for cost consolidation, there are significant inhibitors in terms of the internal infrastructure to support Linux and the perceived immaturity in the platform.
Group 2
  • Flexibility in choice of hardware platforms drove decisions to build a development environment as a first step toward evolving a mature support infrastructure for Linux.
  • The primary inhibitor to moving to production was the lack of adequate support infrastructure within key ISVs for solutions on Linux.
Group 3
  • Migration to Linux represented a strategic move to take advantage of the flexibility of deploying the hardware and software solutions that Linux provides.
  • The primary production issue for IT infrastructure teams was providing systems integration services to ensure the success of such a demanding workload, such as the need to build customized monitoring scripts for the environment.
Based on the data above, the most important group to analyze in more detail was Group 1 because it was the dominant group. Moreover, I wanted to provide input to the OSDL Data Center Linux group regarding the strategic focus areas to drive acceptance of the data warehouse on Linux.

Group 1 reported the following motivations and issues in detail.

Motivators

  • Cost consolidation
  • H/W platform flexibility
  • Low-cost clustering
  • Consolidation of system administration skills
Issues
  • Weak internal support for Linux infrastructure
  • Lack of maturity of data warehouse solutions on Linux
    - Maturity defined: Referenceable and in production for at least one year
  • Lack of acceptance of Linux within DW
    - Acceptance defined: Deployments within Fortune 100 companies

Conclusion

The overall conclusion drawn from this survey of the data warehouse and Linux was that the solution stack is sufficient to support the workload on Linux. However, the Linux support infrastructure is often not mature enough for Linux-based deployments for the large, complex configurations and demanding workloads of data warehouses.

End-User Highlights
Some very specific findings emerged from the study with regard to end-user deployment:

  • The majority of companies in Group 1 (no plans in the near future to migrate to Linux) will eventually move into Group 2 (development on Linux with a longer- term move to production). They fell into Group 1 because complexity, reliability, and scalability requirements proved too demanding for current deployments on Linux. Staffing and support issues were key inhibitors as well.
  • Groups 2 and 3 featured early adopters who leveraged the availability of H/W, database, and ETL server solutions to enable successful deployment.
ISV Highlights
Similarly, salient ISV data emerged from the study:
  • Market adoption of Linux in "back room" solutions is healthy and growing.
  • Market adoption of Linux in "front room" solutions is measured, due to limitations in current ISV offerings and challenges for ISVs to support multiple Linux distributions.
  • Opportunities exist for standardization across distributions, e.g., tools, packages, etc., to support the ISV community.
The information from this study has been incorporated into the prioritization of requirements for the OSDL Data Center Linux initiative, especially within the database and data warehouse tier. The OSDL intends to expand upon this informal study in the future to continue to drive visibility to the needs of this and other critical Data Center workloads.
About Lynn de la Torre
Lynn de la Torre is a member of OSDL and coordinates the activities of the DCL Working Group. Lynn has thirty years of experience in the data center, and has worked in operations, system administration, database administration, and software development. Prior to joining OSDL, Lynn was a project manager for a large data warehouse implementation.

LATEST LINUX STORIES
Kevin Hoffman's Review of Iron Man
I took the advice of a friend of mine and steered clear of the 'normal' movie theaters and went a little out of the way to go to a DLP movie theater. The experience of comparing a regular movie theater to a DLP movie theater is like comparing standard def analog TV with a 1080i HDTV si
3rd International Virtualization Conference & Expo: Themes & Topics
From Application Virtualization to Xen, a round-up of the virtualization themes & topics being discussed in NYC June 23-24, 2008 by the world-class speaker faculty at the 3rd International Virtualization Conference & Expo being held by SYS-CON Events in The Roosevelt Hotel, in midtown
Verizon Becomes a Counter-Android Linux Convert
Verizon Wireless is snubbing Google's Linux-based Android initiative to go with the LiMo Foundation's mobile Linux spec for its next wave of mobile phones expected next year. Along with Verizon, Mozilla signed up - giving the consortium its first major open source ISV - and a key one f
Adaptec Launches New Series 2 RAID Controller For Linux Users
Adaptec unveiled a new family of entry-level Unified Serial RAID controllers. The new low-profile Series 2 RAID controllers, built on the same Adaptec dual core RAID-on-Chip (ROC) architecture used in its successful Series 5 RAID controllers, provide significant performance enhancement
JavaOne 2008: Sun Challenges Linux
Sun's mule train has finally pulled into Indiana after three years on the road. Indiana is the Linux-friendly Fedora-like OpenSolaris project meant to move the Solaris-shy Linux community off Linux and on to Solaris tempted by Solaris widgetry like the highly scalable, rollback-easy, 1
Curl Announces Support for Ubuntu for Enterprise RIA Platform
Curl announced it has released the availability of an Ubuntu Installer for the Curl Rich Internet Application (RIA) platform. Curl is a Rich Internet Application platform that competes with Adobe AIR/Flex, Silverlight, and Ajax. Curl has been shipping with Linux support for RedHat 9, S
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS
SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
Click to Add our RSS Feeds to the Service of Your Choice:
Google Reader or Homepage Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online
myFeedster Add to My AOL Subscribe in Rojo Add 'Hugg' to Newsburst from CNET News.com Kinja Digest View Additional SYS-CON Feeds
Publish Your Article! Please send it to editorial(at)sys-con.com!

Advertise on this site! Contact advertising(at)sys-con.com! 201 802-3021

SYS-CON FEATURED WHITEPAPERS

ADS BY GOOGLE