SUMMARY: In my consulting practice, I recommend an incremental, ‘bottom-up’ implementation methodology, similar to that advocated by Ralph Kimball. This has proven to be a successful deployment approach to ensure a short-term return on investment with minimal project risk, while still delivering a data warehousing architecture that provides a standardized, enterprise wide view of information. This column describes a typical project plan based on a bottom-up implementation methodology.
* REQUIREMENT FOR BOTTOM-UP DEVELOPMENT
I receive numerous e-mails from both technical and business managers who have become extremely frustrated by the inordinate amount of time and effort required to build data warehouses using traditional, top-down development techniques. Due to the large amount of effort required up front to perform user interviews and enterprise data modeling, organizations using top-down techniques often wait 12 to 14 months to get any return on investment. In some cases, managers complain that they have worked with a systems integrator on a top-down development project for over two years without obtaining any solution to their business problem.
As described in previous articles, a bottom-up development approach directly addresses the need for a rapid solution of the business problem, at low cost and low risk. A typical requirement is to develop an operational data mart for a specific business area in 90 days, and develop subsequent data marts in 60 to 90 days each. The bottom-up approach meets these requirements without compromising the technical integrity of the data warehousing solution. Data marts are constructed within a long-term enterprise data warehousing architecture, and the development effort is strictly controlled through the use of logical data modeling techniques and integration of all components of the architecture with central meta data.
* PROJECT PLAN
The representative project plan described below is based on an incremental, ‘bottom-up’ implementation methodology. In my experience, this has been the most successful deployment approach to ensure a short-term return on investment with minimal project risk, while still delivering a data warehousing architecture that provides a standardized, enterprise-wide view of information.
The project plan breaks down into three major phases, as follows:
1. Data Warehouse Requirements and Architecture – gather user requirements through a series of requirements interviews, assess the current IT architecture and infrastructure, along with current and long-range reporting requirements, and develop an enterprise data warehousing framework that will provide maximum flexibility, generality and responsiveness to change. Define functional requirements for the initial data mart within the enterprise architecture. Optionally, conduct a Proof-of-Concept demonstration to select enterprise-class ETL and/or BI tools.
2. Implement Initial Data Mart under the New Architecture – prove the new architecture works by implementing a fully operational data mart for a selected subject area within a 90-day time box.
3. Develop Additional Data Marts under the New Architecture – on a phased basis, develop additional architected data marts within the new framework.
Timeboxes are placed around each phase of the project plan to strictly limit the duration of the development effort. Phase 1 (specification of architecture and functional requirements) is limited to four weeks. If proof-of-concept testing of ETL and BI tool is conducted, these tests occur outside the timebox for Phase 1, due to the uncertain timing of scheduling vendors to perform the tests. Phase 2 (development of initial data mart) is limited to 90 days. Phase 3 (development of additional data marts) is limited to 60 to 90 days for each subsequent data mart.
* TASKS AND DELIVERABLES BY PHASE
The list below summarizes representative tasks, timeframe, and deliverables for each of these three phases.
**Phase / Task
1. Requirements and Architecture
—Conduct workshop to review/define strategic business drivers, review current application architecture, and define strategic data warehousing architecture.
—Elapsed Week :1
—Deliverables: strategic requirements, document specifying the recommended long-term enterprise data warehousing architecture
—Conduct requirements interviews with personnel from multiple subject areas and document the results of the interviews.
—Elapsed Week: 1
—Deliverables: Interview summaries, requirements inventory
—Conduct workshop to define functional specifications of initial data mart,
—Elapsed Weeks: 2
—Deliverables: scope statement, reporting and analysis specifications
—Develop high-level dimensional data model for initial data mart
—Elapsed Weeks: 2
—Deliverables: logical data model
—Conduct workshop to prepare for a Proof-of-Concept test of leading ETL tool(s)
—Elapsed Weeks: 3
—Deliverables: specifications for the ETL POC
—Conduct workshop to prepare for a Proof-of-Concept test of leading BI tool(s)
—Elapsed Weeks: 3
—Deliverables: Specifications for the BI POC
—Conduct workshop to develop Phase 2 project plan
—Elapsed Weeks: 4
—Deliverables: Phase 2 project plan
**Phase / Task
2. Implement Initial Data Mart in 90-Day Time-Box,
—Elapsed Weeks: 12
—Deliverables: live architected data mart, live OLAP cubes and reports, central meta data repository, local meta data repository, reusable ETL objects and conformed dimensions
**Phase / Task
3. Implement Additional Data Marts
—Elapsed Weeks: 8-12 each
—Deliverables: Additional architected data marts
* PHASE 1 IMPLEMENTATION
The initial task in Phase 1 is to conduct two on-site workshops, limited to 1 to 2 days each. The function of the first workshop is to bring all members of the development team up-to-speed on alternative enterprise data warehousing architectures and ‘best practices’ in data warehousing. The second workshop is used to achieve consensus on the specification of an enterprise data warehousing architecture capable of meeting the long-term business requirements of the organization.
Interviews with personnel from multiple subject areas are held to define high-level functional requirements for each subject area. Subject areas often correlate with proposed data marts, in areas such as finance, sales, marketing, HR, supply-chain management, customer touchpoints, etc. As described in a previous column, the interviews are kept deliberately short (one day or less per subject area). The deliverables from each interview include a short, concise requirement specification for the subject area and a top-level dimensional data model representing the data sources, source-to-target mappings, target database, and reports required for a specific subject area. The top-level data models from all subject areas are then synthesized to identify common data sources, conformed dimensions and facts, common transformations and aggregates, etc.
Following synthesis of functional requirements from all subject areas, a workshop is held to define the functional requirements for the initial data mart, lay out the project plan for the development of the initial data mart, identify required skill sets, and personnel assignments to the project.
The next task is to define a high-level dimensional data model for the initial data mart, representing an expansion of the model prepared as part of the user interview process.
If the organization has decided to conduct a proof-of-concept test of competitive ETL tools and BI tools, the next two tasks represent preparation of functional specifications for the tests. Proof-of-concept testing may be required if multiple ETL or BI tools are being evaluated and it is politically expedient to conduct rigorous testing of competitive products prior to making a selection. I have found that an intensive two-day workshop is sufficient to specify the functionality of the tests to be conducted for a proof of concept for either an ETL or BI tool.
The final task in Phase 1 is to specify a detailed project plan for the implementation of the initial data mart.
* PHASE 2 IMPLEMENTATION
In the recommended bottom-up development methodology, the process of implementing the initial data mart is limited to 90 calendar days. Although 90 days is arbitrary, it fits the needs of business managers for a rapid solution of the business problem and meets the needs of CFOs for a 90-day Return-On-Investment. The 90-day timebox starts on the day that the ETL tool, the target DBMS, and the BI tool are successfully installed. To meet the challenge of a 90-day implementation process, utilization of an ETL tool, rather than hand-coding the extractions and transformations, is strongly recommended.
Implementation of the initial and subsequent data marts is ideally performed by a small team of data warehousing professionals, typically consisting of a data modeling specialist, an ETL specialist, and a BI tool specialist. In my own organization, I emphasize cross-training of personnel, which minimizes the number of personnel that need to be assigned to a project and maximizes their efficiency.
The first task for the development team is to design the target data base for the initial data mart. Modeling of the target data base for the initial data mart proceeds through three steps: design of an entity-relationship diagram, then a logical dimensional model, and finally a physical model of the target database schema. Although the E-R diagram is not required for the initial data mart, it is required for subsequent data marts to ensure that all physical data models for multiple data marts are derived from a common logical specification.
The next major task is to specify and implement data mapping, extraction, transformation, and data cleansing rules. The data mapping and transformation rules are defined first in natural language, and then implemented using only the transformation objects supplied with the ETL tool. The objective is to avoid coding any ETL processes. It is hard to predict how long this task will take. For many applications, specification and implementation of the transformation rules should not take more than 3 to 4 weeks. However, some applications may require many months to specify and implement complex transformations. These applications are not likely to fit within a 90-day implementation timebox.
In parallel with implementation of the transformation logic, developers build aggregation, summarization, partition, and distribution functions. The ETL tool may be used to compute aggregates in one pass of the source data, using incremental aggregation techniques. Pre-computed aggregates are recommended for most data warehousing applications to provide rapid response to predictable queries, reports, and analysis.
The final task in Phase 2 is delivery of a fully operational, architected data mart for the initial subject area, using an exact subset of the enterprise data warehousing architecture. Ideally, the development team delivers all of the functionality that was specified at the beginning of the 90-day timebox. However, the team is permitted to defer low priority functions in order to make the 90-day timebox. The team is self-managed and organizes its resources to deliver as much functionality as possible within the 90-day development window.
* PHASE 3 IMPLEMENTATION
The objective of Phase 3 is to build additional architected data marts. Additional data marts are built by the primary development team using common templates and components, such as conformed dimensions and facts, common transformation objects, data models, central meta data definitions, etc.
In the bottom-up methodology, a central data warehouse, an operational data store, and a persistent staging file are optional. Data marts are typically developed using a data warehousing backplane schema design, as described by Ralph Kimball in his book ‘The Data Warehouse Toolkit, Second Edition.’ There may be good technical reasons to incorporate a central data warehouse and an operational data store in the enterprise data warehousing architecture. However, in the bottom-up methodology, development of the central data warehouse and ODS are deferred until they are clearly required. A central data warehouse is often required and when detailed, atomic data from multiple data marts must be accessed to generate cross-business reports, and when there is a low percentage of conformed dimensions across the data marts.
Maintenance and administration of the data warehousing application is an ongoing function. A secondary team may be used to enhance and maintain completed data marts. The primary team transfers transformation templates, data models, conformed dimensions, conformed facts, meta data, etc. to the secondary team to simplify the enhancement and administration of completed data marts.
My organization has used this methodology successfully for many clients and has a good deal of experience with it. However, successful implementation of the methodology depends on several critical success factors:
-a dedicated implementation team;
-consulting help from an experienced organization at the beginning of the project;
-backing of a business manager who is hungry for a solution to a painful business problem; and
-integration of all components of the architecture with central meta data.