BI Search engine

Sunday, May 17, 2009

Good practices for ETL


Four-layered approach for ETL architecture design

Functional layer: Core functional ETL processing (extract, transform, and load).

Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.

Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.

Utility layer: Common components supporting all other layers.

Use file-based ETL processing where possible

Storage costs relatively little

Intermediate files serve multiple purposes

Used for testing and debugging

Used for restart and recover processing

Used to calculate control statistics

Helps to reduce dependencies - enables modular programming.

Allows flexibility for job-execution and -scheduling

Better performance if coded properly, and can take advantage of parallel processing capabilities when the need arises.

Use data-driven methods and minimize custom ETL coding

Parameter-driven jobs, functions, and job-control

Code definitions and mapping in database

Consideration for data-driven tables to support more complex code-mappings and business-rule 
application.

Qualities of a good ETL architecture design

Performance

Scalable

Migratable

Recoverable (run_id, ...)

Operable (completion-codes for phases, re-running from checkpoints, etc.)

Auditable (in two dimensions: business requirements and technical troubleshooting)

0 comments:

Post a Comment