Four-layered approach for ETL architecture design
Functional layer: Core functional ETL processing (extract, transform, and load).
Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.
Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.
Utility layer: Common components supporting all other layers.
Use file-based ETL processing where possible
Storage costs relatively little
Intermediate files serve multiple purposes
Used for testing and debugging
Used for restart and recover processing
Used to calculate control statistics
Helps to reduce dependencies - enables modular programming.
Allows flexibility for job-execution and -scheduling
Better performance if coded properly, and can take advantage of parallel processing capabilities when the need arises.
Use data-driven methods and minimize custom ETL coding
Parameter-driven jobs, functions, and job-control
Code definitions and mapping in database
Consideration for data-driven tables to support more complex code-mappings and business-rule
application.
Qualities of a good ETL architecture design
Performance
Scalable
Migratable
Recoverable (run_id, ...)
Operable (completion-codes for phases, re-running from checkpoints, etc.)
Auditable (in two dimensions: business requirements and technical troubleshooting)