News Etl Pipeline
May 2025
News ETL Pipeline with SOLID Architecture
ETL
Airflow
PostgreSQL
Docker
SOLID
Data Engineering
An ETL pipeline for processing news on car accidents involving alcohol, designed with strict SOLID principles for maintainability and extensibility.
Overview
This project implements an ETL (Extract, Transform, Load) pipeline to process news about car accidents involving alcohol consumption. The architecture is designed with strict adherence to SOLID principles to enhance maintainability and extensibility.
Architecture
Extractors
•Handle data collection from various sources
•Implemented sources: NewsAPI, GNews
•Easily extendable for additional sources
Transformers
•Apply filtering and relevance scoring
•Content analysis for alcohol-related accidents
•Standardization of data formats
Loaders
•Manage data persistence
•Implemented destinations: PostgreSQL and S3
•Configurable output formats
Technical Implementation
Microservices-Friendly Structure
•Centralized configuration
•Comprehensive logging
•Database connection handling
Orchestration
•Airflow for pipeline management
•Docker infrastructure
•Service-specific configurations
SOLID Principles Implementation
•Single Responsibility: Each class has one job
•Open/Closed: Extendable without modification
•Liskov Substitution: Interchangeable components
•Interface Segregation: Focused interfaces
•Dependency Inversion: High-level modules independent of low-level modules
Benefits
•Clear separation of concerns
•Highly modular architecture
•Easily extendable for new data sources or destinations
•Robust error handling and logging
•Scalable for increased data volumes
Project Image 1
Project Image 2
Project Details
Project Type
News Etl Pipeline
Completed
May 2025
Technologies
ETLAirflowPostgreSQLDockerSOLIDData Engineering