Extension AirCan DataFactory Integration


Extension Basics

Title
AirCan DataFactory Integration
Name
ckanext-aircan
Type
Public extension
Description
Apache Airflow integration for CKAN enabling automated data processing workflows and ETL operations.
CKAN versions

~2.8.0, ~2.9.0

Show details
Download-Url (zip)
Download-Url commit date
2024-08-19
Url to repo
Category
Specialized Tools


Background Infos

Description (long)
Show details

The AirCan DataFactory Integration extension revolutionizes CKAN’s data processing capabilities by seamlessly integrating with Apache Airflow for sophisticated workflow orchestration and automated ETL operations. This powerful extension replaces traditional data loading mechanisms like DataPusher and XLoader with scalable, cloud-native data processing pipelines. The system automatically triggers Airflow DAGs when resources are created or updated, enabling complex data transformations, validation, and loading processes. Users benefit from advanced workflow features including retry logic, parallel processing, dependency management, and comprehensive monitoring dashboards. The extension supports both local Airflow instances and Google Cloud Composer environments with secure authentication and configuration management. Data processing workflows can include format conversion, schema validation, data cleaning, aggregation, and direct loading to various destinations including PostgreSQL DataStore and cloud data warehouses. The system provides real-time status tracking, error reporting, and notification capabilities for workflow monitoring. Essential for organizations requiring enterprise-grade data processing, automated quality assurance, and scalable data pipeline management.

Version
1.0.5
Version release date
2024-08-19
Contact name
Datopian Team
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Replaces datapusher and xloader, supports local and Google Cloud Composer Airflow instances.

Plugins to configure (ckan.ini)
aircan_connector
CKAN Settings (ckan.ini)
# CKAN__AIRFLOW__URL = 'http://host.docker.internal:8080/api/experimental/dags/ckan_api_load_multiple_steps/dag_runs' # CKAN__AIRFLOW__USERNAME = 'airflow_admin_username' # CKAN__AIRFLOW__PASSWORD = 'airflow_admin_password' # CKAN__AIRFLOW__STORAGE_PATH = '/tmp/' # CKAN__AIRFLOW__CLOUD = 'local' # For Google Cloud Composer: # CKAN__AIRFLOW__CLOUD = 'GCP' # CKAN__AIRFLOW__CLOUD__PROJECT_ID = 'your_project_id' # CKAN__AIRFLOW__CLOUD__LOCATION = 'us-east1' # CKAN__AIRFLOW__CLOUD__COMPOSER_ENVIRONMENT = 'composer_env_name' # CKAN__AIRFLOW__CLOUD__WEB_UI_ID = 'airflow_ui_id' # CKAN__AIRFLOW__CLOUD__GOOGLE_APPLICATION_CREDENTIALS = '{"json":"credentials"}'
DB migration to be executed
(not set)
<< back to Extensions