Extension Ingest


Extension Basics

Title
Ingest
Name
ckanext-ingest
Type
Public extension
Description
Framework for transforming data streams into tasks, reading CSV/JSON/XLSX files to create/update datasets
CKAN versions

~2.9, ~2.10, ~2.11

Show details
Download-Url (zip)
Download-Url commit date
2025-02-09
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

The ingest extension provides a comprehensive framework for reading user-provided file-like objects and producing tasks based on the input. It allows reading records from CSV, JSON, XLSX and other file formats to perform various operations: create or update datasets, remove users/organizations/datasets, send emails, collect statistics, or perform any type of work that can be described as a series of steps. The framework uses a strategy-based architecture where extraction strategies parse sources and produce records, while records perform ingestion actions. Includes generic strategies for CSV (ingest:scheming_csv), ZIP archives (ingest:recursive_zip), and XLSX files (ingest:xlsx). Supports strategy autodetection based on mimetype, strategy delegation, record transformation with ckanext-scheming integration, and comprehensive configuration options for allowed/disabled strategies.

Version
1.4.6
Version release date
2025-02-09
Contact name
DataShades
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Installation: pip install ckanext-ingest

With XLSX support: pip install ‘ckanext-ingest[xlsx]’

Add to ckan.plugins: ingest

Configuration:

List of allowed ingestion strategies (empty = all allowed)

ckanext.ingest.strategy.allowed = ingest:recursive_zip

List of disabled ingestion strategies

ckanext.ingest.strategy.disabled = ingest:scheming_csv

Base template for WebUI (default: page.html)

ckanext.ingest.base_template = admin/index.html

Allow moving existing resources between packages (default: false)

ckanext.ingest.allow_resource_transfer = true

Rename strategies using JSON object mapping

ckanext.ingest.strategy.name_mapping = {“ckanext.ingest.strategy.zip:ZipStrategy”: “zip”}

Usage via CLI: ckanapi action ingest_import_records source@path/to/data.zip strategy=”myext:extract_archive”

Implement IIngest interface to register custom strategies:

class MyPlugin(p.SingletonPlugin): p.implements(IIngest)

def get_ingest_strategies(self):
    return {
        "my:custom_strategy": CustomStrategy,
    }

Built-in strategies: - ingest:scheming_csv: CSV with ckanext-scheming field mapping - ingest:recursive_zip: Process ZIP archives recursively - ingest:xlsx: Process Excel spreadsheets (requires openpyxl)

API actions: - ingest_extract_records: Extract records from source (debugging) - ingest_import_records: Ingest records and create/update data

Strategy delegation, record options (update_existing, nested_strategy, locator, extras), and ckanext-scheming integration with field mapping via ingest_options are supported.

Testing: pytest –ckan-ini=test.ini

Plugins to configure (ckan.ini)
ingest
CKAN Settings (ckan.ini)
# ckanext.ingest.strategy.allowed = ingest:recursive_zip # ckanext.ingest.strategy.disabled = ingest:scheming_csv # ckanext.ingest.base_template = admin/index.html # ckanext.ingest.allow_resource_transfer = true # ckanext.ingest.strategy.name_mapping = {"ckanext.ingest.strategy.zip:ZipStrategy": "zip"}
DB migration to be executed
(not set)
<< back to Extensions