Extension data.world Integration


Extension Basics

Title
data.world Integration
Name
ckanext-datadotworld
Type
Public extension
Description
Synchronize CKAN datasets to data.world platform with organization-level configuration and background job support
CKAN versions

~2.9, ~2.10, ~2.11

Show details
Download-Url (zip)
Download-Url commit date
2025-10-10
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

Extension for syncing CKAN datasets to the data.world platform. Allows organizations to configure automatic synchronization of their datasets to data.world accounts. Supports background job processing via Celery (CKAN 2.4-2.6) or RQ (CKAN 2.7+). Features include: organization-level data.world sync configuration, push failed datasets via cron jobs, sync remote resources, template snippets for banners and labels showing data.world status. Production-ready integration for sharing CKAN data on data.world platform.

Version
0.3.3
Version release date
2025-10-10
Contact name
Sergey Motornyuk
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Requirements: - CKAN 2.4+ (including 2.7, 2.8) - Celery backend (for CKAN 2.4-2.6) OR RQ backend (for CKAN 2.7+) - SQLAlchemy Migrate 0.9.1 - Celery 3.1.23 (for CKAN 2.4-2.6) - data.world account and API credentials

Installation: 1. Activate CKAN virtualenv: . /usr/lib/ckan/default/bin/activate

  1. Install extension: cd /usr/lib/ckan/default/src pip install -e git+https://github.com/DataShades/ckanext-datadotworld#egg=ckanext-datadotworld

  2. Install requirements: pip install -r ckanext-datadotworld/requirements.txt

  3. Add plugin to ckan.plugins: ckan.plugins = … datadotworld …

  4. Initialize database: paster –plugin=ckanext-datadotworld datadotworld init -c /etc/ckan/default/production.ini

  5. Restart CKAN: sudo service apache2 restart

Configuration: No global configuration required - all settings are organization-level.

Organization Setup: 1. Edit organization in CKAN 2. Navigate to “data.world” tab 3. Configure: - data.world Owner ID (required) - data.world API Token (required) - Sync immediately on dataset create/update (optional) - Include private datasets (optional)

Background Job Setup:

For CKAN 2.4-2.6 (Celery): 1. Ensure Celery backend configured in production.ini: ckan.celery.queues = celery

  1. Start Celery worker: paster –plugin=ckan celeryd -c /etc/ckan/default/production.ini

For CKAN 2.7+ (RQ): 1. Start RQ worker: paster –plugin=ckan jobs worker -c /etc/ckan/default/production.ini

Cron Jobs (Optional):

Push Failed Datasets: Retry datasets that failed to sync to data.world.

Add to crontab:

Retry failed syncs every hour

0 * * * * paster –plugin=ckanext-datadotworld datadotworld push_failed -c /etc/ckan/default/production.ini

Sync Remote Resources: Update remote resource URLs that have changed.

Add to crontab:

Sync remote resources daily at 3 AM

0 3 * * * paster –plugin=ckanext-datadotworld datadotworld sync_resources -c /etc/ckan/default/production.ini

Paster Commands:

  1. Initialize database: paster –plugin=ckanext-datadotworld datadotworld init -c production.ini

  2. Upgrade database schema: paster –plugin=ckanext-datadotworld datadotworld upgrade -c production.ini

  3. Push failed datasets: paster –plugin=ckanext-datadotworld datadotworld push_failed -c production.ini

    Options: –organization ORG_ID : Only retry for specific organization

  4. Sync remote resources: paster –plugin=ckanext-datadotworld datadotworld sync_resources -c production.ini

    Options: –organization ORG_ID : Only sync for specific organization

Template Snippets:

Add data.world status banner to dataset page: Edit package/read_base.html template:

{% block content_primary_nav %} {{ super() }} {% snippet ‘datadotworld/snippets/banner.html’, pkg_dict=pkg %} {% endblock %}

Add data.world label to dataset listings: Edit snippets/package_item.html template:

{% block heading_title %} {{ super() }} {% snippet ‘datadotworld/snippets/label.html’, pkg_dict=package %} {% endblock %}

Troubleshooting:

  1. Datasets not syncing:

    • Check organization has valid data.world Owner ID and API Token
    • Verify background job worker (Celery/RQ) is running
    • Check CKAN logs for errors
  2. Failed syncs accumulating:

    • Run push_failed command manually to retry
    • Check data.world API credentials are valid
    • Verify datasets meet data.world requirements
  3. Remote resources not updating:

    • Run sync_resources command manually
    • Check resource URLs are accessible
    • Verify data.world API limits not exceeded

Development Setup:

  1. Clone repository: git clone https://github.com/DataShades/ckanext-datadotworld cd ckanext-datadotworld

  2. Create test.ini from test-core.ini template

  3. Install dev requirements: pip install -r dev-requirements.txt

  4. Run tests: nosetests –nologcapture –with-pylons=test.ini ckanext/datadotworld/tests

Dependencies: - sqlalchemy-migrate==0.9.1 - celery==3.1.23 (CKAN 2.4-2.6 only)

License: Apache License 2.0

Keywords: CKAN, data.world, sync, integration, celery, rq

Developer: Link Digital (Sergey Motornyuk)

Plugins to configure (ckan.ini)
datadotworld
CKAN Settings (ckan.ini)
# ckan.celery.queues = celery
DB migration to be executed
datadotworld
<< back to Extensions