Requirements:
- CKAN 2.4+ (including 2.7, 2.8)
- Celery backend (for CKAN 2.4-2.6) OR RQ backend (for CKAN 2.7+)
- SQLAlchemy Migrate 0.9.1
- Celery 3.1.23 (for CKAN 2.4-2.6)
- data.world account and API credentials
Installation:
1. Activate CKAN virtualenv:
. /usr/lib/ckan/default/bin/activate
Install extension:
cd /usr/lib/ckan/default/src
pip install -e git+https://github.com/DataShades/ckanext-datadotworld#egg=ckanext-datadotworld
Install requirements:
pip install -r ckanext-datadotworld/requirements.txt
Add plugin to ckan.plugins:
ckan.plugins = … datadotworld …
Initialize database:
paster –plugin=ckanext-datadotworld datadotworld init -c /etc/ckan/default/production.ini
Restart CKAN:
sudo service apache2 restart
Configuration:
No global configuration required - all settings are organization-level.
Organization Setup:
1. Edit organization in CKAN
2. Navigate to “data.world” tab
3. Configure:
- data.world Owner ID (required)
- data.world API Token (required)
- Sync immediately on dataset create/update (optional)
- Include private datasets (optional)
Background Job Setup:
For CKAN 2.4-2.6 (Celery):
1. Ensure Celery backend configured in production.ini:
ckan.celery.queues = celery
- Start Celery worker:
paster –plugin=ckan celeryd -c /etc/ckan/default/production.ini
For CKAN 2.7+ (RQ):
1. Start RQ worker:
paster –plugin=ckan jobs worker -c /etc/ckan/default/production.ini
Cron Jobs (Optional):
Push Failed Datasets:
Retry datasets that failed to sync to data.world.
Add to crontab:
Retry failed syncs every hour
0 * * * * paster –plugin=ckanext-datadotworld datadotworld push_failed -c /etc/ckan/default/production.ini
Sync Remote Resources:
Update remote resource URLs that have changed.
Add to crontab:
Sync remote resources daily at 3 AM
0 3 * * * paster –plugin=ckanext-datadotworld datadotworld sync_resources -c /etc/ckan/default/production.ini
Paster Commands:
Initialize database:
paster –plugin=ckanext-datadotworld datadotworld init -c production.ini
Upgrade database schema:
paster –plugin=ckanext-datadotworld datadotworld upgrade -c production.ini
Push failed datasets:
paster –plugin=ckanext-datadotworld datadotworld push_failed -c production.ini
Options:
–organization ORG_ID : Only retry for specific organization
Sync remote resources:
paster –plugin=ckanext-datadotworld datadotworld sync_resources -c production.ini
Options:
–organization ORG_ID : Only sync for specific organization
Template Snippets:
Add data.world status banner to dataset page:
Edit package/read_base.html template:
{% block content_primary_nav %}
{{ super() }}
{% snippet ‘datadotworld/snippets/banner.html’, pkg_dict=pkg %}
{% endblock %}
Add data.world label to dataset listings:
Edit snippets/package_item.html template:
{% block heading_title %}
{{ super() }}
{% snippet ‘datadotworld/snippets/label.html’, pkg_dict=package %}
{% endblock %}
Troubleshooting:
Datasets not syncing:
- Check organization has valid data.world Owner ID and API Token
- Verify background job worker (Celery/RQ) is running
- Check CKAN logs for errors
Failed syncs accumulating:
- Run push_failed command manually to retry
- Check data.world API credentials are valid
- Verify datasets meet data.world requirements
Remote resources not updating:
- Run sync_resources command manually
- Check resource URLs are accessible
- Verify data.world API limits not exceeded
Development Setup:
Clone repository:
git clone https://github.com/DataShades/ckanext-datadotworld
cd ckanext-datadotworld
Create test.ini from test-core.ini template
Install dev requirements:
pip install -r dev-requirements.txt
Run tests:
nosetests –nologcapture –with-pylons=test.ini ckanext/datadotworld/tests
Dependencies:
- sqlalchemy-migrate==0.9.1
- celery==3.1.23 (CKAN 2.4-2.6 only)
License: Apache License 2.0
Keywords: CKAN, data.world, sync, integration, celery, rq
Developer: Link Digital (Sergey Motornyuk)