Requirements:
- CKAN 2.9+
- Python 3.8+
- rdflib >= 7.1.3, < 8.0.0
- geomet >= 1.1.0, < 2.0.0
- pdfkit >= 1.0.0, < 2.0.0
- wkhtmltopdf system package (for PDF export)
Installation:
Install system dependencies for PDF export:
sudo apt-get update
sudo apt-get install wkhtmltopdf
Activate CKAN virtualenv:
. /usr/lib/ckan/default/bin/activate
Install extension:
pip install ckanext-metaexport
Or from source:
git clone https://github.com/DataShades/ckanext-metaexport.git
cd ckanext-metaexport
python setup.py develop
Install Python dependencies:
pip install -r requirements.txt
Key dependencies:
- rdflib: RDF graph library for semantic metadata
- geomet: Geometry handling for spatial metadata
- pdfkit: PDF generation wrapper for wkhtmltopdf
Add plugin to ckan.plugins in production.ini:
ckan.plugins = … metaexport …
Restart CKAN:
sudo service apache2 reload
Configuration:
No global configuration required. Metadata formats are registered via plugin interface.
Usage:
Accessing Metadata Exports:
Metadata exports are available at URL pattern:
/dataset/{dataset_id}/metaexport/{format}
Examples:
- /dataset/my-dataset/metaexport/rdf
- /dataset/my-dataset/metaexport/dcat
- /dataset/my-dataset/metaexport/jsonld
- /dataset/my-dataset/metaexport/iso19115
- /dataset/my-dataset/metaexport/pdf
Supported Formats (built-in):
RDF (Resource Description Framework):
- Semantic web standard
- Machine-readable linked data
- Supports SPARQL queries
DCAT (Data Catalog Vocabulary):
- W3C standard for data catalogs
- Widely used in open data portals
- Interoperable across platforms
JSON-LD (JSON Linked Data):
- JSON-based linked data format
- Easy to parse and consume
- Compatible with semantic web tools
ISO 19115 (Geographic Metadata):
- International standard for geospatial metadata
- Used by GIS systems
- Comprehensive geographic information
Dublin Core:
- Simple metadata standard
- 15 core metadata elements
- Widely adopted for digital resources
PDF Export:
- Human-readable metadata document
- Suitable for archival
- Includes all dataset information
Creating Custom Metadata Formats:
Implement IMetaexport Interface:
from ckan.plugins import SingletonPlugin, implements
from ckanext.metaexport.interfaces import IMetaexport
from ckanext.metaexport.formatters import Format
class MyCustomFormat(Format):
“”“Custom metadata format implementation”“”
def __init__(self):
super().__init__()
self.name = 'myformat'
self.content_type = 'application/xml'
self.file_extension = 'xml'
def export(self, data):
"""Generate metadata in custom format"""
# Transform data into custom format
return custom_xml_output
class MyMetaexportPlugin(SingletonPlugin):
implements(IMetaexport)
def register_metadata_format(self):
"""Register custom formats"""
return {
'myformat': MyCustomFormat()
}
def register_data_extractors(self, formats):
"""Configure data extraction for formats"""
if 'myformat' in formats:
formats['myformat'].set_data_extractor(
self.extract_my_data
)
def extract_my_data(self, package_id):
"""Extract data for custom format"""
# Fetch package data
# Transform to format-specific structure
return {
'title': package['title'],
'custom_field': custom_value,
# ... more fields
}
Format Class Structure:
Inherit from ckanext.metaexport.formatters.Format:
class Format:
“”“Base class for metadata formats”“”
name = '' # Format identifier
content_type = '' # MIME type
file_extension = '' # File extension for downloads
def __init__(self):
self.data_extractor = None
def set_data_extractor(self, extractor):
"""Set function to extract data"""
self.data_extractor = extractor
def export(self, data):
"""Generate metadata in format"""
raise NotImplementedError
Data Extractor Pattern:
def my_data_extractor(package_id):
“”“Extract and transform package data”“”
import ckan.plugins.toolkit as tk
# Get package data
package = tk.get_action('package_show')(
{'ignore_auth': True},
{'id': package_id}
)
# Transform to format-specific structure
return {
'title': package['title'],
'description': package['notes'],
'resources': [
{
'name': r['name'],
'url': r['url'],
'format': r['format']
}
for r in package.get('resources', [])
],
# Custom transformations
'spatial_extent': extract_spatial_info(package),
'temporal_extent': extract_temporal_info(package),
}
Template-Based Export:
Use Jinja2 templates for format generation:
class XMLFormat(Format):
def export(self, data):
from jinja2 import Template
template = Template('''
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<title>{{ title }}</title>
<description>{{ description }}</description>
{% for resource in resources %}
<resource>
<name>{{ resource.name }}</name>
<url>{{ resource.url }}</url>
</resource>
{% endfor %}
</metadata>
''')
return template.render(**data)
Development:
Clone repository:
git clone https://github.com/DataShades/ckanext-metaexport.git
cd ckanext-metaexport
Install for development:
python setup.py develop
pip install -r dev-requirements.txt
Run tests:
pytest –ckan-ini test.ini
Run with coverage:
pytest –ckan-ini test.ini –cov=ckanext.metaexport
Testing Custom Formats:
- Register format in test plugin
- Create test dataset with known metadata
- Call export endpoint: /dataset/{id}/metaexport/{format}
- Validate output against format schema
- Test data transformations
Troubleshooting:
PDF export not working:
- Verify wkhtmltopdf is installed: wkhtmltopdf –version
- Check PATH includes wkhtmltopdf
- Install: sudo apt-get install wkhtmltopdf
- For headless servers: install xvfb-run wrapper
RDF export errors:
- Verify rdflib version: pip show rdflib
- Check namespace definitions
- Validate RDF syntax
- Review error logs for parsing issues
Custom format not appearing:
- Verify plugin implements IMetaexport
- Check register_metadata_format returns dict
- Ensure plugin is in ckan.plugins
- Restart CKAN after changes
Data extractor errors:
- Verify extractor function signature
- Check package_id is valid
- Review data transformations
- Add error handling for missing fields
Best Practices:
Format Implementation:
- Use standard schemas when possible
- Validate output against schemas
- Handle missing/optional fields gracefully
- Provide clear error messages
Data Extraction:
- Cache expensive operations
- Handle custom fields properly
- Transform spatial/temporal data correctly
- Include all relevant metadata
Performance:
- Cache format instances
- Optimize data extraction queries
- Consider async generation for large datasets
- Implement pagination for large exports
Standards Compliance:
- Follow official specifications
- Use standard namespaces/vocabularies
- Validate against reference implementations
- Document any deviations
Advanced Features:
Geospatial Metadata (geomet):
from geomet import wkt, wkb
def extract_spatial_info(package):
“”“Extract and transform spatial extent”“”
if ‘spatial’ in package:
# Convert GeoJSON to WKT
geom = wkt.dumps(package[‘spatial’])
return geom
return None
RDF Graph Building (rdflib):
from rdflib import Graph, Namespace, Literal, URIRef
from rdflib.namespace import DCTERMS, RDF
def build_rdf_graph(package_data):
“”“Build RDF graph for dataset”“”
g = Graph()
DCAT = Namespace(‘http://www.w3.org/ns/dcat#’)
dataset_uri = URIRef(package_data['url'])
g.add((dataset_uri, RDF.type, DCAT.Dataset))
g.add((dataset_uri, DCTERMS.title, Literal(package_data['title'])))
return g.serialize(format='xml')
Development Status: Beta (4)
License: AGPL v3.0 or later
Keywords: CKAN, metadata, export, RDF, DCAT, JSON-LD, ISO 19115, Dublin Core
Developer: Link Digital (Sergey Motornyuk)
Related Extensions:
- ckanext-dcat: DCAT-specific functionality
- ckanext-spatial: Spatial metadata and search
- ckanext-scheming: Schema customization
- ckanext-harvest: Metadata harvesting