Extension Metadata Export


Extension Basics

Title
Metadata Export
Name
ckanext-metaexport
Type
Public extension
Description
Universal metadata export to different metadata standards (RDF, DCAT, JSON-LD, etc.)
CKAN versions

~2.9, ~2.10, ~2.11

Show details
Download-Url (zip)
Download-Url commit date
2025-10-10
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

Provides a universal framework for exporting dataset metadata into different metadata standards and formats. Implements an extensible plugin-based architecture using IMetaexport interface that allows registration of custom metadata formats. Supports multiple metadata standards including RDF, DCAT, JSON-LD, ISO 19115, and Dublin Core. Each format can have custom data extractors that transform CKAN metadata into the target standard’s schema. Export views are available at /dataset/{id}/metaexport/{format} URLs. Useful for metadata interoperability, harvesting by external systems, and compliance with metadata standards. Includes PDF export functionality using pdfkit and geometric data handling with geomet library.

Version
0.2.0
Version release date
2025-10-10
Contact name
Sergey Motornyuk
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Requirements: - CKAN 2.9+ - Python 3.8+ - rdflib >= 7.1.3, < 8.0.0 - geomet >= 1.1.0, < 2.0.0 - pdfkit >= 1.0.0, < 2.0.0 - wkhtmltopdf system package (for PDF export)

Installation:

  1. Install system dependencies for PDF export: sudo apt-get update sudo apt-get install wkhtmltopdf

  2. Activate CKAN virtualenv: . /usr/lib/ckan/default/bin/activate

  3. Install extension: pip install ckanext-metaexport

    Or from source: git clone https://github.com/DataShades/ckanext-metaexport.git cd ckanext-metaexport python setup.py develop

  4. Install Python dependencies: pip install -r requirements.txt

    Key dependencies:

    • rdflib: RDF graph library for semantic metadata
    • geomet: Geometry handling for spatial metadata
    • pdfkit: PDF generation wrapper for wkhtmltopdf
  5. Add plugin to ckan.plugins in production.ini: ckan.plugins = … metaexport …

  6. Restart CKAN: sudo service apache2 reload

Configuration:

No global configuration required. Metadata formats are registered via plugin interface.

Usage:

Accessing Metadata Exports:

Metadata exports are available at URL pattern: /dataset/{dataset_id}/metaexport/{format}

Examples: - /dataset/my-dataset/metaexport/rdf - /dataset/my-dataset/metaexport/dcat - /dataset/my-dataset/metaexport/jsonld - /dataset/my-dataset/metaexport/iso19115 - /dataset/my-dataset/metaexport/pdf

Supported Formats (built-in):

  1. RDF (Resource Description Framework):

    • Semantic web standard
    • Machine-readable linked data
    • Supports SPARQL queries
  2. DCAT (Data Catalog Vocabulary):

    • W3C standard for data catalogs
    • Widely used in open data portals
    • Interoperable across platforms
  3. JSON-LD (JSON Linked Data):

    • JSON-based linked data format
    • Easy to parse and consume
    • Compatible with semantic web tools
  4. ISO 19115 (Geographic Metadata):

    • International standard for geospatial metadata
    • Used by GIS systems
    • Comprehensive geographic information
  5. Dublin Core:

    • Simple metadata standard
    • 15 core metadata elements
    • Widely adopted for digital resources
  6. PDF Export:

    • Human-readable metadata document
    • Suitable for archival
    • Includes all dataset information

Creating Custom Metadata Formats:

Implement IMetaexport Interface:

from ckan.plugins import SingletonPlugin, implements from ckanext.metaexport.interfaces import IMetaexport from ckanext.metaexport.formatters import Format

class MyCustomFormat(Format): “”“Custom metadata format implementation”“”

def __init__(self):
    super().__init__()
    self.name = 'myformat'
    self.content_type = 'application/xml'
    self.file_extension = 'xml'

def export(self, data):
    """Generate metadata in custom format"""
    # Transform data into custom format
    return custom_xml_output

class MyMetaexportPlugin(SingletonPlugin): implements(IMetaexport)

def register_metadata_format(self):
    """Register custom formats"""
    return {
        'myformat': MyCustomFormat()
    }

def register_data_extractors(self, formats):
    """Configure data extraction for formats"""
    if 'myformat' in formats:
        formats['myformat'].set_data_extractor(
            self.extract_my_data
        )

def extract_my_data(self, package_id):
    """Extract data for custom format"""
    # Fetch package data
    # Transform to format-specific structure
    return {
        'title': package['title'],
        'custom_field': custom_value,
        # ... more fields
    }

Format Class Structure:

Inherit from ckanext.metaexport.formatters.Format:

class Format: “”“Base class for metadata formats”“”

name = ''  # Format identifier
content_type = ''  # MIME type
file_extension = ''  # File extension for downloads

def __init__(self):
    self.data_extractor = None

def set_data_extractor(self, extractor):
    """Set function to extract data"""
    self.data_extractor = extractor

def export(self, data):
    """Generate metadata in format"""
    raise NotImplementedError

Data Extractor Pattern:

def my_data_extractor(package_id): “”“Extract and transform package data”“” import ckan.plugins.toolkit as tk

# Get package data
package = tk.get_action('package_show')(
    {'ignore_auth': True},
    {'id': package_id}
)

# Transform to format-specific structure
return {
    'title': package['title'],
    'description': package['notes'],
    'resources': [
        {
            'name': r['name'],
            'url': r['url'],
            'format': r['format']
        }
        for r in package.get('resources', [])
    ],
    # Custom transformations
    'spatial_extent': extract_spatial_info(package),
    'temporal_extent': extract_temporal_info(package),
}

Template-Based Export:

Use Jinja2 templates for format generation:

class XMLFormat(Format): def export(self, data): from jinja2 import Template

    template = Template('''
    <?xml version="1.0" encoding="UTF-8"?>
    <metadata>
        <title>{{ title }}</title>
        <description>{{ description }}</description>
        {% for resource in resources %}
        <resource>
            <name>{{ resource.name }}</name>
            <url>{{ resource.url }}</url>
        </resource>
        {% endfor %}
    </metadata>
    ''')

    return template.render(**data)

Development:

  1. Clone repository: git clone https://github.com/DataShades/ckanext-metaexport.git cd ckanext-metaexport

  2. Install for development: python setup.py develop pip install -r dev-requirements.txt

  3. Run tests: pytest –ckan-ini test.ini

  4. Run with coverage: pytest –ckan-ini test.ini –cov=ckanext.metaexport

Testing Custom Formats:

  1. Register format in test plugin
  2. Create test dataset with known metadata
  3. Call export endpoint: /dataset/{id}/metaexport/{format}
  4. Validate output against format schema
  5. Test data transformations

Troubleshooting:

  1. PDF export not working:

    • Verify wkhtmltopdf is installed: wkhtmltopdf –version
    • Check PATH includes wkhtmltopdf
    • Install: sudo apt-get install wkhtmltopdf
    • For headless servers: install xvfb-run wrapper
  2. RDF export errors:

    • Verify rdflib version: pip show rdflib
    • Check namespace definitions
    • Validate RDF syntax
    • Review error logs for parsing issues
  3. Custom format not appearing:

    • Verify plugin implements IMetaexport
    • Check register_metadata_format returns dict
    • Ensure plugin is in ckan.plugins
    • Restart CKAN after changes
  4. Data extractor errors:

    • Verify extractor function signature
    • Check package_id is valid
    • Review data transformations
    • Add error handling for missing fields

Best Practices:

  1. Format Implementation:

    • Use standard schemas when possible
    • Validate output against schemas
    • Handle missing/optional fields gracefully
    • Provide clear error messages
  2. Data Extraction:

    • Cache expensive operations
    • Handle custom fields properly
    • Transform spatial/temporal data correctly
    • Include all relevant metadata
  3. Performance:

    • Cache format instances
    • Optimize data extraction queries
    • Consider async generation for large datasets
    • Implement pagination for large exports
  4. Standards Compliance:

    • Follow official specifications
    • Use standard namespaces/vocabularies
    • Validate against reference implementations
    • Document any deviations

Advanced Features:

Geospatial Metadata (geomet):

from geomet import wkt, wkb

def extract_spatial_info(package): “”“Extract and transform spatial extent”“” if ‘spatial’ in package: # Convert GeoJSON to WKT geom = wkt.dumps(package[‘spatial’]) return geom return None

RDF Graph Building (rdflib):

from rdflib import Graph, Namespace, Literal, URIRef from rdflib.namespace import DCTERMS, RDF

def build_rdf_graph(package_data): “”“Build RDF graph for dataset”“” g = Graph() DCAT = Namespace(‘http://www.w3.org/ns/dcat#’)

dataset_uri = URIRef(package_data['url'])
g.add((dataset_uri, RDF.type, DCAT.Dataset))
g.add((dataset_uri, DCTERMS.title, Literal(package_data['title'])))

return g.serialize(format='xml')

Development Status: Beta (4)

License: AGPL v3.0 or later

Keywords: CKAN, metadata, export, RDF, DCAT, JSON-LD, ISO 19115, Dublin Core

Developer: Link Digital (Sergey Motornyuk)

Related Extensions: - ckanext-dcat: DCAT-specific functionality - ckanext-spatial: Spatial metadata and search - ckanext-scheming: Schema customization - ckanext-harvest: Metadata harvesting

Plugins to configure (ckan.ini)
# metaexport=ckanext.metaexport.plugin:MetaexportPlugin
CKAN Settings (ckan.ini)
# self.name = 'myformat' # self.content_type = 'application/xml' # self.file_extension = 'xml'
DB migration to be executed
(not set)
<< back to Extensions