Extension Metadata Export

Extension Basics

Title	Metadata Export
Name	ckanext-metaexport
Type	Public extension
Description	Universal metadata export to different metadata standards (RDF, DCAT, JSON-LD, etc.)
CKAN versions	~2.9, ~2.10, ~2.11 Show details These CKAN Versions are exactely matched: 2.10.0 2.10.1 2.10.2 2.10.3 2.10.4 2.10.5 2.10.6 2.10.7 2.10.8 2.11.0 2.11.1 2.11.2 2.11.3 2.12.0 2.9.0 2.9.1 2.9.10 2.9.11 2.9.2 2.9.3 2.9.4 2.9.5 2.9.6 2.9.7 2.9.8 2.9.9
Download-Url (zip)	https://github.com/DataShades/ckanext-metaexport.git#egg=ckanext-metaexport
Download-Url commit date	2025-10-10
Url to repo	https://github.com/DataShades/ckanext-metaexport
Category	Data Management & Quality

Background Infos

Description (long)	Show details Provides a universal framework for exporting dataset metadata into different metadata standards and formats. Implements an extensible plugin-based architecture using IMetaexport interface that allows registration of custom metadata formats. Supports multiple metadata standards including RDF, DCAT, JSON-LD, ISO 19115, and Dublin Core. Each format can have custom data extractors that transform CKAN metadata into the target standard’s schema. Export views are available at /dataset/{id}/metaexport/{format} URLs. Useful for metadata interoperability, harvesting by external systems, and compliance with metadata standards. Includes PDF export functionality using pdfkit and geometric data handling with geomet library.
Version	0.2.0
Version release date	2025-10-10
Contact name	Sergey Motornyuk
Contakt email	sergey.motornyuk@linkdigital.com.au
Contact Url	(not set)

Installation Guide

Configuration hints	Requirements: - CKAN 2.9+ - Python 3.8+ - rdflib >= 7.1.3, < 8.0.0 - geomet >= 1.1.0, < 2.0.0 - pdfkit >= 1.0.0, < 2.0.0 - wkhtmltopdf system package (for PDF export) Installation: Install system dependencies for PDF export: sudo apt-get update sudo apt-get install wkhtmltopdf Activate CKAN virtualenv: . /usr/lib/ckan/default/bin/activate Install extension: pip install ckanext-metaexport Or from source: git clone https://github.com/DataShades/ckanext-metaexport.git cd ckanext-metaexport python setup.py develop Install Python dependencies: pip install -r requirements.txt Key dependencies: rdflib: RDF graph library for semantic metadata geomet: Geometry handling for spatial metadata pdfkit: PDF generation wrapper for wkhtmltopdf Add plugin to ckan.plugins in production.ini: ckan.plugins = … metaexport … Restart CKAN: sudo service apache2 reload Configuration: No global configuration required. Metadata formats are registered via plugin interface. Usage: Accessing Metadata Exports: Metadata exports are available at URL pattern: /dataset/{dataset_id}/metaexport/{format} Examples: - /dataset/my-dataset/metaexport/rdf - /dataset/my-dataset/metaexport/dcat - /dataset/my-dataset/metaexport/jsonld - /dataset/my-dataset/metaexport/iso19115 - /dataset/my-dataset/metaexport/pdf Supported Formats (built-in): RDF (Resource Description Framework): Semantic web standard Machine-readable linked data Supports SPARQL queries DCAT (Data Catalog Vocabulary): W3C standard for data catalogs Widely used in open data portals Interoperable across platforms JSON-LD (JSON Linked Data): JSON-based linked data format Easy to parse and consume Compatible with semantic web tools ISO 19115 (Geographic Metadata): International standard for geospatial metadata Used by GIS systems Comprehensive geographic information Dublin Core: Simple metadata standard 15 core metadata elements Widely adopted for digital resources PDF Export: Human-readable metadata document Suitable for archival Includes all dataset information Creating Custom Metadata Formats: Implement IMetaexport Interface: from ckan.plugins import SingletonPlugin, implements from ckanext.metaexport.interfaces import IMetaexport from ckanext.metaexport.formatters import Format class MyCustomFormat(Format): “”“Custom metadata format implementation”“” `def __init__(self): super().__init__() self.name = 'myformat' self.content_type = 'application/xml' self.file_extension = 'xml' def export(self, data): """Generate metadata in custom format""" # Transform data into custom format return custom_xml_output` class MyMetaexportPlugin(SingletonPlugin): implements(IMetaexport) def register_metadata_format(self): """Register custom formats""" return { 'myformat': MyCustomFormat() } def register_data_extractors(self, formats): """Configure data extraction for formats""" if 'myformat' in formats: formats['myformat'].set_data_extractor( self.extract_my_data ) def extract_my_data(self, package_id): """Extract data for custom format""" # Fetch package data # Transform to format-specific structure return { 'title': package['title'], 'custom_field': custom_value, # ... more fields } Format Class Structure: Inherit from ckanext.metaexport.formatters.Format: class Format: “”“Base class for metadata formats”“” `name = '' # Format identifier content_type = '' # MIME type file_extension = '' # File extension for downloads def __init__(self): self.data_extractor = None def set_data_extractor(self, extractor): """Set function to extract data""" self.data_extractor = extractor def export(self, data): """Generate metadata in format""" raise NotImplementedError` Data Extractor Pattern: def my_data_extractor(package_id): “”“Extract and transform package data”“” import ckan.plugins.toolkit as tk `# Get package data package = tk.get_action('package_show')( {'ignore_auth': True}, {'id': package_id} ) # Transform to format-specific structure return { 'title': package['title'], 'description': package['notes'], 'resources': [ { 'name': r['name'], 'url': r['url'], 'format': r['format'] } for r in package.get('resources', []) ], # Custom transformations 'spatial_extent': extract_spatial_info(package), 'temporal_extent': extract_temporal_info(package), }` Template-Based Export: Use Jinja2 templates for format generation: class XMLFormat(Format): def export(self, data): from jinja2 import Template `template = Template(''' <?xml version="1.0" encoding="UTF-8"?> <metadata> <title>{{ title }}</title> <description>{{ description }}</description> {% for resource in resources %} <resource> <name>{{ resource.name }}</name> <url>{{ resource.url }}</url> </resource> {% endfor %} </metadata> ''') return template.render(**data)` Development: Clone repository: git clone https://github.com/DataShades/ckanext-metaexport.git cd ckanext-metaexport Install for development: python setup.py develop pip install -r dev-requirements.txt Run tests: pytest –ckan-ini test.ini Run with coverage: pytest –ckan-ini test.ini –cov=ckanext.metaexport Testing Custom Formats: Register format in test plugin Create test dataset with known metadata Call export endpoint: /dataset/{id}/metaexport/{format} Validate output against format schema Test data transformations Troubleshooting: PDF export not working: Verify wkhtmltopdf is installed: wkhtmltopdf –version Check PATH includes wkhtmltopdf Install: sudo apt-get install wkhtmltopdf For headless servers: install xvfb-run wrapper RDF export errors: Verify rdflib version: pip show rdflib Check namespace definitions Validate RDF syntax Review error logs for parsing issues Custom format not appearing: Verify plugin implements IMetaexport Check register_metadata_format returns dict Ensure plugin is in ckan.plugins Restart CKAN after changes Data extractor errors: Verify extractor function signature Check package_id is valid Review data transformations Add error handling for missing fields Best Practices: Format Implementation: Use standard schemas when possible Validate output against schemas Handle missing/optional fields gracefully Provide clear error messages Data Extraction: Cache expensive operations Handle custom fields properly Transform spatial/temporal data correctly Include all relevant metadata Performance: Cache format instances Optimize data extraction queries Consider async generation for large datasets Implement pagination for large exports Standards Compliance: Follow official specifications Use standard namespaces/vocabularies Validate against reference implementations Document any deviations Advanced Features: Geospatial Metadata (geomet): from geomet import wkt, wkb def extract_spatial_info(package): “”“Extract and transform spatial extent”“” if ‘spatial’ in package: # Convert GeoJSON to WKT geom = wkt.dumps(package[‘spatial’]) return geom return None RDF Graph Building (rdflib): from rdflib import Graph, Namespace, Literal, URIRef from rdflib.namespace import DCTERMS, RDF def build_rdf_graph(package_data): “”“Build RDF graph for dataset”“” g = Graph() DCAT = Namespace(‘http://www.w3.org/ns/dcat#’) `dataset_uri = URIRef(package_data['url']) g.add((dataset_uri, RDF.type, DCAT.Dataset)) g.add((dataset_uri, DCTERMS.title, Literal(package_data['title']))) return g.serialize(format='xml')` Development Status: Beta (4) License: AGPL v3.0 or later Keywords: CKAN, metadata, export, RDF, DCAT, JSON-LD, ISO 19115, Dublin Core Developer: Link Digital (Sergey Motornyuk) Related Extensions: - ckanext-dcat: DCAT-specific functionality - ckanext-spatial: Spatial metadata and search - ckanext-scheming: Schema customization - ckanext-harvest: Metadata harvesting
Plugins to configure (ckan.ini)	# metaexport=ckanext.metaexport.plugin:MetaexportPlugin
CKAN Settings (ckan.ini)	# self.name = 'myformat' # self.content_type = 'application/xml' # self.file_extension = 'xml'
DB migration to be executed	(not set)

<< back to Extensions