Extension OAI-PMH Harvester


Extension Basics

Title
OAI-PMH Harvester
Name
ckanext-oaipmh
Type
Public extension
Description
CKAN harvester for OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) conforming repositories
CKAN versions
Download-Url (zip)
Download-Url commit date
2022-01-10
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

The oaipmh extension provides a harvester for CKAN that supports the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) protocol. It allows importing metadata from OAI-PMH conforming repositories into CKAN. Features include support for credentials (username/password), harvesting specific sets, multiple metadata formats (oai_dc and oai_ddi currently supported), and HTTP GET enforcement option for sources that don’t support HTTP POST. The harvester integrates with ckanext-harvest extension and requires a sysadmin user called ‘harvest’ on the CKAN instance. Configuration is done via the harvest source configuration JSON with options for username, password, set selection, metadata_prefix, and force_http_get flag.

Version
0.1
Version release date
2022-01-10
Contact name
Liip AG
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Requirements: - ckanext-harvest extension must be installed - IMPORTANT: You need a sysadmin user called ‘harvest’ on your CKAN instance

Installation: source /home/www-data/pyenv/bin/activate pip install -e git+https://github.com/openresearchdata/ckanext-oaipmh.git#egg=ckanext-oaipmh –src /var/www cd /var/www/ckanext-oaipmh pip install -r requirements.txt python setup.py develop

Add to ckan.plugins: oaipmh_harvester

Setup Harvester: 1. Navigate to /harvest/new 2. Enter base URL of OAI-PMH repository (e.g., http://boris.unibe.ch/cgi/oai2) 3. Select Source type: OAI-PMH Harvester

Configuration options (JSON in Configuration section):

Credentials (if required)

{“username”: “foo”, “password”: “bar”}

Harvest specific set only

{“set”: “baz”}

Specify metadata format (currently oai_dc and oai_ddi supported)

{“metadata_prefix”: “oai_dc”}

Enforce HTTP GET if source doesn’t support POST (default: false)

{“force_http_get”: true}

Run Harvester: 1. Activate python environment 2. cd to CKAN directory (e.g., /usr/lib/ckan/default/src/ckan) 3. Start consumers: paster –plugin=ckanext-oaipmh harvester gather_consumer & paster –plugin=ckanext-oaipmh harvester fetch_consumer & 4. Run job: paster –plugin=ckanext-oaipmh harvester run

Development/Testing: . ~/default/bin/activate cd /var/www/ckanext-oaipmh nosetests –logging-filter=ckanext.oaipmh.harvester –ckan –with-pylons=test.ini ckanext/oaipmh/tests

OAI-PMH repositories: http://www.openarchives.org/Register/BrowseSites

Plugins to configure (ckan.ini)
oaipmh_harvester
CKAN Settings (ckan.ini)
(not set)
DB migration to be executed
(not set)
<< back to Extensions