Extension Federated Index


Extension Basics

Title
Federated Index
Name
ckanext-federated-index
Type
Public extension
Description
Lightweight solution for storing and searching remote CKAN datasets locally while redirecting to original portal
CKAN versions

~2.9, ~2.10, ~2.11

Show details
Download-Url (zip)
Download-Url commit date
2024-10-17
Url to repo
Category
Specialized Tools


Background Infos

Description (long)
Show details

The federated-index extension provides a lightweight method to index datasets from remote CKAN instances into your local search index without creating full dataset copies. Unlike ckanext-harvest which creates local datasets, this extension only adds remote datasets to the search index and redirects users to the original portal when they open dataset details. It uses profile-based configuration allowing multiple remote CKAN instances to be indexed simultaneously. Advanced features include schema alignment, incremental updates (fetch only newer datasets), custom search payloads, and multiple storage backends (database, filesystem, Redis, or SQLite). Provides an IFederatedIndex interface for custom hooks before indexing.

Version
0.1.1.post1
Version release date
2024-10-17
Contact name
DataShades
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Installation: pip install ckanext-federated-index

Add to ckan.plugins: federated_index

Profile-based configuration (each remote portal needs a profile):

Basic profile (demo is the profile name)

ckanext.federated_index.profile.demo.url = https://demo.ckan.org ckanext.federated_index.profile.demo.api_key = YOUR_API_KEY_OPTIONAL ckanext.federated_index.profile.demo.timeout = 5

Profile with advanced features (JSON format for extras)

ckanext.federated_index.profile.demo.extras = {“search_payload”: {“rows”: 100, “fq”: “organization:my-org”}, “storage”: {“type”: “redis”}}

Global settings: ckanext.federated_index.align_with_local_schema = false ckanext.federated_index.redirect_missing_federated_datasets = true ckanext.federated_index.dataset_read_endpoints = dataset.read ckanext.federated_index.index_url_field = federated_index_remote_url ckanext.federated_index.index_profile_field = federated_index_profile

Storage types (configured via profile extras): - db (default): custom table via migration - fs: filesystem JSON files in ckan.storage_path/federated_index/PROFILENAME - redis: Redis storage - sqlite: separate SQLite database per profile

Refresh datasets via CLI: ckanapi action federated_index_profile_refresh profile=demo index=true

Incremental updates (only newer datasets): ckanapi action federated_index_profile_refresh profile=demo index=true since_last_refresh=true

Implement IFederatedIndex interface for custom hooks: class MyPlugin(p.SingletonPlugin): p.implements(plugins.IFederatedIndex)

def federated_index_before_index(self, pkg_dict, profile):
    # Custom logic before indexing
    return pkg_dict

Differences from ckanext-harvest: - Works only with CKAN instances (not generic harvesters) - Uses CKAN API, no background processes - Adds to search index only, no local dataset copies - Lighter weight and simpler architecture

Testing: pytest

Plugins to configure (ckan.ini)
federated_index
CKAN Settings (ckan.ini)
# ckanext.federated_index.profile.demo.api_key = YOUR_API_KEY_OPTIONAL # ckanext.federated_index.profile.demo.timeout = 5 # ckanext.federated_index.profile.demo.extras = {"search_payload": {"rows": 100, "fq": "organization:my-org"}, "storage": {"type": "redis"}} # ckanext.federated_index.align_with_local_schema = false # ckanext.federated_index.redirect_missing_federated_datasets = true # ckanext.federated_index.dataset_read_endpoints = dataset.read # ckanext.federated_index.index_url_field = federated_index_remote_url # ckanext.federated_index.index_profile_field = federated_index_profile
DB migration to be executed
federated-index
<< back to Extensions