Extension Search Federation


Extension Basics

Title
Search Federation
Name
ckanext-searchfed
Type
Public extension
Description
Federated search across multiple CKAN instances to supplement local search results with datasets from remote portals
CKAN versions

~2.9, ~2.10, ~2.11

Show details
Download-Url (zip)
Download-Url commit date
2025-10-10
Url to repo
Category
Specialized Tools


Background Infos

Description (long)
Show details

Search federation extension that supplements local CKAN search results with datasets from remote CKAN instances using their package_search API. When local search results fall below a configurable threshold (min_search_results), the extension automatically queries configured remote data portals and adds their results below local datasets. Supports multiple remote portals with custom labels (e.g. ‘FROM DATA.BRISBANE.GOV.AU’), facet merging from remote results, API federation control, and source portal filtering. Useful for creating unified search experiences across federated data infrastructures, allowing users to discover relevant datasets from partner organizations without leaving the local portal. Includes smart filtering to avoid duplicate harvested content.

Version
0.0.1
Version release date
2025-10-10
Contact name
(not set)
Contakt email
(not set)
Contact Url
(not set)


Installation Guide

Configuration hints

Requirements: - CKAN 2.6+ - Python 2.6 or 2.7 - Network access to remote CKAN instances - Remote portals must have public package_search API

Installation:

  1. Activate CKAN virtualenv: . /usr/lib/ckan/default/bin/activate

  2. Install extension: pip install ckanext-searchfed

    Or from source: git clone https://github.com/DataShades/ckanext-searchfed.git cd ckanext-searchfed python setup.py develop

  3. Install dependencies: pip install -r dev-requirements.txt

  4. Add plugin to ckan.plugins in production.ini: ckan.plugins = … searchfed …

  5. Configure remote portals (see Configuration below)

  6. Restart CKAN: sudo service apache2 reload

Configuration:

Required Settings:

  1. Remote Portal Configuration: Define labels and URLs for remote CKAN instances. Format: label URL [label URL …]

    Multiple remote portals

    ckan.search_federation = data.brisbane.gov.au https://data.brisbane.qld.gov.au/data data.gov.au http://data.gov.au data.sa.gov.au https://data.sa.gov.au

    Label will appear next to dataset titles in search results: “FROM DATA.BRISBANE.GOV.AU”

Optional Settings:

  1. Filter Label (for excluding harvested content): Labels used to filter out already-harvested datasets. Default: empty string

    ckan.search_federation.label = data.sa.gov.au

    When set, creates filter query: -harvest_portal:data.sa.gov.au

  2. Extra Filter Keys: Field names used for filtering remote queries. Default: ‘harvest_portal’

    ckan.search_federation.extra_keys = harvest_portal search_federation_portal

    Prevents showing datasets already harvested from remote portals.

  3. Use Remote Facets: Use facets from remote search instead of local facets. Default: false

    ckan.search_federation.use_remote_facet_results = false

    If true and remote results include ‘search_facets’, they replace local facets.

  4. Minimum Search Results Threshold: Trigger federation when local results below this number. Default: 20 Set to -1 to always run federation regardless of local results.

    ckan.search_federation.min_search_results = 3

    If local results < 3, fetch remote results. If set to -1, always fetch remote results.

  5. API Federation Control: Include remote datasets in API search results. Default: false

    ckan.search_federation.api_federation = false

    If true: API searches include remote datasets If false: Only web UI shows federated results

  6. Source Facet Field: Facet field identifying source portal for each dataset. Default: empty string

    ckan.search_federation.source_facet_field = vocab_source_portal

    Used for merging facet counts and building “Source” filter.

  7. Source Extras Key: Key in search_params[“extras”] carrying selected source portals. Default: empty string

    ckan.search_federation.source_extras_key = source_portal

    Searchfed checks this key to decide if remote portal should be queried. If user hasn’t selected this source, remote call is skipped.

Configuration Examples:

Basic Federation (always supplement with 2 portals):

ckan.plugins = … searchfed … ckan.search_federation = data.brisbane.gov.au https://data.brisbane.qld.gov.au/data data.gov.au http://data.gov.au ckan.search_federation.min_search_results = -1

Conditional Federation (only when < 5 local results):

ckan.plugins = … searchfed … ckan.search_federation = data.gov.au http://data.gov.au ckan.search_federation.min_search_results = 5

Advanced with Filtering:

ckan.plugins = … searchfed … ckan.search_federation = data.brisbane.gov.au https://data.brisbane.qld.gov.au/data data.gov.au http://data.gov.au ckan.search_federation.label = data.brisbane.gov.au ckan.search_federation.extra_keys = harvest_portal search_federation_portal ckan.search_federation.min_search_results = 10 ckan.search_federation.api_federation = true ckan.search_federation.source_facet_field = vocab_source_portal ckan.search_federation.source_extras_key = source_portal

Usage:

Automatic Federated Search:

  1. User performs search query
  2. CKAN executes local search
  3. If local results < min_search_results:
    • Extension queries configured remote portals
    • Filters out already-harvested datasets
    • Checks user’s source portal selection
    • Merges remote results with local results
  4. Remote results appear below local results with source labels

Search Result Display:

Local Results (top): - Dataset 1 (local) - Dataset 2 (local) - Dataset 3 (local)

Federated Results (below, with labels): - Dataset 4 FROM DATA.BRISBANE.GOV.AU - Dataset 5 FROM DATA.GOV.AU - Dataset 6 FROM DATA.BRISBANE.GOV.AU

Source Filtering:

If source_facet_field configured, users can filter by source:

Facet: “Source Portal” - Local Portal (15) - data.brisbane.gov.au (8) - data.gov.au (12)

Selecting a source filters to show only those datasets.

API Behavior:

With api_federation = false (default): - Web UI: Shows federated results - API calls: Return only local results

With api_federation = true: - Web UI: Shows federated results - API calls: Include remote results in response

Development:

  1. Clone repository: git clone https://github.com/DataShades/ckanext-searchfed.git cd ckanext-searchfed

  2. Install for development: python setup.py develop pip install -r dev-requirements.txt

  3. Create test.ini from template

  4. Run tests: pytest –ckan-ini test.ini

Troubleshooting:

  1. No remote results appearing:

    • Verify remote portal URLs are accessible
    • Check min_search_results threshold
    • Test remote API: curl “https://remote-portal/api/3/action/package_search?q=test”
    • Review CKAN logs for connection errors
    • Verify ckan.search_federation is configured
  2. Duplicate results (harvested + federated):

    • Configure ckan.search_federation.label
    • Set ckan.search_federation.extra_keys
    • Verify harvest_portal field populated on harvested datasets
    • Check filter query is being applied
  3. Performance issues:

    • Reduce number of remote portals
    • Increase min_search_results threshold
    • Set api_federation = false for API performance
    • Consider caching remote results
    • Monitor remote API response times
  4. Facet merging problems:

    • Verify remote portals return search_facets
    • Check use_remote_facet_results setting
    • Ensure facet field names match
    • Review facet configuration on remote portals
  5. Source filtering not working:

    • Verify source_facet_field matches actual field
    • Check source_extras_key is correct
    • Ensure facet is configured in both local and remote
    • Review search parameters being sent

Performance Considerations:

  1. Network Latency:

    • Remote API calls add latency to searches
    • Consider timeout settings
    • Monitor remote portal availability
    • Use CDN/caching for frequently accessed data
  2. Result Limit:

    • Federated results count towards total
    • Balance local vs. remote result proportions
    • Consider pagination implications
  3. API Load:

    • Frequent searches generate many remote API calls
    • Implement rate limiting if needed
    • Cache popular query results
    • Consider batch/async queries

Best Practices:

  1. Configuration:

    • Start with min_search_results = 10 (don’t always federate)
    • Use harvest_portal filtering to avoid duplicates
    • Set api_federation = false unless specifically needed
    • Configure source filtering for user control
  2. Remote Portal Selection:

    • Choose reliable, fast remote portals
    • Verify APIs are public and stable
    • Test thoroughly before production
    • Monitor remote portal health
  3. User Experience:

    • Clearly label remote results
    • Explain source of federated datasets
    • Provide source filtering options
    • Consider separate tabs for local/remote
  4. Data Quality:

    • Validate remote result formats
    • Handle missing metadata gracefully
    • Filter inappropriate/irrelevant results
    • Monitor result quality

Use Cases:

  1. Regional Data Portals:

    • City portal federates with state/national portals
    • Users discover relevant datasets from all levels
    • Seamless cross-jurisdiction search
  2. Thematic Networks:

    • Health data portal federates with related portals
    • Research data across institutions
    • Collaborative data ecosystems
  3. Partner Organizations:

    • Organization federates with partner portals
    • Shared data discovery
    • Collaborative projects

Development Status: Beta (4)

License: AGPL v3.0 or later

Keywords: CKAN, search, federation, distributed, remote, API

Related Extensions: - ckanext-harvest: Metadata harvesting - ckanext-spatial: Spatial search - ckanext-cloudstorage: Remote storage integration - ckanext-scheming: Schema compatibility

Plugins to configure (ckan.ini)
# searchfed=ckanext.searchfed.plugin:SearchfedPlugin
CKAN Settings (ckan.ini)
# ckan.search_federation.label = data.sa.gov.au # ckan.search_federation.extra_keys = harvest_portal search_federation_portal # ckan.search_federation.use_remote_facet_results = false # ckan.search_federation.min_search_results = 3 # ckan.search_federation.api_federation = false # ckan.search_federation.source_facet_field = vocab_source_portal # ckan.search_federation.source_extras_key = source_portal
DB migration to be executed
(not set)
<< back to Extensions