Extension Resource Indexer


Extension Basics

Title
Resource Indexer
Name
ckanext-resource_indexer
Type
Public extension
Description
Index resource file content in dataset search
CKAN versions

~2.9, ~2.10, ~2.11

Show details
Download-Url (zip)
Download-Url commit date
2025-02-10
Url to repo
Category
Specialized Tools


Background Infos

Description (long)
Show details

Discover more results in dataset search by searching through the content of resources. This extension indexes the content of files attached to resources, giving users more chances to find relevant results when using site search. The indexation process can be customized for each file format via resource indexers. Supports plain text, PDF, and JSON out of the box.

Version
0.4.3
Version release date
2025-02-10
Contact name
Sergey Motornyuk
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Installation:

pip install ckanext-resource-indexer

Add resource_indexer to plugins

Optional built-in indexers: plain_resource_indexer, pdf_resource_indexer, json_resource_indexer

Configuration:

ckanext.resource_indexer.allow_remote = 1 # Index remote files (default: false)

ckanext.resource_indexer.remote_timeout = 10 # Download timeout in seconds (default: 2)

ckanext.resource_indexer.max_remote_size = 4 # Max remote file size in MB (default: 4)

ckanext.resource_indexer.indexable_formats = txt pdf # Formats to index (lowercase)

ckanext.resoruce_indexer.index_field = extras_res_attachment # Index field (default: text)

ckanext.resoruce_indexer.search_boost = 0.5 # Boost matches by content (default: 1)

Plain indexer config:

ckanext.resource_indexer.plain.indexable_formats = xml txt csv # Default: txt csv json yaml yml html

PDF indexer config:

pip install ‘ckanext-resource-indexer[pdf]’ or pip install pdftotext

Install system packages: poppler, poppler-utils, poppler-cpp-devel (CentOS) or libpoppler-cpp-dev (Debian) or poppler (macOS)

ckanext.resoruce_indexer.pdf.page_processor = custom.module:value_processor # Preprocess page text

JSON indexer config:

ckanext.resoruce_indexer.json.add_as_plain = true # Index as plain text too (default: false)

ckanext.resoruce_indexer.json.key_processor = custom.module:key_processor # Preprocess keys

ckanext.resoruce_indexer.json.value_processor = custom.module:value_processor # Preprocess values

Disable indexation:

Set environment variable CKANEXT_RESOURCE_INDEXER_BYPASS or use context manager disabled_indexation()

Plugins to configure (ckan.ini)
resource_indexer
CKAN Settings (ckan.ini)
# ckanext.resource_indexer.allow_remote = 1 # ckanext.resource_indexer.remote_timeout = 10 # ckanext.resource_indexer.max_remote_size = 4 # ckanext.resource_indexer.indexable_formats = txt pdf # ckanext.resoruce_indexer.index_field = extras_res_attachment # ckanext.resoruce_indexer.search_boost = 0.5 # ckanext.resource_indexer.plain.indexable_formats = xml txt csv # ckanext.resoruce_indexer.pdf.page_processor = custom.module:value_processor # ckanext.resoruce_indexer.json.add_as_plain = true # ckanext.resoruce_indexer.json.key_processor = custom.module:key_processor # ckanext.resoruce_indexer.json.value_processor = custom.module:value_processor
DB migration to be executed
(not set)
<< back to Extensions