Configuration hints |
Installation:
pip install ckanext-resource-indexer
Add resource_indexer to plugins
Optional built-in indexers: plain_resource_indexer, pdf_resource_indexer, json_resource_indexer
Configuration:
ckanext.resource_indexer.allow_remote = 1 # Index remote files (default: false)
ckanext.resource_indexer.remote_timeout = 10 # Download timeout in seconds (default: 2)
ckanext.resource_indexer.max_remote_size = 4 # Max remote file size in MB (default: 4)
ckanext.resource_indexer.indexable_formats = txt pdf # Formats to index (lowercase)
ckanext.resoruce_indexer.index_field = extras_res_attachment # Index field (default: text)
ckanext.resoruce_indexer.search_boost = 0.5 # Boost matches by content (default: 1)
Plain indexer config:
ckanext.resource_indexer.plain.indexable_formats = xml txt csv # Default: txt csv json yaml yml html
PDF indexer config:
pip install ‘ckanext-resource-indexer[pdf]’ or pip install pdftotext
Install system packages: poppler, poppler-utils, poppler-cpp-devel (CentOS) or libpoppler-cpp-dev (Debian) or poppler (macOS)
ckanext.resoruce_indexer.pdf.page_processor = custom.module:value_processor # Preprocess page text
JSON indexer config:
ckanext.resoruce_indexer.json.add_as_plain = true # Index as plain text too (default: false)
ckanext.resoruce_indexer.json.key_processor = custom.module:key_processor # Preprocess keys
ckanext.resoruce_indexer.json.value_processor = custom.module:value_processor # Preprocess values
Disable indexation:
Set environment variable CKANEXT_RESOURCE_INDEXER_BYPASS or use context manager disabled_indexation()
|