Extension Resource Type Validation


Extension Basics

Title
Resource Type Validation
Name
ckanext-resource-type-validation
Type
Public extension
Description
Stricter validation of resource formats ensuring file extension, contents, and format are compatible
CKAN versions
Download-Url (zip)
Download-Url commit date
2022-07-13
Url to repo
Category
Data Management & Quality


Background Infos

Description (long)
Show details

The resource-type-validation extension performs comprehensive validation of resource formats for uploaded files in CKAN. It ensures that the file extension, file contents (via magic/type sniffing), and declared resource format are all compatible with each other. This reduces workload on staff by preventing miscategorized files and provides better restrictions on allowed formats by running files through magic/type sniffing systems, ensuring invalid files can’t be uploaded by simply selecting a random format and changing the file extension. The extension supports whitelists of allowed file extensions and/or MIME types. Configuration file allows defining allowed_extensions, allowed_overrides (MIME type subtypes), equal_types (interchangeable types), archive_types (special handling for ZIP etc), generic_types (supertypes to prevent content-sniffing attacks), and extra_mimetypes (custom extension mappings). Developed by Queensland Government.

Version
0.0.1
Version release date
2022-07-13
Contact name
Queensland Online
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Requirements: - CKAN <= 2.8 (CKAN 2.9 compatibility not yet verified) - python-magic library for file type sniffing

Installation: 1. Activate CKAN virtualenv 2. Install extension: pip install -e git+https://github.com/qld-gov-au/ckanext-resource-type-validation.git#egg=ckanext-resource-type-validation 3. Install dependencies: pip install -r ckanext-resource-type-validation/requirements.txt

Add to ckan.plugins: resource_type_validation

Configuration:

Path to configuration file for file types (optional)

Default: ckanext/resource_type_validation/resources/resource_types.json

ckanext.resource_validation.types_file = /path/to/file.json

Support contact for error messages (optional)

ckanext.resource_validation.support_contact = webmaster@example.com

Whitelist of allowed MIME types (optional)

ckan.mimetypes_allowed = application/pdf,text/plain,text/xml

Configuration file structure (all optional):

  1. allowed_extensions: List of allowed file extensions (case-insensitive) Example: [“pdf”, “csv”, “json”, “xml”]

  2. allowed_overrides: MIME type subtype mappings Example: {“text/plain”: [“application/xml”, “text/”], “application/octet-stream”: [“”]}

    • application/xml is subtype of text/plain
    • Wildcards: “” for any type, “prefix/” for any with prefix
  3. equal_types: Lists of interchangeable types Example: [[“text/xml”, “application/xml”], [“text/csv”, “application/csv”]]

  4. archive_types: Types requiring special handling Example: [“application/zip”, “application/x-tar”, “application/gzip”]

    • Archives can specify any format (referring to contents)
    • Must be well-formed (extension and contents match)
  5. generic_types: Generic supertypes (prevents content-sniffing attacks) Example: [“text/plain”, “application/octet-stream”]

    • File with text/plain content can specify CSV extension/format
    • File with .txt extension cannot specify CSV format
    • Prevents browser-based content-sniffing attacks
  6. extra_mimetypes: Custom extension to MIME type mappings Example: {“.ttf”: “text/plain”, “.geojson”: “application/geo+json”}

Benefits: - Reduces staff workload fixing miscategorized files - Better format restrictions via type sniffing - Prevents invalid file uploads with fake extensions - Protects against content-sniffing attacks

Testing: python ckanext/resource_type_validation/test_mime_type_validation.py OR nosetests –ckan –with-pylons=test.ini ckanext/resource_type_validation

Plugins to configure (ckan.ini)
resource_type_validation
CKAN Settings (ckan.ini)
# ckanext.resource_validation.types_file = /path/to/file.json # ckanext.resource_validation.support_contact = webmaster@example.com # ckan.mimetypes_allowed = application/pdf,text/plain,text/xml
DB migration to be executed
(not set)
<< back to Extensions