Extension Git-based Dataset Storage


Extension Basics

Title
Git-based Dataset Storage
Name
ckanext-gitdatahub
Type
Public extension
Description
Git-based metadata storage system with version control, branching, and distributed dataset management.
CKAN versions
Download-Url (zip)
Download-Url commit date
2020-06-02
Url to repo
Category
Cloud Infrastructure & Storage


Background Infos

Description (long)
Show details

The Git-based Dataset Storage extension transforms CKAN into a distributed data management system by implementing Git-based storage for dataset metadata, enabling advanced version control, collaborative editing, branching workflows, and distributed synchronization capabilities for sophisticated data governance and collaborative data management. This innovative extension stores all dataset metadata in Git repositories, providing full version history, branch management, and merge capabilities that enable collaborative dataset development with conflict resolution and change tracking. The system supports distributed CKAN instances that can synchronize metadata changes through Git push/pull operations, enabling federated data management across multiple organizations or geographic locations. Advanced features include metadata branching for experimental changes, pull request workflows for collaborative editing, and automated synchronization between CKAN instances through Git hooks and API integration. The extension provides comprehensive change tracking with detailed commit histories, author attribution, and rollback capabilities for robust metadata management. Administrative tools include repository management interfaces, branch visualization, conflict resolution workflows, and automated backup systems through Git’s distributed nature. Integration capabilities extend to external Git services like GitHub, GitLab, and Bitbucket for enterprise-grade collaboration and backup infrastructure. The system supports custom metadata schemas with Git-based storage, enabling complex metadata evolution and schema migration through version control mechanisms. Performance optimizations include efficient Git operations, metadata caching, and optimized synchronization protocols for large-scale deployments. Essential for organizations requiring collaborative metadata management, distributed CKAN deployments across multiple sites, research consortiums with shared data governance, and installations where version control, change attribution, and distributed resilience are critical for maintaining data integrity and enabling collaborative data stewardship across organizational boundaries.

Version
Latest
Version release date
2020-06-02
Contact name
Datopian Team
Contakt email
Contact Url
(not set)


Installation Guide

Configuration hints

Implements Git-based storage for dataset metadata with version control

Plugins to configure (ckan.ini)
gitdatahub
CKAN Settings (ckan.ini)
# ckanext.gitdatahub.git_repo_url = https://github.com/your-org/metadata-repo.git # ckanext.gitdatahub.git_user_name = CKAN System # ckanext.gitdatahub.git_user_email = ckan@example.com # ckanext.gitdatahub.auto_push = true # ckanext.gitdatahub.branch_name = main # ckanext.gitdatahub.enable_webhooks = true
DB migration to be executed
gitdatahub initdb
<< back to Extensions