At Gigantum, we are building an open-source tool for developing, executing, and sharing data science projects that automates the creation of versioned and containerized code. This way your work is always accessible, reproducible, and transparent. Our ultimate goal is to make science and data science more efficient and reproducible, and we want people to directly access and build on each other’s work without all of the technical hassles. You can learn more about Gigantum, try the Client in the cloud, or download and install it locally at our website: https://gigantum.com
A core concept of the platform is the Gigantum Project. Projects bundle data, code, and environment configuration into an augmented repository that is automatically managed by the Gigantum Client. Projects can be created from scratch, imported as a file, or shared via Gigantum Cloud, and each one contains a granular history of changes to data, code, and environment. This high resolution history is accessible through the Activity Feed, which is a visual record of figures and searchable text that lets you find and inspect everything that has been done by every person that has worked on the Project.
When we started, we knew that we wanted to leverage Git to version changes to Projects because of Git’s distributed design and maturity. The main issue we had was that in order to capture rich metadata for things like the Gigantum Client’s Activity Feed, we needed to store additional metadata (e.g. figure thumbnails, code snippets, tags) with each commit, which was impractical to store in Git directly.
These changes are captured by Gigantum and processed to extract metadata. The client is able to collect this extra data due to a tight integration with Jupyter, which will be discussed in another post. In this case, it’s a thumbnail and the code snippet that was executed. These are written to the embedded datastore and keys are collected.