This bite-sized post is the first in a series that digs into using Git effectively from within Gigantum. We start with the the most basic thing, which is importing an external Git repository (or "repo") with some data. Gigantum does a lot of Git automation under the hood. While that automation provides nice features like version control by default and the Activity Feed, naive inclusion of a Git repos in your project can lead to some hiccups! So how can we use a dataset that's published on GitHub?
This post is an overview for reviewers that are using Gigantum to inspect code for a manuscript.
Gigantum is a browser base application that integrates with Jupyter & RStudio to streamline the creation and sharing of reproducible work in Python & R.
Working exclusively in a single cloud isn't possible for most people, and that is not just because it is expensive. Real work requires significantly flexibility around deployment.
For example, sensitive data typically can't go in the cloud. Or maybe each of your three clients uses a different cloud, or maybe you spend significant time on a laptop.
It would be nice if things would "just work" wherever you want them to, but the barriers are many and large. Git & Docker skills are table stakes. Typos & hard coded variables rule the day. No matter how careful you are, stuff goes wrong. Maybe your collaborators don't have the same level of care and technical skill you do.
Who knows? The possibilities are endless.
Well, it used to be hard. There is a new container native system that moves reproducible work between machines (virtual or bare metal) with a few clicks.
No need to know Docker or Git. No need to be obsessive about best practices. No need to worry who is on what machine.
We will demo it here using Dask and DigitalOcean for context. In the demo we:
- Create a 32-core Droplet (i.e. instance) on Digital Ocean
- Install the open source Gigantum Client on the Droplet
- Import a Dask Project from Gigantum Hub and run it
- Sync your work to Gigantum Hub to save it for later.
At Gigantum, we are building an open-source tool for developing, executing, and sharing data science projects that automates the creation of versioned and containerized code. This way your work is always accessible, reproducible, and transparent. Our ultimate goal is to make science and data science more efficient and reproducible, and we want people to directly access and build on each other’s work without all of the technical hassles. You can learn more about Gigantum, try the Client in the cloud, or download and install it locally at our website: https://gigantum.com