Below, we’ll sketch out a smart approach for using lots of CPU cores without breaking the bank: using your laptop when feasible along with a DIY approach to working on bigger cloud resources as needed. We’ll use Gigantum to automate Git and Docker, along with most details of our cloud environment. With the following approach, you can be up and running Dask on 32 CPU cores on DigitalOcean in about 10 minutes - look at those tasks fly in parallel!
Computational reproducibility should be trivial but it is not. Though code and data are increasingly shared, the community has realised that many other factors affect reproducibility, a typical example of which is the difficulty in reconstructing a work’s original library dependencies and software versions. The required level of detail documenting such aspects scales with the complexity of the problem, making the creation of user-friendly solutions very challenging.
Today, we present Gigantum, an open source platform for creating and collaborating on computational and analytic work, complete with:
- Automated, high-resolution versioning of code, data and environment for reproducibility and rollback
- Work and version history illustrated in a browsable activity feed
- Streamlined environment management with customization via Docker snippets
- One-click transfer between laptop and cloud for easy sharing
- Seamless integration with development environments such as JupyterLab