Blog

Data from an External Git Repo in Gigantum

Posted by Dav Clark on Aug 7, 2020 4:45:35 PM

This bite-sized post is the first in a series that digs into using Git effectively from within Gigantum. We start with the the most basic thing, which is importing an external Git repository (or "repo") with some data. Gigantum does a lot of Git automation under the hood. While that automation provides nice features like version control by default and the Activity Feed, naive inclusion of a Git repos in your project can lead to some hiccups! So how can we use a dataset that's published on GitHub?

Read More

Topics: Data Science, Open Science, Git

Webinar Recap: Data Science 2.0 and Scaling Remote Teams

Posted by Tyler Whitehouse on Jun 30, 2020 3:09:15 PM

This recaps our first webinar of June 23, 2020. It was fun and we wanted to give access to the video.

The webinar demoed creating portable and reproducible work in Jupyter and RStudio, as well as an easy system for transferring work between CPU and GPU resources. It further explained why decentralization, not centralization, is best for collaboration and productivity in data science.  The current remote work situation makes this decentralized approach even more critical.

In the webinar Dean (CTO) and Tyler (CEO):

  • Outlined the technical problems of collaboration and managing data science work;
  • Related this problem to cost and productivity concerns;
  • Explained "centralized vs decentralized" and why decentralization is better;
  • Explained how local automation can make decentralization robust & scalable;
  • Demonstrated Gigantum's Client + Hub model for scaling collaboration and productivity.

Decentralization means letting data scientists work across resources in a self-service fashion. For us, it also means container native, not just cloud native. It is that simple.

The key to decentralization is automation and a UI at the local level, not as a monolithic, managed cloud  service. We call this "Self Service SaaS", which is sort of a silly phrase but captures what we mean.

Self Service SaaS takes the good parts of the SaaS experience, i.e. nice UIs and automation around difficult tasks, and eliminates the bad parts, i.e. zero control over deployment and everything that entails.

Check out the video and let us know what you think. We love to talk about this stuff and we want to hear your story and your problems. You can watch by filling out the form below.

Read More

Topics: Data Science, Containers, Git, Jupyter, RStudio

Extending Git Commit Metadata In Gigantum

Posted by Dean Kleissas - Co-founder and CTO at Gigantum on Jul 20, 2018 12:27:00 PM

At Gigantum, we are building an open-source tool for developing, executing, and sharing data science projects that automates the creation of versioned and containerized code. This way your work is always accessible, reproducible, and transparent. Our ultimate goal is to make science and data science more efficient and reproducible, and we want people to directly access and build on each other’s work without all of the technical hassles. You can learn more about Gigantum, try the Client in the cloud, or download and install it locally at our website: https://gigantum.com

Read More

Topics: Open Science, Git, Software