Solving 3 Big Technical Questions of Data Science

Posted by Ken Sanford on Apr 2, 2021 11:45:16 AM

Data scientists waste a lot of time.

Data science is rife with technical challenges. Many of these problems have nothing to do with the job of a data scientist. Processing speed, sharing memory, security issues, package management, etc are all part of data science considerations but are not core to what we do. They are merely necessary steps along the way to doing productive data work. Solve these issues and data scientists immediately have more time to work.


The technical challenges of doing data science are real and the solution up until today has been to centralize everything. This doesn’t have to be the only future. We have developed a new way to solve the three big technical questions of data science. Keep reading to figure out how we did it.  

1. How to customize machines?

How much time do you waste building a new data science machine? IT gives you a VM (or a new laptop) and it is your responsibility to install R or Python. Then you have to bring in a Git repo and connect to datasets. And this has to happen every time you need a new environment. Sometimes for each project. 

How much time would you save if this step was eliminated forever?

2. How to share work with colleagues?

Git is the best way to share code. But data science is more than Python or R code. Data science is the environment/packages where work was done. Data science needs data. And of course, data science is the script as well. Git is great for sharing code but insufficient for sharing data science work. 

We have developed a new way. One that combines Code, Environments and Data in a Git-style repository that can be checked-in, checked-out and branched and forked.

What if you could share work without friction? 

3. How to Move Work Across Machines?

Zombie VM’s are everywhere. Why? It is because data science projects cannot easily be moved from one machine to another. And once useful insights are produced there is a strong disincentive to ever destroy that machine. But this costs money and creates confusion. Why not begin each data science project with the flexibility to tear down that machine?

Wouldn’t it be nice to be able move data science projects between machines easily?

Gigantum was designed to remove these three technical hurdles of data science. Never again will data scientists have to do the technical nonsense of data science. If you are interested in seeing how Gigantum works and how it makes your life easier, explore our online trial.

Topics: Data Science, Jupyter, Multi-Cloud, Hybrid