Data Science on Ad-Hoc Infrastructure

Posted by Ken Sanford & Tyler Whitehouse on Sep 17, 2020 1:35:29 PM

Wide Together

Does your data science platform check the following boxes?

  1. Runs across Hybrid Infrastructure (on-premise/cloud/laptops)
  2. Enables Decentralized Collaboration and Reproducibility
  3. Deploys across Multiple Clouds (Azure, AWS, GCP & Digital Ocean)

If not then give Gigantum a test drive. We skipped cloud native and went right to container native. This makes the three things above easy, cost effective and fun!

Container Native Beats Cloud Native

Enterprise data scientists use the infrastructure they get, not the one they choose. This creates pains for resource competition, outdated tools, and governance. Remote work further complicates things through new routines & broken deployments.

As tightening budgets cut into SaaS purchases, small data science teams face tough working conditions across ad-hoc and hybrid resources, from laptops to clouds.

Cloud native is the rage for big budget teams, but direct containerization is the way to go for smaller teams that can't throw everything into a cloud platform. The problem is that containerization requires time & skill, and this makes scaling it next to impossible.

However, the right kind of automation can change the game.

Gigantum is an automated, container native approach that deploys easily across virtual machines & bare metal. It handles the hard stuff so you can work & collaborate across hybrid infrastructure without sweating the technical details. 

Get Collaboration Through Reproducibility

how it works

Sharing, auditing & running code is critical for collaboration but takes skill & diligence to do correctly. Mistakes get made, and they get made often.

Reproducibility is a robust and transformative approach to collaboration but the problem is that cost effective reproducibility is limited and very stove-piped.

Data scientists must switch back & forth between packages and 3rd party services for parameter logging, data versioning & model management. Too many moving parts means stuff breaks, often. Making things worse, stuff always breaks when dealing with environments.  

To go beyond lower budget, limited approaches, you either need to adopt complicated developer processes OR pay up to $100,000 for centralized SaaS environments.

Things are changing though. New approaches provide integrated reproducibility that greatly scales collaboration.

For example, Gigantum's reproducibility by default lets users author in their own environment, never lose any work, and seamlessly share & collaborate with people on different machines without lifting a finger.

Multi-Cloud Defrays Costs & Vendor Lock

While IT and management pursue solutions for multi-cloud deployments as a way to decrease costs and reduce vendor lock, data science teams don't seem to have a strategy. Platform vendors continue throwing single cloud SaaS as bait, and data science teams keep biting.  

People that have adopted single cloud SaaS platforms know that it is both expensive and problematic for governance and privacy. Attempts to lower cloud costs or pull out workloads fail, leaving you at the vendor's mercy for assistance.

Regardless of whether the platform charges you for compute or provides cost pass through to the cloud vendor, the lack of portability is the basic problem with using single cloud SaaS platforms. Stuff gets captured in a platform or a single cloud and is basically locked there FOREVER. 

As a result, you enter into a relationship with a vendor or cloud provider that seems impossible to get out of.

Systems providing portability allow you to work across different clouds and locations, letting you optimize costs and governance without needing help from the vendor.

Gigantum achieves this by going container native instead of being content with cloud native. It was a lot harder to do, but the results are worth it because we don't hard code anything onto resources or lock you into a single cloud.

Everything is portable, all of the time.

Why We Are Different from the Rest

Our mission is to help you work wherever you like while providing you with seamless reproducibility & collaboration that beats single cloud SaaS. Furthermore, we never hit you with seat licenses or charge you for cloud resources. Believe it or not, we actually want to help you save money because we've been there.

  • Want to move reproducible work between different clouds? No problem
  • Data and code can't go in the cloud and need to be used on-prem? Easy
  • Intrigued by Nvidia RAPIDS? Cool. Get it up and running wherever you want.

Curious?

To learn more, follow us on Twitter or check out the stuff below.

Topics: Data Science, Multi-Cloud