This paper outlines a new system for managing data science work & collaboration across machines. The system provides the self-determination & flexibility for open source software and approaches necessary for innovation while eliminating the chaos that is typical of "do it yourself" systems.
In particular, the paper outlines the challenges of scaling team data science and introduces a low cost, turnkey framework that provides:
- Easy deployment on premises or in the cloud
- Reduced IT effort for provisioning & managing infrastructure and environments
- One click transfer of customized & reproducible Python or R work between machines & locations
- Automation and streamlining for versioning, containerization and best practices.