A review of the ML/DS development ecosystem: From a Software Engineers perspective

Photo by Charles Deluvio on Unsplash

FYI: Everything I talk about concerning Machine Learning is also true of any Data Science work; while a Data Scientist’s work might be focussed internally, if it’s used to make decisions for the business then it should be reliable and dependable.

Software development has come a long way in the last 30 years, with thousands of tools being developed to aid in the complex task of writing code. While many of these tools have faded into the dust, there are some, such as git, which has become a staple of software development worldwide. Many of these tools focused on abstracting…

Running large-scale ETL Jobs without an army of developers behind you

Photo by Pietro Jeng on Unsplash

ETL — or Extract, Transform, Load — is a common pattern for processing incoming data. It allows efficient use of resources by bunching the “transform” into a single bulk operation, often making it far easier to develop and maintain than its stream processing counterpart. It also lends itself well to one-off investigations into datasets, where the user writes some custom code to perform some analysis over the dataset, exporting some results to be utilized later on. This common pattern underpins many data science explorations, but in my experience, I have often found implementing it clunky and inefficient. …

Robbie Anderson

Cloud Engineer. Building tools and supporting Machine Learning + Data Science workflows.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store