This article is the first in the “Machine learning in the home office” series. Read Part 2 for what’s important and what you need to consider. Part 3 tackles the practical how-to’s and best-practices.
The recent trends in machine learning praise the general belief that doing ML without access to extremely sophisticated hardware equipment is impossible and that we need infrastructure as powerful as the ones provided by the various cloud actors (AWS, Google, Azure) to be serious in the field.
This is certainly true for particular tasks, but not all ML training involves datasets as big as the “Web” or targets such as training Alpha-go, yet providing products with cutting-edge technologies.
Why raise this issue?
An immediate answer is the rather world-wide singular years that were 2020/2021, during which many have worked from remotely rather extensively. “Machine learners” need to be comfortable at home developing their models and their code. With the technology we have today, I am still not aware of a better way for checking the correctness of a program than running it and analysing its behaviour. For machine learners, this would mean running the training and data pipelines on various datasets, and preferably on datasets that are representative of the learning task. Accelerating factors in developments are not very different for machine learners and developers, and some of them could be:
- Reducing the number of context switches: Running the pipelines should be well integrated in the development process and should ideally not require a change of technology or infrastructure,
- Reducing the number of tackled technologies: Especially partitioning the space of skill sets between developers should preferably be avoided. Machine learners should be able to go as far as possible with the development and analysis of their models. This reduces communication overhead, decreases the overall number of steps and promotes an iterative development process.
- Increasing the precision and effectiveness of the communication, including the various review processes. This is especially important in the remote-work set up, and it involves not only code, but also a shift in the discussion towards mathematical models, the design and architecture of pipelines, and the appreciation of the learning curves.
There is an element that is however departing from a pure software development process: The ability to run an ML pipeline as part of a development, and this usually requires processing power, storage as well as other types of equipment.
A possible solution would be to go “full cloud”: The home equipment is merely used as a remote control for developing and processing on cloud machines, there is no context switch. However, this approach brings new technologies that collide with the main focus -- the development of ML -- does not improve the communication (rather the contrary), and usually makes the development more difficult as the tools give a less native experience. Last and above all: this is extremely expensive!
Besides and apart from a few exceptions, it should be noted that the processing frameworks are ubiquitously and widely available: We have today access to extremely powerful programming languages (Python, Lua, Go, C++, typescript) and tools (TensorFlow, PyTorch, OpenCV, Spicy, etc) for free. We can do so much for just … $0. So why would the development of machine learning pipelines be costly?
At some point in the product-development cycle, infrastructures and services on the cloud will be used for product deployment and consumer reach. The questions to ask are “What are the parts we can develop and run locally?” and “What are the benefits?”.
In the coming posts, we will share approaches for cost-benefit analyses and some practical how-tos when it comes to implementing ML at home. Thus, we will address the following questions:
- Is home office compatible with machine learning?
- Is the cloud a requirement for developing ML pipelines?
- What practices and methodologies work best when doing ML from home?
- What are the monetary levers when considering ML at home vs. ML in the cloud?
Read more on what you need to consider when thinking of and running machine learning at home in Part 2 of this article.
Want to learn more? Subscribe to Reasonal's newsletter and receive the next posts as well as exclusive content about AI, productivity, and more.