What matters is that edge computing is booming. There is growing interest by vendors, and ample coverage, for good reason. Although the definition of what constitutes edge computing is a bit fuzzy, the idea is simple. It’s about taking compute out of the data center, and bringing it as close to where the action is as possible. Whether it’s stand-alone IoT sensors, devices of all kinds, drones, or autonomous vehicles, there’s one thing in common. Increasingly, data generated at the edge are used to feed applications powered by machine learning models. There’s just one problem: machine learning models were never designed to be deployed at the edge. Not until now, at least. Enter TinyML. Tiny machine learning (TinyML) is broadly defined as a fast growing field of machine learning technologies and applications including hardware, algorithms and software capable of performing on-device sensor data analytics at extremely low power, typically in the mW range and below, and hence enabling a variety of always-on use-cases and targeting battery operated devices. This week, the inaugural TinyML EMEA Technical Forum is taking place, and it was a good opportunity to discuss with some key people in this domain. ZDNet caught up with Evgeni Gousev from Qualcomm, Blair Newman from Neuton, and Pete Warden from Google.

Hey Google

Pete Warden wrote the world’s only mustache detection image processing algorithm. He also was the founder and CTO of startup Jetpac. He raised a Series A from Khosla Ventures, built a technical team, and created a unique data product that analyzed the pixel data of over 140 million photos from Instagram and turned them into in-depth guides for more than 5,000 cities around the world. Jetpac was acquired by Google in 2014, and Warden has been a Google Staff Research Engineer since. Back then, Warden was feeling pretty good about himself for being able to fit machine learning models in two megabytes. That was until he found some of his new Google colleagues had a 13 kilobyte model that they were using to recognize wake words running on always on DSP on Android devices. That way the main CPU wasn’t burning battery listening out for “that” wake word – Hey Google. “That really blew my mind, the fact that you could do something actually really useful in that smaller model. And it really got me thinking about all of the other applications that might be possible if we can run especially all these new machine learning, deep learning approaches” said Warden. Although Warden is oftentimes credited by his peers as having kickstarted the TinyML subdomain of machine learning, he is quite modest about it. Much of what he did, he acknowledges, was based off things others were already working on: “A lot of my contribution has been helping publicize and document a bunch of these engineering practices that have emerged,” he said.

TinyML, exponential growth

Evgeni Gousev came to the US from Russia more than 25 years ago for a short visit. He never planned to stay, but today he is here, serving as a Senior Director at Qualcomm. Gousev has a background in physics and stints in academia as well as IBM before Qualcomm. Gousev met Warden in 2018, and he describes seeing what was possible using the techniques Warden was working on as an eye opening experience: Gousev and Warren teamed up, and they started thinking about spreading the word on what they felt had enormous potential. They wanted to create an ecosystem around TinyML. They started calling colleagues and friends, sharing and socializing their ideas. The first step was to organize a TinyML session at the Google campus. Initially, they were worried whether they could get 30 people in the room. After a couple of months, the event was already oversubscribed at 200 people. From that point on, it’s been pretty much exponential growth, and TinyML events have reached the 5000 participants mark already.

Bringing services to the fingertips of your customers

The TinyML Foundation was set up, and the inaugural TinyML Summit in March 2019 showed very strong interest from the community with active expert participation from 90 companies. It showed that TinyML-capable hardware is becoming “good enough” for many commercial applications and new architectures (e.g. in-memory compute) are on the horizon. It also showed that significant progress on algorithms, networks and models down to 100kB and below is being made, and showcased some initial low power applications in the vision and audio space. There are a couple of points worth emphasizing here. First, the working definition of what constitutes TinyML was, and to some extent still is, debated. What matters is how devices can be deployed in the field and how they’re going to perform, said Gousev. That will be different depending on the device and the use case, but the point is being always on and not having to change batteries every week. That can only happen in the mW range and below. In order to be successful, you need to be able to bring your services to the fingertips of your customers. This is how Blair Newman put it, epitomizing Neuton’s approach. Newman, another seasoned expert with stints in Sun Microsystems, Verizon and T-System, has been serving as the CTO of Neuton since 2015. Back in 2018, Neuton caused a splash by announcing a neural network framework claiming to be far more effective than any other framework and non-neural algorithm available on the market. Neuton’s secret sauce is in how the network architecture is created, and this is what makes it relevant for TinyML, too, said Newman:

TinyML is just getting started

Newman referred to Neuton’s proprietary approach as building the network architecture neuron by neuron, resulting in something which is not only optimized for performance, but also for size. They can be up to 100 times smaller, he went on to add, and the goal is to be able to deploy the same machine learning model in the data center and at the edge. TinyML overall, however, has benefited greatly from the work of many people in the industry, as well as in research, with techniques such as binary networks, quantization, pruning, clustering, and distillation. The goal of the TinyML institution is to be open and inclusive, and produce results available to everyone. In conclusion, everyone agreed we’re only beginning to scratch the surface of what is possible with TinyML. Even though today TinyML is used only for the inference part of machine learning, Gousev, Newman and Warden believe that in a timeline of 5 years we’ll start seeing training on the edge too. Techniques such as federated learning will be increasingly important there. Gousev, Newman and Warden are all seasoned experts, with skills and backgrounds that go beyond engineering. They all agreed that the caveats applying to any machine learning application, such as data quality and quantity, building data pipelines, and having the required expertise in the organization, apply to TinyML, too. Whether it’s monitoring beehives in Kenya or building smart factories, in the end, applications will go above and beyond what technologists can imagine. That’s what happened before with foundational technology, and that’s what people who believe in TinyML see happening here, too.