Tech from the Front Line

Deep Learning in the Real World: How to Deal with Non-differentiable Loss Functions

Over the past few years, deep learning has been taking by storm many industries. From voice recognition to image analysis and synthesis, neural networks have turned out to be very efficient at solving a vast number of problems. Concretely, if you can define your problem as the minimization of a differentiable objective function, there is a very good chance deep learning can do something for you. Neural nets will first do badly, and as the error is backpropagated from objective function through all the layers of your neural network, the optimizer tweaks the neural network weights, until the error for your problem is minimized, at least locally.

Problem is, can you always frame your problem as the minimization of a differentiable function? In a lot of real-world scenarios, there is no obvious answer to this question. Here are a few examples:

 

 

 

In all those cases, it’s not hard to find what to minimize, but the amount of interest usually cannot be computed by a nice differentiable python function that you can just plug in your tensorflow or pytorch training loop. This doesn’t mean your objective function is not differentiable in theory (although it’s possible), it just means that you don’t have a gradient computation mechanism in place.

So, what are the strategies available?

 

 

 

 

I personally think those 4 techniques are going to fuel the deep learning revolution in coming years, as people try to see how they can embed neural networks in more and more real-world operations and business-specific software.

Mathematics is very often about establishing theoretical theorems and results, and then finding ways to frame real-world problems in a way compatible with those theorems. When people first started to work on neural networks decades ago, a very powerful theorem was established: the universal approximation theorem, which basically states that any continuous function (so basically almost any phenomenon in nature and life) can be approximated by a neural network. Deep learning scientists then stacked more and more neurons layers to improve the power of neural networks, and train them to do ever more impressive things, always under the assumption that the objective function to minimize has to be differentiable (else backpropagation is not possible).

To many, this limitation is severe (many problems are not differentiable per se), but this is missing the point: differentiability is indeed an hypothesis on which deep learning methods are built, but then it’s up to clever engineers to frame their problems as differentiable problems. We all know planets are not perfect spheres, yet NASA and ESA have been using for decades approximated models to successfully compute gravitational interactions and land spacecrafts off-world.

To conclude, never forget that theoretical assumptions and hypothesis are not limitations, they’re a framework for clever engineers to frame real-world problems, and get results. As Galileo said, “The book of nature is written in the language of Mathematic”.