Testing deep learning neural networks on public datasets is fun, but it’s usually on unseen data that you can really see how the published techniques really perform.
Recently, I was trying to detect human faces on Game of Thrones footage. I was surprised to see that the most widely used techniques didn’t fare really well.
I first tried OpenCV HaaR Cascade detector, then Dlib HOG frontal face detector. In both cases, it worked well only in ideal cases (frontal face, no weird lighting, no occlusion). In a real-world scenario, this means useless.
I then came across this link that advertised OpenCV DNN Face detector as performant. The result was better than OpenCV HaaR Cascade detector and Dlib HOG frontal face detector, but still not really good on my Game of thrones footage. I then understood that yes, human face detection is an academically solved problem on many datasets, but many of those datasets do not reflect yet the true “in-the-wild diversity” you can have in many footages.
I then googled around to see what really is the state of the art for human face detection in 2019. I finally came across this repo and their RetinaFace network, but they didn’t provide any Dockerfile so it was a bit of a pain to install and run. I made the Dockerfile, made some tests, and the results are outstanding! On my Game of Thrones footage, their RetinaFace network performs really well, even for human faces with odd angles, occlusion, and poor lighting.
I set up a Github repository (https://github.com/francoisruty/fruty_face-detection), with a ready-made Dockerfile, short instructions and the neural network pre-trained weights. Feel free to use it to test the RetinaFace network in 5min max, on any footage!