Traditional face recognition algorithms are based on local features. The algorithm first detected a series of relatively invariant points in the face (fiducial points: corners of the eyes and mouth, nostrils, etc.). Typically the number of points was between 10 and 300. Then, usually after aligning the face, the algorithm extracted information of interest from these local regions, one per point, using certain visual features (e.g. wavelets, HOG, SIFT, SURF... or variations of them). The different pieces of information were concatenated into a single feature, and optionally its dimension was reduced to make it easier to store and fast to compare to other features.
In that kind of algorithms, the number of points became synonymous with the accuracy of the prediction. Which is not entirely true: if the points are well-aligned, more points generally give more accuracy for a given method, but the accuracy is extremely dependant on the particular feature and classifier.
Deep learning algorithms, which are now state-of-the-art on most computer vision applications, work different. They apply banks of convolutional and non-linear filters repeatedly over an original image. Each layer of application processes the image and extracts higher-order information. After many layers of these filter banks (typically between tens and hundreds), faces are encoded directly into small templates which are very fast to compare, and yield much more accurate results.
The interesting thing about deep learning is that the way to extract visual features is not manually defined, as before, but it is optimally learned by the network itself during training. So all the processes of face alignment/frontalization, localization of interesting regions, etc. are done internally by the network. You do not need to tell the algorithm where the interesting points are, nor how to extract the information, as it learns by itself.
Deep learning is a branch of machine learning. It is particularly suited for certain learning tasks, as it tends to scale accuracy and generalization with the training data (thus benefiting from large amounts of it), and it automatically learns the best internal representations of data that optimize a learning goal, as opposed to some traditional learning techniques that required manual handcrafting of such representations.
Use of NVIDIA GPUs
NVIDIA GPUs are ideal for training deep neural networks, speeding a process that could otherwise take a year or more to just weeks or days. That’s because GPUs perform many calculations at once—or in parallel. And once a system is “trained,” with GPUs, scientists and researchers can put that learning to work. That work involves tasks once thought impossible.
GPUs are used to train these deep neural networks using far larger training sets, in an order of magnitude less time, using far less datacenter infrastructure. GPUs are also being used to run these trained machine learning models to do classification and prediction in the cloud, supporting far more data volume and throughput with less power and infrastructure.
Early adopters of GPU accelerators for machine learning include many of the largest web and social media companies, along with top tier research institutions in data science and machine learning. With thousands of computational cores and 10-100x application throughput compared to CPUs alone, GPUs have become the processor of choice for processing big data for data scientists.
In 1999 NVIDIA sparked the growth of the PC gaming market, redefined modern computer graphics, and revolutionized parallel computing. More recently, GPU deep learning ignited modern AI — the next era of computing — with the GPU acting as the brain of computers, robots, and self-driving cars that can perceive and understand the world.