Face recognition and comparison for onboarding

Published on: 2024-08-10 18:48:28

Face recognition has changed fast in recent years. Deep learning drove that shift. It made remote identity verification practical. Face comparison against identity documents, which used to be manual, is now moving online.

Humans are good at recognizing and comparing faces because the brain has areas built for that task. Beyond face recognition, the brain also uses other signals to identify people. These include clothes, gender, location, context, and how someone walks or moves.

Getting artificial intelligence to reach similar results in full-person recognition would require multiple methods, so here we focus on the narrower problem of face recognition. Running face recognition online requires a process for validating inputs, running models, and making decisions from the data collected.

Photos used for decision making in face recognition usually fall into these types:

Simple user-uploaded photo,
Selfie taken during the recognition process,
Photo captured during the liveness detection process.

The person's photo is then matched against another source. That source may be a government service such as NCIIC in China or Dukcapil in Indonesia, or a photo from an identity document such as an ID card, driving license, or passport.

Data input validation

Main methods for validating the authenticity of digital photos are:

Error level analysis (more),
EXIF metadata analysis (more),
Last saved quality.

Only two of these can usually be automated well without creating too many false positives. Error level analysis is limited because someone often still needs to inspect the result. It works more as a visual tool. It also misses some manipulations that are simple but effective. For example, a screenshot of a manipulated photo will often not be flagged by ELA.

Metadata analysis provides useful information, including the camera used, timestamps, the location of objects in the photo, and sometimes even geolocation. That helps when you need to confirm the photo was taken in the right place, such as a point of sale, was taken recently, or was not edited in Photoshop or other software. If metadata is stripped, treat that as a warning sign. If metadata is missing from every photo, check with your developers how the photos are captured, processed, and stored.

Last saved quality is tied to compression, because stored or edited photos are often compressed and no longer kept at the original quality produced by the device.

Many fake-photo risks can be reduced by using liveness detection in the process. That usually means a mobile app runs liveness detection algorithms and captures a photo during the flow, which is then sent for recognition. At that point, the likely attack vector is the API that uploads photo to the server. To reduce that risk, use more than standard hashing and encryption. Additional hardening methods can make API endpoints harder to abuse with fake data.

Running comparison

Once the incoming data is trustworthy, the next step is comparison. Government services often perform well. As for building your own model, that now makes little sense in most cases because deep learning models need large amounts of training data, and third-party services are already advanced and relatively cheap.

You can select a service on several dimensions: price, speed, and comparison quality. It is also worth checking performance across the racial profile of the people being compared. For example, Microsoft services often perform well on caucasian faces, but can perform poorly on Asian faces. For Asian faces, I have seen strong results from Face++. On caucasian faces, those same services can sometimes miss finer facial detail.

In most cases, I recommend using two services for face recognition. One handles analysis, and one handles comparison. Some teams run comparison only and skip analysis. That is a mistake. Analysis helps check what is being compared. Sometimes algorithms are clearly wrong, such as classifying someone as male when the image obviously shows a female.

Final decisioning

A practical process for face recognition decision making looks like this:

Data validation - incoming data can be trusted
Outlier or strange-result check - use analysis results for trouble detection
Final decision - compare confidence results

Incoming data detection rules recommendation is as follow:

No Photoshop or other software
Camera maker matches phone make, based on other metadata such as browser metadata
Geolocation is not off
Photo is not too old
Image metadata is present

Photo analysis

Gender match
Only one person detected in the photo

Comparison

The comparison result is high, for example 99%
Confidence threshold based on the vendor recommendation. Usually:

80%+ high confidence that it is the same person
60-80% some certainty
<60% not the same person

Final words…

The goal of this post is to explain the face recognition process in broad terms. It does not cover everything. At most, it should be the start of a policy or strategy for decision making in identity verification, not the end. Human identification is a multi-part problem. If you focus only on calling cognitive services and setting a cutoff, you can end up with poor decision making because you assumed the input data was better than it really was.

Private State Tokens in Online Antifraud: What They Change and Where They Fit
Private State Tokens are a privacy-preserving signal for online trust. They can help separate real users from automated abuse without relying on traditional fingerprinting, which is getting harder to use as browsers tighten privacy controls.
Protect Your Lending App with Device Fingerprinting, App Behavioral Data, and Face Recognition
Fraud and identity theft target lending apps. Protecting customer data and keeping credit underwriting reliable matters. Device fingerprinting is a practical place to start.
How to integrate Ekata Address Risk API into a decision engine
This article shows how to connect Ekata Address Risk API to a decision engine in Decisimo. It covers API key setup, data source configuration, and how to add the call to a decision flow so address checks run as part of decision logic.
Email address profiling
Email address profiling is a low-cost, high-signal way to assess customer quality. In fintech and insurance, it can support credit risk and actuarial ML models. By extracting username features such as first or last name presence, numbers, nickname or gibberish indicators, and Levenshtein distances, along with domain signals such as MX or WHOIS deliverability, disposable email detection, and provider type, teams can improve scoring, fraud detection, and deliverability decisions.
Bridging Manual and Automated Scoring Models
Manual scoring models used statistical scorecards but slowed decision-making with time-consuming data work. This article explains how machine learning-based automated scoring speeds evaluations and improves accuracy, and why causal analysis and second-order effects matter when you implement decision logic and decision workflows.

Face recognition and comparison for onboarding

Try our decision engine.

Data input validation

Running comparison

Final decisioning

Incoming data detection rules recommendation is as follow:

Photo analysis

Comparison

Final words…

Try our decision engine.

Face recognition and comparison for onboarding

Try our decision engine.

Data input validation

Running comparison

Final decisioning

Incoming data detection rules recommendation is as follow:

Photo analysis

Comparison

Final words…

Try our decision engine.

Related Articles