Email address profiling

Published on: 2024-08-10 18:48:28

Email profiling is a practical way to assess customer quality across industries. In fintech and insurance, it can support credit risk management and actuarial machine learning models.

Each part of an email address tells you something different. The username shows naming patterns, transparency, nicknames, and randomness. The domain, the part after @, provides operational and ownership signals.

Decisimo decision engine

Try our decision engine.

Username profiling produces features you can use for scoring and for detecting gibberish email addresses. Feature creation often focuses on these:

  • First name present
  • Last name present
  • Number present
  • First name only
  • Levenshtein distance from first name
  • Levenshtein distance from last name

These indicators are useful, but they are not simple good or bad signals. Combine them with other attributes in predictive modeling.

Domain name profiling usually comes down to a few checks:

  • DNS information (MX records, domains)
  • Disposable, junk, or temporary email detection
  • Email provider

Some domain-based rules are simple. If an address is not deliverable, do not send to it. Check MX DNS records and WHOIS. MX records work like postal routing codes. They define where mail should go. If none exist, the email will not be delivered. WHOIS shows whether the domain exists and who owns it. If the domain does not exist, do not send the email.

Next, detect disposable or temporary mailboxes. If someone provides a temporary mailbox like Guerrilla Mail, they likely do not plan a long-term relationship. These services are common in anti-fraud work because they are easy to use, and users can create many addresses quickly.

Then detect the email provider. Free email differs from business-hosted services. You can score identification of, for example:

  • Free email provider (Gmail, Hotmail, …)
  • Educational institution
  • Business Outlook
  • Business G Suite
  • Generic web hosting email provider
  • Self-hosted Outlook
  • Other self-hosted email server

The service type behind the email can signal cost and operational maturity. Free email has no direct cost. Cloud solutions for companies add subscription costs. Running a self-hosted Outlook server requires licenses and IT support. For that reason, provider type can be a useful predictor in your models.

Lastly, consider catch-all email addresses. Some domains route all incoming emails to one mailbox. That can be negative if fraudsters reuse many domains in a short period. It can also be positive if someone owns a personal domain but does not run a full mail system.

Decisimo decision engine

Try our decision engine.