Alternative data for credit scoring
Published on: 2024-08-10 18:37:05
Alternative data moved from pilots to production first in fintech. Banks followed later. Many lenders now integrate these signals into their scoring models.
Banks and other financial institutions are catching up. They use alternative data to supplement bureau files and internal records.
Fintechs used alternative data to launch products such as peer-to-peer lending and to price thin-file customers. Lenders that relied only on credit history could not reach many of these borrowers.
Alternative data as a proxy for traditional credit history
Alternative data usually serves as a proxy for traditional credit history. That creates risk because it is not direct repayment evidence, but an indirect signal.
Signals can correlate with default without any causal link. Some effects are seasonal or context-specific, and do not reflect true repayment behavior. In short, many correlations are spurious.
Using alternative data in scoring can also create new discrimination risks. For example, social media signals can encode ethnicity, gender, or other attributes that lead to unfair lending outcomes.
Another challenge is format. Many sources are unstructured and hard to use in traditional scoring models. They require specialized processing, often with artificial intelligence (AI) and machine learning (ML) methods.
Below is a summary of the main advantages and drawbacks.
Advantages:
- Alternative data helps reach more potential borrowers, including people without traditional credit files.
- Alternative data can provide a broader, more current view of a borrower's finances, which can improve lending decisions.
- Using alternative data can reduce scoring costs when traditional sources are expensive to obtain and process.
Disadvantages
- Because many sources are unstructured and hard to integrate into traditional models, specialized processing is required, which can be costly.
- Alternative data can be misused in ways that discriminate against certain groups.
- Correlations between alternative signals and default can be accidental, and not indicative of repayment behavior.
Building a credit scoring model with alternative data
Select alternative data attributes with the same or higher scrutiny used for traditional credit scoring.
This means the attributes should be:
- Related to creditworthiness: attributes should help predict probability of default.
- Available at scale: attributes should exist for a large share of borrowers to train a reliable model.
- Practical to obtain and process: attributes should be straightforward to source and compute to keep costs down.
- Stable over time: relationships with default should hold across periods, including on an “out of time” sample.
- Unbiased: attributes should avoid introducing discrimination.
Examples of alternative data sources for credit scoring
Examples of attributes from alternative sources that can meet these criteria include:
- Payment history on utility bills
- Rent payment history
- History of taking out short-term loans
- Social media activity
- Cell phone usage
- Ecommerce transaction data
- Ride-hailing history
Credit scoring methods
After selecting attributes, build the credit scoring model. You can use traditional methods, such as logistic regression, or more advanced machine learning techniques.
Common methods for credit scoring with alternative data include:
- Random forest
- Gradient boosting (XGBoost, LightGBM)
- Neural networks
- SVM
- Logistic regression
- Other regression models such as MARS
Machine learning can learn complex relationships between attributes and default risk automatically, which reduces the need for manual feature specification. The tradeoff is lower explainability and a higher risk of overfitting. Apply strict scrutiny to each attribute that enters the final model.
After you build the model, validate it to assess predictive power and guard against overfitting. Use cross-validation and, where possible, out-of-time validation.
Conclusion
Alternative data can supplement traditional sources in credit scoring and expand access to borrowers without a credit history.
It can provide a broader picture of a borrower's finances and support better lending decisions.
It can also lower scoring costs when traditional sources are expensive to obtain and process.
However, the risks remain: potential discrimination, and correlations that are accidental rather than indicative of repayment behavior.