Calculating Probability of Default Using Logistic Regression
Analyze credit risk with statistical precision using the Logit model
Variable 1: Debt-to-Income (DTI)
Variable 2: Credit Score Adjustment
Variable 3: Utilization Rate
1.09%
-4.50
0.011
Low
Sigmoid Curve Visualization
The red dot indicates the current calculated PD relative to the logit score.
What is Calculating Probability of Default Using Logistic Regression?
Calculating probability of default using logistic regression is the cornerstone of modern quantitative credit risk management. Unlike linear regression, which predicts continuous values, logistic regression is designed to predict binary outcomes—in this case, whether a borrower will “default” (1) or “not default” (0). By transforming a linear combination of financial variables through a sigmoid function, banks and financial institutions can assign a specific percentage chance of failure to any loan or credit facility.
Financial analysts use this method because it handles the non-linear relationship between risk factors and default likelihood. For example, a drop in credit score from 800 to 750 might have a negligible impact on risk, whereas a drop from 600 to 550 could exponentially increase the chance of default. Calculating probability of default using logistic regression captures these nuances perfectly.
Common misconceptions include the idea that high coefficients always mean high risk; in reality, coefficients must be interpreted alongside the scale of the input variable (e.g., a coefficient for “income in thousands” will look different from “income in dollars”).
Calculating Probability of Default Using Logistic Regression Formula
The mathematical foundation of this model relies on the Logit function. The process starts with a linear predictor ($z$), which is then mapped to a probability between 0 and 1.
| Variable | Meaning | Unit | Typical Range |
|---|---|---|---|
| β₀ (Intercept) | The baseline log-odds of default | Log-odds | -5.0 to -2.0 |
| X₁ (DTI) | Debt-to-Income Ratio | Percentage | 10% – 60% |
| X₂ (Credit Score) | Borrower’s Credit Rating | Points | 300 – 850 |
| βᵢ (Coefficients) | Sensitivity of PD to the variable | Scalar | -1.0 to 1.0 |
The calculation follows these steps:
- Calculate the Logit Score (z): z = β₀ + β₁X₁ + β₂X₂ + … + βₙXₙ
- Calculate the Odds: Odds = eᶻ
- Calculate the Probability: PD = Odds / (1 + Odds) OR 1 / (1 + e⁻ᶻ)
Practical Examples (Real-World Use Cases)
Example 1: Conservative Mortgage Lending
Suppose a bank is calculating probability of default using logistic regression for a mortgage applicant. The intercept is -4.0. The applicant has a DTI of 35% (β₁=0.06) and a Credit Score of 780 (β₂=-0.01).
z = -4.0 + (0.06 * 35) + (-0.01 * 780) = -4.0 + 2.1 – 7.8 = -9.7.
PD = 1 / (1 + e⁹.⁷) ≈ 0.006%. This represents an extremely low-risk borrower.
Example 2: Subprime Personal Loan
A fintech lender evaluates a borrower with an Intercept of -2.5, a DTI of 55% (β₁=0.08), and a Credit Score of 580 (β₂=-0.005).
z = -2.5 + (0.08 * 55) + (-0.005 * 580) = -2.5 + 4.4 – 2.9 = -1.0.
PD = 1 / (1 + e¹·⁰) ≈ 26.89%. This borrower has a high probability of default, likely requiring a higher interest rate or collateral.
How to Use This Calculating Probability of Default Using Logistic Regression Calculator
Follow these steps to generate a risk profile:
- Enter the Intercept: This value is usually derived from historical data training. A more negative intercept implies a lower “default” baseline.
- Define Your Variables: Input the coefficients (weights) and the actual values for the borrower. Ensure the weights match the units of the values.
- Interpret the PD: The result is shown as a percentage. In most banking contexts, a PD > 5% is considered “Medium Risk” and > 10% is “High Risk.”
- Analyze the Sigmoid Chart: Observe where the borrower falls on the curve. If they are on the steep part of the S-curve, small changes in their financial health will result in large changes in default risk.
Key Factors That Affect Calculating Probability of Default Using Logistic Regression
- Macroeconomic Conditions: During a recession, the intercept ($\beta_0$) typically shifts upward as systemic risk increases.
- Data Quality: If the historical data used to find the coefficients is biased, the resulting PD will be inaccurate.
- Multicollinearity: If variables like “Income” and “Credit Limit” are too closely correlated, the model coefficients can become unstable.
- Variable Selection: Choosing the right predictors (e.g., payment history vs. current employment length) is critical for model power.
- Sample Size: Logistic regression requires large datasets to ensure that the coefficients for rare events (like default) are statistically significant.
- Time Horizon: PD is usually calculated for a 12-month period. A “Lifetime PD” requires different modeling techniques.
Frequently Asked Questions (FAQ)
Related Tools and Internal Resources
- Debt-to-Income Ratio Calculator – Calculate a primary input for PD models.
- Credit Score Impact Tool – Understand how score changes affect lending rates.
- Loan Loss Provision Model – Using PD and LGD to calculate bank reserves.
- Weighted Average Cost of Capital – How default risk impacts corporate funding costs.
- Effective Interest Rate Calculator – Calculate true costs after risk premiums.
- Risk-Adjusted Business Valuation – Incorporating default probability into company value.