Machine learning system detecting fraudulent intermediary accounts across 160K accounts and 400M+ transactions. 208 engineered features. 3-model ensemble. Two competition phases.
3-model gradient-boosted tree ensemble (LightGBM + XGBoost + CatBoost) with rank averaging, trained on 208 features across 160K accounts and ~400M transactions (16 GB).
| Model | Training | Ensemble Method |
|---|---|---|
| LightGBM | 3-seed x 5-fold CV | Rank Averaging |
| XGBoost | 3-seed x 5-fold CV | Rank Averaging |
| CatBoost | 3-seed x 5-fold CV | Rank Averaging |
Computed in 4 passes over the 16 GB dataset using memory-efficient batch processing.
| Experiment | Public AUC | Outcome |
|---|---|---|
| V1: Baseline (LGB+XGB+CB) | 0.956 | Solid starting point with target encoding |
| V2: Optuna HPO (100 trials/model) | 0.956 | Found optimal near-zero regularization |
| V3: Freq encoding + rank avg + multi-seed | 0.968 | Best. Eliminated leakage, improved stability |
| V5: Feature interactions (26 derived) | 0.963 | Hurt. Trees discover interactions internally |
| V6: Pseudo-labeling (2-stage) | 0.787 | Catastrophic. Diluted mule signal from 2.8% to 1.8% |
| V7: Drop all branch features | 0.959 | Fixed RH7 but destroyed overall AUC |
| V8: Surgical branch_code drop | 0.958 | Precise RH7 fix, still too much AUC loss |
Key Lesson: With near-zero regularization and powerful tree models, the simplest feature set produces the best generalization. Feature engineering quality matters more than model complexity.
High velocity credits followed by rapid debits, minimal balance retention
High fan-in/fan-out with many unique counterparties acting as intermediary
Long inactivity followed by sudden intense transaction bursts
Deliberately fragmenting amounts below monitoring thresholds
| RH | Description | Detection Method | Score |
|---|---|---|---|
| #1 | Routine investigation false positives | Heuristic noise weight 0.6 | 0.995 |
| #2 | Missing alert_reason | Heuristic noise weight 0.8 | 0.990 |
| #3 | Future mule_flag_date | Dates after Jun 2025 downweighted | 0.993 |
| #4 | Very old flag dates | Dates before Jul 2020 downweighted | 0.978 |
| #5 | Boundary date artifacts | Exact boundary dates flagged | 0.993 |
| #6 | Frozen accounts as mules | Null flagged_by_branch detected | 0.999 |
| #7 | Flagged by own branch | Branch features carry signal + noise | 0.000 |
RH7 Analysis: Branch features carry genuine discriminative signal. Removing all branch features achieves RH7=1.000 but drops AUC from 0.968 to 0.959. The surgical V8 approach (drop only branch_code encodings) balances the tradeoff.
2 rounds of out-of-fold LightGBM probability estimation with per-class thresholds (Northcutt et al. 2021). Identifies label noise.
Rule-based detection targeting 7 red herring categories. Assigns noise scores based on metadata signals.
max(CL score, heuristic score) mapped to sample weights in [0.2, 1.0]. Downweights noisy labels without discarding data.
Deep EDA on 24K accounts with 7.4M transactions. LightGBM + XGBoost ensemble with 125 features achieving 0.985 OOF AUC-ROC.
| Model | OOF AUC-ROC | Mean Fold AUC | Std Dev |
|---|---|---|---|
| LightGBM | 0.9834 | 0.9831 | ±0.0058 |
| XGBoost | 0.9789 | 0.9785 | ±0.0067 |
| Ensemble | 0.9851 | - | - |
| Signal | Legitimate | Mule | Multiplier |
|---|---|---|---|
| Accounts Frozen | 3.0% | 58.9% | 19.6x |
| MCC 6051 (Wire Transfer) | 0.12% | 2.10% | 18x |
| Post-Mobile Txn Value | 127K | 903K | 7.1x |
| Txn-to-Balance Ratio | 68.5 | 473.9 | 6.9x |
| Near-50K Structuring | 1.1% | 5.9% | 5.3x |
| Median Txn Velocity | 336.8h | 78.3h | 4.3x faster |
| Unique Counterparties | 13.7 | 37.1 | 2.7x |
| Pass-Through Ratio | 1.184 | 1.015 | ~1:1 |
mcc_6051_ratewas_frozench_UPD_ratecp_per_txndays_since_kycmcc_5933_ratep25_amountch_CHQ_raterel_yearsch_ATW_rateAll 12 known mule behavior patterns from the RBIH challenge specification were identified and validated with statistical evidence.
Inactive accounts suddenly process high-value bursts
Transactions just below the 50K INR reporting threshold
Near 1:1 credit-to-debit ratio, money flows through untouched
Many-to-one or one-to-many fund flows reveal network topology
PIN code mismatches across customer, branch, and address
Young accounts with disproportionate transaction volume
Transaction values vastly exceed account balance patterns
Activity surges 7x after mobile number update
Overuse of exact round amounts (1K, 5K, 10K, 50K)
Weak multi-signal combinations that evade single-rule detection
Laundering timed to coincide with salary credit windows
Suspicious account clusters originating from the same branch
47 statistical tables, 25 analytical plots, and a full written report covering every aspect of mule account behavior.
Showing 6 of 25 visualizations. Full report includes class distribution, channel analysis, temporal patterns, geographic analysis, and more.
160K accounts, 400M+ transactions spanning July 2020 - June 2025. Memory-efficient batch processing over 16 GB Parquet dataset.
208 features in 4 passes: transaction core, extended patterns, static account metadata, and graph/network metrics (PageRank, Louvain, betweenness).
2-round confident learning + heuristic noise scoring for 7 red herring categories. Sample weights in [0.2, 1.0].
LightGBM + XGBoost + CatBoost ensemble with 3-seed x 5-fold CV, rank averaging, Optuna HPO. Frequency encoding to prevent leakage.