RBIH x IIT Delhi TRYST 2025

Catching Money Mules

Machine learning system detecting fraudulent intermediary accounts across 160K accounts and 400M+ transactions. 208 engineered features. 3-model ensemble. Two competition phases.

Phase 1: EDA + ModelingPhase 2: Production Pipeline
Scroll
Phase 2Production Pipeline

Competition Results

3-model gradient-boosted tree ensemble (LightGBM + XGBoost + CatBoost) with rank averaging, trained on 208 features across 160K accounts and ~400M transactions (16 GB).

0.968
Public AUC-ROC
Leaderboard rank #37
0.956
Private AUC-ROC
Hidden test set
208
Engineered Features
4 computation passes
160K
Accounts Analyzed
~400M transactions

3-Model Ensemble

ModelTrainingEnsemble Method
LightGBM3-seed x 5-fold CVRank Averaging
XGBoost3-seed x 5-fold CVRank Averaging
CatBoost3-seed x 5-fold CVRank Averaging

Feature Engineering (208 Features)

Computed in 4 passes over the 16 GB dataset using memory-efficient batch processing.

~100Pass 1
Transaction Core
~40Pass 2
Transaction Extended
~35Pass 3
Static Account
~33Pass 4
Graph / Network

Experiment Log

ExperimentPublic AUCOutcome
V1: Baseline (LGB+XGB+CB)0.956Solid starting point with target encoding
V2: Optuna HPO (100 trials/model)0.956Found optimal near-zero regularization
V3: Freq encoding + rank avg + multi-seed0.968Best. Eliminated leakage, improved stability
V5: Feature interactions (26 derived)0.963Hurt. Trees discover interactions internally
V6: Pseudo-labeling (2-stage)0.787Catastrophic. Diluted mule signal from 2.8% to 1.8%
V7: Drop all branch features0.959Fixed RH7 but destroyed overall AUC
V8: Surgical branch_code drop0.958Precise RH7 fix, still too much AUC loss

Key Lesson: With near-zero regularization and powerful tree models, the simplest feature set produces the best generalization. Feature engineering quality matters more than model complexity.

Mule Behavioral Archetypes

Pass-Through

High velocity credits followed by rapid debits, minimal balance retention

Network Hub

High fan-in/fan-out with many unique counterparties acting as intermediary

Dormant-Burst

Long inactivity followed by sudden intense transaction bursts

Structuring

Deliberately fragmenting amounts below monitoring thresholds

Red Herring Analysis

RHDescriptionScore
#1Routine investigation false positives0.995
#2Missing alert_reason0.990
#3Future mule_flag_date0.993
#4Very old flag dates0.978
#5Boundary date artifacts0.993
#6Frozen accounts as mules0.999
#7Flagged by own branch0.000

RH7 Analysis: Branch features carry genuine discriminative signal. Removing all branch features achieves RH7=1.000 but drops AUC from 0.968 to 0.959. The surgical V8 approach (drop only branch_code encodings) balances the tradeoff.

Label Cleaning Strategy

Confident Learning

2 rounds of out-of-fold LightGBM probability estimation with per-class thresholds (Northcutt et al. 2021). Identifies label noise.

Heuristic Noise Scoring

Rule-based detection targeting 7 red herring categories. Assigns noise scores based on metadata signals.

Combined Weights

max(CL score, heuristic score) mapped to sample weights in [0.2, 1.0]. Downweights noisy labels without discarding data.

GCP n2-highmem-8
Compute
8 vCPU, 64GB RAM
~12 min
Training Time
3-seed x 5-fold CV
~10 min
Feature Engineering
208 features, 160K accounts
16 GB
Dataset Size
~400M transactions
Phase 1EDA + Initial Modeling

Exploration & Discovery

Deep EDA on 24K accounts with 7.4M transactions. LightGBM + XGBoost ensemble with 125 features achieving 0.985 OOF AUC-ROC.

0.985
OOF AUC-ROC
LightGBM + XGBoost ensemble
125
Engineered Features
13 categories
12/12
Mule Patterns Found
All validated statistically
1:90
Class Imbalance
263 mules / 23,760 legitimate

Model Comparison

ModelOOF AUC-ROCMean Fold AUCStd Dev
LightGBM0.98340.9831±0.0058
XGBoost0.97890.9785±0.0067
Ensemble0.9851--

Red Flags: Mule vs Legitimate

SignalLegitimateMuleMultiplier
Accounts Frozen3.0%58.9%19.6x
MCC 6051 (Wire Transfer)0.12%2.10%18x
Post-Mobile Txn Value127K903K7.1x
Txn-to-Balance Ratio68.5473.96.9x
Near-50K Structuring1.1%5.9%5.3x
Median Txn Velocity336.8h78.3h4.3x faster
Unique Counterparties13.737.12.7x
Pass-Through Ratio1.1841.015~1:1

Top 10 Features (Phase 1 SHAP)

1
Wire Transfer Rate
2
Account Freeze History
3
UPI Debit Rate
4
Counterparties/Txn
5
KYC Recency
6
Pawn Shop Rate
7
P25 Amount
8
Cheque Rate
9
Relationship Tenure
10
ATM Withdrawal Rate
Behavioral Analysis

12 Mule Patterns

All 12 known mule behavior patterns from the RBIH challenge specification were identified and validated with statistical evidence.

#1

Dormant Activation

Inactive accounts suddenly process high-value bursts

#2

Structuring

Transactions just below the 50K INR reporting threshold

#3

Rapid Pass-Through

Near 1:1 credit-to-debit ratio, money flows through untouched

#4

Fan-In / Fan-Out

Many-to-one or one-to-many fund flows reveal network topology

#5

Geographic Anomaly

PIN code mismatches across customer, branch, and address

#6

New Account High Value

Young accounts with disproportionate transaction volume

#7

Income Mismatch

Transaction values vastly exceed account balance patterns

#8

Post-Mobile-Change Spike

Activity surges 7x after mobile number update

#9

Round Amount Patterns

Overuse of exact round amounts (1K, 5K, 10K, 50K)

#10

Layered / Subtle

Weak multi-signal combinations that evade single-rule detection

#11

Salary Cycle Exploitation

Laundering timed to coincide with salary credit windows

#12

Branch-Level Collusion

Suspicious account clusters originating from the same branch

Exploratory Data Analysis

25 Visualizations

47 statistical tables, 25 analytical plots, and a full written report covering every aspect of mule account behavior.

Showing 6 of 25 visualizations. Full report includes class distribution, channel analysis, temporal patterns, geographic analysis, and more.

Methodology

How It Works

01

Data Ingestion

160K accounts, 400M+ transactions spanning July 2020 - June 2025. Memory-efficient batch processing over 16 GB Parquet dataset.

02

Feature Engineering

208 features in 4 passes: transaction core, extended patterns, static account metadata, and graph/network metrics (PageRank, Louvain, betweenness).

03

Label Cleaning

2-round confident learning + heuristic noise scoring for 7 red herring categories. Sample weights in [0.2, 1.0].

04

Model Training

LightGBM + XGBoost + CatBoost ensemble with 3-seed x 5-fold CV, rank averaging, Optuna HPO. Frequency encoding to prevent leakage.

Statistical Methods

  • Confident Learning
  • Kolmogorov-Smirnov
  • SHAP TreeExplainer
  • Bonferroni correction

ML Models

  • LightGBM (GBDT)
  • XGBoost
  • CatBoost
  • Rank Averaging Ensemble

Graph / Network

  • PageRank & HITS
  • Louvain Communities
  • Betweenness Centrality
  • Clustering Coefficients

Tech Stack

  • Python 3.10+
  • Pandas / NumPy / NetworkX
  • Optuna HPO
  • GCP n2-highmem-8