RBIH x IIT Delhi TRYST 2025

Catching Money Mules

Machine learning system detecting fraudulent intermediary accounts across 160K accounts and 400M+ transactions. 208 engineered features. 3-model ensemble. Two competition phases.

Phase 1: EDA + ModelingPhase 2: Production Pipeline

Phase 2 Results GitHub

Scroll

Phase 2Production Pipeline

Competition Results

3-model gradient-boosted tree ensemble (LightGBM + XGBoost + CatBoost) with rank averaging, trained on 208 features across 160K accounts and ~400M transactions (16 GB).

0.968

Public AUC-ROC

Leaderboard rank #37

0.956

Private AUC-ROC

Hidden test set

208

Engineered Features

4 computation passes

160K

Accounts Analyzed

~400M transactions

3-Model Ensemble

Model	Training	Ensemble Method
LightGBM	3-seed x 5-fold CV	Rank Averaging
XGBoost	3-seed x 5-fold CV	Rank Averaging
CatBoost	3-seed x 5-fold CV	Rank Averaging

Feature Engineering (208 Features)

Computed in 4 passes over the 16 GB dataset using memory-efficient batch processing.

~100Pass 1

Transaction Core

~40Pass 2

Transaction Extended

~35Pass 3

Static Account

~33Pass 4

Graph / Network

Experiment Log

Experiment	Public AUC	Outcome
V1: Baseline (LGB+XGB+CB)	0.956	Solid starting point with target encoding
V2: Optuna HPO (100 trials/model)	0.956	Found optimal near-zero regularization
V3: Freq encoding + rank avg + multi-seed	0.968	Best. Eliminated leakage, improved stability
V5: Feature interactions (26 derived)	0.963	Hurt. Trees discover interactions internally
V6: Pseudo-labeling (2-stage)	0.787	Catastrophic. Diluted mule signal from 2.8% to 1.8%
V7: Drop all branch features	0.959	Fixed RH7 but destroyed overall AUC
V8: Surgical branch_code drop	0.958	Precise RH7 fix, still too much AUC loss

Key Lesson: With near-zero regularization and powerful tree models, the simplest feature set produces the best generalization. Feature engineering quality matters more than model complexity.

Mule Behavioral Archetypes

Pass-Through

High velocity credits followed by rapid debits, minimal balance retention

Network Hub

High fan-in/fan-out with many unique counterparties acting as intermediary

Dormant-Burst

Long inactivity followed by sudden intense transaction bursts

Structuring

Deliberately fragmenting amounts below monitoring thresholds

Red Herring Analysis

RH	Description	Detection Method	Score
#1	Routine investigation false positives	Heuristic noise weight 0.6	0.995
#2	Missing alert_reason	Heuristic noise weight 0.8	0.990
#3	Future mule_flag_date	Dates after Jun 2025 downweighted	0.993
#4	Very old flag dates	Dates before Jul 2020 downweighted	0.978
#5	Boundary date artifacts	Exact boundary dates flagged	0.993
#6	Frozen accounts as mules	Null flagged_by_branch detected	0.999
#7	Flagged by own branch	Branch features carry signal + noise	0.000

RH7 Analysis: Branch features carry genuine discriminative signal. Removing all branch features achieves RH7=1.000 but drops AUC from 0.968 to 0.959. The surgical V8 approach (drop only branch_code encodings) balances the tradeoff.

Label Cleaning Strategy

Confident Learning

2 rounds of out-of-fold LightGBM probability estimation with per-class thresholds (Northcutt et al. 2021). Identifies label noise.

Heuristic Noise Scoring

Rule-based detection targeting 7 red herring categories. Assigns noise scores based on metadata signals.

Combined Weights

max(CL score, heuristic score) mapped to sample weights in [0.2, 1.0]. Downweights noisy labels without discarding data.

GCP n2-highmem-8

Compute

8 vCPU, 64GB RAM

~12 min

Training Time

3-seed x 5-fold CV

~10 min

Feature Engineering

208 features, 160K accounts

16 GB

Dataset Size

~400M transactions

Phase 1EDA + Initial Modeling

Exploration & Discovery

Deep EDA on 24K accounts with 7.4M transactions. LightGBM + XGBoost ensemble with 125 features achieving 0.985 OOF AUC-ROC.

0.985

OOF AUC-ROC

LightGBM + XGBoost ensemble

125

Engineered Features

13 categories

12/12

Mule Patterns Found

All validated statistically

1:90

Class Imbalance

263 mules / 23,760 legitimate

Model Comparison

Model	OOF AUC-ROC	Mean Fold AUC	Std Dev
LightGBM	0.9834	0.9831	±0.0058
XGBoost	0.9789	0.9785	±0.0067
Ensemble	0.9851	-	-

Red Flags: Mule vs Legitimate

Signal	Legitimate	Mule	Multiplier
Accounts Frozen	3.0%	58.9%	19.6x
MCC 6051 (Wire Transfer)	0.12%	2.10%	18x
Post-Mobile Txn Value	127K	903K	7.1x
Txn-to-Balance Ratio	68.5	473.9	6.9x
Near-50K Structuring	1.1%	5.9%	5.3x
Median Txn Velocity	336.8h	78.3h	4.3x faster
Unique Counterparties	13.7	37.1	2.7x
Pass-Through Ratio	1.184	1.015	~1:1

Top 10 Features (Phase 1 SHAP)

Wire Transfer Ratemcc_6051_rate

Account Freeze Historywas_frozen

UPI Debit Ratech_UPD_rate

Counterparties/Txncp_per_txn

KYC Recencydays_since_kyc

Pawn Shop Ratemcc_5933_rate

P25 Amountp25_amount

Cheque Ratech_CHQ_rate

Relationship Tenurerel_years

ATM Withdrawal Ratech_ATW_rate

Behavioral Analysis

12 Mule Patterns

All 12 known mule behavior patterns from the RBIH challenge specification were identified and validated with statistical evidence.

Dormant Activation

Inactive accounts suddenly process high-value bursts

Structuring

Transactions just below the 50K INR reporting threshold

Rapid Pass-Through

Near 1:1 credit-to-debit ratio, money flows through untouched

Fan-In / Fan-Out

Many-to-one or one-to-many fund flows reveal network topology

Geographic Anomaly

PIN code mismatches across customer, branch, and address

New Account High Value

Young accounts with disproportionate transaction volume

Income Mismatch

Transaction values vastly exceed account balance patterns

Post-Mobile-Change Spike

Activity surges 7x after mobile number update

Round Amount Patterns

Overuse of exact round amounts (1K, 5K, 10K, 50K)

#10

Layered / Subtle

Weak multi-signal combinations that evade single-rule detection

#11

Salary Cycle Exploitation

Laundering timed to coincide with salary credit windows

#12

Branch-Level Collusion

Suspicious account clusters originating from the same branch

Exploratory Data Analysis

25 Visualizations

47 statistical tables, 25 analytical plots, and a full written report covering every aspect of mule account behavior.

Read Full EDA Report View on GitHub

Showing 6 of 25 visualizations. Full report includes class distribution, channel analysis, temporal patterns, geographic analysis, and more.

Methodology

How It Works

Data Ingestion

160K accounts, 400M+ transactions spanning July 2020 - June 2025. Memory-efficient batch processing over 16 GB Parquet dataset.

Feature Engineering

208 features in 4 passes: transaction core, extended patterns, static account metadata, and graph/network metrics (PageRank, Louvain, betweenness).

Label Cleaning

2-round confident learning + heuristic noise scoring for 7 red herring categories. Sample weights in [0.2, 1.0].

Model Training

LightGBM + XGBoost + CatBoost ensemble with 3-seed x 5-fold CV, rank averaging, Optuna HPO. Frequency encoding to prevent leakage.

Statistical Methods

Confident Learning
Kolmogorov-Smirnov
SHAP TreeExplainer
Bonferroni correction

ML Models

LightGBM (GBDT)
XGBoost
CatBoost
Rank Averaging Ensemble

Graph / Network

PageRank & HITS
Louvain Communities
Betweenness Centrality
Clustering Coefficients

Tech Stack

Python 3.10+
Pandas / NumPy / NetworkX
Optuna HPO
GCP n2-highmem-8