Skip to main content

Command Palette

Search for a command to run...

Capital One’s 200ms Random Forest: Stopping Launderers Before the Wire Hits

Published
7 min read
Capital One’s 200ms Random Forest: Stopping Launderers Before the Wire Hits

Imagine spotting a money launderer before the wire transaction even settles! Capital One’s ML model just turned a traditional 30 page rulebook into a 200 ms risk score that tells investigators which gift wrapped alert to open first.

The traditional method of a rule based deterministic approach was a simple ledger of “if-else” alarms: If a cash deposit was greater than an amount X and if the frequency was greater than Y in Z days, an alert would be sent out, without any intimation of the level of risk posed. This approach had the following drawbacks :

  • 99% of the water was simply false positive foam

  • Investigators had to open cases first in first out, so the needle hid in the haystack

  • Inability to focus more precisely on just the right combinations of triggers

Using machine learning to build a suspicious activity monitoring system at Capital One

In August 2020 Capital One began applying machine learning to suspicious activity monitoring with an aim to :

  • Use a wide range of data to enable better decisions and give more insight to the investigators

  • Prioritise risk based investigations

This algorithm hunts for behavioral patterns that indicate financial crime. It detects Structuring**,** where criminals break a mountain of dirty cash into small, unnoticeable chunks, to dodge the reporting threshold.

It also spots Velocity Spikes, identifying accounts that act like gateways , money wiring in and immediately wiring out, typical attribute of shell companies trying to wash funds quickly.

Additionally, it flags Profile Deviation, noticing when a customer’s behavior doesn't match their identity like a college student’s account suddenly moving large amounts of money for business activities is very unusual.

Deconstructing the ML Algorithm Used :

The core of the model is a random forest classifier built using Sci-Kit Learn, coded in Python and PySpark.

A Random Forest algorithm merges the decisions of various decision trees to find an answer that best represents the average of all the decision trees

Why is a single decision tree not a better fit?

A single decision tree is prone to a problem called High Variance (or Overfitting). Think of a single gardener who memorizes every specific detail of the plants he has seen during the training process. The flaw here would be that he creates complex rules based on noise rather than the general pattern. The tree captures the noise of the training data so perfectly that it fails to generalize to new data not used during the trainning process.

Now think of a garden with 500 gardeners. If a single pruner might lean the wrong way, the collective decision of 500 gardeners ensures that the flowers and weeds stay sorted the Random Forest accepts that each individual tree might be a little wrong (high variance), but by averaging them, the error cancels out.

When a new transaction arrives :

  1. The Ballot: Each of the 500 trees casts a simple vote: Suspicious or Not.

  2. The Count: Only the majority isn’t considered, infact the majority voice is converted into a precise

    0-100 probability score. Think of it as a confidence score :

    • Scenario A: 100 gardeners say "Weed", then we get

      100/ 500 = 0.20, which means our score is 20% ( Least supicious )

    • Scenario B: 250 gardeners say "Weed”, then we get

      250 / 500 = 0.50, which means our score is 50% ( Barely suspicious )

    • Scenario C: 450 gardeners say "Weed", then we get

      450 / 500 = 0.90, which means our score is 90% ( Highly suspicious )

By keeping the score as a number instead of just a label, the bank knows which alert to open first. In this case, scenario C will be prioritized over scenario B.

The Data Layer

For better decisions, they first created several features based on customer transaction histories that could relate to suspicious activity to train the model.  When presented with a customer's identifiers, the model pulls the relevant customer attributes and transaction data, applies it to the model, and generates a score that represents the likelihood the customer's activity would be deemed suspicious.

To keep the model explainable, fast, and accurate, they regularly audit and prune i.e remove some features. Think of it as an indoor plant, snip the tiny twigs so the tree stays small, sharp, and always in season, let the strongest branches get the sunlight

The Model Layer

At the core sits a 500 tree Random Forest, trained on 65 laser pruned features.  Each tree casts a simple vote, suspicious or not, and the forest turns the majority voice into a 0-100 probability score.  Due to PySpark, the whole ensemble digests 120 k labeled cases in under 30 min on a 16 core cluster, so the model is fresh every morning.

That single score is the baton that is passed on to the investigation team.  Low score alerts skip the queue for a quick one click closure, mid score alerts follow the traditional workflow, and the high score alerts are prioritized to be reviewed first.

How does a Capital One gardener train his eye?

Now, let’s dive deep into the math of the random forest algorithm, starting with Mean Square Error ( MSE )

Before the forest can hunt for money launderers, each of the 500 individual gardeners (trees) must calibrate their vision. They look at 120,000 historical cases where the answer is already known and test their intuition.

  • Yi = The Truth (0 = Legitimate Customer, 1 = Money Launderer)

  • Fi = The Gardener’s Guess (His personal risk score)

Square of his mistake = (Yi-Fi)²

He sums these up to find his focus, which is:

MSE = Average of (Mistakes)²

This formula calculates the distance of each node from the predicted actual value, helping to decide which branch is the better decision for your forest. Here, yi is the value of the data point you are testing at a certain node, and fi is the value returned by the decision tree.

A small MSE is good, a sharp gardener is ready for duty; however, a big MSE value means that the data needs to be split again

Where does he snip the branch? (Classification)

The goal is to get pure error free values, and the method is Gini Impurity, which acts as a messiness meter.

Considering that the gardener has 65 laser pruned features to help him decide where to cut. He wants to separate the mix of legitimate shoppers and fraudsters into pure buckets.

Thus, the Gini Impurity score is used to measure the chaos :

Gini = 1-(share of Legit²+ share of Fraud²)

This formula uses the class and probability to determine the Gini of each branch on a node, determining which of the branches is more likely to occur. Here, pi represents the relative frequency of the class you are observing in the dataset, and c represents the number of classes.

  • 0.0 = Perfectly pure bed (All fraud or all legit)

  • 0.5 = Worst jungle (50% fraud, 50% legit)

Entropy is a related concept, but it asks how many yes/no questions are still unclear?

Either way, the gardener chops the branch exactly where the messiness drops the most, ensuring the strongest branches get the sunlight and the risk is isolated.

This is how the core investigation of the model is put together statistically to get a score of suspicious activity.

Conclusion

The vast majority of scores generated by the system result in the expected investigative output. Overall, the system developed by Capital One has proven itself to be less error prone and more efficient than rules based systems. Capital One overcame the traditional sticking points to create an industry-leading, innovative machine learning solution focusing more on truly suspicious activity, thereby creating a powerful new tool in its arsenal against criminal activity.