DeFi Labelling in Argos

Blockchain MEV Arbitrage DeFi Ethereum Formal Verification Rocq Sandwich Liquidation
Published on 2026/06/18
DeFi Labelling in Argos


This article dives into the DeFi labelling engine that powers Argos, our blockchain intelligence platform. We cover the full classification pipeline and focus on our original contribution: a protocol-agnostic arbitrage detection algorithm currently under peer review.

Why DeFi Labelling Matters

Every block on Ethereum contains hundreds of transactions, most of which interact with DeFi protocols. A raw transaction is just a sequence of EVM opcodes, token transfers, and event logs. Without classification, an analyst looking at a block sees noise. With classification, they see structure: this transaction is a sandwich attack, that one is an arbitrage, this other one is a liquidation on Aave.

Argos labels every transaction in real time through a three-tier classification pipeline. First, our OCaml analysis backend performs deep structural analysis (the most accurate tier, described below). Then a second pass classifies based on decoded function calls and known contract signatures. Finally, a fast heuristic layer uses method names for broad coverage. The result is visible everywhere in the platform: in the block explorer, on transaction pages, and as edge labels on investigation graphs.

This article focuses on how each category of MEV is detected, with a deep dive into our novel arbitrage detection approach.

Sandwich Attacks and Liquidations

For sandwich detection, the approach is well-established in the literature. A sandwich attack involves three transactions in the same block: a front-run that moves the price, a victim swap that executes at the manipulated price, and a back-run that captures the profit. Argos identifies these by scanning each block for this three-transaction pattern, matching on pool addresses and token pairs, and computing the extracted profit. Each transaction in the triplet is tagged with its role (front-runner, victim, back-runner) along with a link to its counterparts.

For liquidations, we detect collateral liquidation events on lending protocols such as Aave, Compound, and Maker. When a borrower's position falls below the health factor threshold, a liquidator repays part of the debt and seizes collateral at a discount. Argos identifies these events through known liquidation function signatures and event patterns, tagging each transaction with the liquidated position details.

These detection methods build on approaches described in the academic literature and adapted to production constraints. They work well for their respective categories, but they share a fundamental limitation: they require knowledge of the protocols involved. Every new lending protocol or DEX aggregator means new signatures to add.

For arbitrage, we took a fundamentally different approach.

Arbitrage Detection: Let the Structure Speak

Instead of asking "does this transaction match a known arbitrage pattern?", we ask a more fundamental question: does the flow of tokens in this transaction form a cycle?

Pipeline diagram: EVM Trace to Cash Flow Tree to Term Rewriting to Canonical Form to Cycle Detection
The detection pipeline: execution traces are transformed into Cash Flow Trees, reduced by 16 rewriting rules to a canonical form, where arbitrage cycles emerge at the fixpoint.

Our detection engine operates in three stages. First, we reconstruct the full token transfer hierarchy from the EVM execution trace, preserving the call stack structure. This produces what we call a Cash Flow Tree: an abstract syntax tree where each node represents a contract call and each leaf represents a token transfer.

Then we apply a convergent term rewriting system to this tree. Fifteen rewriting rules progressively simplify the structure: chaining adjacent transfers that share endpoints, merging parallel paths, collapsing redundant intermediaries. The rules are applied bottom-up, from the deepest call frames to the root, until no more rules can fire.

Left: tangled 4-address transfer graph. Right: clean 3-node cycle with positive WETH balance after rewriting.
Before and after rewriting: a tangled set of transfers between a bot, a router, and two DEXes reduces to a clean arbitrage cycle with a +0.3 ETH profit.

At the fixpoint, arbitrage cycles emerge naturally from the canonical form. A cycle means tokens left an address, flowed through a series of swaps, and returned to the same address in the same token with a positive balance. No protocol-specific knowledge is needed. We use only standard ERC token ABIs and the WETH interface, making the approach portable across any EVM chain.

This also handles complex cases like flash loan arbitrages, where the borrow-repay round-trip is recovered through structural pairing of matching transfers, without inspecting any lending protocol's events.

Results

We evaluated the arbitrage detection on 220,000 consecutive Ethereum mainnet blocks (November 2025 to January 2026).

Four statistics: 220K blocks analyzed, 457K confirmed detections, 83% overlap with Eigenphi, 64K exclusive detections
Key results from the evaluation on 220,000 Ethereum mainnet blocks.

Among our exclusive detections, 63% involve flash loan round-trips discovered purely from the structural pairing of transfers. These are arbitrages that profitability-based detectors miss because they don't reconstruct the full borrow-repay cycle.

The system classifies each detection into confidence tiers (confirmed, probable, attempted, uncertain) based on whether cycles are complete, profitable after gas, and free of unexplained leftover transfers. This gives analysts a clear signal-to-noise control depending on their use case.

Performance-wise, the rewriting algorithm runs in 0.04 ms median per transaction, with the full pipeline (excluding trace retrieval) at 0.19 ms median, making it suitable for real-time block-by-block analysis.

Formally Verified in Rocq

What sets the arbitrage detection apart is that the entire logic is formally verified in Rocq (formerly Coq). We proved five properties with zero admitted obligations:

  • Preservation: every chain in the final tree corresponds to a valid walk in the original transfer graph
  • Termination: the rewriting always reaches a fixpoint, in at most 3n-2 passes for n transfers
  • Soundness: if the system says "arbitrage", the canonical form contains a cycle with matching tokens and positive net balance
  • Confluence: the deterministic traversal order guarantees a unique canonical form for every transaction
  • Decidable equivalence: two transactions implement the same strategy if and only if their canonical forms coincide
  • Enabling query languages over canonical forms: Arbitrage detection is only an example of a query over this normalization.

The Rocq mechanization comprises 157 lemmas, 8 theorems, and 4 corollaries across 5,568 lines of proof. This means the detection verdicts are not just empirically accurate; they are mathematically guaranteed to be consistent and reproducible.

To our knowledge, this is the first formally verified MEV detection system.


The DeFi labelling engine runs on every Ethereum block in real time inside Argos. If you are a compliance team, forensic analyst, or researcher interested in MEV detection, visit argosint.com or reach out.