Articles

Every benchmark, explained

Deep dives on how we built each rewrite, what we measured, and the exact reason .NET pulls ahead.

Py vs .NET

17.4× peak
Jun 11, 2026

Every BS4 call routes through Python's pure-Python html.parser, wraps each node in a Python object, and traverses with Python iterators. AngleSharp parses to a C# DOM with zero Python overhead. At 50,000 documents the gap is 17×.

beautifulsoup4 → AngleSharp →
Py vs .NET

7.7× peak
Jun 10, 2026

scikit-image's anti-aliased resize allocates four intermediate float64 arrays per image and dispatches through Python for every frame. ImageSharp's Lanczos3 resampler is a single-pass SIMD loop. On 10,000 images the gap is 7.7×.

scikit-image → ImageSharp →
Py vs .NET

1.53× peak
May 31, 2026

At GPT-2 scale .NET and NumPy are tied. Scale to Llama-3's 128k vocab and .NET pulls 1.53× ahead — because TensorPrimitives reuses a single SIMD buffer while NumPy keeps allocating.

numpy → TensorPrimitives →
Py vs .NET

31× peak
May 27, 2026

Python's intervaltree stores intervals as Python objects in a balanced BST. An augmented interval tree in .NET backed by flat int[] arrays queries the same ranges 31× faster.

intervaltree → AugmentedIntervalTree →
Py vs .NET

21.3× peak
May 24, 2026

Python's difflib uses Ratcliff/Obershelp with per-call matrix allocation. We replaced it with a preallocated DP LCS buffer in .NET and eliminated GC pressure entirely.

difflib → Custom DP LCS →
Py vs .NET

19× peak
May 21, 2026

dateutil.parser.parse handles any date format automatically using a heuristic ML-style tokenizer. DateTimeOffset.TryParseExact handles a known set of formats with a compiled lookup — 19× faster at 1M timestamps.

dateutil → DateTimeOffset →
Py vs .NET

14× peak
May 18, 2026

Python's qrcode library generates QR codes through a pure-Python Reed-Solomon encoder. QRCoder in .NET compiles the same error correction math to native code — 14× faster at 50,000 codes.

qrcode → QRCoder →
Py vs .NET

11× peak
May 15, 2026

mistune is Python's fastest Markdown parser. Markdig is .NET's equivalent. Both parse CommonMark-compatible Markdown to HTML — Markdig does it 11× faster on 10,000 real-world documents.

mistune → Markdig →
Py vs .NET

71× peak
May 12, 2026

Python's textdistance allocates a fresh O(m×n) matrix for every pair. One preallocated ArrayPool row in .NET, reused across all calls, cuts runtime from 14 seconds to under 200 ms at 100k pairs.

textdistance → ArrayPool Wagner-Fischer →
Py vs .NET

22× peak
May 9, 2026

Whoosh is a pure-Python BM25 search engine. Lucene.NET is the same algorithm in C#. 1,000 queries across 100,000 documents — Lucene.NET is 9× faster on indexing, 22× on search throughput.

whoosh → Lucene.NET →
Py vs .NET

8.3× peak
May 6, 2026

NLTK's Punkt tokenizer runs a trained ML model for sentence boundaries — smart but slow. A compiled regex pair in .NET gives equivalent quality 8× faster on 100 MB of plain text.

nltk → Compiled Regex →
Py vs .NET

6.2× peak
May 3, 2026

pypdf is a pure-Python PDF parser — every byte of every page goes through the CPython interpreter. PdfPig is a pure C# equivalent. Same algorithm, same data, 4–6× faster.

pypdf → PdfPig →
Py vs .NET

47× peak
Apr 30, 2026

NetworkX stores graphs as Python dicts — every iteration dispatches millions of attribute lookups. A CSR sparse matrix in .NET with SIMD normalization cuts the same algorithm to a fraction of the time.

networkx → CSR + TensorPrimitives →