Deep dives on how we built each rewrite, what we measured, and the exact reason .NET pulls ahead.
Python's difflib uses Ratcliff/Obershelp with per-call matrix allocation. We replaced it with a preallocated DP LCS buffer in .NET and eliminated GC pressure entirely.
Python's intervaltree stores intervals as Python objects in a balanced BST. An augmented interval tree in .NET backed by flat int[] arrays queries the same ranges 31× faster.
Python's qrcode library generates QR codes through a pure-Python Reed-Solomon encoder. QRCoder in .NET compiles the same error correction math to native code — 14× faster at 50,000 codes.
dateutil.parser.parse handles any date format automatically using a heuristic ML-style tokenizer. DateTimeOffset.TryParseExact handles a known set of formats with a compiled lookup — 19× faster at 1M timestamps.
Python's textdistance allocates a fresh O(m×n) matrix for every pair. One preallocated ArrayPool row in .NET, reused across all calls, cuts runtime from 14 seconds to under 200 ms at 100k pairs.
mistune is Python's fastest Markdown parser. Markdig is .NET's equivalent. Both parse CommonMark-compatible Markdown to HTML — Markdig does it 11× faster on 10,000 real-world documents.
Whoosh is a pure-Python BM25 search engine. Lucene.NET is the same algorithm in C#. 1,000 queries across 100,000 documents — Lucene.NET is 9× faster on indexing, 22× on search throughput.
NLTK's Punkt tokenizer runs a trained ML model for sentence boundaries — smart but slow. A compiled regex pair in .NET gives equivalent quality 8× faster on 100 MB of plain text.
pypdf is a pure-Python PDF parser — every byte of every page goes through the CPython interpreter. PdfPig is a pure C# equivalent. Same algorithm, same data, 4–6× faster.
NetworkX stores graphs as Python dicts — every iteration dispatches millions of attribute lookups. A CSR sparse matrix in .NET with SIMD normalization cuts the same algorithm to a fraction of the time.
At GPT-2 scale .NET and NumPy are tied. Scale to Llama-3's 128k vocab and .NET pulls 1.53× ahead — because TensorPrimitives reuses a single SIMD buffer while NumPy keeps allocating.