Engineering Blog

Technical Insights & Tutorials

Deep dives into model optimization, HPC, MLOps, DevOps, and production-grade AI/ML engineering.

Blogs

Our most popular and in-depth technical guides

Engineering

Embedding + Rerank Gateway: Rust vs Python (28% Faster, 67% Less RAM)

We built the same embedding + rerank gateway in Python, Rust (ONNX), and a split architecture — then benchmarked all three on GCP. Rust hit 28% more RPS with 67% less memory. Same model, same API.

10 min read

Read

Engineering

Self-Knowledge Distillation: Compress Orpheus TTS 3B with Unsloth + LoRA

Compress Orpheus-3B TTS to half its size with no quality loss. Step-by-step guide using self-knowledge distillation, Unsloth, SNAC tokenization, and LoRA — with full code examples you can run today.

25 min read

Read

Engineering

Threads Beat Multiprocessing for RAG: 70% Faster, 75% Less Memory

We benchmarked RAG ingestion across Python 3.13, 3.14, and 3.14t. Threads are 70% faster than multiprocessing with 75% less memory — because NumPy and PyTorch already release the GIL. Your infra doesn't need more pods.

12 min read

Read

View all posts

7 articles published

Security

Claude Mythos and Project Glasswing: The Morning an AI Found a 27-Year-Old Bug in OpenBSD

Yesterday Anthropic previewed Claude Mythos and announced Project Glasswing. Somewhere in the middle of the report is a sentence about a 27-year-old OpenBSD bug and a 16-year-old FFmpeg flaw that five million fuzzing runs missed. This is our honest reaction, a walk through what the model actually did, and what we think you should be doing about it this week.

14 min read

Read

Engineering

Designing Software for AI Agents: Why Your CLI and API Now Have Two Readers

A warning I saw in a CLI last week points at the biggest shift in software design since cloud. Software now has two readers: humans and AI agents. Here's what that actually means for your CLIs, APIs, docs, and cost structure, with the patterns and pitfalls we've learned building agent-native interfaces at NavyaAI.

12 min read

Read

Engineering

Transformer Inference: Python vs Rust (Rust Wins by 50%)

We load-tested Python (FastAPI + HuggingFace) vs Rust (Axum + rust-bert) for GPU transformer inference. Rust delivered 50% lower latency and 81% higher throughput. Full benchmark code and results inside.

25 min read

Read

Engineering

Python 3.14 No-GIL vs Rust: We Benchmarked Both (4x Speedup)

Free-threaded Python 3.14t hits 4x speedup on 4 threads — closing the gap to just 3.4x of Rust. We ran head-to-head CPU-bound benchmarks with full code. Here are the results.

30 min read

Read