Implementing site-wide search using Aider and MiniLM

Motivation

For a long time, I have been wanting to implement my own site-wide search functionality.

Playing a poorly made word-guessing game at exocon25 made we want to recrate it myself, so I looked up the latest word2vec tech to date.

I found sentence-transformers and MiniLM. It’s supposed to be a language model but mini. From reading the model config, it is a model of the BERT family. (Later, from reading the MiniLM paper, I learned that it is a student of BERT.)

Since it is BERT, we can run it using llama.cpp from ggml. I even found the model converted for llama.cpp.

We move on to actual coding.

“Vibe coding”

The code of this project is mostly written by Aider and Gemini 2.5. Long story short, it made two versions. The first in Python. The second in Rust.

What the code does: It fetches a sitemap XML and index every paragraph using MiniLM. Then, the user can search for sematically similar paragraphs using freeform text.

Why does people call asking LLM to code “vide coding”? To me, it feels more like parenting someone who like to play with comments. Too much comments.

Maybe I should explain the setup a bit? You run aider, use /add, /read, /ask, /code, and ask it to do things. The Tier 1 Gemini subscription (no cost) gives you enough tokens to play with each day. You should try it out if you are interested. It is really good at coding.

Now, the jank.

The Python library of llama.cpp loads extremely slow when compared to the Rust one.

llama.cpp refuses to work within its own thread. The model pointer mysteriously turn into zero when multiple models are used within a process. I really have no idea what is happening so I put each inference instance in its own process and used ZeroMQ for IPC.

Switching from Sourcehut Pages to Cloudflare Pages

Sourcehut’s hut is the best DevOps CLI tool I’ve used. However, their CSP does not allow cross-origin requests, so I have to move this website to another hosting provider.

First, I tried BunnyCDN. It… doesn’t even have an official CLI. Running bnycdn cp -h (the community CLI) made we wonder what am I looking at.

So, I switched to Cloudflare. Why does Cloudflare have so many CLI commands? I have no idea. Their main CLI is called wrangler. Would you have guessed the company from the name?

Setting it up is easy enough.

wrangler login
wrangler pages project create
wrangler pages deploy _site/

Then, add this domain to the domain list of the pages project.

Switching from Fly.io to Fly.io

Fly.io’s docker image builder has been broken for a year now, and they aren’t fixing it.

So, while fly deploy --depot=false was running, I…

  • Registered on railway.com.
  • Created a new project.
  • Ran pnpm i -g @railway/cli; railway link; railway up.
  • Added a volume by right clicking on the Railway dashboard. I have no idea what I am doing yet it works.
  • Ran railway up again.
  • Figured out that I cannot attach a volume to the app whatsoever.

Let me read the Railway home page again.

Shipping great products is hard.
Scaling infrastructure is easy.

Oh, fuck you.

OK. Let me try DigitalOcean.

DigitalOcean: You shall not create an app from a Codeberg repo or upload directly from the CLI.

Me: 🖕

fly deploy --depot=false finished running.

Use the thing

The search engine should be up at https://wtwt.fly.dev/. If your website has a sitemap.xml, you can use it to search within your site. See </search/> for a demo.

Aider, I don’t understand

A few more days of using Aider and it has created bugs I don’t understand using code I don’t understand.

I have seen an CEO saying that they picked Python over Zig because of AI tooling. Like, their whole team uses AI to code. How chaotic would that be?

Also, Turso’s Rust rewrite of SQLite limbo only write to WAL but not read from it. I guess write-only database is the future.

In so far, asking Aider and Gemini to write small Python scripts to rewrite file content is fine. Managing big projects… it will make the code worse and worse. And… it hallucinates about incorrect usage of ZeroMQ.

Despite what I said, Gemini is quite good at writing code. It will write code for people who can’t write code (QA-driven development?). If energy consumption of future models (hopefully not transformer-based models) go down to 1/100 of today, I will hopefully be able to use it to write a whole project.