Small language, big world
I’m a big fan of Small Language Models. The technology to distill a Large Language Model down to something that generates tokens at decent speed is now there. I had a Python application costing me 17 cents per run in API calls, so I started to muck about with Ministral-3-3B-Instruct-2512 under llama.cpp, and after some tuning got good results. Ministral is energetic and verbose. I had to be clear that inventing answers when the input was thin was not acceptable. Once we were past that, it started performing well.