Intro
I had been meaning to write this article for a while, about smaller and specialized LLM models, but I kept putting it off.
In the meantime, though, two things happened that somehow connect to what I wanted to say.
-
This article about Mistral and how they managed to build a small AI empire without having the most powerful models: How France’s Mistral Built a 14 billion AI empire by not being american
-
GitHub Copilot is moving to token-based billing instead of request-based billing, as it had until now: GitHub Copilot is moving to usage-based billing
The real price
I think we have been through a testing period that is now starting to come to an end. It was a period in which it was incredibly cheap to use top AI models and produce whatever kind of slop you wanted.
For example, I use GitHub Copilot Pro, which still works on a request-based model. It costs 10 USD, I let it go 20 more over that if needed, and that is that. If you have a good prompt and files that are easy for an LLM to go through, you can pretty easily “abuse” the Pro plan and get more out of it because it is not calculated based on the tokens used.
Since I do not do everything only with an LLM, it is more than enough. And I have not even needed Opus, which costs 3x more. Incredibly rarely have I ended up using Opus; Sonnet 4.5/4.6 is more than okay in most cases.
I have seen a lot of discussion about how much of the compute power and the price of these models is subsidized, probably somewhere between 90% and 100%. I would lean more toward 100%, because Copilot Pro, for example, is ridiculously cheap.
Not to mention DeepSeek, which is the cheapest option available right now. You put in 10 USD and forget you even paid. Although the API is very slow in some situations, I do not think you can ask for too much at that price.
OpenAI and Anthropic need to make money, and now the enshittification phase begins: token prices are going up, by as much as 9x in the case of top models like Opus, and I still do not think they are anywhere near their real cost. It is just one way they will try to focus on enterprise customers and then keep them locked in.
The cost is even higher: You’re about to feel the AI money squeeze
And this is only when we talk about the models themselves, without any other wrappers around them. Lovable or Figma Make are basically unusable now in trial mode, and the first paid tier does not help much either because the credits run out very quickly.
The future
I have the feeling that local LLMs will become more and more popular as prices for closed models rise closer to their real value.
Those who are not locked into using one of the cloud AI ecosystems will, I think, gradually migrate to specialized local models. And I do not mean only individual users or power users who constantly need LLMs for their work, but also mid-sized companies that pay close attention to long-term expenses.
That is, unless we see some major breakthrough in how LLMs operate on the same kinds of hardware, or a substantial software improvement that makes them much more efficient and drives costs down so much that nobody would have a reason not to use them.
And some of the tools that will be used will be used by specialists in the field. If you are a designer, for sure it will cost the company less for you to use AI tools for design because you have experience, you know what you want to do, and you know how to get to the result through good prompts. At the same time, it is much more likely that what you create will be better than the generic thing I would make if I had access to the same tool.
That whole idea of the “democratization” of various fields like programming and design sounds good, and that everyone can do anything with AI, but there is a very good chance the results will be mediocre.
Just as a side note, on one project I am involved in, Lovable and Figma Make have more or less been dropped for people who are not actually in the field, and suddenly we now have four designers available to talk to about what exactly we need.
Security
For a company that has certain business “secrets,” I do not see why you would use a US or Chinese cloud model.
You effectively have no guarantee whatsoever regarding the safety of the information you put into the system.
I am sure that everything you put in is used to train the models further, and regardless of whatever assurances they give you, somewhere there is probably a clause through which you gave them permission to do exactly that.
Government institutions, banks, companies in the medical field, or those in other key sectors, I do not think they should be using AI in this way.
It is still astonishing that all governments have used Microsoft up to now for operating systems and other applications while they were funneling information through servers in the US.
Local
I think local models would have been even more popular if we were not currently in the middle of a RAM shortage that will last until the end of the year, or maybe longer. It is pretty much the same situation with GPUs—you cannot really find anything at a reasonable price.
AMD and others have mini PC models with a lot of RAM, designed specifically to be used with local LLMs by taking advantage of unified memory.
And the selection of open-source models you can install is very large.
A lot of the time, all you need is autocomplete on steroids, not something that builds your entire application from a prompt.
But it is incredibly expensive to buy something with more RAM right now. For example, my laptop is more expensive now than it was last year when I bought it, and a lot of people are forced to use cloud-based subscription models.
Small and specialized variants
I do not think you need LLMs that do everything well.
I have never really understood why all the closed-source companies are competing to make one model that performs well on every benchmark instead of building separate things.
One possibility is that maybe those closed-source models do in fact have a number of active parameters that are triggered depending on the task they are given. In that case, yes, it is one single model, but only part of it is active when the request is about programming, and maybe another part when you need a marketing plan.
Still, I think that from a cost-efficiency standpoint, I would rather have a model that I know is dedicated to programming, and that is the one I pay for or have installed locally and connected to VSCode.
It would be easier and cheaper to train, it would cost less to use, but because OpenAI and Anthropic are fighting to go public at the end of the year, we have these expensive models that are good at everything.
Mistral
In a way, I also wanted to get to Mistral because it seems like an interesting case to me. They have not had top-tier models so far, but they have been fairly decent. There is no way to compete with the US or China when you do not have their money and you are also trying to respect copyright when it comes to the data used for training.
What they did instead was diversify, with a whole range of models specialized for various tasks that you can use for those specific actions.
I have already been using Le Chat for quite a while for more general things, and their API with the Small 4 or Medium model for an application that has an integrated bot to check certain content. It is not Claude, but it is good enough. It does not get everything right on the first try, but you can make it work.
Now, with the newer model, Medium 3.5, it is getting close to or even surpassing Sonnet 4.5 in certain situations. At this point, if those comparisons hold up in practice and not just in simulations, then we have a European model that can genuinely be used as an alternative to US offerings. What is missing is good integration with VSCode. You can use Mistral Vibe, which has one of the coolest presentation pages out there. Whoever did Mistral’s design really nailed it.
Another thing I find interesting is Forge, also from them, for creating models specially trained on the information the client has. And it seems to be a kind of service for institutions and companies that are focused on security. In the end, you host it yourself and do not depend on a cloud provider you cannot control. That way, you have absolute control over the information that goes into and out of the system.