The future is CLI, not MCPs

For the past couple of months, I have changed jobs and in this current gig we use LLMs - like, a lot.

We’ve seen great things come out of these tools, but also a lot of slop. We’ve learned where LLMs can help us become more productive and deliver more, while also seeing their limits and where we need to tread carefully. We also run our own MCP server to help integrate our platform with LLMs and AI tools, but we quickly ran into the challenge of managing context.

In late 2024, Anthropic released the Model Context Protocol that I don’t think lives up to expectations. While on paper it felt like a great way for tools to interact with each other, easily provide context and actions to LLMs, we’re finding out how hard context management quickly becomes when rolling out an MCP for your product.

On one hand, you want to ship as much detail as possible through your MCP server; on the other, you’re taking up a lot of context that is pre-loaded on chat initiation.

As I’ve used more and more LLMs, I’m coming to the conclusion the best interactions with the tool are actually CLI-driven. At the top of my mind, GitHub (gh) CLI, kubectl (Kubernetes command line tool to interact with clusters), psql, and bash toolset come to mind — the latter may sound weird, but it’s really amazing what LLMs can do with decade-old tools (awk/sed/grep/find -h) and, most of the time, the file manipulations they do. Not surprisingly, the best ones are developer-oriented tools, or at least tech-savvy. That doesn’t mean your product can’t become more tech-savvy too, though I’m unsure if it’s worth it.

During this time, I’ve also seen some companies doubling down on APIs and OpenAPI specs to make it easier to interact with their product. This helps, quite a lot, but perhaps a well-written CLI will make it a lot easier for LLMs to interact with than a great OpenAPI spec.

What I think sets CLI approaches apart from classical MCP servers or APIs is mostly context. CLIs are usually small programs because they are aimed at humans across all skill levels (or so they should). The possibilities are branched out and one command does only one thing — or so it should, as per the Unix philosophy¹.

But more importantly for the context of this post, their help texts do not ship with the agent by default and are only used when needed.

Because I use kubectl frequently, and need to go through the security layer to access a machine in staging or production, I’ve written a bunch of agent skills to perform common operations. In fact, we’ve adopted this practice where we write internal docs explaining to other engineers how to do these things, and link those docs to Claude’s skills — if it works for a human, it should work for the LLM too. This way, once the skills are enabled, they can easily be picked up on demand and used accordingly.

For instance, an internal doc could look like this:

 1
 2# Debugging Kubernetes Logs
 3
 4## Authentication
 5
 6This is what you should do to authenticate:
 7`foo bar login`
 8
 9## Namespaces
10
11Assuming you've authenticated successfully, you should be able to issue any `kubectl` commands.
12In cluster `xyz`, these are the relevant namespaces to me:
13- namespace1: some type of environment or relevant context
14- namespace2: ...
15- ...: ...

This works for the LLM too.

kubectl specifically is part of the LLMs’ training data. However, as one explained to me:

Reasoning from patterns: More than raw memorization, I learned the structure:

kubectl (get, describe, logs, exec)

Common flags (-n, –previous, -o yaml)

Typical debugging workflow: find the broken pod → describe it → get logs → maybe exec in

Tools that have predictable patterns become very easy for the agent to reason with and perform the actions one would expect.

Should this only work for tools in the training data? I don’t think so — as long as your CLI has a predictable pattern and approach, paired with good docs, you should be all set to leverage it with LLMs.

Now I’m wondering how this will turn out though. Command-line interfaces don’t necessarily have a framework or unique convention people follow — it’s very much each to their own. And, in fact, some CLIs and their corresponding man pages are notoriously bad² — even those that followed the Unix philosophy.

I don’t know how non-developer tools fit in this box right now either. How should the industry handle non-tech tools? I’m not sure investing in a proper CLI + docs is really worth it, just for the sake of LLMs leveraging them.

Perhaps non-tech tools will continue to embrace AI in a totally different manner — think Photoshop, Office Suite tools. These usually embed AI in their platform directly, allowing users to interact with an agent and request things to do in the work at hand. It doesn’t make sense, at all, to develop a CLI to perform these operations. But the developer tools industry? It absolutely does.