AI Featured

State and future of AI-assisted development

In the last few months, I've been trying to program using large language models. I want to collect my impressions in a series of notes about which ideas seemed successful and which seemed inconvenient and possibly even lead to a dead-end.

Denis Redozubov

04 Feb 2025 • 8 min read

Getting started

For the last couple of months, I've been trying to program using large language models. I've tried various tools for this, including open-source tools like Aider and Cline and commercial tools like Cursor and Windsurf. These tools have primarily shaped my experience. This isn't an endorsement of any particular product; it's just an account of what I've found to be the basis of my observations.

I will collect my impressions in a series of notes about which ideas seemed successful and which seemed awkward or possibly even leading to a dead-end. My experience involves writing a RAG (retrieval augmentation generation) application in Rust. The code base I'm experimenting with is about 7,000 lines of Rust code, and it went through many iterations (thousands of lines were changed repeatedly). Practically speaking, it's very different from one-off LLM requests like "Write me Tetris in Python" because it hasn't been done before as student homework and data mined from Github. My code base had to be maintained and improved through numerous iterations while not being wholly developed or documented, similar to how a project would evolve in a human team. Understanding my focus is essential because it best illustrates where I come from. This involves higher-level work with code: an ability to understand what needs to be done, analyze the code, develop changes, test them, think about data constraints in the database layer, and so on.

The core process of software development

The most crucial issue I saw was fixing the core loop: the mismatch between the software development and the LLM chat-like UI. The software development process has a graph-like structure of trial and error, experimentation, and rejection of patches. We try new approaches, see new constraints, backtrack, validate, and iterate. It maps well to version control systems and Git, allowing us to try different ideas that must be combined. Sometimes, they conflict or block each other, and ultimately, they need to be resolved from a zoomed-out perspective. They're not a series of linear interactions like a chat, where we have a sequential application of patches or actions.

The glaring difference is that each conversation and interaction with the chat has its own context, which usually doesn't transfer well between multiple chats. You end up exhausting the token limits, only to face an issue of manual context transfer into the next chat. It would work better to have a control flow that inherits part of the context and appends new information. It could be more expensive, but it is necessary for the developers to whom these tools are targeted.

Automating it for graph-like structures and graph traversal operations gives the user back the power and transparency.

How it may be done

The basic principle would be to create some context and extract information about our project's code before turning to LLM to "solve" it somehow. We extract it and push it into the LLM context window RAG-style, preferably in the most compressed form. Then, we make a request explaining what needs to be done and what needs to be changed. When it starts to produce code, though (it's not different from humans producing code), we encounter various kinds of resistances from the codebase: compilation errors, warnings, and problems we didn't notice before, which must be solved to continue solving the original problem. This is where tools start to break and leak meaning into the nothingness.

This all resembles control flow in programming or a very branching graph structure. The task becomes a series of tasks and subtasks we must pay attention to, test against, and track.

However, in practice, we feed every new thing into the LLM's context windows, effectively advancing the conversation thread. Once the context windows run out of tokens, we have to start fresh by keeping the most essential parts of the context.

My thesis is that it can be improved by using multiple agents that track the specifications, compose execution plans, execute them, and verify the solutions against them and all of the other invariants we must keep. This way, we can better separate concerns and reduce the amount of pressure on any individual context window. However, to be truly useful, it has to be explorable/traversable by the user, so we can "bisect" the changes to the point it went wrong and prune incorrect branch of the development, which seems more straightforward than delving into the interaction of incorrect code and all the changes following it.

How it felt

Programming with LLM feels like working with a very enthusiastic junior developer who spits out code in seconds but frequently misses the more significant point or inevitably loses attention to detail. In fact, it spits out code so fast that reviewing it becomes a time-intensive chore, which intensifies even more when you try to follow up with all the feedback and suggestions you can give. It's going forward on 60-70% of code challenges and creates a tangled web of issues that span the rest. Importantly, reviewing it is very different from writing the code yourself. You have to infer the purpose and reasoning behind what's written. This is why I lack the proper tools to work with this competently, as it has an overwhelming effect. LLM assistance offers a large number of changes that are difficult to track. You rapidly lose familiarity with a code base, making reviewing and fixing these changes more difficult.

Hard constraints

For this reason, the interaction cycle between the user and the programming assistant must be carefully reconsidered. As a user, I'd answer additional questions that an architect or system analyst would ask of me as a product owner rather than go through the wall of code searching for implicit decisions made without my input. These questions can be turned into verifiable constraints (type-checking, test running) and an agentic review computed before the machine returns to me again with a diff. I'd rather waste some compute cycles than tire myself by trying to review so much code.

In fact, you spend most of your time repeating "Fix this"-like queries and pointing at mistakes. This is why it's decent at writing code but doesn't get how the whole process fits together. For example, keeping the other software (we're working on) features up and running doesn't come easily to LLM's "mind" when working on a new feature. Something should be automatically done before the flow of development returns to the user asking for feedback. It's not rocket science to run a type checker or some tests, but it's lacking (or rarely performed) in most of the tools I saw.

However, if you automate this, another catch arises:

Suppose your assistant tries to change code in a loop without supervision while getting feedback from the development environment (compiler, test suites). In that case, the accumulated changes between interaction points will increase. This can turn into many changes that are not practical to review. In my experience so far, more problems are created during these changes than the number of problems solved, or at least it's comparable.

This is a form of Sisyphean labor, or stepping on the same rake repeatedly because the LLM creates it repeatedly and throws it under your feet. This should motivate you to wrap it with some unit or integration tests, but I've seen no tools that do this without user intervention. This should be the most basic project setting.

After observing these different approaches and how they stumble in slightly different places, my core observation is we're trying to implement a faster horse instead of creating a paradigm shift. If we apply our new computing toys to software development, let's not try to make a better programmer; let's teach a machine how a team of human experts would approach and productize it. I'd rather pay for extra compute cycles but keep my energy and sanity. Ideally, I want to create a task, answer architectural questions to provide some rigor for the specification, and only interact with it if needed.

Industry lessons in dealing with complexity

Just think about methods of coping with complexity in real-world software development teams. Projects are getting large and complex enough rapidly, and it's easy to get to the point when no single person knows the whole project inside out. At this point, we scale it, making roles explicit and separating concerns. Why? Because we have limited context, time, computing, and attention. Does it remind you of something? Context size, attention, and computing - We use these terms to describe LLM implementations and their constraints. This is why it's essential to recognize how this job should be divided into multiple roles (agent) to tackle the complexity of existing tools. It's an abstraction on the organizational layer, which we use to scale labor when it's no longer possible to scale it any other way.

Instead of creating a helper for the programmer, we can create a virtual programmer for pair programming or, even better, an entire virtual team, which may be very similar. Along the way, we leverage the knowledge about software development methodologies already in the data corpus, which is used to train every relevant LLM offering.

What happens when we onboard a new person to a team? We provide helpful information in documentation, contribution guidelines, Continuous Integration (CI) pipelines, and various machine-executable tests. Why would we onboard an AI member any other way? Maybe in the future, we'll be able to delegate inferring all of that to the AI, but the current state suggests there are limitations, and we don't really know how to help it unless we revise the UI majorly.

Organization abstraction

One can even argue that software developers can be viewed functionally as one-person teams:

They need to consider the requirements of what they're implementing as analysts to work through details that weren't explicitly described elsewhere.
They need to think like QA engineers to understand how to verify the changes they're making without breaking critical logical paths and/or existing functionality.
They need to think like an architect, note what else needs to be done, what changes will hinder, and complicate other steps that will need to be taken to implement different tasks and features.

If we consider this an interaction of a real team, we can realize different people perform different roles to critically examine the same changes from various points of view. QA engineers, developers, architects, and operations people (DevOps) pay attention to different nuances to balance the quality of the produced intellectual material. This can be a good starting point for several reasons. The first reason is that it's easier to conceptualize and integrate at the application level. If you already have a project that's developing and exists, most likely, there are already these actors. Adding this kind of virtual actors to this project can supplement the development process rather than trying to completely reinvent it from scratch.

Another point is that we have an extensive list of methods to maintain the quality during further development. The same techniques can be applied through multi-agentic LLM usage. First, they are all trained on datasets that describe these methodologies. Secondly, they are well enough understood by people who can guide them in the right direction.

Benefits for the end user

Imagine yourself as a business owner. To get the edge in business, you must employ software specialists to write and maintain your systems. Sometimes, it lags behind the business development, and you try to find ways to scale it somehow. At this point, you start employing more developers, outstaffing, and outsourcing, which comes with a material cost of time and funds. These processes are time-bound because you must ensure new people fit the team profile, onboard them, and give them time to be great contributors. The time benefit and organizational ease of having a virtual teammate are enormous. We still have to explain things to it and onboard it, but it's all an order of magnitude faster than finding people to do this work!

You can disagree by saying: "I'm a software developer, not a business owner; I care about simplifying my work." Here's a take for you: it's precisely the same! You build the software product by producing intellectual labor, which may be targeted at conversing with an AI instead of fiddling with code directly most of the time! We have to learn new tools all the time as software developers, so it's nothing new in that regard.

It may also be composed if you think about a person or a unit-size development team. We already have operations (processes) in place to add operational units to the team and synchronization and consensus protocols for teams. Let's leverage it!

There's a generational potential here. AI progress will eventually lift us up and improve our offerings, but we must build smarter to bootstrap smart enough builders!