AI-Assisted Coding Field Reports

Experience with AI-assisted coding varies widely across different projects, teams, and individual developers. How Orgs Actually Win With AI by Laura Tacho at Pragmatic Summit showed some interesting stats on the bimodal distribution of outcomes.

Stats are insihgtful but can be hard to relate to. A recent Hacker News discussion How is AI-assisted coding going for you professionally? provides a direct and insightful window into the current state of things.

If you don't want to wade through the whole 500+ comments thread yourself, here's a summary:

What works

Greenfield and small-scope projects

The clearest wins are on new, small projects: personal projects, prototypes, throwaway scripts, and internal tools. Simon Willison reports the majority of his code since November 2025 has been agent-written, much of it from his iPhone, with years-old project ideas becoming afternoon hacks. A brewery owner and former developer built five business apps, cutting monthly bookkeeping from 16 hours to about 3, plus production reports, a rewards tracker, and a TV menu system. A freelancer estimates roughly 3x solo productivity. One commenter says they've taken on a second part-time job entirely thanks to the tools.

(Related: Vibe-coding a startup MVP from scratch)

Well-structured codebases with strong conventions

On these, the incremental feature work can be largely one-shotted when the developer plans carefully and reviews everything. One team invested heavily in making a large Elixir app (3,000+ modules) "AI-native" with detailed skill files and tooling, and reports that incremental changes mostly succeed on the first try using plan mode. Another developer describes a workflow on two greenfield web apps (Preact, Go, PostgreSQL) where 80% of the time he writes a paragraph describing what he wants, reviews the output, and it's ready to ship. In both cases, the developers explicitly note this is not vibecoding: they maintain a strong architectural vision and guide the AI like a junior programmer.

Codebase navigation and investigation

Multiple developers report that asking questions like "which functions touch this table" or "what are all the authentication paths in this Rails monolith" returns reliably useful answers. Bug-hunting in large codebases, framework boilerplate, dependency-update breakages, test maintenance after refactors, and regex generation are all areas with reported clear speedups. Cross-language context switching (jumping between TypeScript, Python, and Go without the cognitive cost of remembering idioms and APIs) saves one developer 30–40 minutes daily. AI-assisted code review catches things humans miss, particularly as a complement to human review rather than a replacement.

What doesn't

Involving multiple systems

The most consistent failure is on anything that spans system boundaries: bugs crossing client-server divides, queue producers and consumers, frontend state propagating through API calls to cache invalidation. The AI addresses one layer confidently and silently ignores the rest. On large existing codebases at big companies, several experienced engineers report near-zero success getting working commits. One FAANG engineer adds that they don't personally know any ICs who have successfully used the tools at work, despite endless internal posts claiming 10x productivity.

Superficially correct code

AI-generated code reliably looks correct on the surface but hides problems underneath: unnecessary complexity, reinvention of existing functions, over-cautious exception handling, and sometimes dangerous substitutions. One team lead gives a specific example: Claude replaced an HTML sanitizer with a custom regex that passed all tests and matched the spec, but was fundamentally unsafe. A veteran freelancer says careful review of AI code always turns up conceptual errors, performance issues, and maintainability problems.

Review overload

The review burden this creates is a major pain point. Juniors and less careful developers now produce large, plausible-looking changesets that take enormous effort to properly vet. One developer describes spending over a week untangling an AI-coded feature that didn't respect the main project's API design, a task that made them look slow compared to the team that generated the original code quickly. AI-generated documentation fares even worse: one commenter reports teammates' AI-written docs are 100% hallucinated, requiring painstaking cross-checking against the actual code, and concludes that no doc is sometimes better than a bad doc.

(Related: Rethinking code reviews in the age of AI)

Takeaways

Expertise is (still) a prerequisite

Nearly every positive report comes with the caveat that the developer has deep domain knowledge, understands their codebase, and has a mental model of the desired implementation before prompting. The counter-case is a 4-year developer on an "AI-First" team told not to write code themselves, now shipping PRs they don't understand, in technologies they've never used, on a codebase where even the original architects acknowledge they don't understand parts of their own AI-built stack.

Model generation and harness quality matter greatly

One developer reports Opus 4.5+ with 1M context was a tipping point — before that, they agreed entirely with negative assessments. Another notes Amazon restricts developers to internal tools while Meta allows best-of-breed external ones, producing very different outcomes within the same industry tier. Teams that invest in making repos "AI-native" with CLAUDE.md files, skills, and automated review pipelines report much better results than those that don't.

Skill atrophy is a recurring concern

Multiple developers independently flag that they feel they're not learning anything when using AI tools. One notes that after a vibecoded throwaway script, they couldn't reproduce what the AI did without asking it again. Another describes the tools as addictive, requiring conscious effort to stop. The addictive quality, the laziness effect ("sometimes it's just easier to fire up Claude than to focus"), and the sense that individual coding skill is becoming invisible run through multiple accounts as a quiet undercurrent beneath the productivity gains.

Looking for more insight or wondering how this might apply to your organisation?Reach out for a free 30-minute consultation.Get in touch