Glasseye

Issue 17: September 2025

Sep 24, 2025

In this month’s issue:

Some annotated AI sophistry in the dunghill
Semi-supervised asks you to package things up
The secrets of icy-clear technical prose in the white stuff.

Plus cursor on the command line, a plot twist for last month’s dunghill, and at last some industry interest in the synthetic respondents brouhaha.

The dunghill

One of the British newspapers used to run a regular feature in which they would annotate a political speech or interview to explain to those outside the Westminster bubble what was really going on beneath the surface. I thought it would be interesting to do something similar for a sample of Silicon Valley AI speak.

So here’s a short segment from an interview with Anthropic’s CEO, Dario Amodei, that featured on the BBC’s Radical with Amol Rajan. It’s less than three minutes long, but it is nevertheless crammed full of verbal manoeuvring. I’ve numbered the sections I’m going to comment on.

Dario Amodei: And of course, AI models are starting to write code. About 90% of code at Anthropic is written now or at least suggested by AI models.(1) We use our own AI models internally, and I’ve heard the CEOs of large companies say the same. So we’ve gone from barely being able to put together a sentence to writing a lot of the production code at some of the biggest technology companies in the world, at the level of a scientist, a PhD level scientist.
And the fact that there’s this exponential, that, you know, we’ve gotten to this stage… if the progress continues for even a couple of years beyond that, we may get to levels where the models are capable of doing things like making new biomedical discoveries, proposing a new molecular structure of a drug. We’ve already seen some of that with things like AlphaFold that you’ve seen Google did in the UK. (2) And we’re starting to see LLMs participate in things like this. It’s small, but we’re already working with pharmaceutical companies to use LLMs to speed the approval of clinical trials or something called a clinical study report. (3) And usually that takes nine or ten weeks to do. It’s kind of a summary of the results of a clinical trial. We’ve gotten that time down to less than one week with LLMs. So this is now compressed by eight weeks, the amount of time it takes to approve a drug.
Amol Rajan: And there’s a very good practical example I actually covered on the radio this morning, the day we’re speaking, which is that artificial intelligence is being used to identify the causes of a stroke in people who come to a hospital very, very quickly.(4) Was it a burst blood vessel or was it a blocked blood vessel? That often requires very specific knowledge and artificial intelligence can be used to augment what a doctor does.
You know, it’s interesting just listening to you, right? Because if you follow this field closely as I’ve done, you report on it, you read the books, you listen to the podcast (5), different big players, and you are one of the biggest players, are known for different things. And Anthropic, the clue is in the name, is trying to make through Claude a more humane Al, something that has kind of something that’s a bit more the complete human.

Commentary

“At least suggested by AI models” - this might seem like an innocent enough qualification, but in fact it is doing most of the work. It would be nice to know, for example, what percentage of the 90% was suggested as opposed to taken as read. The reason this matters is that the big sell (and, for many, the big fear) with AI is that it will be deployed in situations where it can act autonomously. For this to become a reality, we need to reach a point where certain types of error (hallucinations and the kind of common-sense errors that occur because LLMs do inhabit our world) are near enough eradicated. Now it looks like many in Silicon Valley are coming round to the idea that this is not going to be possible, not just practically but also in principle, and this has prompted some quiet back-peddling. A rather brilliant pivot is to place the responsibility for being right on human beings, and limit the role of an LLM to that of an insightful but unreliable critic, who provides interesting suggestions, all of which must be taken with a pinch of salt. That way, if they suggest something that’s clearly nuts, it can be brushed off with no harm done. This seems to be exactly what Jack Clark, another Anthropic key player, is saying in a recent interview on the Newsagents podcast: “You know, today lots of people use these systems [LLMs] to learn, but some of them use these systems to do junk food learning and some of them use these systems to do effective learning. Junk food learning is: upload a research paper to the system and say, tell me what this research paper is about and then read the output. You haven’t actually learned anything there. You’ve just become dependent on the machine in a way that doesn’t help anyone. The way that I use these systems, and many do, is I read a research paper, I write out what I understand that paper to mean and when I upload the paper and my understanding of it to the system and say, do I have this right? And if I don’t have it right, explain to me. That’s useful learning because the system reads the paper, reads my explanation and tells me whether I got it right or wrong, just like a colleague. If we use these things in the right way, they can help us be a lot more capable and a lot smarter.” This is all very true. It is exactly how we use these systems to code, and they are enormously useful. But saying that the LLM is “just like a colleague” is misleading. There is a reason they have been relegated to the passenger seat and are not allowed to touch the steering wheel.
Alphafold is not an LLM. It does share with LLMs a transformer architecture but one that is specifically designed for the job of predicting protein structures. All credit to Amodei, he doesn’t do the usual thing, which is to pretend that Alphafold and LLMs are essentially the same thing (see point 4 below) but, as his next sentence shows, he is undoubtedly leaning on its achievements to create the impression that similar successes for LLMs are just around the corner.
A report writing tool then? Something that suggests or summarises text? Well it better not be the junk food kind that Jack Clark has just warned us about. But if it is not going to be junk food then, as Clark points out, it needs a human being as the primary authority. This is a very different proposition to Alphafold, which can operate without supervision. It is a stretch then to say that LLMs are starting to “participate in things like this”, unless you qualify the kind of participation.
This one is on Amol, and just about every journalist who has written or spoken about AI since 2020. The fact is that over the last decade or so the word “AI” has meant:
1. A multidisciplinary project to create autonomous agents that can act intelligently in complex environments (pre-2010).
2. A small but suddenly very successful sub-discipline within that project, i.e. machine learning using deep neural networks (roughly 2010-2015).
3. Just about everything that sat inside a computer (roughly 2015 - 2020: if I remember rightly, there was a brief interregnum before the arrival of LLMs where no one was quite sure what AI was and anything could be sold under its banner without any mention of a chatbot).
4. An even smaller but phenomenally successful sub-field (LLMs) of the previously mentioned sub-discipline. (2020 to now)
So much overuse has left the term very slippery indeed. The most common slip being between what it means to most people now (ChatGPT or a variant) and the many things it has been in the past (and thanks to step c there’s not much it hasn’t been). This is very fortunate for those who wish to make a strong case for the practical usefulness of AI ( in sense d) because they can draw on a, b and c for myriad examples, despite the fact that these examples are only distantly related. I think most people would be surprised to learn that the technology at work in every NHS AI success story (scanning, detecting, imaging, etc), including the one mentioned by Rajan, has nothing to do with generative AI and everything to do with slow and steady progress in a field (machine learning) that has has been trundling along for twenty or so years.
From a journalistic point of view, I think this gets to the heart of the problem. The podcasts, the books, the speakers, these are all an obfuscating layer between the journalist and the truth about the technology. And who can blame them for not digging deep enough? There are lots of big names, with convincing qualifications, some with Nobel prizes, telling fascinating, newsworthy stories about the end of humanity. Who would want the truth?

If you have some particularly noxious bullshit that you would like to share then I’d love to hear from you. DM me on substack or email me at simon@coppelia.io.

The white stuff

It is a lazy stereotype about technical people that they are poor communicators. When I get a three word email with a hundred possible meanings it has invariably come from a “people person” whose charisma has not made it into the text. Nevertheless there is always room for improvement, which is why I’m recommending two books to sharpen your prose. The first is Stephen Pinker’s The Sense of Style: The Thinking Person’s Guide to Writing in the 21st Century. Whatever your views on Pinker’s relentless enlightenment positivism, this is an incredibly useful book, for at least two reasons: first Pinker’s academic background was originally linguistics which means that he can convincingly give you the ‘why’ as well as the ‘what’ for each linguistic rule (and show pedants the door); second he is himself a writer of clear and engaging scientific prose (an understatement), and can explain how he has achieved this.

The second book, Clear and Simple as the Truth: Writing Classic Prose, is heavily referenced by Pinker and with good reason - it is an argument for something the authors call the ‘classic style’ - a conversational writing style that emphasises, brevity and directness.

It doesn’t matter that you are not personally writing a book, or an article. Both of the above will change the way you write an email or text message; even - under some circumstances - the words that come out of your mouth.

Semi-supervised

Of the data scientists I work with my assumption is that one half started out as software engineers and drifted data-wards, and the other half came from a variety of numerate but not necessarily IT-intensive disciplines: natural sciences (a lot of physics students for some reason), economics, social sciences etc. My unproven, armchair theory is that the latter group gravitates towards Jupyter (or R) notebooks as a way of working since the format here is closest to the more familiar paper or essay format. If this is true then these are the people I’m talking to today. The first group will need no convincing.

Because I’d like to talk up the merits of package building as an approach to modelling, analysis and data science in general. By package building I mean assembling a core of reusable, configurable, documented, tested code with a decent interface. If you are one of the many who entered data science through the notebooks of online data science courses then it might not have even occurred to you that this is a way of working. Or you may have dismissed it as hugely inefficient given your limited, one-off goals. But hear me out.

First, I’m not suggesting you write a python package for every ad-hoc analysis request that comes your way. That would be silly. Nevertheless there are signs that a package is calling out to you from the pages of your notebook: repeated blocks of code with minor variations, sometimes coagulating into functions that you run at the top of the notebook; a growing feeling that you lack control over a complex problem; the multiplication of notebooks with titles that sound like software processes. (Note that once you are in the habit of package building things will never again get this out of hand - the problem type, or its objectives will have demanded a package way in advance.)

Second, let me point out that by building a package you will be availing yourself of problem solving techniques that have been refined over fifty years of software design, using simple but powerful concepts such as object-orientation, encapsulation, separation of concerns, statelessness, chaining. These are invaluable for organising your thoughts and then your code. If you doubt this then consider how they are all used in the analysis and modelling packages you already find so useful.

Third, note that by entering the into the world of software design you will be helping yourself to a much richer toolset for managing your work - tools for managing virtual environments, tracking changes, testing and debugging your code.

Fourth, you’ll see that software development comes with a well-honed set of processes for managing people and projects. They were invented to prevent well-meaning meddlers from wrecking development projects so how could they not be useful to you in managing upwards. Log issues, separate them into bugs and enhancements, provide your output in a bundled release of data and model after some thorough unit and system testing. Insist people wait for the next release.

Fifth, are you planning on collaborating with others on your project? Ever tried collaborating on a notebook? I don’t think I need say more.

Finally, delivering a package at the end of a project means delivering a reusable, modifiable, living, breathing thing, hopefully with a nice API that will make it easy to use. The client gets a tool rather than a dead-on-arrival report or presentation, you get some new skills and some job satisfaction.

Please do send me your questions and work dilemmas. You can DM me on substack or email me at simon@coppe

From Coppelia

The flight to the command line mentioned in last month’s glasseye has been given an extra boost by the discovery of the cursor CLI, a newish CLI tool that comes with the standard cursor subscription. I’m too tight to fork out for Claude Code so was very happy to find this. It’s a little erratic but so far so good.
A plot twist for last month’s dunghill: when I feed the offending Campaign article into ZeroGPT (the tool for detecting LLM written prose mentioned in the white stuff) I get the response: “We are highly confident this text was AI generated”. I think the Campaign article is bad but it’s not (I hope) written by genAI. So how do we interpret that result? Does it imply that the Campaign article is very human in the sense of very average/bland and of a style highly represented in a training data set that contains millions of similar articles and therefore quite typical of what a model trained on that data would spit out. Or, to take Dominic Bates point about fine tuning steps producing less human responses, is the article very un-human? Or is ZeroGPT just not good at its job?
I’m very happy to report that someone, somewhere is pushing back on the synthetic respondents bullshit that we have been railing against for well over a year now. I have been talking to a couple of people in large research agencies about how to gently point out to well meaning peers that the emperor is butt-naked. If anyone else is having such problems, I’m offering my services for free on this one!

If you’ve enjoyed this newsletter or have any other feedback, please leave a comment.

Glasseye

Discussion about this post

Ready for more?