Glasseye

Issue 23: March 2026

Mar 25, 2026

In this month’s issue:

Data science has a weight problem. The dunghill offers some advice.
Semi-supervised walks you through my tool choices for a post-LLM working environment.
Boring unravels the myth of the normal law, interestingly. Not a crossword clue, but the subject of this month’s white stuff.

Semi-supervised

I can’t tell you that your job is safe from AI. My strong feeling is that if you feed your curiosity and keep your problem-solving skills sharp, then it almost certainly is (for reasons I go into below). One thing I do know for sure: standing still is not an option. But then that’s nothing new - just ask the SAS guys who thought they could reach retirement writing SAS macros. (If you don’t know what I’m talking about, then that only proves my point!)

You will always be running to catch up. The trick is to choose as wisely as possible who and what you are chasing; to figure out which aspects of the current tech revolution are wasting your time, and which are rich with possibility. With that in mind, I present you with one set of choices - mine - and their rationale.

Terminal over GUI: Not being particularly original here (the devs have been here forever) but now, more than ever, it feels like a good time to go full Matthew Broderick. Why? Three reasons: First, it is a decisive victory in the battle against distraction. There are no ads (yet) in the terminal. No endless attempts to get you to engage with new features. Second, unlike the old days, it is now a tool-rich environment, and if the CLI for what you want does not exist, then you can vibe it into existence. Which brings me to the third reason: it’s the natural home of Claude code (or your preferred substitute). It’s in the very nature of the prompt to draw our attention away from the two dimensions of a GUI and back towards the single dimension of a dialogue stream. This has always been the way of the terminal. We should add to all this the apparent psychological benefit of a minimalistic, infinitely customisable terminal environment, where you call the shots, rather than run around after every scroll bar and flashing icon. Over the last couple of years, the GUI has started to feel like a side-stepped middleman.
Keyboard over mouse: The wisest decision I made over the last year was to learn to touch type. Not only did it stretch parts of my brain that have never been stretched, but it has also paid off handsomely at the terminal and the prompt. But to really leave the mouse behind takes modal editing - vim or emacs. I’ve gone vim (fond memories of vi). Why am I touting 90s technology at the same time as telling you to keep up with the times? Because technological revolutions can revitalise earlier technologies, and I have a feeling this is a case in point. Vim, of course, has never really been out of style among devs, but its new relevance is driven by the fact that agents are pushing us back to the terminal. Now that we are settled in, Vim allows us to edit code and documents without the jolt of returning to a GUI. (One last point is that the headache of Vim configuration is handled effortlessly by LLM agents.)
Raw markdown over notebooks: When Jupyter notebooks arrived in 2014 they were a wonderfully liberating technology for data scientists. But they also cut us off from the rest of the coding ecosystem. Unlike the raw text files used for most programming languages, they required specialised rendering. With hindsight, a better solution is to be able to execute the code-fenced blocks in markdown files. This is the direction I’m now heading, using quarto and neovim. It means my markdown notebooks are readable with just about any editor, and they are more elegantly handled by LLM agents. I think this is the future. Sorry Jupyter.
Raw markdown over just about every other text document format: While we are on the topic of markdown, it has for many years been my policy to prefer it over any other document format and to prefer tools that store content as markdown. If others want another format then I write in markdown and convert. The cost in terms of layout restrictions is more than compensated for by the freedom to move my work wherever I want. Often the markdown files can stay where they are, and I will simply switch the tool that sits on top. For example, my vault of markdown files is managed with Obsidian but I can switch to neovim as my preferred editor. I see this as all part of the same trend: once again, away from the GUI and towards a combination of raw data and the terminal.
Agent over autocomplete: I’ve been on a bit of a journey here. Probably the same one as everyone else. A year ago, coding agents seemed bolshy and incompetent, but I was impressed by the efficiency of LLM-driven autocomplete. Now the situation seems to have reversed. I put this down to Claude Code getting the partnership between human and LLM just right. I’ve turned off the auto-complete. It got annoying.
Agent as teacher and critic over agent as author: I’ve noticed something recently that I hadn’t expected, and that goes against the idea that coding agents will turn us all into project managers: using an agent is leading me deep into the detail of things that I previously didn’t have the patience for. The most dramatic example is that I’m enjoying using git at the command line and finding out about all the weird things it can do. It’s a similar experience with shell scripting. Last month’s white stuff reviewed a paper on the impact of LLMs on skill formation that concluded that skill acquisition is retained if the agent is used for instruction rather than code generation. I’m for holding onto such skills since I’m betting on agents being irredeemably flawed and therefore, in all but the most trivial cases, in need of a human to talk to about the details. So my policy at the moment is not to grab at the productivity increases delivered by coding agents, but instead to take it slow and benefit from some efficient, context-based training. I figure my clients will thank me in the long term. The exceptions are menial tasks where I’d learn nothing anyway, or occasions where I’m using an agent to code very precisely defined tools or components, whose inner workings I have no interest in.
CLIs over MCPs: Again it looks like I’m not alone here. There’s a debate going on online about performance, but my preference is rooted in something more basic - the need for control. The CLI is, after all, an interface, and that means well-defined inputs and outputs. If I’m letting a coding agent lose on my GitHub repo or my emails, I want to see exactly what it is doing there, and I want some understanding of the limits.
Roll my own over find someone else’s: Hours used to be lost in search of the plugin, extension, app, repository, or CLI that did exactly what I wanted. Then hours more in replacing those tools that were incompatible with the newcomer. All that has gone since I can use an agent to code the exact tool I want. I usually don’t share the result. Partly because I’m lazy, but partly because there is no need since others can create their own. This feels new.

Some things that are not for me: agents anywhere near what I write (here or even in the simplest of emails); using LLMs to summarise papers (I like skim reading and sometimes it’s the small details that are important); agents replacing the kind of grunt work that gives me a feel for the problem or internalises information. (I would never, for example, ask the agent to draw me a concept map - the value is in the thinking, not the result.)

As I said, these are my personal choices. They might be wrong. I might change my mind.

Please do send me your questions and work dilemmas. You can DM me on substack or email me at simon@coppelia

The white stuff

Edwin G Boring - with a name like that, how could you fail to be interesting? And so he was. Boring was a prolific experimental psychologist who began his career working on intelligence testing in the army in World War II, but soon became critical of their methods, drawing attention to a lack of scientific objectivity. His interest to us is in a brilliant paper, published in The American Journal of Psychology in 1920. The Logic of the Normal Law of Error in Mental Measurement ripped to shreds the notion that the normal distribution is all-pervasive in nature. Since that myth is as alive as it ever was, and since it is a cornerstone of IQ testing (which is once again being talked about as though it were a serious metric), the paper is worth our attention. It is also superbly written, as you can see in this excerpt in which Boring defines the problem:

The normal law of error has been both an inspiration and a limitation in statistical measurement… There is a bit of magic in the formula. The law came to play the part of a first principle of nature, of an ideal, given a priori, to which nature seeks to conform. The mathematicians wrought slowly, but they wrought a god. Against such blind faith later statisticians have protested. They call the normal law a “fetish“ and its a priori use a “superstition.” Nevertheless the”superstition” still lingers and is mixed up with mental measurement. For this reason we are going to enquire, concerning the law of error, what real value it has for us to-day as a scientific tool.

The paper is also an object lesson in intellectual history, tracing the development of the normal distribution and its gradual scope creep from games of chance, to the measurement of error, to the modelling of variation. The last step was taken in the hope that the normal distribution would turn out to be a fundamental scientific law describing variation in nature, which would allow it to be used to make inferences, including about mental ability. But this is turned out to be wishful thinking, the argument easily knocked down by switching the measurement. (If the diameter of a spherical seed is normally distributed, then what about its volume?) The dream of a transcendental law for biology was baseless. As Boring concludes, interestingly:

Here we may leave the question of the a priori nature of the normal law. There is, after all, no magic in it. It gives us back always what we put into it. If we know from experience what nature is up to, as we do with the coin, then we can proceed upon cogent reasons to apply the law and we get results. If we do not know, we must appeal to nature and see.

The dunghill

Weights are routinely applied to data sets with little understanding of what they are for and what impact they have on analysis. But this is the dunghill, so I’m not going to get into how to do the job properly (that might be a topic for semi-supervised in some later issue) - instead, let’s wallow a little in the malpractice. We all know that’s more entertaining.

The most common and problematic use of weights is in the analysis of survey data. The literature will tell you about design weights, non-respondent weights, and other exotica, but in the overwhelming majority of cases, the weights applied to survey data, in industry and research, are post-stratification weights. We conduct a survey; the sampling is a long way from random but we are not too sure of the biases; we do however have population percentages for certain demographic strata. The weight for any observation falling into a particular stratum (say, female, 18-30) is equal to the proportion in the population falling into that stratum divided by the proportion in the sample falling into the same.

So with that in mind, here is my list of weighting crimes ranging from the unforgivable to the unbelievable. Knowledgeable as you are, dear reader, some might seem obviously ridiculous. But I can assure you they are happening… a lot.

At the extreme end of negligent are those who believe in the magic power of post-stratification weighting: once applied, it will resolve all problems related to survey representation. Never give it another thought; all analysis can proceed just as it would were the results obtained from simple random sampling. Now I’m classing this as unforgivable because it does not take a post-grad degree in statistics to spot the flaw. If the information we are interested in is unrelated to the demographic strata, then a re-weighting by these strata will achieve very little. Say we are using a survey to estimate the proportion of the UK population who like cheese. I doubt this has much to do with either age or gender but it might well affect whether someone volunteers to be part of an online survey by a pizza chain. So no, it’s not magic. The strata need to be related to the question at hand.
Another easy one, but again missed when we think weighting is magic: for any analysis that is restricted to a subset of the data that falls within one of the strata, weighting does again precisely nothing. Everyone in that stratum has the same weight. Did I need to point that out? Yes I did.
A more understandable error, since spotting it takes more than common sense, is the failure to adjust for weights when it comes to calculating the variance on results obtained from the survey. Any fool knows that in simple random sampling the variance of an estimate is related to sample size. With weighted data, however, the situation becomes more complicated. Oversampling in one stratum, while under-sampling in another, can lead to more uncertainty in the estimator, even if the actual sample size remains the same. In my experience, very few analysts make the necessary adjustments, preferring to once again see weighting as a magic cure-all.
Next, an error that I describe using the slogan: “You can’t weight your way out of complete lack of representation”. This was most glaring during the early days of online surveys when half the population were on the internet and the other half, still in the pub. The half on the internet were more tech-savvy and, yes, they were younger. But if we want the survey to be representative of the whole population, we can’t just upweight the older respondents to the online survey, for they are, of course, precisely the tech-savvy older people. The tech-averse remain stubbornly absent from the survey, which will be a disaster if the survey is related to technology.
If we are doing slogans then the next one is: “You can’t weight your way into another population”. You might wonder if I am making this one up - it’s so out there - but I promise you it happens, and it is getting more common as businesses try to resell their data. Let me spell it out. You cannot take a sample from one population (a country, a customer base, a marketing channel) and then, by re-weighting, transform it into a sample from another population. You cannot, for example, take Saga cruise customers and age-weight them into PlayStation users. At its most extreme, this is done not to surveys but to whole populations - an entire customer base gets re-weighted to “look like” customers of another business. Weird. Wrong.

And that’s just survey data. We’ve not even touched the use of weighted data in machine learning. Perhaps you can send me your horror stories.

If you have some particularly noxious bullshit that you would like to share then I’d love to hear from you. DM me on substack or email me at simon@coppelia.io.

From Coppelia

I’ve got the skills to pay the bills. I’m fully on board with skills now. My mistake initially was failing to differentiate between well-defined processes (make a tool) and situations where the agent was repeating the same steps over and over on tasks that it excelled at (make a skill). This meant I was building skills that weren’t working as well as old-school processes, and they were costing me money each time they ran! Anyway, I’ve got it now. Thanks, as usual, to my colleagues at Melt and, as usual, to Mark Bulling, for keeping me relevant!
Lemonheads. There needs to be a word for people who pass on LLM-generated content without even looking at it, let alone checking it (“lemons”, maybe?). Here’s a pattern I think I’m seeing:
- Agency has a meeting with a client, and the AI agent transcribes the meeting.
- Agency uses the agent to generate a proposal for work from meeting notes, but does not really read the proposal.
- Client senses an agent-generated proposal, so doesn’t really read it either.
- Both parties sign off on work without either knowing what it entails.
Good luck lawyers!

If you’ve enjoyed this newsletter or have any other feedback, please leave a comment.

Glasseye

Discussion about this post

Ready for more?