I was a miserable biology student in high school. I never made it past the rote memorization stage of that scientific domain, where you learn "mitochondria are the powerhouse of the cell", but you don't yet cover the origin of the word "mitochondria" or exactly how it's the powerhouse of the cell...where it gets its energy from or how it transfers it to anything else. I excelled in physics and chemistry, though. I could take apart the model molecule and move the atoms around, never creating nor destroying, even if sometimes an electron got left on the table as 'energy'. I could see the transfer of momentum as you compressed a spring, or pulled a pendulum back, and then once the mechanics of the given process were made clear, it became straightforward on how to describe it with math, rather than with words.
No surprise that I, like thousands of others like me, found my way into IT, technology, and math. Some of my classmates really enjoyed the memorization, even though to me it seemed boring, this crude parroting of facts without knowledge or understanding. I know several nurses and doctors, and I'm thankful they made it through that memorization phase and on to deeper understanding.
LLMs, of which there are several, appear purpose-built for that memorization and synthesization of facts and responses, and many, if not all, of them are doomed to never make it past that stage of existence.
How LLMs Work: Prediction, Not Understanding
To understand how LLMs work, it's important to recognize two things that are common across nearly all models. First and foremost, they are predictive models, which means they have ingested millions upon millions of works, whether written, sonic, or visual. LLMs have millions of examples of how sentences are punctuated, what a sunset looks like over an ocean horizon, or what a G7 diminished chord sounds like. They have been shown zillions of pictures of a wristwatch and from zillions of angles. Each input reinforces what the LLM 'understands' what is supposed to be.
As an example, in the English language, 100% of declarative sentences end with a period or an exclamation mark, none with a question mark. As a result, no LLM will generate a question with an exclamation mark as part of its response.
When a user engages a LLM, that user provides a 'prompt', and that prompt drives an analysis of what is wanted and how it's wanted, and the LLM draws upon its vast repository of information to predict what will respond best to the prompt. Mechanically, it's not that far away from rote memorization.
Most interestingly (or infuriating, depending), it's very difficult to ask a truly unique question, especially when put against the entire question-asking world, which would include everyone trying to stump an LLM or otherwise cause it to twist itself into an error. It's not impossible, per se, but the odds are definitely trending towards 0 and accelerating every day.
The Confidence Problem
Second, they are all designed to provide some response, even if the confidence levels (which are difficult to tease out) are low. The builders and maintainers of these systems want to drive engagement, to the point of exclusivity, which means making them look like authoritative systems with The Correct Answers, Returned Quickly. Even if a user specifically prompts towards being provided a confidence level in the answer, it can be difficult to get. The LLM is going to posture as if it is stone-cold sure that the answer is correct.
So, in summary, we have an engine that predicts what the outcome should be, informed by billions of inputs and shaped by the user's prompt, and is always going to respond as if it's 100% correct.
Real-World Examples: Six Fingers and 10:10
This is why we have AI-generated pictures of people with six fingers on a hand. The LLM used to generate that picture has seen roughly equal amounts of images with people with their hand like "this" and like "that", and so the positioning of three out of five fingers is consistent across both, but many of them have that fourth finger 'here' and just as many have that fourth finger 'there', and so the LLM displays them both proudly.
At the other end of the "image generation problem" spectrum, LLMs have no real idea how to tell time. At some point in the print advertising timeline, watchmakers decided that 10:10 was by far the prettiest way to display a watch. As a result, there are so few pictures of wristwatches at any other time, with any other arrangement of their hands. 10:10 has been reinforced so hard into all of the LLMs as 'what a watch looks like' that it is impossible to get an AI to generate a picture any other way.
This has been my personal experience with writing code as well. I have attempted to use several LLMs to generate code for me (I used to write code for a living, but it's been a long time). Each one confidently gave me code that failed somehow, whether "simply won't compile" or "the platform simply isn't designed that way at all". I went back each time with error codes or other research, and the responses would be updated, but ultimately each continued to struggle and never deliver code that would work.
The Right Use Cases for LLMs
The broad spectrum of AI is not limited to generative AI. Computer vision, navigational systems, and more are amazing systems that work reliably well.
LLMs are great for brainstorming. They can generate new ideas, some of which are garbage and some which are worth follow-up. They can write new songs, or haiku, or a new bedtime story every night...it's the Shahrazad of our age. However, like those 1,001 Arabian tales, even the best LLMs' best output should be treated like a fictional tale, sometimes realistic and sometimes fantastic, but we just can't take it at face value yet.
