The well-intentioned know-it-all
Christopher Lash was a tall, imposing, spectacled figure with a long white beard. A very erudite man, he seemed to have a profound knowledge of just about everything.
I worked with him for many years, and for a long time, I took his word on most things. But, despite all his knowledge and confidence, I had learned enough about some of his favorite topics to start seeing holes in his arguments. When I questioned or pressed him on his spurious claims, he'd blush and apologize for misleading me.
For him, whether he was saying anything true was secondary to whether he appeared to be teaching me something. It was better to provide a wrong answer than no answer at all.
AI reminds me of him.
I is for…
AI is like that old friend of mine inasmuch as it's easy to be convinced that the information a generative AI model imparts is accurate because of the certainty with which it's communicated. But if you're even moderately versed in a subject, you might be surprised by the amount of error and bias there is in the information.
I say AI, but of course - like most people nowadays - what I'm actually referring to is the deep learning models known as LLMs. (Sorry, image models: I do like you, but you're not relevant to this particular conversation).
These LLMs (large language models) have become the go-to solution for just about anything related to knowledge and communication. A veritable jack of all trades, ChatGPT is now the first thing people think of when it comes to creating emails, articles, tables, code, ideas, and a whole host of other things.
The problem is no one actually knows what LLMs are really capable of, much less the best way to use them, and - most worryingly of all - when and how they fail.
Despite the tsunami of prompt tutorials which followed the whirlwind that ChatGPT unleashed on us, there's no instruction manual for using generative AI models. They're incredibly effective in some tasks and total (or partial) failures in others. Sometimes, Gen AI has flashes of brilliance, and at other times, it will fail enough to make you think the I in AI stands for idiot.
This problem and confusion about the use of AI models in the workplace was the impetus for Navigating the Jagged Technological Frontier, released on September 15, 2023. This is the first working paper by a team of social scientists in collaboration with Boston Consulting Group. It contains the findings of an experiment conducted to measure the effects of AI on knowledge worker productivity and quality.
The evidence presented in the paper is pretty interesting, but it doesn't exactly make for easy reading. So, I'll unpack it for you without going into too much detail. By the end, you'll be able to determine whether you're a centaur or a cyborg.
The jagged technological frontier
758 consultants and 18 realistic tasks representing work done at a consulting company. That's what the experiment involved. Tasks were assigned according to three different conditions:
The headline? Consultants who used ChatGPT-4 outperformed those who didn't by a large margin and in every way.
Consultants using AI were:
But there's a lot more to it than that.
The conclusion of the study is that AI creates a “jagged technological frontier” where some tasks are easily done by AI while others, though seemingly similar in difficulty level, are outside the current capability of LLMs.
On tasks within the frontier, AI significantly improved human performance. Outside of it, humans relied too much on the AI and were more likely to make mistakes. Not all users navigated the jagged frontier with equal adeptness. While some completed their task incorrectly, others showcased a remarkable ability to harness the power of AI effectively.
Inside the frontier are the things AI is capable of, and outside the frontier are tasks it can't perform. When it came to the second category, consultants using AI were 19 percentage points less likely to produce correct solutions compared to those without AI.
The problem is that no one can see the frontier, and tasks you'd assume are jobs for a machine (like basic math) are things LLMs struggle with, while activities you'd associate with human creativity (like generating ideas or writing a sonnet) are things they can do pretty well.
What makes this even more surprising when you think carefully about it is that both activities should be the same for AI. Math obviously involves numbers, but for language models, so does writing a poem. AI perceives in tokens, not words.
Is AI a skills leveler?
It's becoming apparent in other studies of AI that using LLMs acts as a skills leveler in the workplace. The Jagged Frontier paper confirms this. Consultants who scored the lowest at the start of the experiment had a 43% jump in performance when they got to use AI. The consultants who started with the highest score also got a boost but a less significant one.
This suggests that companies should consider using AI technology more to raise workers to the top tiers of performance.
But there are two sides to every coin. While AI can act as a skills leveler, relying on it too much can backfire. A previous study had already demonstrated that recruiters who used high-quality AI became lazy, careless, and less skilled in their own judgment. They made worse decisions than those who used low-quality AI or none at all. Instead of using the LLM as a tool, they let it take over.
The Jagged Frontier study found the same thing to be true. The workers who used GPT-4 had less accurate answers than those who were not allowed to use it.
This aptly demonstrates the problem of the invisible frontier. It's easy to be deceived by AI if you can't see the line between what it can and can't do.
Centaur or cyborg?
According to the frontier study, there are two types of workers who use AI most effectively: centaurs and cyborgs.
The mythical horse-human hybrid analogy refers to workers who divide activities between themselves and AI, delegating select tasks to the models. Work is thus part human, part AI.
Users with this strategy switch between AI and human tasks, allocating responsibilities based on the strengths and capabilities of each entity. They discern which tasks are best suited for human intervention and which can be efficiently managed by AI.
The cyborgs are workers who integrate their task flow with the AI and continually interact with the technology. Every task involves AI-assisted human labor.
Cyborg users don’t just delegate tasks; they intertwine their efforts with AI at the very frontier of capabilities. This strategy might manifest as alternating responsibilities at the subtask level, such as initiating a sentence for the AI to complete or working in tandem with the AI.
A centaur's thoughts on using LLMs for writing
Why should AI get to do all the fun stuff?
Here's something the frontier study doesn't consider:
I barely use AI for writing, especially for first drafts. I'm a writer, and as such, I love the writing process. For me, the blank sheet is where the magic begins. It's where I start to form ideas: attention-grabbing opening lines, interesting angles, clever structure, and thought-provoking ways to end before I've even begun.
Delegating the first draft to an AI model would take away one of the things I most enjoy about my work.
That poses the question: will using AI too much make people's work boring? Could it mean employees will become dissatisfied doing a job they chose because they enjoy its processes?
AI doesn't make me faster
Not only does AI make writing less interesting, but it also fails to make it quicker and more efficient.
I don't mean to brag, but I've always been an insanely fast writer. Despite the findings of that frontier study, I don't think using AI would help me churn out content at greater speed without sacrificing quality. The amount of time I'd have to spend prompting an LLM to produce something I'm happy with and then editing it is the same amount of time I'd take to write it myself.
Why I'm a centaur (how I use ChatGPT)
Notwithstanding the above, I have a confession to make: I have, at times, elicited the help of an LLM with varying degrees of success. So, permit me to conclude with three examples of how I've used AI (ChatGPT in this case) for content.
1) Finalizing content ✅
Not long ago, I wrote this career story. I crafted the entire body of text without any AI assistance. I interviewed the subject of the story a couple of times and tried to find ways to put the information together in a fun and coherent way.
What I ended up with was one long block of text: no subheadings and no title. For some reason, I just couldn't figure out how to break it all up into sections or what to call the article. So, I copied and pasted the whole article into ChatGPT and asked it to create a title, identify suitable breaks in the text, and create catchy subheadings for each section.
The title and all the subheadings you see in that article (except for the emojis) are produced by ChatGPT, untouched and unadulterated. Not bad for a generative pre-trained transformer!
2) Writing and explaining code ❌
One of my first serious attempts at using ChatGPT for work was an article on web scraping with Selenium and Python. I didn't begin by using AI. Instead, I researched the topic to identify what aspects of web scraping with Selenium to focus on.
I checked the output against the Selenium documentation, video tutorials, and other articles and made some changes where the AI version seemed incomplete or too general. In the process, I discovered there was a new way to launch headless Chrome that ChatGPT couldn’t be aware of (for LLMs lived in the past until very recently), so I updated the code.
Now, with a basic structure for the tutorial ready, I began drafting the rest of the text. I tried to come up with some interesting things beyond the usual SEO content that often sounds even more robotic than ChatGPT. In the process, I decided to bring Monty Python into it. That’s when it occurred to me to use the Monty Python website as the target for the tutorial. I then updated the tutorial to demonstrate how to scrape that particular website.
Quite proud of what I'd managed to accomplish with the help of AI, I shared it with a developer to ensure accuracy. The first question the dev asked was, “Did you use ChatGPT for this?”
Developers pick up on AI-generated code the way writers detect those trite and hackneyed ChatGPT conclusions. Also, the code was too generic to be functional, so he made some small fixes to make it usable.
The moral of the story? Don't use AI to write things you wouldn't be able to write without it.
3) Fixing (or generating) tables ✅
Tables in articles can come in very handy, as you can provide a lot of information without long-winded writing. They're particularly popular amongst developers for comparing tools and methods. But sometimes, the various tools at hand being what they are, I run into problems.
A recent example is an HTML table that looked fine in light mode but invisible in dark mode. I had three choices:
- a) spend ages trying to figure out how to fix it to work in both modes (I suck at CSS),
- b) interrupt a colleague to help me out,
- c) ask my underworked AI assistant to fix it for me.
I went for option c) since ChatGPT had nothing better to do. I copied and pasted the HMTL and asked it to fix the table so it would appear in both light and dark modes. It produced the altered HTML in a flash, and it worked like a charm.
How do you use AI in the workplace?
The Navigating the Jagged Technological Frontier paper demonstrates that when AI is used for the right things, it makes workers faster, more efficient, and more productive. It can act as a skills leveler and can take a lot of the drudgery out of everyday tasks. But it can also make us lazy, inefficient, and downright stupid.
The frontier study suggests that the choice isn't whether or not to use AI but when and how to use it. The evidence points to two effective approaches to the problem: dividing tasks between human work and AI (centaurs) and combining human oversight and AI in every aspect of work (cyborgs).
Based on my own use of AI in the workplace and the findings of the frontier study, I know I'm a centaur.
Which one are you?
Extract text content from the web to feed your vector databases and fine-tune or train large language models.