Follow ZDNET: Add us as a preferred source on Google.
ZDNET’s key takeaways
- ChatGPT makes errors in long-form chats with many variables.
- Personal productivity isn’t as high as it could be because of this.
- Unless you have massive infrastructure, check AI’s work.
By now, everyone knows that generative AI can be flaky. If you’re using it to edit your novel or to create an image, the AI might add elements that are inconsistent with your narrative or lose track of what it’s supposed to do with the picture.
Sometimes, flakiness is worth the time taken if it helps you iterate on an idea. But, when it comes to detailed processes, such as a financial forecast, be prepared to spend enormous amounts of time double-checking and correcting the bot — or you’ll be led astray.
Also: If AI is so amazing, why does ChatGPT melt down over this simple image edit task?
Creating a business plan is an illustrative test of OpenAI’s ChatGPT, or any generative AI program. I’ve spent weeks working with ChatGPT on hypothetical business plans, and the results have been helpful, but also riddled with errors.
(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)
The lesson: The longer your ChatGPT sessions, the more errors sneak in. It makes the experience infuriating.
Crafting a business plan with ChatGPT
Working with ChatGPT, using the newly installed OpenAI GPT-5 model, I started a chat to devise a plan to acquire thousands of subscribers for my budding newsletter publication by spending money on advertisements.
That business plan involved creating and re-creating spreadsheet tables of subscribers, revenue amounts, ad spending amounts, and cash flow profit.
ChatGPT created tables in Excel for me from scratch, and it allowed me to play with assumptions such as the slope of subscriber growth.
Also: What is OpenAI’s GPT-5? Here’s everything you need to know about the company’s latest model
The process began with a prompt to ChatGPT: “What’s a good, simple business plan outline for growing a subscription business over three years from 250 subscribers to 10,000, where churn per year is assumed at 16%?”
We went back and forth for a while. I added new information, such as a $30-per-month subscriber number, and repeated iterations on tables and charts of how the business would develop. During this process, I was able to make choices about how to adjust different metrics and then see how they affected the table. For example, changing the cost to acquire a subscriber, “CAC,” a key assumption of a media business plan, led to new tables and charts of profit and loss.
Errors start to seep in
In hindsight, this was the halcyon part of working with generative AI, a golden time before dread and confusion.
The first error cropped up about a third of the way into this process. The table of profit and loss showed the business turning profitable in month 10. However, ChatGPT asserted that “By Month ~43–45, cumulative cash flow turns positive — the business recoups all acquisition + operating expenses.”
I challenged this, noting that ChatGPT’s own table had just shown the positive turn at month 10.
In response, ChatGPT explained the terms in detail and then offered to provide a graph of profitability.
I pointed out that the graph itself showed profit turning positive in month 10. ChatGPT conceded the error, with a chipper “and that’s on me.”
Several turns later, ChatGPT made a similar mistake, claiming the business’s break-even point occurred around month 11, not month 10.
When I pointed out the discrepancy, ChatGPT conceded the error, noting that it had failed to factor in that we had started from 250 subscribers, not zero. Sure enough, when I looked back over the exchange of the last several hours, somewhere along the line, ChatGPT had “forgotten” this fundamental assumption.
I was faced with a realization that must be familiar to many users of generative AI: ChatGPT had left out a key assumption, but could point to that omission when prompted. The problem isn’t errors, but rather a flawed workflow where parts get left out and brought back later.
At this point, it became clear to me that one of us needed a good memory for the key details that were fixed assumptions amidst all the financial modeling — and it wasn’t going to be ChatGPT.
Oops! My bad!
As ChatGPT churned out new tables and graphs with each new assumption, strange little errors kept popping up.
To calculate the “terminal value” of the business, or how much a business will be worth when it’s no longer really acquiring or losing subscribers, I asked ChatGPT to tally the total subscribers in month 60 and the revenue they would generate in perpetuity.
Also: OpenAI’s fix for hallucinations is simpler than you think
ChatGPT asked if I wanted to use a precise value or a value rounded off.
The precise value it offered, 9,200 ending subscribers in month 60, was wrong. Moments earlier, ChatGPT generated a table listing the figure as 10,228.15.
Once again, ChatGPT conceded the error — without explanation — and once again, I realized I was going to have to be some kind of anal-retentive fact checker.
No excuses, no explanations
One of the most frustrating aspects of working with ChatGPT or any bot — I’ve experienced the same issue running the same type of project with Google’s Gemini — is that there’s never any explanation for what goes wrong.
Bots offer verbiage such as “my bad” when what I really want to know is, How did you just confidently cite a number in direct contradiction with the number in the table that you, yourself, generated?
Also: Your favorite AI chatbot is full of lies
No answer will be forthcoming, but I scrolled back over everything we had done. Hours earlier, we had started to discuss the matter of the terminal value. ChatGPT had laid out a number of bullet points under the heading, “What we know,” which included the statement, “By Month 60, the business has about 9,200 subscribers.”
At that moment in time, I hadn’t taken note of the fact that the assertion came out of nowhere, as nothing in the discussion to that point had suggested the number 9,200 as a number of subscribers.
Playing whack-a-mole
This is the ultimate kind of “whack-a-mole”: if you’re not on your toes, you’ll miss an incorrect assertion that might later throw everything off.
When you are working with a complex process that involves lots of shifting assumptions — subscriber growth, acquisition cost, “lifetime value of a subscriber,” etc. — you already have to keep track of the variables you’re dealing with, and to think carefully about all the ways those factors can be adjusted in your planning and forecasting, like levers you can pull to produce different possible results.
With ChatGPT, you’re getting a lot of help, but you’re also getting an assistant/collaborator who sometimes hands you the wrong lever or hides the levers you know should be at hand.
What is going on with AI, exactly?
Critics of today’s AI, such as scholar Gary Marcus, have long pointed out that ChatGPT doesn’t really have logic on its side. Instead of actual “reasoning,” which would mean being able to have a solid grip on assumptions, such as the agreed-upon variables, it will produce confident-sounding statements that let basic facts slip away.
On a simpler level, all these little slips speak to a strange occurrence with memory. Every large language model has what’s called a context window, which lets the model look back over what has come before to draw connections between the current line of output and the past.
In the case of forgetting month 10’s break-even point, ChatGPT was not correctly recalling what was in the context window. It was up to me to be faithful to already-established fact.
Also: AI’s not ‘reasoning’ at all – how this team debunked the industry hype
In the case of 9,200 ending subscribers, ChatGPT accurately reached back to an assertion in the context window that was erroneous but had never been examined.
I won’t bore you with the details of the many, many moles I whacked from then on, but here’s a rundown of the errors that cropped up in the hours that followed:
- ChatGPT used the wrong monthly subscription price, leading to the incorrect calculation of revenue, twice.
- ChatGPT calculated the “steady state” of subscriber growth one month later than had been agreed.
- ChatGPT generated a chart with numbers varying wildly from what was shown in the table using the current assumptions.
- ChatGPT generated an erroneous figure for free cash flow by mixing together different assumptions that were unrelated.
- ChatGPT constructed tables with key values missing.
- ChatGPT forgot the agreed-upon “discount rate” for future cash flows and substituted some other discount rate.
- ChatGPT miscalculated specific equations.
All these errors meant new pots of coffee, and deep-breathing exercises, as I had now steeled myself to do less ideating and pondering and more error-checking. This was the latter stage of working with generative AI, from soaring to crawling.
I reached out via email to OpenAI, providing the link to the entire chat, and detailing all the points made above.
In response, an OpenAI spokesperson emailed back that ChatGPT, “like all current LLMs” is strongest in “short-turn conversations.”
The company is “continuously improving reliability in longer conversations.” OpenAI also noted that both user terms of use and warnings in ChatGPT notify the users that the program can make mistakes, and one should “check important information.”
It’s not only business math that’s perilous
Let me say definitively that the whack-a-mole practice is not limited to business plan generation, math, or complicated finance problems. I’ve had similar issues with, for example, translating a published book of poetry in PDF form.
Not only were there errors in the text-scraping of the original text, but there were whole pieces missing from poems where the poem ran to a second page. And entirely new poems were inserted that were not in the original book of poetry.
Keep calm and prompt on
Taking a step back, what generative AI offers is quite remarkable.
Here is a program that can supply useful equations and important background information, such as the discount rate to apply to a business’s terminal value, all without my having to look up anything.
Moreover, the AI model can stay with the thread of a discussion and incorporate new information time after time. That is a substantial improvement over the mode of automated chat of only a few years ago, which was incapable of doing that.
Chatbots used to “lose the plot,” as they say, meaning, departing wildly and inexplicably from the subject of discussion. ChatGPT does not.
That overarching relevance and consistency prove useful if you want to go from nothing to some substantial piece of work.
At the same time, the AI model inserts erroneous assumptions, forgets key assumptions, and sometimes fails to calculate a key variable, all obvious mistakes that can feel maddening.
Also: 8 ways to write better ChatGPT prompts – and get the results you want faster
It’s such a mixed bag that I’ve come up with my own measure of personal productivity that’s one part euphoria and one part lamentation.
“This project took me half as much time with ChatGPT as it would have taken me on my own,” is what I say to myself.
But then I say, too, “You know, half the time I spent was spent correcting things that ChatGPT shouldn’t have done in the first place.”
As a formal equation of productive work, I guess you could say I saved half the hours, but lost another quarter of what would have been time saved.
Not measured in my calculus of personal productivity is the stress of having to watch everything very carefully to divine ChatGPT’s errors, wondering when the next “gotcha” would emerge.
I have no good way to really calculate that stress. It’s a lesson in patience — something like, “keep calm, and prompt on.”
A key technical issue
The key technical issue is that a large language model is really a database program, but it’s a sloppy database. It holds onto many pieces of data, but it can also wipe out key pieces of data or replace them with other data without warning.
There are technical solutions for those shortcomings. The approach known as “retrieval-augmented generation,” RAG, is being implemented by enterprises to keep some variables stable by storing them in a database, and then asking the model to retrieve them.
Also: Make room for RAG: How Gen AI’s balance of power is shifting
RAG can help ensure that individual variables, such as subscriber count or cash flow, once arrived at, for example, stay fixed and constant, unless they are explicitly changed at some point by common accord between the user and the machine. (RAG has its own shortcomings, it should be noted.)
Most of us, though, don’t have that infrastructure. Without recourse to RAG and the like, the best we can do is to simply be vigilant, checking at every point to make sure an error has not crept in.
Also: RAG can make AI models riskier and less reliable, new research shows
Be warned: watch the model like a hawk, and keep a pot of coffee ready.
Want to follow my work? Add ZDNET as a trusted source on Google.
Leave a Reply