I asked AI to modify mission-critical code, and what happened next haunts me

Tsuba_sa/iStock/Getty Images Plus

Follow ZDNET: Add us as a preferred source on Google.

ZDNET’s key takeaways

AI repeated major sections in a mission-critical coding plan.
Mission-critical coding work is too risky to delegate to AI.
AI is fine for new features, not core infrastructure.

I woke up in a cold sweat. In my nightmare, I was being chased by tens of thousands of people, all of whom were enraged because I destroyed their privacy. They were all holding laptops over their heads, swinging them like clubs intended for my head.

They say nightmares reflect whatever your subconscious is trying to tell you. Given the work I was planning to start in the morning, I knew exactly what my dark night brain was trying to say.

Also: I retested GPT-5’s coding skills using OpenAI’s guidance – and now I trust it even less

It was saying, “Stop!” Don’t do it.” My inner knowing was screaming at the top of its lungs, “Don’t let the AI code for you.”

This, believe it or not, is not hyperbole. I was getting ready to start a coding project where I was planning on using an AI for help.

But unlike all those vibe coding stories you read about where someone makes a Pinterest clone in 12 days of code-free prompt calisthenics, I was planning on making a deep architectural change to mission-critical code used by more than 20,000 sites across the world to provide access security and site privacy.

If I shipped damaged code, it would, at best, break a bunch of sites. At worst, it would open the contents of those private sites to the public internet.

Also: How I saved myself $1200 a year in cloud storage – in 5 sobering steps

People use my code to create protected, private sites that they don’t want shared with the entire internet. Users can designate specific family members, schoolmates, and/or teachers who can log in.

It’s also used by developers for locking down projects in progress. Users set up private test sites behind login pages, which is great for publishing restricted-access preview sites for client review and use.

What the project entails

So, let me tell you about the update I was planning. Then I’ll share the work I did with the AI, and then why I changed my mind.

At the core of all of these sites is a series of settings records. These records capture how each site owner wants to configure their privacy. They include lists of pages to make private or public, tags and categories used for the same purpose, and a variety of other site-specific privacy settings.

Also: GPT-5 bombed my coding tests, but redeemed itself with code analysis

Settings are currently saved as something called serialized arrays. Serialization is a way of taking a big block of structured data and storing it in a database field. But there’s a problem, not with the serialization process, but with the unserialization process, where the data is reconstituted for use by code.

Most of the time, serialization goes on behind the scenes whenever WordPress saves or updates a settings value. This is a perfectly safe mechanism for settings management. However, there are some places in my existing code that explicitly serialize and unserialize unnecessarily.

The problem is that unserialize() will reconstitute pretty much anything, including malicious logic. This is called PHP Object Injection. It’s a vulnerability that can be exacerbated using unserialize(). My code already does some checks to prevent malicious behavior, but in a few places, my code does its own serialize/unserialize process that opens up a slight vector of risk.

It should be noted that this vulnerability only occurs if another existing plugin or theme already has vulnerabilities and exploits installed in the system. Tests using vulnerability scanners have never identified such vulnerabilities in any of my code, but I’d rather be safe than sorry.

Also: I tested GPT-5’s coding skills, and it was so bad that I’m sticking with GPT-4o (for now)

I want to update my code to simply remove the few unnecessary uses of hand-coded serialization. This is a fairly straightforward process that involves reading the old settings data, updating it to the new format, and saving it back to the database.

Except… not so much.

A lot needs to be considered when making this change. First, of course, is that 20,000 sites use these settings. Any change has to be robust, redundant, recoverable, and fairly transparent.

It has to have some kind of pre-migration backup process and a failure recovery process. It has to work no matter what order the settings are accessed and saved. Every setting that’s updated, checked, and resaved has to be converted throughout 12,000+ lines of code.

Edge conditions need to be identified, tested for, and factored into the code so that no site fails. Some level of version management has to be added to the settings data so that newer versions of the code know what to convert, and older versions of code on other sites don’t break.

It’s a lot. This is not starting from scratch and making some sort of pretty site using AI. This is modifying code in existing installations and making sure every site is able to safely update.

Getting an AI overview

Before I considered modifying the settings code to remove the items I was concerned about, I asked GPT-5 Thinking Mode Deep Research in ChatGPT, OpenAI Codex, Google Jules, and a lighter version of GPT-5 Deep Research.

(Disclosure: Ziff Davis, ZDNET’s parent company, filed an April 2025 lawsuit against OpenAI, alleging it infringed Ziff Davis copyrights in training and operating its AI systems.)

The last AI above was used because I apparently exceeded a limit, so my deep research query resulted in this message: “Your remaining queries are powered by a lighter version of deep research. Our full access resets on Saturday. Upgrade to ChatGPT Pro to continue using deep research.” Since I had a week to wait before I’d get the full Deep Research AI back, I decided to rerun the query in the lighter version and see what it would do.

OpenAI Codex and Google Jules both phoned in their answers. Codex gave me a short list of settings values in bullet form. Jules provided four short paragraphs essentially saying that my code requests and receives values back from the database. Neither answer impressed me.

Before GPT-5 Deep Research downgraded, I got a 13-page document that explained every mechanism, every field, and every option used in my settings code. To be honest, it was almost overwhelming. It was clear and comprehensive, but it was almost too complete. It presented the most in-the-weeds details at the same level of priority as the major concepts, making it difficult to get a truly good picture of operations.

The lighter version of Deep Research gave me what I’d consider the Goldilocks version. It was just right. It presented the high-order architecture and mentioned the tiny details, but didn’t get sidetracked by them. I found it quite useful.

Planning for the major code change

My intent was to get the AI to code this settings fix. Before Deep Research downgraded me to the lighter version, I had been working with its fully powered capability.

Also: I went hands-on with ChatGPT Codex and the vibe was not good – here’s what happened

At that point, I wanted to have Deep Research produce a plan of action for making the change then feed that plan to either Codex or Jules. Jules is known for developing a plan of action for any coding task, but given how little it provided for the initial analysis, I wasn’t confident it would be able to think through all the implications and stages necessary.

I had downloaded the aforementioned slightly overwhelming 13-page “how settings work” detail document created by the fully powered Deep Research before it downgraded. I passed that along to a new session.

The idea was to have one AI session analyze the existing code, and then have a completely different AI session take that analysis to plan the actual modification process.

Also: Google’s Jules AI coding tool exits beta with serious upgrades – and more free tasks

This time I asked it for a plan to initiate the upgrade. I gave it a very detailed prompt (in retrospect, possibly too detailed), and asked it to create a product requirements document (PRD) that could be given to Jules or Codex.

I got back an 11-page document with the following sections:

Background and Objectives
Data Structures Before and After
Migration Strategy (with Version Tracking and Failover)
Plugin Interoperability and Partial Upgrade Handling
Settings Management Library API Design
Edge Cases and Rollback Strategy
Plugin Interoperability and Partial Upgrade Handling
Settings Management Library API Design
Edge Cases and Rollback Strategy
Deployment Considerations
Developer Notes for Codex/Jules

Do you notice anything in that list? Something about it ain’t right. Keep looking. You’ll see it.

Yep, it repeated three sections. Plugin interop, settings management, and edge cases are repeated twice.

I don’t trust the AI to do this

Now, look. I’ve been guilty of cutting and pasting and leaving some content in two places, but I’m not an AI. I’m also not being “interviewed” for the job of modifying mission-critical code.

Yes, there’s no doubt I could have removed the duplicate sections and still fed the PRD to either Jules or Codex. But the presentation error raised the hairs on the back of my neck. That PRD was a set of instructions for one giant coding change. What else was wrong with it? What might I have missed?

Also: 9 programming tasks you shouldn’t hand off to AI – and why

After all, when I code, I do one small feature at a time. I test out every line, sweat every detail, and obsess over every change. But this was a big document that I could theoretically rubber-stamp and delegate the work to some pseudo-intelligence in the cloud.

I thought about this pretty deeply before making a decision.

I am fairly comfortable letting the AI add a new capability or build something from scratch. But diving deep into the bowels of mission-critical code? I’m not ready to give up the reins.

Also: Coding with AI? My top 5 tips for vetting its output – and staying out of trouble

The downside could be far too catastrophic. If the AI ran amok in my code, I might not even be able to figure out what went wrong. Sure, I could roll back all the way to before I delegated the task to the AI, but why take the chance?

I want more granular control. I’m happy to have the AI help with writing a specific routine, doing coding for well-documented interfaces, and adding some new non-mission-critical features.

But when it comes to core capabilities and things that could turn nightmares of torch-wielding, laptop-swinging, angry site operators into reality, I think I’ll do the coding myself.

Stay tuned. I will be using the AI to code. And I will tell you about it. But I’m not going to let the AI loose where it could do so much damage so quickly, for so little gain.

Also: 10 professional developers on vibe coding’s true promise and peril

Have you tried letting AI handle parts of your coding projects? Did you trust it with critical infrastructure or only non-essential features? Where do you draw the line between convenience and risk? Let us know in the comments below.

Original Source: zdnet

Discovery Tech