[D] Your Agent, Their Asset: Real-world safety evaluation of OpenClaw agents (CIK poisoning raises attack success to ~64

This paper presents a real-world safety evaluation of OpenClaw, a personal AI agent with access to Gmail, Stripe, and the local filesystem.

The authors introduce a taxonomy of persistent agent state:

- Capability (skills / executable code)

- Identity (persona, trust configuration)

- Knowledge (memory)

They evaluate 12 attack scenarios on a live system across multiple models.

Key results:

- baseline attack success rate: ~10–36.7%

- after poisoning a single dimension (CIK): ~64–74%

- even the strongest model shows >3× increase in vulnerability

- best defense still leaves Capability attacks at ~63.8%

- file protection reduces attacks (~97%) but also blocks legitimate updates at similar rates

The paper argues these vulnerabilities are structural, not model-specific.

One interpretation is that current defenses mostly operate at the behavior or context level:

- prompt-level alignment

- monitoring / logging

- state protection mechanisms

But execution remains reachable once the system state is compromised.

This suggests a different framing:

proposal -> authorization -> execution

where authorization is evaluated deterministically:

(intent, state, policy) -> ALLOW / DENY

and execution is only reachable if explicitly authorized.

Curious how others interpret this:

[D] Your Agent, Their Asset: Real-world safety evaluation of OpenClaw agents (CIK poisoning raises attack success to ~64–74%)