Artificial Intelligence

NVIDIA Project Arc & OpenShell: Sandboxed AI Agents Guide

NVIDIA and ServiceNow's Project Arc runs long, sandboxed AI agents on OpenShell. Here is what that means for mobile field ops and n8n automation work.

İlker Ulusoy 2026-06-16 8 min min read

NVIDIA Project Arc, announced with ServiceNow at Knowledge 2026 and flagged again in the latest smol.ai newsletter, is a long-running autonomous desktop agent that runs inside a sandboxed, policy-governed runtime called NVIDIA OpenShell. Everything is denied by default, every action is audited, and the agent ties into ServiceNow Action Fabric for governance. For teams that already build with Halmob — mobile apps, n8n automations, and AI agent orchestration — Project Arc is less a new product to buy and more a blueprint to copy.

The interesting part of Project Arc is not that another vendor shipped an agent. It is the runtime model underneath. Most enterprise agents today run with broad credentials, open file system access, and audit trails bolted on afterwards. OpenShell flips that: the sandbox starts with zero permissions, IT defines what is allowed, and every tool call lands in a structured log before it touches a real system. That is the security posture mobile field-ops and regulated automation pipelines have been waiting for.

The 30-Second Version

Project Arc is a self-evolving desktop agent for knowledge workers, available in early preview. It runs on NVIDIA OpenShell, an open source sandbox where every action is denied until policy explicitly allows it, and it connects to ServiceNow Action Fabric for cross-system orchestration. The OpenShell pattern is portable — you can apply the same denial-by-default model to mobile agents and n8n workflows without buying ServiceNow.

What Project Arc Actually Does

Project Arc is an autonomous desktop agent aimed at developers, IT admins, and other knowledge workers whose day is a long chain of small tasks across many tools. Instead of a chatbot that answers one question, it is a worker that runs for hours, holds context, calls real tools, and reports back. The agent is "self-evolving" in the sense that it refines its plan as it gets feedback from the systems it touches.

  • Long-running. The agent persists across many tool calls and many minutes, not a single request and response.
  • Desktop-native. It operates on a real workstation environment — files, shells, browsers — instead of just a chat surface.
  • Sandboxed. Everything happens inside NVIDIA OpenShell, where the default answer to any action is no.
  • Policy-governed. IT defines what tools and actions the agent can touch, and ServiceNow Action Fabric records every decision.
  • Audit-ready. The trail is built in, not stitched together after the fact from raw logs.

The OpenShell Sandbox Model

NVIDIA OpenShell is the open source piece that matters most for the rest of us. It is a secure runtime for developing and deploying autonomous agents in sandboxed, policy-governed environments. The phrase "denied by default" is doing a lot of work in the announcement: it is the exact opposite of how most internal automation gets shipped, where an agent inherits the credentials of whoever launched it and the audit log is whatever the framework happened to print.

Think of OpenShell as the container philosophy applied to agents. A container does not see the host file system unless you mount it; an OpenShell agent does not see a tool unless policy mounts it. That single change — from "allow everything, then block what hurts" to "allow nothing, then permit what helps" — is what makes long-running agents acceptable to enterprise IT in the first place.

Why This Matters for Mobile and Field Ops

Project Arc is sold as a desktop agent, but the runtime pattern is exactly what a mobile field-ops agent needs. A technician's phone agent that schedules visits, pulls inventory, and updates work orders looks small on screen and large in the threat model: it talks to the same systems a desktop agent does, often with weaker network controls. A denial-by-default sandbox around the mobile agent's tool calls — not just inside its UI — is the difference between "we built a cool prototype" and "we shipped it to a regulated industry."

That is the architecture we keep arriving at in our mobile development work at Halmob. The agent on the phone is the surface, but the trust boundary lives in the policy layer behind it. The Project Arc announcement is a useful reminder that the policy layer is no longer optional, and that an open source runtime like OpenShell removes the last excuse for ad-hoc agent permissions.

Project Arc vs. Earlier Enterprise Agent Patterns

Anyone who has shipped agents in the last two years has seen at least three different runtime philosophies. Project Arc is closer to the third than the first two, and the table below shows where it lands.

PatternPermissionsAuditBest fit
Inherited-credentials agentWhatever the user can doWhatever the framework printsInternal prototypes only
Tool-allowlist agentCurated list of API callsPer-tool, hand-stitchedSingle-team production
Sandboxed agent (OpenShell / Project Arc)Denied by default, policy-grantedBuilt into the runtimeRegulated, cross-team rollouts

Most teams sit in the middle row today. They have a working agent, they have a partial allowlist, and the audit story is a slide. Project Arc's contribution is mostly to make the bottom row look attainable without writing a sandbox from scratch.

Adapting the Pattern Without ServiceNow

You do not need a ServiceNow contract to copy the architecture. The OpenShell model — denial by default, policy-defined tool surface, audit baked into the runtime — translates cleanly into the stack we ship with most clients. The shape we recommend is below.

  1. 1Pick the orchestrator. n8n for workflow stitching, a custom harness for code-heavy agents, or both. The orchestrator owns the "what should run next" question; it does not own credentials.
  2. 2Put every tool behind a policy gate. No agent calls a tool directly. Every call goes through a thin server that checks "is this agent allowed to do this, now, against this resource."
  3. 3Start with an empty allowlist. Add tools as the use case requires them, not in case the agent might want them. Every addition is a deliberate decision, logged with a reason.
  4. 4Log the decision, not just the call. Audit needs "agent X tried action Y, policy Z allowed it because R" — not just the raw HTTP request.
  5. 5Treat the mobile client the same as the desktop client. The policy layer does not care which surface called it. That is the only way you scale from one team to ten without rewriting the trust model.
  6. 6Wire human approval in early. A long-running agent will eventually want to do something risky. The cheapest place to add a human-in-the-loop is at the policy gate, not inside every tool.

Risks and Open Questions

  • Policy authoring becomes the bottleneck. Denial by default is only as useful as the policy library. Without a clear owner, you end up with an agent that cannot do anything.
  • Sandbox escape is still a research problem. OpenShell is new. Treat the boundary as "good enough for most internal work" and not as a substitute for network segmentation, secret management, and least-privilege accounts.
  • Long-running tasks need cancellation semantics. A self-evolving agent that runs for hours can also burn budget for hours. The kill switch is part of the design, not a feature you add when a finance team complains.
  • Audit volume scales fast. One policy decision per tool call across many agents is a lot of structured logs. Plan storage and retention before you turn on a fleet.
  • Lock-in via Action Fabric. The Project Arc value depends on tight integration with ServiceNow workflows. If you are not on ServiceNow, copy OpenShell and bring your own action layer.

Where Project Arc Fits Among Other Recent Agent Announcements

The shape of 2026 is becoming clear: every major platform is shipping its own opinion on how long-running agents should run. Salesforce's answer is multi-agent coordination on Atlas 3.0, covered in our Salesforce Agentforce multi-agent orchestration write-up. Cloudflare's answer is durable execution, which we walked through in the Cloudflare Project Think guide. OpenAI's recent transport change is covered in our WebSocket Responses API piece.

Project Arc is none of those things — it is a runtime story, not an orchestration, durability, or transport story. The four announcements stack: a faster pipe, a durable executor, a coordinated team, and a sandbox around each agent. A serious enterprise rollout in 2026 will end up using something from each layer.

The smartest thing OpenShell does is admit that "trust the agent" is not a security model. Default-deny is.

How It Fits the Halmob Stack

Most of what we ship at Halmob is the bridge between an AI agent and a real user on a phone — a field technician, a sales rep, a clinician on the floor. The pattern we recommend is the same one Project Arc validates: keep the mobile UI thin, push the agent into an orchestrator we control, and put every tool call through a policy gate. The technology stack underneath is usually some combination of an n8n automation layer, a mobile app, and a model provider — the same pieces, just composed with the OpenShell mindset.

Teams that are still picking a baseline agent platform will get more out of our OpenClaw 101 guide for new users than out of Project Arc itself. OpenClaw covers the primitives — tools, permissions, memory — that any sandboxed runtime assumes are already named. Once those primitives exist, OpenShell-style governance is a configuration job, not a rewrite.

When to wait, when to copy now

Wait on Project Arc itself if you are not on ServiceNow — the integration is the value. Copy the OpenShell pattern now if you ship any agent that touches real systems. Denial-by-default does not need a vendor.

The Bottom Line

Project Arc matters less as a product than as a signal. Long-running autonomous agents, in 2026, are not getting deployed without a sandbox underneath them. NVIDIA shipping OpenShell as open source raises the floor for everyone else: there is no good reason left to wire an agent directly into a production system with ambient credentials and hope.

If you are building an AI agent that touches your customers' data, ask the OpenShell question first — what is denied by default, who writes the allow list, and where does the audit trail land. The right answers turn a demo into a system your security team will actually approve. At Halmob we pair mobile development with n8n automation and AI agent orchestration for teams that want to clear that bar on the first try.

For sources, see the official NVIDIA Project Arc and OpenShell announcement, the ServiceNow newsroom write-up, and the agent infrastructure coverage on the smol.ai AINews newsletter.