Machine Learning

Introducing Amazon Nova Act

By Amazon AGI

Mar 31, 2025

A research preview for developers to build agents that take action in web browsers

Today, we’re excited to introduce Amazon Nova Act, a new AI model trained to perform actions within a web browser. We’re releasing a research preview of the Amazon Nova Act SDK (available via nova.amazon.com), which will allow developers to experiment with an early version of the new model. Using this SDK, developers can build agents that can complete tasks in a web browser (e.g., submit an out of office request in internal system, put a calendar hold to show you will be out of office, and put up an ‘away from office’ email).

Since large language models (LLMs) entered the public consciousness, “agents” primarily referred to systems that could respond back to the user in natural language or draw on knowledge bases via Retrieval-Augmented Generation (RAG). Instead, we think of agents as systems that can complete tasks and act in a range of digital and physical environments on behalf of the user. Today, these systems are still new, and most of them are limited to use cases fully covered by APIs, which few are.

Our dream is for agents to perform wide-ranging, complex, multi-step tasks like organizing a wedding or handling complex IT tasks to increase business productivity. While some use cases are well-suited for today’s technology, multi-step agents prompted with high-level goals still require constant human hovering and supervision.

To address this shortcoming of today’s agents, the Nova Act SDK enables developers to break down complex workflows into reliable atomic commands (e.g., search, checkout, answer questions about the screen). It also enables them to add more detailed instructions to those commands where needed (e.g., “don’t accept the insurance upsell”), call APIs, and even alternate direct browser manipulation through Playwright to further strengthen reliability (e.g., for entering passwords). You can interleave Python code, whether it be tests, breakpoints, asserts, or thread pools for parallelization, since even the fastest agents are limited by web page load times.

Nova Act is focused on reliable building blocks that can be composed into more complex workflows. Many agent benchmarks measure model performance on high-level tasks, where state-of-the-art models achieve 30% to 60% accuracy on completing tasks in web browsers. But agents must be reliable to be truly useful — we’ve focused on scoring >90% on internal evals of capabilities that trip up other models, like date picking, drop downs, and popups, and achieve best-in-class performance on benchmarks like ScreenSpot and GroundUI Web which most directly measure the ability for our model to actuate the web:

Amazon Nova ActClaude 3.7 Sonnet*OpenAI CUA*
ScreenSpot Web Text
Follow natural language instructions to interact with a textual element on screen (e.g., set font size to 50)
0.9390.9000.883
ScreenSpot Web Icon
Follow natural language instructions to interact with a visual element on screen (e.g., how many stars does this GitHub repo have?)
0.8790.8540.806
GroundUI Web
Understand and interact with various UI elements on the web
0.8050.8250.823
* Benchmarked by our team. Prompts were generally kept simple, e.g., "click on <element>" for each element in these benchmarks. Alternative prompts did not improve performance in our testing, but further prompt engineering may be possible. Results measured internally by Amazon for evaluation purposes using (i) the Bedrock API for Claude 3.7 Sonnet and (ii) the OpenAI API for CUA.
Nova Act’s focus on reliability means that once you have things working, there’s no need to watch it perform each action—switch on headless mode, turn your agent into an API that can integrate into your product, or even set it up to run asynchronously on whatever schedule you want. Here we’ve built an agent that runs behind the scenes to order a salad for delivery every Tuesday for dinner:

While still early for Nova Act, we’re excited about our model’s ability to transfer user interface understanding across environments. We were pleasantly surprised to find that our early Nova Act checkpoints appear to succeed in novel environments--like web games--despite zero video game experience:

With this combination of reliable building blocks and flexible form factor, Nova Act is already being used in Alexa+ to navigate the internet in a self-directed way to complete tasks on your behalf when integrated services can’t provide all the necessary APIs:

Nova Act is the first step in our vision for building the key capabilities that will enable useful agents at scale. This is an early checkpoint from a much larger training curriculum we are pursuing with Nova models. To truly make agents smart and reliable for increasingly complex multi-step tasks, we think agents need to be trained via reinforcement learning on a wide range of useful environments, not just via supervised fine-tuning with simple demonstrations into an LLM. We’re excited to share more of our research and results in this direction over time.

We think the most valuable use cases for agents have yet to be built. The best developers and designers will discover them. This research preview of our Nova Act SDK enables us to iterate alongside these builders through rapid prototyping and iterative feedback. Thanks for joining us on our journey!

The Nova Act SDK is available in research preview through nova.amazon.com, a new website to easily explore Amazon Nova foundation models.