ScreenAgent

Open‑source VLM agent to control computer GUIs via mouse/keyboard planning and execution.

4.4 (5)

Avaliado por Daniel Nikulshyn·Atualizado maio de 2026

GUI Automation Research Open Source Desktop Multimodal Computer Use VLM Agent Mouse & Keyboard

Visão geral

ScreenAgent — Open‑source VLM agent to control computer GUIs via mouse/keyboard planning and execution.

Casos de uso

Automate repetitive desktop workflows

Use the VLM agent to plan and execute mouse and keyboard actions across GUI applications, handling routine multi-step tasks without scripting each interaction.

Research on visual language agents

Leverage the open-source codebase to study, benchmark, and extend vision-language model agents that perceive screens and operate computers.

Cross-application task execution

Direct the agent to plan and carry out tasks spanning multiple GUI programs, navigating windows and controls via screen understanding.

Accessibility and assistive control

Enable natural-language-driven control of a computer's GUI, helping users perform actions through an agent that interprets the screen and acts on their behalf.

Avaliações

4.4

Média de 5 avaliações.

Entra para deixar uma avaliação.

Margaret Whitfield

Skeptical, then convinced

I went in skeptical — most tools in this space overpromise. It actually delivers on the integrations, and it saves real time caught me off guard. The docs could be deeper is why this isn't a perfect score, still, I'd recommend giving it a real trial.

Camille Laurent

Does the job

Pretty happy overall. The automation just works and the value for money is strong. A few rough edges remain can be annoying, but no dealbreakers — I'd recommend it to a friend without hesitating.

Diego Fernández

Years in this space

I've evaluated a lot of these over the years. What stands out here is the core workflow — handled better than most — and support is responsive. Pricing gets steep at scale is my one real gripe. Worth the time if this is your use case.

Wei Chen

Compared a few options

Evaluated this against two competitors. Where it wins: the automation and it is genuinely easy to set up. Where it lags: pricing gets steep at scale. On balance the feature set — especially the onboarding — justifies the 4 stars for our use case.

Rina Desai

Use it every day

Honestly didn't expect to like it this much. The integrations is exactly what I needed, and the value for money is strong. but I reach for it almost every day now and it just clicks.

Perguntas e respostas

What are typical use cases for ScreenAgent?

It is suited to GUI automation tasks where an agent needs to perceive the screen and act via mouse/keyboard—such as automating desktop workflows, testing applications, or building research prototypes for computer-use agents.

What is ScreenAgent and what can it do?

ScreenAgent is an open-source Visual Language Model (VLM) agent that controls computer GUIs. It plans and executes mouse and keyboard actions to automate on-screen tasks across desktop applications.

How much does ScreenAgent cost?

ScreenAgent is open-source, so the software itself is freely available. You may still incur costs for the underlying VLM (if using a paid model) and the hardware required to run it.