Browser Automation with AI: Using Browser Use for Seamless Control

Build smart agents that interact with the web like a human — code walkthrough included.

Jun 07, 2025

a computer keyboard with a bunch of icons on it — Photo by BoliviaInteligente on Unsplash

Introduction

The field of AI is exploding with amazing tools, and now, it's your time to leverage those tools to build something amazing. In this blog post, you will learn how to leverage Browser-Use, a quick and easy way to connect your AI Agent with the browser.

Creating a virtual environment

As usual, you can start by creating a virtual environment to avoid messing up your existing Python environment. It is also essential to note that Browser-Use requires Python 3.11 and above, otherwise, it won’t work.

I also recommend using uv for package installation, as it is deemed a modern, high-performance Python package manager and installer written in Rust.

#creating virtual environments with python 3.11
uv venv --python 3.11

# Activating the environment (For Mac/Linux)
source .venv/bin/activate

# installing browser-use in the environment
uv pip install browser-use

# install playwright
uv run playwright install

You will also notice that, we install Playwright, a general purpose browser automation tool designed for testing web applications end-to-end. You are done with installation for browser-use.

Creating Agent

Browser Use supports various Langchain-based chat models, and If you are planning to use different chat model like Anthropic, Deepseek, then you can install their corresponding langchain modules.

Here is your simple way to kick-start browser use with the GPT4o agent. Here, langchain_openai is installed.

Note: You have to set the OPENAI_API_KEY in the .env file, and load it as mentioned in the code below using load_dotenv . Likewise, you have set different API keys for different LLM providers as well.

from langchain_openai import ChatOpenAI
from browser_use import Agent
from dotenv import load_dotenv
load_dotenv()

import asyncio

llm = ChatOpenAI(model="gpt-4o")

async def main():
    agent = Agent(
        task="Compare the price of gpt-4o and DeepSeek-V3",
        llm=llm,
    )
    result = await agent.run()
    print(result)

asyncio.run(main())

Open-Source Ollama Models

Now, not everyone has the API key to try out the closed-source models. So, Browser-Use provides support for Ollama, which provides all the open-source models.

You can set up Ollama by following the steps below:

Download Ollama from here
Run ollama pull model_name. Pick a model that supports tool-calling from here
Run ollama start

from langchain_ollama import ChatOllama
from browser_use import Agent
from pydantic import SecretStr


# Initialize the model
llm=ChatOllama(model="qwen2.5", num_ctx=32000)

# Create agent with the model
agent = Agent(
    task="Your task here",
    llm=llm
)

Since you are working with a browser and an AI Agent, the tool provides a great bunch of parameters to control the Agent and the Browser. You can check all the supported models here

Agent Parameters

use_vision — By default, the agent is enabled with vision parameters to work with visual information present on the website.
controller — It is used for adding custom functions that can be called by the agent.
Refer to the link for other parameters for an agent.

Browser Settings

BrowserSession is the Browser Use object that tracks a connection to a running browser.
You can connect to an existing running browser using the PID (process ID)
You can use cookies from an existing browser and add them as part of a browser session.
There are various options to explore on the browser here.

Custom Functions

You can extend the default agent and write custom action functions to do certain tasks. For examples of custom actions (e.g. uploading files, asking a human-in-the-loop for help, drawing a polygon with the mouse, and more), You can see some examples of custom-functions here.

Action Function Registration

To register your custom functions (which can be sync or async), decorate them with the @controller.action(...) decorator. This saves them into the controller.registry.

from browser_use import Controller, ActionResult

controller = Controller()

@controller.action('Ask human for help with a question', domains=['example.com'])   # pass allowed_domains= or page_filter= to limit actions to certain pages
def ask_human(question: str) -> ActionResult:
    answer = input(f'{question} > ')
    return ActionResult(extracted_content=f'The human responded with: {answer}', include_in_memory=True)

Next, you can pass the controller to the agent as follows

# Then pass your controller to the agent to use it
agent = Agent(
    task='...',
    llm=llm,
    controller=controller,
)

Sensitive Data

Since you will be interacting with various websites that require us to log in, you must use browser use carefully to avoid sending PII information to LLM.

Interestingly, it provides a parameter called sensitive_data in the Agent module. When you’re working with sensitive information like passwords or PII, you can use the Agent(sensitive_data=...) parameter to provide sensitive strings that the model can use in actions without ever seeing them directly.

agent = Agent(
    task='Log into example.com as user x_username with password x_password',
    sensitive_data={
        'https://example.com': {
            'x_username': 'abc@example.com',
            'x_password': 'abc123456',  # 'x_placeholder': '<actual secret value>',
        },
    },
)

Browser-Use also provides sample examples to get started with different use cases.

Sample — Posting On Twitter

"""
Goal: Provides a template for automated posting on X (Twitter), including new tweets, tagging, and replies.

X Posting Template using browser-use
----------------------------------------

This template allows you to automate posting on X using browser-use.
It supports:
- Posting new tweets
- Tagging users
- Replying to tweets

Add your target user and message in the config section.

target_user="XXXXX"
message="XXXXX"
reply_url="XXXXX"

Any issues, contact me on X @defichemist95
"""

import asyncio
import os
import sys
from dataclasses import dataclass

sys.path.append(os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))))

from dotenv import load_dotenv

load_dotenv()

from langchain_openai import ChatOpenAI

from browser_use import Agent, Controller
from browser_use.browser import BrowserProfile, BrowserSession

if not os.getenv('OPENAI_API_KEY'):
 raise ValueError('OPENAI_API_KEY is not set. Please add it to your environment variables.')


# ============ Configuration Section ============
@dataclass
class TwitterConfig:
 """Configuration for Twitter posting"""

 openai_api_key: str
 chrome_path: str
 target_user: str  # Twitter handle without @
 message: str
 reply_url: str
 headless: bool = False
 model: str = 'gpt-4o-mini'
 base_url: str = 'https://x.com/home'


# Customize these settings
config = TwitterConfig(
 openai_api_key=os.getenv('OPENAI_API_KEY'),
 chrome_path='/Applications/Google Chrome.app/Contents/MacOS/Google Chrome',  # This is for MacOS (Chrome)
 target_user='XXXXX',
 message='XXXXX',
 reply_url='XXXXX',
 headless=False,
)


def create_twitter_agent(config: TwitterConfig) -> Agent:
 llm = ChatOpenAI(model=config.model, api_key=config.openai_api_key)

 browser_profile = BrowserProfile(
  headless=config.headless,
  executable_path=config.chrome_path,
 )
 browser_session = BrowserSession(browser_profile=browser_profile)

 controller = Controller()

 # Construct the full message with tag
 full_message = f'@{config.target_user} {config.message}'

 # Create the agent with detailed instructions
 return Agent(
  task=f"""Navigate to Twitter and create a post and reply to a tweet.

        Here are the specific steps:

        1. Go to {config.base_url}. See the text input field at the top of the page that says "What's happening?"
        2. Look for the text input field at the top of the page that says "What's happening?"
        3. Click the input field and type exactly this message:
        "{full_message}"
        4. Find and click the "Post" button (look for attributes: 'button' and 'data-testid="tweetButton"')
        5. Do not click on the '+' button which will add another tweet.

        6. Navigate to {config.reply_url}
        7. Before replying, understand the context of the tweet by scrolling down and reading the comments.
        8. Reply to the tweet under 50 characters.

        Important:
        - Wait for each element to load before interacting
        - Make sure the message is typed exactly as shown
        - Verify the post button is clickable before clicking
        - Do not click on the '+' button which will add another tweet
        """,
  llm=llm,
  controller=controller,
  browser_session=browser_session,
 )


async def post_tweet(agent: Agent):
 try:
  await agent.run(max_steps=100)
  agent.create_history_gif()
  print('Tweet posted successfully!')
 except Exception as e:
  print(f'Error posting tweet: {str(e)}')


async def main():
 agent = create_twitter_agent(config)
 await agent.run()


if __name__ == '__main__':
 asyncio.run(main())

Interesting Demos

I would recommend watching these videos to get an idea of what can be done with this tool.

Applying for Jobs

Writing In Google Docs

Conclusion

I found the Browser-Use tool very interesting, considering it requires us to dive deep into how browser works, and all the various actions and models that can be leveraged to perform various tasks. I hope you will be eager to experiment with Browser-Use. Do share your thoughts.

Connect with the Author here:

Reference

Browser-Use GitHub Repo

Browser-Use Documentation

MLWorks Newsletter

Discussion about this post

Ready for more?