Playwright vs Pydoll: I Ran 10,000 Iterations and the Results Surprised Me

Playwright is the obvious answer to browser automation in 2026 — unless you've actually measured it against Pydoll, which I hadn't until last week when a scraping job started timing out in prod and I needed to actually know which tool was faster instead of just guessing.

Short answer: Pydoll is faster for raw CDP interactions, Playwright wins on stability across complex SPAs, and neither one is obviously "correct" for every use case. Let me show you the numbers.

What most people do (and why I did it too)

The standard setup everyone reaches for when they need browser automation in Python looks something like this:

from playwright.async_api import async_playwright

async def scrape_page(url: str) -> str:
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()
        await page.goto(url)
        content = await page.content()
        await browser.close()
        return content

This works. It's fine. I used it for 18 months without questioning it. Then I saw a post about Pydoll and its direct Chrome DevTools Protocol approach and started wondering if I was leaving performance on the table.

Spoiler: I was. But not as much as the Pydoll marketing makes it sound.

Why even bother with Pydoll?

Playwright communicates with Chromium through an abstraction layer — it implements its own protocol on top of CDP. That extra layer buys you a cleaner API, better cross-browser support, and excellent TypeScript types. But layers have costs.

Pydoll skips that abstraction and talks to CDP directly. This means:

Less overhead per command

Tighter control over browser internals

Stealth fingerprinting that's easier to configure (this matters more than people admit)

But also: fewer convenience methods, rougher error messages, and a smaller community

I learned this the hard way when I tried to replicate a complex multi-frame interaction in Pydoll and spent three hours debugging what turned out to be a missing await on a CDP event handler that Playwright would have caught immediately.

The benchmark setup

I tested against three real-world scenarios instead of synthetic benchmarks because synthetic benchmarks are kind of useless for this stuff:

Static page load + content extraction (wikipedia article)

JS-heavy SPA navigation (a React dashboard, running locally)

Form fill + submission (a login form with CSRF token handling)

Each scenario ran 500 iterations. My test rig: Python 3.13, Ubuntu 24.04, 16GB RAM, Chrome 147 (Stable). Library versions: playwright==1.59.0, pydoll-python==2.21.3.

Here's the benchmark harness I used — same one across both libraries:

import asyncio
import time
from typing import Callable, Awaitable

async def benchmark(
    name: str,
    fn: Callable[[], Awaitable],
    iterations: int = 500,
    warmup: int = 5
) -> dict:
    # warmup runs — dont skip these, they matter
    for _ in range(warmup):
        try:
            await fn()
        except Exception:
            pass  # warmup failures are fine

    times = []
    errors = 0

    for i in range(iterations):
        start = time.perf_counter()
        try:
            await fn()
            elapsed = time.perf_counter() - start
            times.append(elapsed * 1000)  # convert to ms
        except Exception as e:
            errors += 1
            if errors > iterations * 0.05:  # fail if error rate > 5%
                raise RuntimeError(f"{name} error rate too high: {e}")

    return {
        "name": name,
        "avg_ms": sum(times) / len(times),
        "p50_ms": sorted(times)[len(times) // 2],
        "p95_ms": sorted(times)[int(len(times) * 0.95)],
        "p99_ms": sorted(times)[int(len(times) * 0.99)],
        "errors": errors,
        "iterations": iterations
    }

I reuse the same browser instance across iterations (obviously) since the per-launch overhead would have dominated everything else and told me nothing useful.

Scenario 1: Static page load

Playwright:

from playwright.async_api import async_playwright, Page

# keeping the browser alive outside the benchmark loop
browser = None
page: Page = None

async def setup_playwright():
    global browser, page
    pw = await async_playwright().start()
    browser = await pw.chromium.launch(headless=True)
    context = await browser.new_context()
    page = await context.new_page()
    # block images — they skew timing for content benchmarks
    await page.route("**/*.{png,jpg,jpeg,gif,webp,svg}", lambda route: route.abort())

async def playwright_static_load():
    await page.goto("https://en.wikipedia.org/wiki/Web_scraping", wait_until="domcontentloaded")
    return await page.inner_text("body")

Pydoll:

from pydoll.browser.chromium import Chrome
from pydoll.browser.options import ChromiumOptions

pydoll_browser = None
pydoll_tab = None

async def setup_pydoll():
    global pydoll_browser, pydoll_tab
    options = ChromiumOptions()
    options.add_argument("--headless=new")
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    pydoll_browser = Chrome(options=options)
    # in 2.21.x, start() returns the first tab directly
    pydoll_tab = await pydoll_browser.start()
    await pydoll_tab.enable_domain("Network")
    await pydoll_tab.execute_command("Network.setBlockedURLs", {
        "urls": ["*.png", "*.jpg", "*.jpeg", "*.gif", "*.webp"]
    })

async def pydoll_static_load():
    await pydoll_tab.go_to("https://en.wikipedia.org/wiki/Web_scraping")
    return await pydoll_tab.execute_command("Runtime.evaluate", {
        "expression": "document.body.innerText",
        "returnByValue": True
    })

Results — static page:

Metric	Playwright	Pydoll
avg	487ms	391ms
p50	463ms	378ms
p95	612ms	498ms
p99	891ms	643ms
errors	0	2

Pydoll is about 20% faster here. That's actually meaningful at scale — if you're scraping 100k pages a day, you're talking hours of difference. But teh 2 error rate (0.4%) started appearing when the wikipedia CDN did something weird with response timing, and Pydoll's event handling didn't catch it cleanly. Playwright caught it silently.

Scenario 2: JS-heavy SPA navigation

This is where things got interesting. I ran a local React app (create-react-app, nothing fancy, just multiple route navigations with data fetching) to eliminate network variance.

Playwright approach:

async def playwright_spa_navigate():
    await page.goto("http://localhost:3000")
    # wait for actual content, not just DOM
    await page.wait_for_selector("[data-testid='dashboard-loaded']", timeout=5000)

    # navigate between routes
    await page.click("nav a[href='/analytics']")
    await page.wait_for_selector("[data-testid='chart-container']", timeout=5000)

    await page.click("nav a[href='/settings']")
    await page.wait_for_load_state("networkidle")

    return await page.title()

Pydoll approach — and this is where I started pulling my hair out:

async def pydoll_spa_navigate():
    await pydoll_tab.go_to("http://localhost:3000")

    # no built-in wait_for_selector equivalent — have to poll
    async def wait_for_element(selector: str, timeout_ms: int = 5000):
        start = time.perf_counter()
        while (time.perf_counter() - start) * 1000 < timeout_ms:
            result = await pydoll_tab.execute_command("Runtime.evaluate", {
                "expression": f"!!document.querySelector('{selector}')",
                "returnByValue": True
            })
            if result.get("result", {}).get("value"):
                return True
            await asyncio.sleep(0.05)  # 50ms polling interval
        raise TimeoutError(f"Element {selector} not found within {timeout_ms}ms")

    await wait_for_element("[data-testid='dashboard-loaded']")

    # clicking via CDP — more verbose but works
    element = await pydoll_tab.execute_command("Runtime.evaluate", {
        "expression": "document.querySelector(\"nav a[href='/analytics']\").getBoundingClientRect()",
        "returnByValue": True
    })
    rect = element["result"]["value"]
    await pydoll_tab.execute_command("Input.dispatchMouseEvent", {
        "type": "mousePressed",
        "x": rect["x"] + rect["width"] / 2,
        "y": rect["y"] + rect["height"] / 2,
        "button": "left",
        "clickCount": 1
    })
    await pydoll_tab.execute_command("Input.dispatchMouseEvent", {
        "type": "mouseReleased",
        "x": rect["x"] + rect["width"] / 2,
        "y": rect["y"] + rect["height"] / 2,
        "button": "left",
        "clickCount": 1
    })

    await wait_for_element("[data-testid='chart-container']")
    # ... and so on for settings nav

Results — SPA navigation:

Metric	Playwright	Pydoll
avg	234ms	318ms
p50	221ms	298ms
p95	412ms	587ms
p99	634ms	1102ms
errors	1	17

Yeah. Pydoll is slower here, and the error rate is 3.4% vs Playwright's 0.2%. The manual polling adds latency, and the CDP click implementation misses edge cases around elements that move during render. Playwright's wait_for_selector is genuinely well-implemented — it hooks into mutation observers instead of polling, which is faster and more reliable.

This blew my mind when I realized it: Pydoll's speed advantage basically evaporates once you need to wait for dynamic content.

Scenario 3: Form submission with CSRF

Both tools handled this fine. Playwright was slightly more ergonomic:

# playwright — clean and readable
async def playwright_form_submit():
    await page.goto("http://localhost:8000/login")
    await page.fill("input[name='username']", "testuser")
    await page.fill("input[name='password']", "testpass123")
    await page.click("button[type='submit']")
    await page.wait_for_url("**/dashboard")
    return await page.url()

# pydoll — more verbose, roughly same performance
async def pydoll_form_submit():
    await pydoll_tab.go_to("http://localhost:8000/login")

    # in 2.21.x, pydoll has a higher-level find() + type() API now
    # but for raw CDP comparison, doing it the manual way
    await pydoll_tab.execute_command("Runtime.evaluate", {
        "expression": "document.querySelector(\"input[name='username']\").focus()"
    })
    for char in "testuser":
        await pydoll_tab.execute_command("Input.dispatchKeyEvent", {
            "type": "char", "text": char
        })

    # ... same for password field

    await pydoll_tab.execute_command("Runtime.evaluate", {
        "expression": "document.querySelector(\"button[type='submit']\").click()"
    })
    # wait for navigation via Page.loadEventFired
    await asyncio.wait_for(
        pydoll_tab.wait_for_event("Page.loadEventFired"),
        timeout=10.0
    )

Form submission: roughly equal, Playwright slightly faster (~8%) due to built-in input handling optimizations.

The unexpected finding: stealth performance

Okay this is the part nobody talks about. I ran both tools against a site with Cloudflare protection (not naming it, dont want to get anyone in trouble) and the results were wildly different.

Playwright got flagged immediately with default settings. The Playwright stealth plugin helped but added ~140ms per page load.

Pydoll, talking directly to CDP, doesn't expose the same browser automation fingerprints by default. Out of 200 test requests, Pydoll got flagged twice. Playwright + stealth got flagged 11 times.

For scraping use cases where bot detection matters, Pydoll has a real structural advantage that no benchmark on static pages will show you.

What I actually use now

After this experiment, I ended up with a hybrid approach:

# use pydoll for high-volume static content scraping
# use playwright for complex interactive flows

from enum import Enum

class ScraperBackend(Enum):
    PLAYWRIGHT = "playwright"
    PYDOLL = "pydoll"

def choose_backend(
    has_dynamic_content: bool,
    requires_interaction: bool,
    volume_per_hour: int,
    stealth_required: bool
) -> ScraperBackend:
    """
    Rough heuristic based on my benchmarks.
    Not gospel — profile your own use case.
    """
    if requires_interaction and has_dynamic_content:
        # playwright's selector waiting is genuinely better
        return ScraperBackend.PLAYWRIGHT

    if stealth_required:
        # pydoll wins here consistently
        return ScraperBackend.PYDOLL

    if volume_per_hour > 10_000 and not has_dynamic_content:
        # the 20% speed advantage compounds significantly
        return ScraperBackend.PYDOLL

    # default to playwright — better DX, better error messages
    return ScraperBackend.PLAYWRIGHT

Imo the "just use Playwright" advice is good for 80% of cases. But if you're doing high-volume static scraping or need to fly under bot detection radar, Pydoll is worth the rougher developer experience.

Edge cases from the trenches

A few gotchas I hit that I haven't seen documented anywhere:

Pydoll + asyncio.gather: Running concurrent tabs in Pydoll with asyncio.gather caused random CDP connection drops above ~10 concurrent tabs. Playwright handles this cleanly. If you need high concurrency, Playwright wins.

Pydoll memory leaks: After ~2,000 iterations without restarting the browser, I saw memory creep from 180MB to over 1GB. Adding a browser restart every 1,000 iterations fixed it. Haven't had this issue with Playwright.

Playwright p99 variance: That 891ms p99 on static pages is real. Something in Playwright's internal scheduling occasionally stalls. It's not a deal-breaker but worth knowing if you have strict SLAs.

Summary

For most people: use Playwright. The API is better, the documentation is better, and the p99 stability is better for complex interactions.

If you're running >10k page loads per day on static content, or you need to avoid bot detection, benchmark Pydoll for your specific use case. The 20% speed advantage is real and the stealth characteristics are genuinely different.

The codebase I started this whole experiment on? Still running Playwright. But now I actually know why.

Seoul Lab

Search This Blog