MCP-Crawl4AI

API Reference

Canonical tool, resource, and prompt reference for MCP-Crawl4AI

API Reference

MCP-Crawl4AI exposes a canonical web-surface with envelope-based responses for crawl operations.

The server exposes 4 tools and 2 resources.

Connecting to the Server

TransportCommand
stdio (default)mcp-crawl4ai
HTTPmcp-crawl4ai --transport http --host 127.0.0.1 --port 8000

Tools

scrape

Scrape one URL or a bounded list of URLs (max 20) using canonical option groups.

Parameters

NameTypeRequiredDescription
targetsstring | string[]YesOne URL string or an array of 1-20 URLs.
optionsobject | nullNoCanonical option groups: extraction, transformation, conversion, runtime, diagnostics, session, render.

Example Request

{
  "name": "scrape",
  "parameters": {
    "targets": "https://example.com",
    "options": {
      "conversion": { "output_format": "markdown" },
      "runtime": { "bypass_cache": true }
    }
  }
}

Response Shape

{
  "schema_version": "scrape-crawl.v1",
  "tool": "scrape",
  "ok": true,
  "data": {
    "target": "https://example.com",
    "url": "https://example.com",
    "ok": true,
    "data": "# Example Domain ..."
  },
  "items": null,
  "meta": {
    "target_count": 1,
    "option_groups": ["conversion", "runtime"],
    "output_format": "markdown",
    "diagnostics": false,
    "session_id": null,
    "extraction_mode": null
  },
  "warnings": [],
  "error": null
}

crawl

Crawl with canonical traversal controls.

  • options.traversal.mode="list" for bounded URL traversal.
  • options.traversal.mode="deep" for recursive BFS/DFS traversal from one seed URL.

Parameters

NameTypeRequiredDescription
targetsstring | string[]YesOne URL or list of URLs (deep mode requires exactly one).
optionsobject | nullNoSame option groups as scrape, plus traversal.

Traversal Option Highlights

FieldDescription
modelist or deep
max_depth / max_pagesDeep traversal bounds
crawl_modebfs or dfs for deep mode
include_external / url_filtersDeep frontier controls
max_concurrency, rate_limit_*, dispatcherList-mode dispatcher controls

close_session

Close a stateful session and purge associated runtime state/artifacts.

NameTypeRequiredDescription
session_idstringYesSession identifier created via scrape/crawl session options.

get_artifact

Retrieve metadata/content for artifacts captured by scrape or crawl.

NameTypeRequiredDescription
session_idstringYesSession identifier that owns the artifact.
artifact_idstringYesArtifact identifier returned in run metadata.
include_contentbooleanNoInclude bounded artifact content when true.

Canonical Option Groups

GroupFields
extractioncss_selector, word_count_threshold, schema, extraction_mode
transformationjs_code
conversionoutput_format, capture_artifacts
runtimewait_for, bypass_cache, timeout_ms, max_retries, retry_backoff_ms, max_content_chars
diagnosticsinclude_diagnostics
sessionsession_id, session_ttl_seconds, session_max_uses, artifact retention fields
renderviewport_width, viewport_height
traversalmode, deep controls, dispatcher controls

Resources

URIMIME TypeDescription
config://serverapplication/jsonServer config and canonical tool inventory
crawl4ai://versionapplication/jsonServer/dependency versions

Prompts

PromptParametersDescription
summarize_pageurl, focusSuggests a scrape-first summarization workflow
build_extraction_schemaurl, data_typeGuides schema authoring for scrape extraction options
compare_pagesurl1, url2Suggests comparing two pages via scrape

Found an issue with this page? Submit a GitHub issue

On this page