API Reference
Canonical tool, resource, and prompt reference for MCP-Crawl4AI
API Reference
MCP-Crawl4AI exposes a canonical web-surface with envelope-based responses for crawl operations.
The server exposes 4 tools and 2 resources.
Connecting to the Server
| Transport | Command |
|---|---|
| stdio (default) | mcp-crawl4ai |
| HTTP | mcp-crawl4ai --transport http --host 127.0.0.1 --port 8000 |
Tools
scrape
Scrape one URL or a bounded list of URLs (max 20) using canonical option groups.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
targets | string | string[] | Yes | One URL string or an array of 1-20 URLs. |
options | object | null | No | Canonical option groups: extraction, transformation, conversion, runtime, diagnostics, session, render. |
Example Request
Response Shape
crawl
Crawl with canonical traversal controls.
options.traversal.mode="list"for bounded URL traversal.options.traversal.mode="deep"for recursive BFS/DFS traversal from one seed URL.
Parameters
| Name | Type | Required | Description |
|---|---|---|---|
targets | string | string[] | Yes | One URL or list of URLs (deep mode requires exactly one). |
options | object | null | No | Same option groups as scrape, plus traversal. |
Traversal Option Highlights
| Field | Description |
|---|---|
mode | list or deep |
max_depth / max_pages | Deep traversal bounds |
crawl_mode | bfs or dfs for deep mode |
include_external / url_filters | Deep frontier controls |
max_concurrency, rate_limit_*, dispatcher | List-mode dispatcher controls |
close_session
Close a stateful session and purge associated runtime state/artifacts.
| Name | Type | Required | Description |
|---|---|---|---|
session_id | string | Yes | Session identifier created via scrape/crawl session options. |
get_artifact
Retrieve metadata/content for artifacts captured by scrape or crawl.
| Name | Type | Required | Description |
|---|---|---|---|
session_id | string | Yes | Session identifier that owns the artifact. |
artifact_id | string | Yes | Artifact identifier returned in run metadata. |
include_content | boolean | No | Include bounded artifact content when true. |
Canonical Option Groups
| Group | Fields |
|---|---|
extraction | css_selector, word_count_threshold, schema, extraction_mode |
transformation | js_code |
conversion | output_format, capture_artifacts |
runtime | wait_for, bypass_cache, timeout_ms, max_retries, retry_backoff_ms, max_content_chars |
diagnostics | include_diagnostics |
session | session_id, session_ttl_seconds, session_max_uses, artifact retention fields |
render | viewport_width, viewport_height |
traversal | mode, deep controls, dispatcher controls |
Resources
| URI | MIME Type | Description |
|---|---|---|
config://server | application/json | Server config and canonical tool inventory |
crawl4ai://version | application/json | Server/dependency versions |
Prompts
| Prompt | Parameters | Description |
|---|---|---|
summarize_page | url, focus | Suggests a scrape-first summarization workflow |
build_extraction_schema | url, data_type | Guides schema authoring for scrape extraction options |
compare_pages | url1, url2 | Suggests comparing two pages via scrape |
Found an issue with this page? Submit a GitHub issue