Firecrawl 会在任务生命周期的各个阶段发送 Webhook 事件,因此你无需轮询,即可实时跟踪进度、获取结果并处理失败情况。
| 事件 | 触发条件 |
|---|
crawl.started | 爬取任务开始处理 |
crawl.page | 在爬取过程中抓取某个页面时 |
crawl.completed | 爬取任务结束,且所有页面均已处理完成 |
batch_scrape.started | 批量抓取作业开始处理 |
batch_scrape.page | 在批量抓取过程中抓取某个 URL 时 |
batch_scrape.completed | 批次中的所有 URL 均已处理完成 |
extract.started | 提取任务开始处理 |
extract.completed | 提取成功完成 |
extract.failed | 提取失败 |
agent.started | 代理任务开始处理 |
agent.action | 代理执行某个工具 (抓取、搜索等) |
agent.completed | 代理成功完成 |
agent.failed | 代理遇到错误 |
agent.cancelled | 代理任务被用户取消 |
monitor.page | 受监控页面的抓取已完成 |
monitor.check.completed | 监控检查完成,且页面级变更已可用 |
所有 webhook 事件均采用以下结构:
{
"success": true,
"type": "crawl.page",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [...],
"metadata": {}
}
| 字段 | 类型 | 描述 |
|---|
success | boolean | 操作是否成功 |
type | string | 事件类型 (例如 crawl.page) |
id | string | 任务 ID |
data | array or object | 与事件相关的数据 (见下方示例) |
metadata | object | 来自你在 webhook 配置中的自定义元数据 |
error | string | 错误信息 (当 success 为 false 时) |
在爬取任务开始处理时发送。
{
"success": true,
"type": "crawl.started",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
在爬取过程中,每抓取到一个页面就会发送此事件。data 数组包含页面内容和元数据。
{
"success": true,
"type": "crawl.page",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"markdown": "# 页面内容……",
"metadata": {
"title": "页面标题",
"description": "页面说明",
"url": "https://example.com/page",
"statusCode": 200,
"contentType": "text/html",
"scrapeId": "550e8400-e29b-41d4-a716-446655440001",
"sourceURL": "https://example.com/page",
"proxyUsed": "basic",
"cacheState": "命中",
"cachedAt": "2025-09-03T21:11:25.636Z",
"creditsUsed": 1
}
}
],
"metadata": {}
}
在爬取任务结束且所有页面都已处理时发送。
{
"success": true,
"type": "crawl.completed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
在批量抓取任务开始处理时发送。
{
"success": true,
"type": "batch_scrape.started",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
针对批处理中每个被抓取的 URL 发送。data 数组包含页面内容和元数据。
{
"success": true,
"type": "batch_scrape.page",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"markdown": "# Page content...",
"metadata": {
"title": "Page Title",
"description": "页面描述",
"url": "https://example.com",
"statusCode": 200,
"contentType": "text/html",
"scrapeId": "550e8400-e29b-41d4-a716-446655440001",
"sourceURL": "https://example.com",
"proxyUsed": "basic",
"cacheState": "miss",
"cachedAt": "2025-09-03T23:30:53.434Z",
"creditsUsed": 1
}
}
],
"metadata": {}
}
在批次中的所有 URL 均处理完成后发送。
{
"success": true,
"type": "batch_scrape.completed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
每个受监控页面抓取完成时都会发送此事件。该事件由抓取工作器流程发出,因此会在完整的监控检查完成汇总前到达。
{
"success": true,
"type": "monitor.page",
"id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"webhookId": "f1e2d3c4-0000-0000-0000-000000000000",
"data": [
{
"monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
"checkId": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"url": "https://example.com/blog",
"status": "changed",
"previousScrapeId": "019df94f-82c3-7e41-81f0-00c72b2d9c52",
"currentScrapeId": "019df960-73ee-7ac2-97a9-fb0e442c21f1",
"error": null,
"isMeaningful": true,
"judgment": {
"meaningful": true,
"confidence": "high",
"reason": "The page headline changed to announce a new release cadence.",
"meaningfulChanges": [
{
"type": "changed",
"before": "Welcome to our weekly update.",
"after": "Welcome to our weekly update — now with daily releases!",
"reason": "The headline changed in a way that matches the monitor goal."
}
]
},
"diff": {
"text": "--- previous\n+++ current\n@@ -1,3 +1,3 @@\n # Latest posts\n-Welcome to our weekly update.\n+Welcome to our weekly update — now with daily releases!\n"
}
}
],
"metadata": {
"environment": "production"
}
}
在监控检查完成时发送。data 对象包含检查状态和汇总计数。页面级结果仅会通过 monitor.page 事件发送,或由监控检查 API 返回。
{
"success": true,
"type": "monitor.check.completed",
"id": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"webhookId": "f1e2d3c4-0001-0000-0000-000000000000",
"data": [
{
"monitorId": "019df960-06e7-7383-9d89-82c0113dc31a",
"checkId": "019df960-5f2a-75fb-a98b-bd2d32ca67d4",
"status": "completed",
"summary": {
"totalPages": 2,
"same": 1,
"changed": 1,
"new": 0,
"removed": 0,
"error": 0
}
}
],
"metadata": {
"environment": "production"
}
}
当检查完成且没有页面错误时,success 为 true。对于部分完成或失败的检查,success 为 false,且 error 可能包含错误消息。
当提取任务开始处理时发送。
{
"success": true,
"type": "extract.started",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
在提取操作成功完成后发送。data 数组包含提取的数据和用量信息。
{
"success": true,
"type": "extract.completed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"success": true,
"data": { "siteName": "示例网站", "category": "科技" },
"extractId": "550e8400-e29b-41d4-a716-446655440000",
"llmUsage": 0.0020118,
"totalUrlsScraped": 1,
"sources": {
"siteName": ["https://example.com"],
"category": ["https://example.com"]
}
}
],
"metadata": {}
}
当提取失败时发送。error 字段中包含失败原因。
{
"success": false,
"type": "extract.failed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"error": "提取数据失败:超时已超出",
"metadata": {}
}
当 Agent 任务开始执行时发送。
{
"success": true,
"type": "agent.started",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [],
"metadata": {}
}
在每次调用工具(scrape、search 等)后发送。
{
"success": true,
"type": "agent.action",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"creditsUsed": 5,
"action": "mcp__tools__scrape",
"input": {
"url": "https://example.com"
}
}
],
"metadata": {}
}
action 事件中的 creditsUsed 值是目前累计使用 credits 总量的预估值。最终准确的 credits 消耗量仅在
completed、failed 或 cancelled 事件中可用。
当 agent 成功完成时会发送该事件。data 数组包含提取的数据以及消耗的总额度(credits)。
{
"success": true,
"type": "agent.completed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"creditsUsed": 15,
"data": {
"company": "示例公司",
"industry": "技术",
"founded": 2020
}
}
],
"metadata": {}
}
当 agent 遇到错误时会发送该事件。error 字段包含失败原因。
{
"success": false,
"type": "agent.failed",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"creditsUsed": 8
}
],
"error": "Max credits exceeded",
"metadata": {}
}
当用户取消代理作业时发送。
{
"success": false,
"type": "agent.cancelled",
"id": "550e8400-e29b-41d4-a716-446655440000",
"data": [
{
"creditsUsed": 3
}
],
"metadata": {}
}
默认情况下,你会接收到所有事件。若只想订阅特定事件,请在 webhook 配置中通过指定 events 数组:
{
"url": "https://your-app.com/webhook",
"events": ["completed", "failed"]
}
如果你只关心任务是否完成,而不需要逐页级更新时,这会很有用。