MCP 服务器监控经验分享
过去几个月我一直在生产环境中构建模型上下文协议 (MCP) 服务器。这很有趣,但协议中仍有许多问题需要解决。
忽略多租户问题,我遇到的一个大问题是实现追踪和指标跟踪。我尝试过的大多数传统 APM 包都支持不力。因此,我计划开源我构建的解决方案,但如果你想先睹为快,请告诉我。
MCP 监控的独特性
MCP 不是典型的 REST 或 gRPC 服务。它依赖于一种丰富的消息协议,AI 客户端通过长连接(stdio、HTTP/SSE 等)请求工具、执行操作和检索资源。这带来了三个主要的监控挑战:
- 协议消息跟踪:我需要了解传入请求、传出响应、消息大小和错误率。
- 工具执行洞察:每次工具调用都有自己的延迟、成功/失败模式和参数模式。
- 安全资源访问:由于 MCP 可以暴露文件、数据和其他敏感操作,我需要监控谁访问了什么,并警惕可疑序列。
定义我的核心指标
协议消息指标
- 按类型划分的流量:我统计发现请求、执行请求、访问尝试和错误响应。
- 协议版本分布:跟踪客户端版本有助于我识别过时或不兼容的工具调用。
- 消息大小直方图:过大的消息通常表示效率低下或滥用。
工具执行指标
- 调用计数:我记录每个工具调用以了解受欢迎程度和负载均衡。
- 执行时间分解:我分别测量初始化、执行和响应格式化时间。
- 错误分类:我标记权限错误、无效参数、超时和内部故障。
- 结果大小:有效负载大小有助于我发现意外的大输出。
资源访问模式
- 访问频率:哪些文件或 URI 被请求得最频繁。
- 响应大小:我监控大文件传输,这可能会成为客户端的瓶颈。
- 异常检测:意外的访问序列或峰值会触发安全审查。
将监控付诸实践
下面是我如何使用我首选的指标库(Prometheus、Datadog 等)来检测我的 MCP 服务器的每个层。
1. 协议消息的检测
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics } from "./monitoring.js";
class MonitoredTransport extends StdioServerTransport {
constructor() {
super();
const origReceive = this.receive.bind(this);
const origSend = this.send.bind(this);
this.receive = async () => {
const msg = await origReceive();
if (msg) {
const type = msg.method || 'response';
metrics.increment('mcp.message_received', { type });
metrics.histogram('mcp.message_size', JSON.stringify(msg).length);
}
return msg;
};
this.send = async (msg) => {
const type = msg.method || 'response';
metrics.increment('mcp.message_sent', { type });
metrics.histogram('mcp.message_sent_size', JSON.stringify(msg).length);
return origSend(msg);
};
}
}
const server = new McpServer({ name: "MonitoredMcpServer", version: "1.0.0" });
const transport = new MonitoredTransport();
await server.connect(transport);
2. 捕获工具执行详情
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
import { metrics } from "./monitoring.js";
const server = new McpServer({ name: "DatabaseToolServer", version: "1.0.0" });
server.tool(
"executeQuery",
{ query: z.string(), parameters: z.record(z.any()).optional() },
async ({ query, parameters = {} }) => {
metrics.increment('mcp.tool.invoked', { tool: 'executeQuery' });
const timer = metrics.startTimer('mcp.tool.execution_time');
try {
if (!isValidQuery(query)) {
metrics.increment('mcp.tool.invalid_parameters', { tool: 'executeQuery' });
return { content: [{ type: "text", text: "Error: Invalid query format" }] };
}
const result = await database.runQuery(query, parameters);
const size = JSON.stringify(result).length;
metrics.histogram('mcp.tool.result_size', size);
metrics.increment('mcp.tool.success', { tool: 'executeQuery' });
return { content: [{ type: "text", text: JSON.stringify(result) }] };
} catch (err) {
metrics.increment('mcp.tool.error', { tool: 'executeQuery', error_type: err.name });
return { content: [{ type: "text", text: `Error: ${err.message}` }] };
} finally {
timer.end({ tool: 'executeQuery' });
}
}
);
const transport = new StdioServerTransport();
await server.connect(transport);
3. 监控资源访问
import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics } from "./monitoring.js";
import fs from "fs/promises";
const server = new McpServer({ name: "FileServer", version: "1.0.0" });
server.resource(
"data",
new ResourceTemplate("file://{filename}"),
async (uri, { filename }) => {
const id = `file://${filename}`;
metrics.increment('mcp.resource.accessed', { resource: id });
const timer = metrics.startTimer('mcp.resource.access_time');
try {
const content = await fs.readFile(filename, 'utf-8');
metrics.histogram('mcp.resource.size', content.length);
metrics.increment('mcp.resource.success', { resource: id });
return { contents: [{ uri: uri.href, text: content }] };
} catch (error) {
metrics.increment('mcp.resource.error', { resource: id, error_type: error.name });
throw error;
} finally {
timer.end({ resource: id });
}
}
);
const transport = new StdioServerTransport();
await server.connect(transport);
设计有意义的警报
我围绕以下方面设置了警报规则:
- 安全事件:未经授权的工具调用或资源访问异常。
- 性能突破:工具执行时间超过 P95 阈值。
- 可靠性峰值:协议错误或连接失败的突然增加。
这些警报让我能在事件影响用户或危及安全之前捕捉到它们。
高级监控技术
会话级追踪
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics, logger } from "./monitoring.js";
class SessionTransport extends StdioServerTransport {
constructor() {
super();
this.sessionId = generateSessionId();
logger.info('Session started', { sessionId: this.sessionId });
this.timer = metrics.startTimer('mcp.session.duration');
this.on('disconnect', () => {
metrics.increment('mcp.session.disconnected', { sessionId: this.sessionId });
this.timer.end();
logger.info('Session ended', { sessionId: this.sessionId });
});
}
}
const server = new McpServer({ name: "SessionTrackedServer", version: "1.0.0" });
const transport = new SessionTransport();
await server.connect(transport);
Grafana 仪表板
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"panels": [],
"title": "Protocol Health",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ops"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 1
},
"id": 2,
"options": {
"legend": {
"calcs": [
"mean",
"max",
"sum"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(rate(mcp_message_received_total{type=~\"$message_type\"}[1m])) by (type)",
"legendFormat": "{{type}}",
"range": true,
"refId": "A"
}
],
"title": "Message Rate by Type",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"fillOpacity": 80,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineWidth": 1,
"scaleDistribution": {
"type": "linear"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 1
},
"id": 3,
"options": {
"barRadius": 0,
"barWidth": 0.97,
"fullHighlight": false,
"groupWidth": 0.7,
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"orientation": "auto",
"showValue": "auto",
"stacking": "none",
"tooltip": {
"mode": "single",
"sort": "none"
},
"xTickLabelRotation": 0,
"xTickLabelSpacing": 0
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(mcp_protocol_version_count) by (version)",
"legendFormat": "{{version}}",
"range": true,
"refId": "A"
}
],
"title": "Protocol Version Distribution",
"type": "barchart"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 30
},
{
"color": "orange",
"value": 50
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 9
},
"id": 4,
"options": {
"displayMode": "gradient",
"minVizHeight": 10,
"minVizWidth": 0,
"orientation": "horizontal",
"reduceOptions": {
"calcs": [
"mean"
],
"fields": "",
"values": false
},
"showUnfilled": true,
"text": {}
},
"pluginVersion": "9.5.1",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(rate(mcp_message_size_bucket{direction=\"received\"}[5m])) by (le)",
"format": "heatmap",
"legendFormat": "{{le}}",
"range": true,
"refId": "A"
}
],
"title": "Message Size Distribution",
"type": "bargauge"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 17
},
"id": 5,
"panels": [],
"title": "Tool Performance",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "ops"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 18
},
"id": 6,
"options": {
"legend": {
"calcs": [
"sum",
"mean"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(rate(mcp_tool_invoked_total{tool=~\"$tool\"}[1m])) by (tool)",
"legendFormat": "{{tool}}",
"range": true,
"refId": "A"
}
],
"title": "Tool Invocation Rate",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ms"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 18
},
"id": 7,
"options": {
"legend": {
"calcs": [
"mean",
"max",
"min"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.95, sum(rate(mcp_tool_execution_time_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
"legendFormat": "{{tool}} - p95",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.50, sum(rate(mcp_tool_execution_time_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
"hide": false,
"legendFormat": "{{tool}} - p50",
"range": true,
"refId": "B"
}
],
"title": "Tool Execution Time",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
}
},
"mappings": []
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 27
},
"id": 8,
"options": {
"displayLabels": ["name", "percent"],
"legend": {
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"pieType": "pie",
"reduceOptions": {
"calcs": [
"sum"
],
"fields": "",
"values": false
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(mcp_tool_error_total{tool=~\"$tool\"}) by (error_type)",
"legendFormat": "{{error_type}}",
"range": true,
"refId": "A"
}
],
"title": "Tool Error Distribution",
"type": "piechart"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"fillOpacity": 80,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineWidth": 1,
"scaleDistribution": {
"type": "linear"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 27
},
"id": 9,
"options": {
"barRadius": 0,
"barWidth": 0.97,
"fullHighlight": false,
"groupWidth": 0.7,
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"orientation": "auto",
"showValue": "auto",
"stacking": "none",
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.95, sum(rate(mcp_tool_result_size_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
"legendFormat": "{{tool}}",
"range": true,
"refId": "A"
}
],
"title": "Tool Result Size (p95)",
"type": "barchart"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 36
},
"id": 10,
"panels": [],
"title": "Resource Access",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "ops"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 0,
"y": 37
},
"id": 11,
"options": {
"legend": {
"calcs": [
"sum",
"mean"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "topk(5, sum(rate(mcp_resource_accessed_total[5m])) by (resource))",
"legendFormat": "{{resource}}",
"range": true,
"refId": "A"
}
],
"title": "Top 5 Accessed Resources",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "bytes"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 12,
"x": 12,
"y": 37
},
"id": 12,
"options": {
"legend": {
"calcs": [
"max",
"mean"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.95, sum(rate(mcp_resource_size_bucket{resource=~\"$resource\"}[5m])) by (resource, le))",
"legendFormat": "{{resource}}",
"range": true,
"refId": "A"
}
],
"title": "Resource Response Size (p95)",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "yellow",
"value": 0.5
},
{
"color": "red",
"value": 0.8
}
]
},
"unit": "percentunit"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 24,
"x": 0,
"y": 46
},
"id": 13,
"options": {
"minVizHeight": 75,
"minVizWidth": 75,
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showThresholdLabels": false,
"showThresholdMarkers": true
},
"pluginVersion": "9.5.1",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(rate(mcp_resource_error_total[5m])) / sum(rate(mcp_resource_accessed_total[5m]))",
"legendFormat": "Error Rate",
"range": true,
"refId": "A"
}
],
"title": "Resource Access Error Rate",
"type": "gauge"
},
{
"collapsed": false,
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 54
},
"id": 14,
"panels": [],
"title": "Session Metrics",
"type": "row"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 55
},
"id": 15,
"options": {
"legend": {
"calcs": [
"mean",
"lastNotNull"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "sum(mcp_active_sessions)",
"legendFormat": "Active Sessions",
"range": true,
"refId": "A"
}
],
"title": "Active Sessions",
"type": "timeseries"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 10,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 55
},
"id": 16,
"options": {
"legend": {
"calcs": [
"mean",
"max"
],
"displayMode": "table",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.95, sum(rate(mcp_session_duration_bucket[5m])) by (le))",
"legendFormat": "p95",
"range": true,
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"editorMode": "code",
"expr": "histogram_quantile(0.50, sum(rate(mcp_session_duration_bucket[5m])) by (le))",
"hide": false,
"legendFormat": "p50",
"range": true,
"refId": "B"
}
],
"title": "Session Duration",
"type": "timeseries"
}
],
"refresh": "10s",
"schemaVersion": 38,
"style": "dark",
"tags": ["mcp", "monitoring"],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"definition": "label_values(mcp_message_received_total, type)",
"hide": 0,
"includeAll": true,
"label": "Message Type",
"multi": true,
"name": "message_type",
"options": [],
"query": {
"query": "label_values(mcp_message_received_total, type)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"type": "query"
},
{
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"definition": "label_values(mcp_tool_invoked_total, tool)",
"hide": 0,
"includeAll": true,
"label": "Tool",
"multi": true,
"name": "tool",
"options": [],
"query": {
"query": "label_values(mcp_tool_invoked_total, tool)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"type": "query"
},
{
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"definition": "label_values(mcp_resource_accessed_total, resource)",
"hide": 0,
"includeAll": true,
"label": "Resource",
"multi": true,
"name": "resource",
"options": [],
"query": {
"query": "label_values(mcp_resource_accessed_total, resource)",
"refId": "StandardVariableQuery"
},
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 0,
"type": "query"
},
{
"current": {
"selected": true,
"text": "Prometheus",
"value": "Prometheus"
},
"description": "Prometheus data source",
"hide": 0,
"includeAll": false,
"label": "Data Source",
"multi": false,
"name": "DS_PROMETHEUS",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"type": "datasource"
}
]
},
"time": {
"from": "now-6h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
]
},
"timezone": "",
"title": "MCP Server Monitoring Dashboard",
"uid": "mcp-server-monitoring",
"version": 1,
"weekStart": ""
}
工具链分析和异常检测
我还聚合了工具调用序列和参数分布,以发现异常行为模式。
当我检测到偏离既定基线的偏差时,我将它们输入到基于机器学习的检测器中,该检测器会标记潜在的安全或性能问题。
通过仪表板可视化指标
为了将所有这些联系起来,我构建了显示以下内容的仪表板:
- 协议健康状况:按类型划分的消息速率、错误计数和客户端版本。
- 工具性能:调用速率、延迟百分比和错误细分。
- 资源使用情况:访问量、数据传输大小和异常警报。
这些视图让我对我的 MCP 生态系统有了 360 度全景。
反思与下一步
在生产环境中监控 MCP 是一个持续学习的过程。通过检测消息流、工具执行和资源访问,我建立了一个有弹性且安全的可见性策略。
随着我添加更多的工具和资源,我将继续完善我的指标、警报和仪表板,以应对新的挑战。
我希望我的经验能帮助您设计一个监控解决方案,使您的 MCP 部署保持可靠、高性能和安全。