MCP 服务器监控经验分享

社区文章 发布于2025年5月4日

过去几个月我一直在生产环境中构建模型上下文协议 (MCP) 服务器。这很有趣,但协议中仍有许多问题需要解决。

忽略多租户问题,我遇到的一个大问题是实现追踪和指标跟踪。我尝试过的大多数传统 APM 包都支持不力。因此,我计划开源我构建的解决方案,但如果你想先睹为快,请告诉我

MCP 监控的独特性

MCP 不是典型的 REST 或 gRPC 服务。它依赖于一种丰富的消息协议,AI 客户端通过长连接(stdio、HTTP/SSE 等)请求工具、执行操作和检索资源。这带来了三个主要的监控挑战:

  1. 协议消息跟踪:我需要了解传入请求、传出响应、消息大小和错误率。
  2. 工具执行洞察:每次工具调用都有自己的延迟、成功/失败模式和参数模式。
  3. 安全资源访问:由于 MCP 可以暴露文件、数据和其他敏感操作,我需要监控谁访问了什么,并警惕可疑序列。

定义我的核心指标

协议消息指标

  • 按类型划分的流量:我统计发现请求、执行请求、访问尝试和错误响应。
  • 协议版本分布:跟踪客户端版本有助于我识别过时或不兼容的工具调用。
  • 消息大小直方图:过大的消息通常表示效率低下或滥用。

工具执行指标

  • 调用计数:我记录每个工具调用以了解受欢迎程度和负载均衡。
  • 执行时间分解:我分别测量初始化、执行和响应格式化时间。
  • 错误分类:我标记权限错误、无效参数、超时和内部故障。
  • 结果大小:有效负载大小有助于我发现意外的大输出。

资源访问模式

  • 访问频率:哪些文件或 URI 被请求得最频繁。
  • 响应大小:我监控大文件传输,这可能会成为客户端的瓶颈。
  • 异常检测:意外的访问序列或峰值会触发安全审查。

将监控付诸实践

下面是我如何使用我首选的指标库(Prometheus、Datadog 等)来检测我的 MCP 服务器的每个层。

1. 协议消息的检测

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics } from "./monitoring.js";

class MonitoredTransport extends StdioServerTransport {
  constructor() {
    super();
    const origReceive = this.receive.bind(this);
    const origSend = this.send.bind(this);

    this.receive = async () => {
      const msg = await origReceive();
      if (msg) {
        const type = msg.method || 'response';
        metrics.increment('mcp.message_received', { type });
        metrics.histogram('mcp.message_size', JSON.stringify(msg).length);
      }
      return msg;
    };

    this.send = async (msg) => {
      const type = msg.method || 'response';
      metrics.increment('mcp.message_sent', { type });
      metrics.histogram('mcp.message_sent_size', JSON.stringify(msg).length);
      return origSend(msg);
    };
  }
}

const server = new McpServer({ name: "MonitoredMcpServer", version: "1.0.0" });
const transport = new MonitoredTransport();
await server.connect(transport);

2. 捕获工具执行详情

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { z } from "zod";
import { metrics } from "./monitoring.js";

const server = new McpServer({ name: "DatabaseToolServer", version: "1.0.0" });

server.tool(
  "executeQuery",
  { query: z.string(), parameters: z.record(z.any()).optional() },
  async ({ query, parameters = {} }) => {
    metrics.increment('mcp.tool.invoked', { tool: 'executeQuery' });
    const timer = metrics.startTimer('mcp.tool.execution_time');
    try {
      if (!isValidQuery(query)) {
        metrics.increment('mcp.tool.invalid_parameters', { tool: 'executeQuery' });
        return { content: [{ type: "text", text: "Error: Invalid query format" }] };
      }
      const result = await database.runQuery(query, parameters);
      const size = JSON.stringify(result).length;
      metrics.histogram('mcp.tool.result_size', size);
      metrics.increment('mcp.tool.success', { tool: 'executeQuery' });
      return { content: [{ type: "text", text: JSON.stringify(result) }] };
    } catch (err) {
      metrics.increment('mcp.tool.error', { tool: 'executeQuery', error_type: err.name });
      return { content: [{ type: "text", text: `Error: ${err.message}` }] };
    } finally {
      timer.end({ tool: 'executeQuery' });
    }
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

3. 监控资源访问

import { McpServer, ResourceTemplate } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics } from "./monitoring.js";
import fs from "fs/promises";

const server = new McpServer({ name: "FileServer", version: "1.0.0" });

server.resource(
  "data",
  new ResourceTemplate("file://{filename}"),
  async (uri, { filename }) => {
    const id = `file://${filename}`;
    metrics.increment('mcp.resource.accessed', { resource: id });
    const timer = metrics.startTimer('mcp.resource.access_time');
    try {
      const content = await fs.readFile(filename, 'utf-8');
      metrics.histogram('mcp.resource.size', content.length);
      metrics.increment('mcp.resource.success', { resource: id });
      return { contents: [{ uri: uri.href, text: content }] };
    } catch (error) {
      metrics.increment('mcp.resource.error', { resource: id, error_type: error.name });
      throw error;
    } finally {
      timer.end({ resource: id });
    }
  }
);

const transport = new StdioServerTransport();
await server.connect(transport);

设计有意义的警报

我围绕以下方面设置了警报规则:

  • 安全事件:未经授权的工具调用或资源访问异常。
  • 性能突破:工具执行时间超过 P95 阈值。
  • 可靠性峰值:协议错误或连接失败的突然增加。

这些警报让我能在事件影响用户或危及安全之前捕捉到它们。

高级监控技术

会话级追踪

import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
import { metrics, logger } from "./monitoring.js";

class SessionTransport extends StdioServerTransport {
  constructor() {
    super();
    this.sessionId = generateSessionId();
    logger.info('Session started', { sessionId: this.sessionId });
    this.timer = metrics.startTimer('mcp.session.duration');
    this.on('disconnect', () => {
      metrics.increment('mcp.session.disconnected', { sessionId: this.sessionId });
      this.timer.end();
      logger.info('Session ended', { sessionId: this.sessionId });
    });
  }
}

const server = new McpServer({ name: "SessionTrackedServer", version: "1.0.0" });
const transport = new SessionTransport();
await server.connect(transport);

Grafana 仪表板

{
  "annotations": {
    "list": [
      {
        "builtIn": 1,
        "datasource": {
          "type": "grafana",
          "uid": "-- Grafana --"
        },
        "enable": true,
        "hide": true,
        "iconColor": "rgba(0, 211, 255, 1)",
        "name": "Annotations & Alerts",
        "type": "dashboard"
      }
    ]
  },
  "editable": true,
  "fiscalYearStartMonth": 0,
  "graphTooltip": 0,
  "id": null,
  "links": [],
  "liveNow": false,
  "panels": [
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 0
      },
      "id": 1,
      "panels": [],
      "title": "Protocol Health",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "ops"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 1
      },
      "id": 2,
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "max",
            "sum"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(rate(mcp_message_received_total{type=~\"$message_type\"}[1m])) by (type)",
          "legendFormat": "{{type}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Message Rate by Type",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "fillOpacity": 80,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineWidth": 1,
            "scaleDistribution": {
              "type": "linear"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 1
      },
      "id": 3,
      "options": {
        "barRadius": 0,
        "barWidth": 0.97,
        "fullHighlight": false,
        "groupWidth": 0.7,
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "orientation": "auto",
        "showValue": "auto",
        "stacking": "none",
        "tooltip": {
          "mode": "single",
          "sort": "none"
        },
        "xTickLabelRotation": 0,
        "xTickLabelSpacing": 0
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(mcp_protocol_version_count) by (version)",
          "legendFormat": "{{version}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Protocol Version Distribution",
      "type": "barchart"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 30
              },
              {
                "color": "orange",
                "value": 50
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 9
      },
      "id": 4,
      "options": {
        "displayMode": "gradient",
        "minVizHeight": 10,
        "minVizWidth": 0,
        "orientation": "horizontal",
        "reduceOptions": {
          "calcs": [
            "mean"
          ],
          "fields": "",
          "values": false
        },
        "showUnfilled": true,
        "text": {}
      },
      "pluginVersion": "9.5.1",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(rate(mcp_message_size_bucket{direction=\"received\"}[5m])) by (le)",
          "format": "heatmap",
          "legendFormat": "{{le}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Message Size Distribution",
      "type": "bargauge"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 17
      },
      "id": 5,
      "panels": [],
      "title": "Tool Performance",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "ops"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 18
      },
      "id": 6,
      "options": {
        "legend": {
          "calcs": [
            "sum",
            "mean"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(rate(mcp_tool_invoked_total{tool=~\"$tool\"}[1m])) by (tool)",
          "legendFormat": "{{tool}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Tool Invocation Rate",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "red",
                "value": 80
              }
            ]
          },
          "unit": "ms"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 18
      },
      "id": 7,
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "max",
            "min"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.95, sum(rate(mcp_tool_execution_time_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
          "legendFormat": "{{tool}} - p95",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.50, sum(rate(mcp_tool_execution_time_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
          "hide": false,
          "legendFormat": "{{tool}} - p50",
          "range": true,
          "refId": "B"
        }
      ],
      "title": "Tool Execution Time",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            }
          },
          "mappings": []
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 27
      },
      "id": 8,
      "options": {
        "displayLabels": ["name", "percent"],
        "legend": {
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "pieType": "pie",
        "reduceOptions": {
          "calcs": [
            "sum"
          ],
          "fields": "",
          "values": false
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(mcp_tool_error_total{tool=~\"$tool\"}) by (error_type)",
          "legendFormat": "{{error_type}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Tool Error Distribution",
      "type": "piechart"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "fillOpacity": 80,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineWidth": 1,
            "scaleDistribution": {
              "type": "linear"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 27
      },
      "id": 9,
      "options": {
        "barRadius": 0,
        "barWidth": 0.97,
        "fullHighlight": false,
        "groupWidth": 0.7,
        "legend": {
          "calcs": [],
          "displayMode": "list",
          "placement": "bottom",
          "showLegend": true
        },
        "orientation": "auto",
        "showValue": "auto",
        "stacking": "none",
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.95, sum(rate(mcp_tool_result_size_bucket{tool=~\"$tool\"}[5m])) by (tool, le))",
          "legendFormat": "{{tool}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Tool Result Size (p95)",
      "type": "barchart"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 36
      },
      "id": 10,
      "panels": [],
      "title": "Resource Access",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "ops"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 0,
        "y": 37
      },
      "id": 11,
      "options": {
        "legend": {
          "calcs": [
            "sum",
            "mean"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "topk(5, sum(rate(mcp_resource_accessed_total[5m])) by (resource))",
          "legendFormat": "{{resource}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Top 5 Accessed Resources",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "bytes"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 9,
        "w": 12,
        "x": 12,
        "y": 37
      },
      "id": 12,
      "options": {
        "legend": {
          "calcs": [
            "max",
            "mean"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.95, sum(rate(mcp_resource_size_bucket{resource=~\"$resource\"}[5m])) by (resource, le))",
          "legendFormat": "{{resource}}",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Resource Response Size (p95)",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "thresholds"
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              },
              {
                "color": "yellow",
                "value": 0.5
              },
              {
                "color": "red",
                "value": 0.8
              }
            ]
          },
          "unit": "percentunit"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 24,
        "x": 0,
        "y": 46
      },
      "id": 13,
      "options": {
        "minVizHeight": 75,
        "minVizWidth": 75,
        "orientation": "auto",
        "reduceOptions": {
          "calcs": [
            "lastNotNull"
          ],
          "fields": "",
          "values": false
        },
        "showThresholdLabels": false,
        "showThresholdMarkers": true
      },
      "pluginVersion": "9.5.1",
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(rate(mcp_resource_error_total[5m])) / sum(rate(mcp_resource_accessed_total[5m]))",
          "legendFormat": "Error Rate",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Resource Access Error Rate",
      "type": "gauge"
    },
    {
      "collapsed": false,
      "gridPos": {
        "h": 1,
        "w": 24,
        "x": 0,
        "y": 54
      },
      "id": 14,
      "panels": [],
      "title": "Session Metrics",
      "type": "row"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          }
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 0,
        "y": 55
      },
      "id": 15,
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "lastNotNull"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "sum(mcp_active_sessions)",
          "legendFormat": "Active Sessions",
          "range": true,
          "refId": "A"
        }
      ],
      "title": "Active Sessions",
      "type": "timeseries"
    },
    {
      "datasource": {
        "type": "prometheus",
        "uid": "${DS_PROMETHEUS}"
      },
      "fieldConfig": {
        "defaults": {
          "color": {
            "mode": "palette-classic"
          },
          "custom": {
            "axisCenteredZero": false,
            "axisColorMode": "text",
            "axisLabel": "",
            "axisPlacement": "auto",
            "barAlignment": 0,
            "drawStyle": "line",
            "fillOpacity": 10,
            "gradientMode": "none",
            "hideFrom": {
              "legend": false,
              "tooltip": false,
              "viz": false
            },
            "lineInterpolation": "linear",
            "lineWidth": 1,
            "pointSize": 5,
            "scaleDistribution": {
              "type": "linear"
            },
            "showPoints": "never",
            "spanNulls": false,
            "stacking": {
              "group": "A",
              "mode": "none"
            },
            "thresholdsStyle": {
              "mode": "off"
            }
          },
          "mappings": [],
          "thresholds": {
            "mode": "absolute",
            "steps": [
              {
                "color": "green",
                "value": null
              }
            ]
          },
          "unit": "s"
        },
        "overrides": []
      },
      "gridPos": {
        "h": 8,
        "w": 12,
        "x": 12,
        "y": 55
      },
      "id": 16,
      "options": {
        "legend": {
          "calcs": [
            "mean",
            "max"
          ],
          "displayMode": "table",
          "placement": "bottom",
          "showLegend": true
        },
        "tooltip": {
          "mode": "single",
          "sort": "none"
        }
      },
      "targets": [
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.95, sum(rate(mcp_session_duration_bucket[5m])) by (le))",
          "legendFormat": "p95",
          "range": true,
          "refId": "A"
        },
        {
          "datasource": {
            "type": "prometheus",
            "uid": "${DS_PROMETHEUS}"
          },
          "editorMode": "code",
          "expr": "histogram_quantile(0.50, sum(rate(mcp_session_duration_bucket[5m])) by (le))",
          "hide": false,
          "legendFormat": "p50",
          "range": true,
          "refId": "B"
        }
      ],
      "title": "Session Duration",
      "type": "timeseries"
    }
  ],
  "refresh": "10s",
  "schemaVersion": 38,
  "style": "dark",
  "tags": ["mcp", "monitoring"],
  "templating": {
    "list": [
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(mcp_message_received_total, type)",
        "hide": 0,
        "includeAll": true,
        "label": "Message Type",
        "multi": true,
        "name": "message_type",
        "options": [],
        "query": {
          "query": "label_values(mcp_message_received_total, type)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(mcp_tool_invoked_total, tool)",
        "hide": 0,
        "includeAll": true,
        "label": "Tool",
        "multi": true,
        "name": "tool",
        "options": [],
        "query": {
          "query": "label_values(mcp_tool_invoked_total, tool)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": false,
          "text": "All",
          "value": "$__all"
        },
        "datasource": {
          "type": "prometheus",
          "uid": "${DS_PROMETHEUS}"
        },
        "definition": "label_values(mcp_resource_accessed_total, resource)",
        "hide": 0,
        "includeAll": true,
        "label": "Resource",
        "multi": true,
        "name": "resource",
        "options": [],
        "query": {
          "query": "label_values(mcp_resource_accessed_total, resource)",
          "refId": "StandardVariableQuery"
        },
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "sort": 0,
        "type": "query"
      },
      {
        "current": {
          "selected": true,
          "text": "Prometheus",
          "value": "Prometheus"
        },
        "description": "Prometheus data source",
        "hide": 0,
        "includeAll": false,
        "label": "Data Source",
        "multi": false,
        "name": "DS_PROMETHEUS",
        "options": [],
        "query": "prometheus",
        "refresh": 1,
        "regex": "",
        "skipUrlSync": false,
        "type": "datasource"
      }
    ]
  },
  "time": {
    "from": "now-6h",
    "to": "now"
  },
  "timepicker": {
    "refresh_intervals": [
      "5s",
      "10s",
      "30s",
      "1m",
      "5m",
      "15m",
      "30m",
      "1h",
      "2h",
      "1d"
    ]
  },
  "timezone": "",
  "title": "MCP Server Monitoring Dashboard",
  "uid": "mcp-server-monitoring",
  "version": 1,
  "weekStart": ""
} 

工具链分析和异常检测

我还聚合了工具调用序列和参数分布,以发现异常行为模式。

当我检测到偏离既定基线的偏差时,我将它们输入到基于机器学习的检测器中,该检测器会标记潜在的安全或性能问题。

通过仪表板可视化指标

为了将所有这些联系起来,我构建了显示以下内容的仪表板:

  • 协议健康状况:按类型划分的消息速率、错误计数和客户端版本。
  • 工具性能:调用速率、延迟百分比和错误细分。
  • 资源使用情况:访问量、数据传输大小和异常警报。

这些视图让我对我的 MCP 生态系统有了 360 度全景。

反思与下一步

在生产环境中监控 MCP 是一个持续学习的过程。通过检测消息流、工具执行和资源访问,我建立了一个有弹性且安全的可见性策略。

随着我添加更多的工具和资源,我将继续完善我的指标、警报和仪表板,以应对新的挑战。

我希望我的经验能帮助您设计一个监控解决方案,使您的 MCP 部署保持可靠、高性能和安全。

社区

你好 @mclenhard

你有 MCP 仪表板的 (Grafana-)截图吗?

我下周会开一门关于 MCP 仪表板/遥测的课程,想展示一些截图并提及 https://www.mcpevals.io/features/mcp-monitoring

谢谢
大卫

·
文章作者

嘿 David,

很高兴与你分享。最好的联系方式是什么?LinkedIn、电子邮件还是 Discord?我的电子邮件是 mclenhard at gmail.com,Discord 用户名是 mattlenhard

你好,我正在为我的 MCP 服务器做日志记录部分,它暴露了 Oracle DB 执行工具供 MCP 客户端获取数据。从 MCP 服务器端来看,有没有办法记录最终用户在 MCP 客户端中输入的提示,以便从我的数据库中获取数据?我无法控制 MCP 客户端。

注册登录 以评论