MongoDB 性能分析器使用指南

性能分析器的作用

MongoDB 性能分析器（Profiler）是一个内置工具，用于捕获和记录数据库操作的详细信息，包括查询、写入、命令和游标操作。通过分析这些信息，可以识别慢查询、性能瓶颈和优化机会。

捕获慢查询和低效操作
分析查询执行时间和资源消耗
识别缺少索引的查询
监控数据库负载和性能趋势
辅助性能优化决策

性能分析器的级别

MongoDB Profiler 有三个级别：

级别	描述	性能影响
0	禁用 Profiler	无
1	仅记录慢操作	低
2	记录所有操作	高

性能分析器配置

1. 启用性能分析器

通过命令行启用

javascript

// 启用级别 1，仅记录慢操作
use admin
db.setProfilingLevel(1, { slowms: 100 })

// 启用级别 2，记录所有操作
use admin
db.setProfilingLevel(2)

// 禁用 Profiler
use admin
db.setProfilingLevel(0)

通过配置文件启用

在 mongod.conf 中添加以下配置：

yaml

operationProfiling:
  mode: slowOp  # 可选值：off, slowOp, all
  slowOpThresholdMs: 100  # 慢查询阈值（毫秒）
  sampleRate: 1.0  # 采样率（0.0-1.0）

2. 查看当前配置

javascript

// 查看当前 Profiler 配置
use admin
db.getProfilingStatus()

// 查看当前 Profiler 级别
db.getProfilingLevel()

3. 配置慢查询阈值

javascript

// 设置慢查询阈值为 100 毫秒
use admin
db.setProfilingLevel(1, { slowms: 100 })

4. 配置采样率

javascript

// 设置采样率为 50%（仅适用于级别 2）
use admin
db.setProfilingLevel(2, { sampleRate: 0.5 })

性能分析器数据存储

1. 系统.profile 集合

Profiler 数据存储在每个数据库的 system.profile 集合中：

这是一个上限集合（capped collection）
默认大小为 16MB
可以自定义大小

2. 查看 system.profile 集合配置

javascript

// 查看 system.profile 集合状态
use test
db.system.profile.stats()

3. 修改 system.profile 集合大小

javascript

// 删除现有集合
db.system.profile.drop()

// 创建新的上限集合，大小为 128MB
db.createCollection(
  "system.profile",
  { capped: true, size: 128 * 1024 * 1024, max: 10000 }
)

// 重新启用 Profiler
db.setProfilingLevel(1, { slowms: 100 })

查询 Profiler 数据

1. 基本查询

javascript

// 查看最近的 10 条 Profiler 记录
use test
db.system.profile.find().sort({ ts: -1 }).limit(10)

// 格式化输出
use test
db.system.profile.find().sort({ ts: -1 }).limit(10).pretty()

2. 按操作类型查询

javascript

// 查询所有查询操作
use test
db.system.profile.find({ op: "query" }).sort({ ts: -1 }).limit(10)

// 查询所有写入操作
use test
db.system.profile.find({ op: { $in: ["insert", "update", "delete"] } }).sort({ ts: -1 }).limit(10)

// 查询所有命令操作
use test
db.system.profile.find({ op: "command" }).sort({ ts: -1 }).limit(10)

3. 按执行时间查询

javascript

// 查询执行时间超过 500 毫秒的操作
use test
db.system.profile.find({ millis: { $gt: 500 } }).sort({ millis: -1 }).limit(10)

4. 按集合查询

javascript

// 查询特定集合的操作
use test
db.system.profile.find({ ns: "test.collection" }).sort({ ts: -1 }).limit(10)

5. 按客户端查询

javascript

// 查询来自特定客户端的操作
use test
db.system.profile.find({ client: /192.168.1.100/ }).sort({ ts: -1 }).limit(10)

分析 Profiler 数据

1. 慢查询分析

javascript

// 查找最慢的 10 个查询
use test
db.system.profile.aggregate([
  { $match: { op: "query" } },
  { $sort: { millis: -1 } },
  { $limit: 10 },
  { $project: {
      ns: 1,
      millis: 1,
      query: 1,
      planSummary: 1,
      ts: 1
    }
  }
])

2. 缺少索引的查询

javascript

// 查找缺少索引的查询（COLLSCAN）
use test
db.system.profile.find({
  planSummary: /COLLSCAN/,
  op: "query"
}).sort({ millis: -1 }).limit(10)

3. 最频繁的慢查询

javascript

// 统计最频繁的慢查询
use test
db.system.profile.aggregate([
  { $match: { millis: { $gt: 100 } } },
  { $group: {
      _id: { query: "$query", ns: "$ns" },
      count: { $sum: 1 },
      avgMillis: { $avg: "$millis" },
      maxMillis: { $max: "$millis" }
    }
  },
  { $sort: { count: -1 } },
  { $limit: 10 }
])

4. 按时间段分析

javascript

// 分析过去 1 小时的慢查询
use test
var oneHourAgo = new Date(Date.now() - 3600000)
db.system.profile.find({
  ts: { $gte: oneHourAgo },
  millis: { $gt: 100 }
}).sort({ ts: -1 })

性能分析器输出解读

1. 输出字段说明

字段	描述
`ts`	操作发生的时间戳
`op`	操作类型（query, insert, update, delete, command, getmore）
`ns`	命名空间（database.collection）
`millis`	操作执行时间（毫秒）
`query`	查询条件（仅查询操作）
`command`	命令内容（仅命令操作）
`updateobj`	更新对象（仅更新操作）
`planSummary`	查询计划摘要
`execStats`	执行统计信息
`client`	客户端 IP 地址和端口
`user`	执行操作的用户名
`shard`	分片信息（仅分片集群）

2. 示例输出

javascript

{
  "ts": ISODate("2023-01-01T00:00:00.000Z"),
  "op": "query",
  "ns": "test.users",
  "command": {
    "find": "users",
    "filter": { "age": { "$gt": 30 } },
    "sort": { "created_at": -1 },
    "$db": "test"
  },
  "planSummary": "COLLSCAN",
  "millis": 500,
  "client": "127.0.0.1:12345",
  "user": "test_user"
}

性能分析器最佳实践

1. 生产环境配置

建议使用级别 1，仅记录慢操作
设置合理的慢查询阈值（根据业务需求）
限制 system.profile 集合大小，避免占用过多磁盘空间
考虑使用采样率减少性能影响

2. 性能影响最小化

仅在需要分析时启用 Profiler
避免在高负载时段启用级别 2
定期清理 system.profile 集合
考虑使用日志文件替代 Profiler 记录慢查询

3. 与其他工具结合使用

结合 explain() 分析查询执行计划
使用 MongoDB Atlas 或其他监控工具分析慢查询
结合日志文件分析长期性能趋势
使用 db.currentOp() 分析当前正在执行的操作

慢查询日志

1. 启用慢查询日志

在 mongod.conf 中配置：

yaml

systemLog:
  destination: file
  path: /var/log/mongodb/mongod.log
  logAppend: true
  logRotate: reopen
  component:
    operationProfiling:
      verbosity: 1

2. 查看慢查询日志

bash

# 实时查看慢查询日志
tail -f /var/log/mongodb/mongod.log | grep -i "slow query"

# 搜索特定时间段的慢查询
grep -i "slow query" /var/log/mongodb/mongod.log | grep "2023-01-01"

3. 慢查询日志格式

2023-01-01T00:00:00.000+0000 I COMMAND  [conn12345] command test.users command: find { find: "users", filter: { age: { $gt: 30 } }, sort: { created_at: -1 }, $db: "test" } planSummary: COLLSCAN cursorid:1234567890 keysExamined:0 docsExamined:100000 hasSortStage:1 reslen:20000 locks:{ Global: { acquireCount: { r: 2 } }, Database: { acquireCount: { r: 1 } }, Collection: { acquireCount: { r: 1 } } } protocol:op_msg 500ms

性能分析器与慢查询监控工具

1. MongoDB Atlas

内置慢查询分析器
可视化慢查询仪表盘
自动索引建议
性能趋势分析

2. MongoDB Compass

图形化 Profiler 界面
交互式查询计划分析
慢查询可视化
索引建议

3. 第三方工具

Prometheus + Grafana：使用 MongoDB Exporter 收集慢查询指标
Datadog：自动发现和分析慢查询
New Relic：慢查询追踪和分析
Percona Monitoring and Management (PMM)：开源监控和分析工具

常见性能问题及优化方法

1. 缺少索引

问题：查询执行 COLLSCAN（全表扫描）

解决方案：

为查询条件和排序字段创建索引
分析查询模式，创建合适的复合索引
避免过度索引

2. 低效索引

问题：索引未被有效使用

解决方案：

检查索引选择性
优化复合索引字段顺序
删除未使用的索引

3. 长查询

问题：查询执行时间过长

解决方案：

优化查询条件，减少返回结果集大小
使用合适的索引
考虑分页查询
优化数据模型

4. 大量写入操作

问题：写入操作影响查询性能

解决方案：

批量写入操作
优化写入模式
考虑分片分散写入负载
调整 WiredTiger 缓存大小

性能分析器的替代方案

1. 日志文件分析

启用慢查询日志，通过日志分析工具处理
性能影响小，适合长期监控
可以使用 ELK Stack、Splunk 等工具分析

2. 实时监控工具

使用 MongoDB Atlas、Datadog 等工具实时监控
提供可视化仪表盘和告警
无需启用 Profiler，性能影响小

3. 抽样分析

定期启用 Profiler 进行抽样分析
结合慢查询日志和实时监控
平衡性能影响和分析需求

版本差异

MongoDB 4.0+ 特性

引入了 sampleRate 参数，支持操作采样
改进了 Profiler 输出格式
增强了查询计划摘要信息

MongoDB 4.2+ 特性

支持在 system.profile 中存储执行统计信息
改进了慢查询日志格式
增强了 explain() 输出

MongoDB 5.0+ 特性

引入了时间序列集合的 Profiler 支持
改进了长事务的性能分析
增强了复制集的 Profiler 同步

MongoDB 6.0+ 特性

支持向量索引的性能分析
改进了分片集群的 Profiler 数据聚合
增强了性能分析的可视化界面

常见问题（FAQ）

Q1: 性能分析器对数据库性能有影响吗？

A1: 是的，性能分析器会对数据库性能产生一定影响：

级别 0：无影响
级别 1：低影响（仅记录慢操作）
级别 2：高影响（记录所有操作）

建议在生产环境使用级别 1，并设置合理的慢查询阈值。

Q2: 如何确定合适的慢查询阈值？

A2: 确定合适的慢查询阈值：

根据业务需求和用户体验要求
分析历史查询性能数据
考虑不同集合和操作类型的差异
建议从 100-500 毫秒开始，逐步调整

Q3: 如何清理 system.profile 集合？

A3: 清理 system.profile 集合：

javascript

// 删除并重建 system.profile 集合
use test
db.system.profile.drop()
db.createCollection("system.profile", { capped: true, size: 16 * 1024 * 1024 })

Q4: 性能分析器和慢查询日志有什么区别？

A4: 性能分析器和慢查询日志的区别：

特性	性能分析器	慢查询日志
存储位置	system.profile 集合	日志文件
查询灵活性	高（支持 MongoDB 查询语法）	低（需要日志分析工具）
性能影响	中到高	低
数据保留	受限于集合大小	受限于日志轮转策略
实时性	高	中

Q5: 如何在分片集群中使用性能分析器？

A5: 在分片集群中使用性能分析器：

需要在每个分片上单独启用 Profiler
可以使用 MongoDB Atlas 或其他监控工具聚合分析
注意分片集群中的查询路由开销
考虑在 mongos 层面监控慢查询

Q6: 如何分析长事务的性能问题？

A6: 分析长事务的性能问题：

javascript

// 查找长事务
use test
db.system.profile.find({
  op: "command",
  command: { $exists: "applyOps" },
  millis: { $gt: 1000 }
}).sort({ millis: -1 }).limit(10)

检查事务中包含的操作
分析事务持有锁的时间
考虑拆分长事务为多个短事务
优化事务中的查询和索引

Q7: 如何使用性能分析器优化聚合查询？

A7: 优化聚合查询：

javascript

// 查找慢聚合查询
use test
db.system.profile.find({
  op: "command",
  command: { aggregate: { $exists: true } },
  millis: { $gt: 500 }
}).sort({ millis: -1 }).limit(10)

使用 explain("executionStats") 分析聚合管道
确保聚合阶段使用了合适的索引
考虑使用 $match 和 $sort 尽早过滤数据
优化聚合管道顺序

Q8: 如何监控性能分析器的性能影响？

A8: 监控性能分析器的性能影响：

监控数据库 CPU 和内存使用率
监控磁盘 I/O 负载
比较启用和禁用 Profiler 时的性能差异
定期检查 system.profile 集合大小

Q9: 如何导出性能分析器数据？

A9: 导出性能分析器数据：

bash

# 使用 mongodump 导出 system.profile 集合
mongodump --db test --collection system.profile --out /backup/profiler

# 使用 mongoexport 导出为 JSON 格式
mongoexport --db test --collection system.profile --out profiler.json

Q10: 如何自动化性能分析和优化？

A10: 自动化性能分析和优化：

使用 MongoDB Atlas 或其他监控工具的自动索引建议
编写脚本定期分析 Profiler 数据
设置慢查询告警，及时发现性能问题
建立性能优化的自动化流程

性能优化案例

案例 1：缺少索引导致的慢查询

问题：查询 db.users.find({ age: { $gt: 30 } }).sort({ created_at: -1 }) 执行时间过长

分析：Profiler 显示查询使用了 COLLSCAN，执行时间为 500ms

解决方案：创建复合索引 db.users.createIndex({ age: 1, created_at: -1 })

结果：查询执行时间从 500ms 降低到 10ms

案例 2：低效索引导致的慢查询

问题：查询 db.orders.find({ status: "completed", customer_id: ObjectId(...) }) 执行时间过长

分析：Profiler 显示查询使用了索引 { status: 1 }，但 status 字段选择性低

解决方案：创建复合索引 db.orders.createIndex({ customer_id: 1, status: 1 })

结果：查询执行时间从 200ms 降低到 5ms

MongoDB 性能分析器使用指南 ​

性能分析器的作用 ​

性能分析器的级别 ​

性能分析器配置 ​

1. 启用性能分析器 ​

通过命令行启用 ​

通过配置文件启用 ​

2. 查看当前配置 ​

3. 配置慢查询阈值 ​

4. 配置采样率 ​

性能分析器数据存储 ​

1. 系统.profile 集合 ​

2. 查看 system.profile 集合配置 ​

3. 修改 system.profile 集合大小 ​

查询 Profiler 数据 ​

1. 基本查询 ​

2. 按操作类型查询 ​

3. 按执行时间查询 ​

4. 按集合查询 ​

5. 按客户端查询 ​

分析 Profiler 数据 ​

1. 慢查询分析 ​

2. 缺少索引的查询 ​

3. 最频繁的慢查询 ​

4. 按时间段分析 ​

性能分析器输出解读 ​

1. 输出字段说明 ​

2. 示例输出 ​

性能分析器最佳实践 ​

1. 生产环境配置 ​

2. 性能影响最小化 ​

3. 与其他工具结合使用 ​

慢查询日志 ​

1. 启用慢查询日志 ​

2. 查看慢查询日志 ​

3. 慢查询日志格式 ​

性能分析器与慢查询监控工具 ​

1. MongoDB Atlas ​

2. MongoDB Compass ​

3. 第三方工具 ​

常见性能问题及优化方法 ​

1. 缺少索引 ​

2. 低效索引 ​

3. 长查询 ​

4. 大量写入操作 ​

性能分析器的替代方案 ​

1. 日志文件分析 ​

2. 实时监控工具 ​

3. 抽样分析 ​

版本差异 ​

MongoDB 4.0+ 特性 ​

MongoDB 4.2+ 特性 ​

MongoDB 5.0+ 特性 ​

MongoDB 6.0+ 特性 ​

常见问题（FAQ） ​

Q1: 性能分析器对数据库性能有影响吗？ ​

Q2: 如何确定合适的慢查询阈值？ ​

Q3: 如何清理 system.profile 集合？ ​

Q4: 性能分析器和慢查询日志有什么区别？ ​

Q5: 如何在分片集群中使用性能分析器？ ​

Q6: 如何分析长事务的性能问题？ ​

Q7: 如何使用性能分析器优化聚合查询？ ​

Q8: 如何监控性能分析器的性能影响？ ​

Q9: 如何导出性能分析器数据？ ​

Q10: 如何自动化性能分析和优化？ ​

性能优化案例 ​

案例 1：缺少索引导致的慢查询 ​

案例 2：低效索引导致的慢查询 ​

MongoDB 性能分析器使用指南

性能分析器的作用

性能分析器的级别

性能分析器配置

1. 启用性能分析器

通过命令行启用

通过配置文件启用

2. 查看当前配置

3. 配置慢查询阈值

4. 配置采样率

性能分析器数据存储

1. 系统.profile 集合

2. 查看 system.profile 集合配置

3. 修改 system.profile 集合大小

查询 Profiler 数据

1. 基本查询

2. 按操作类型查询

3. 按执行时间查询

4. 按集合查询

5. 按客户端查询

分析 Profiler 数据

1. 慢查询分析

2. 缺少索引的查询

3. 最频繁的慢查询

4. 按时间段分析

性能分析器输出解读

1. 输出字段说明

2. 示例输出

性能分析器最佳实践

1. 生产环境配置

2. 性能影响最小化

3. 与其他工具结合使用

慢查询日志

1. 启用慢查询日志

2. 查看慢查询日志

3. 慢查询日志格式

性能分析器与慢查询监控工具

1. MongoDB Atlas

2. MongoDB Compass

3. 第三方工具

常见性能问题及优化方法

1. 缺少索引

2. 低效索引

3. 长查询

4. 大量写入操作

性能分析器的替代方案

1. 日志文件分析

2. 实时监控工具

3. 抽样分析

版本差异

MongoDB 4.0+ 特性

MongoDB 4.2+ 特性

MongoDB 5.0+ 特性

MongoDB 6.0+ 特性

常见问题（FAQ）

Q1: 性能分析器对数据库性能有影响吗？

Q2: 如何确定合适的慢查询阈值？

Q3: 如何清理 system.profile 集合？

Q4: 性能分析器和慢查询日志有什么区别？

Q5: 如何在分片集群中使用性能分析器？

Q6: 如何分析长事务的性能问题？

Q7: 如何使用性能分析器优化聚合查询？

Q8: 如何监控性能分析器的性能影响？

Q9: 如何导出性能分析器数据？

Q10: 如何自动化性能分析和优化？

性能优化案例

案例 1：缺少索引导致的慢查询

案例 2：低效索引导致的慢查询