Neo4j Prometheus 指标映射

集成配置

启用 Prometheus 指标导出

在 Neo4j 4.x 及以上版本中，Prometheus 指标导出默认启用。可以通过以下配置进行调整：

txt

# 启用 Prometheus 指标导出
dbms.metrics.prometheus.enabled=true

# 设置 Prometheus 指标导出端口
dbms.metrics.prometheus.port=2004

# 设置监听地址
dbms.metrics.prometheus.host=0.0.0.0

# 配置指标过滤器，仅导出特定指标
dbms.metrics.filter=neo4j.*

Prometheus 配置

在 Prometheus 配置文件中添加 Neo4j 目标：

yaml

scrape_configs:
  - job_name: 'neo4j'
    static_configs:
      - targets: ['neo4j-server:2004']
    scrape_interval: 15s
    scrape_timeout: 10s

验证指标导出

使用 curl 命令验证 Prometheus 指标是否正常导出：

bash

curl http://neo4j-server:2004/metrics

核心指标映射

数据库核心指标

Neo4j 指标	Prometheus 指标	描述	标签
dbms.neo4j.version	neo4j_version	Neo4j 版本	version, edition
dbms.mode	neo4j_mode	数据库运行模式	mode
dbms.neo4j.start_time	neo4j_start_time_seconds	数据库启动时间
dbms.transaction.active_read	neo4j_transaction_active_total	活跃读事务数	type="read"
dbms.transaction.active_write	neo4j_transaction_active_total	活跃写事务数	type="write"
dbms.transaction.committed_read	neo4j_transaction_committed_total	已提交读事务数	type="read"
dbms.transaction.committed_write	neo4j_transaction_committed_total	已提交写事务数	type="write"
dbms.transaction.rollbacks_read	neo4j_transaction_rollbacks_total	已回滚读事务数	type="read"
dbms.transaction.rollbacks_write	neo4j_transaction_rollbacks_total	已回滚写事务数	type="write"

查询执行指标

Neo4j 指标	Prometheus 指标	描述
dbms.query.active	neo4j_query_active_total	活跃查询数
dbms.query.execution_time_median	neo4j_query_execution_time_seconds	查询执行时间中位数
dbms.query.execution_time_95th_percentile	neo4j_query_execution_time_seconds	95% 查询执行时间
dbms.query.execution_time_99th_percentile	neo4j_query_execution_time_seconds	99% 查询执行时间
dbms.query.rejected	neo4j_query_rejected_total	被拒绝的查询数

存储指标

Neo4j 指标	Prometheus 指标	描述
store.size	neo4j_store_size_bytes	存储大小
store.nodes	neo4j_store_nodes_total	节点数量
store.relationships	neo4j_store_relationships_total	关系数量
store.properties	neo4j_store_properties_total	属性数量
transaction_logs.total_size	neo4j_transaction_logs_size_bytes	事务日志总大小
page_cache.hit_ratio	neo4j_page_cache_hit_ratio	页面缓存命中率
page_cache.misses	neo4j_page_cache_misses_total	页面缓存未命中数
page_cache.hits	neo4j_page_cache_hits_total	页面缓存命中数

JVM 指标

Neo4j 指标	Prometheus 指标	描述	标签
jvm.memory.heap.used	jvm_memory_used_bytes	JVM 堆内存使用量	area="heap"
jvm.memory.heap.committed	jvm_memory_committed_bytes	JVM 堆内存提交量	area="heap"
jvm.memory.heap.max	jvm_memory_max_bytes	JVM 堆内存最大值	area="heap"
jvm.memory.nonheap.used	jvm_memory_used_bytes	JVM 非堆内存使用量	area="nonheap"
jvm.gc.pause	jvm_gc_pause_seconds	GC 暂停时间	action, cause
jvm.gc.collection_count	jvm_gc_collection_count_total	GC 收集次数	collector
jvm.gc.collection_time	jvm_gc_collection_seconds_total	GC 收集总时间	collector
jvm.threads.count	jvm_threads_count	JVM 线程数	state

网络指标

Neo4j 指标	Prometheus 指标	描述	标签
network.connections.active	neo4j_network_connections_active_total	活跃网络连接数	protocol
network.connections.created	neo4j_network_connections_created_total	已创建网络连接数	protocol
network.connections.closed	neo4j_network_connections_closed_total	已关闭网络连接数	protocol
network.received_bytes	neo4j_network_bytes_total	接收字节数	direction="received"
network.sent_bytes	neo4j_network_bytes_total	发送字节数	direction="sent"

集群指标

Neo4j 指标	Prometheus 指标	描述	标签
causal_clustering.members.healthy	neo4j_causal_clustering_members_healthy_total	健康集群成员数	role
causal_clustering.members.total	neo4j_causal_clustering_members_total	集群成员总数
causal_clustering.replication.lag	neo4j_causal_clustering_replication_lag_seconds	复制延迟
causal_clustering.leader_elections	neo4j_causal_clustering_leader_elections_total	领导选举次数

指标查询示例

数据库状态查询

txt

# 检查 Neo4j 版本和运行模式
neo4j_version
neo4j_mode

# 数据库启动时间（转换为可读格式）
time() - neo4j_start_time_seconds

事务性能查询

txt

# 活跃事务数
sum(neo4j_transaction_active_total)

# 每秒提交的写事务数
rate(neo4j_transaction_committed_total{type="write"}[5m])

# 事务回滚率
rate(neo4j_transaction_rollbacks_total[5m]) / rate(neo4j_transaction_committed_total[5m])

存储性能查询

txt

# 页面缓存命中率
neo4j_page_cache_hit_ratio

# 页面缓存未命中率
1 - neo4j_page_cache_hit_ratio

# 存储大小增长趋势
rate(neo4j_store_size_bytes[24h])

JVM 性能查询

txt

# JVM 堆内存使用率
(neo4j_jvm_memory_used_bytes{area="heap"} / neo4j_jvm_memory_max_bytes{area="heap"}) * 100

# GC 暂停时间总和
rate(jvm_gc_pause_seconds_sum[5m])

# GC 暂停频率
rate(jvm_gc_collection_count_total[5m])

查询性能查询

txt

# 活跃查询数
sum(neo4j_query_active_total)

# 慢查询率（执行时间超过 1 秒的查询）
rate(neo4j_query_slow_total[5m]) / rate(neo4j_query_committed_total[5m])

# 查询执行时间趋势
rate(neo4j_query_execution_time_seconds{quantile="0.95"}[5m])

Grafana 仪表盘配置

导入官方仪表盘

Neo4j 提供了官方 Grafana 仪表盘，可以从 Grafana 仪表盘库导入：

打开 Grafana 界面，点击左侧菜单的 "+" 按钮，选择 "Import"
输入仪表盘 ID：6753（Neo4j Overview）或 6754（Neo4j Cluster Overview）
选择 Prometheus 数据源
点击 "Import" 完成导入

自定义仪表盘

创建自定义 Grafana 仪表盘时，建议包含以下面板：

数据库概览：显示数据库版本、运行模式、启动时间等
事务监控：显示活跃事务数、每秒事务数、事务回滚率等
查询性能：显示活跃查询数、查询执行时间、慢查询率等
存储使用：显示存储大小、页面缓存命中率、事务日志大小等
JVM 监控：显示堆内存使用率、GC 暂停时间、线程数等
网络监控：显示网络连接数、吞吐量等
集群状态：显示集群成员数、复制延迟、领导选举次数等

告警配置

在 Grafana 中配置告警规则，例如：

当 JVM 堆内存使用率超过 85% 时触发警告
当页面缓存命中率低于 90% 时触发警告
当活跃查询数超过 100 时触发警告
当复制延迟超过 5 秒时触发警告

最佳实践

指标采集频率

对于生产环境，建议采集频率设置为 15-30 秒
对于开发环境，可以降低采集频率以减少资源消耗

指标存储策略

根据监控需求配置合适的指标保留时间
使用 Prometheus 联邦集群处理大规模监控数据
考虑使用 Thanos 或 Cortex 进行长期指标存储

指标查询优化

避免使用高基数标签，减少指标 cardinality
使用 PromQL 聚合函数减少返回数据量
为常用查询创建 Prometheus 记录规则

告警策略

定义清晰的告警级别（警告、错误、紧急）
设置合理的告警阈值，避免误报
配置告警通知渠道（邮件、Slack、PagerDuty 等）
定期审查和调整告警规则

常见问题（FAQ）

Q1: 如何启用特定的 Neo4j 指标导出？

A1: 使用 dbms.metrics.filter 配置项过滤要导出的指标。例如，仅导出存储相关指标：

txt

dbms.metrics.filter=store.*,page_cache.*

Q2: Neo4j 4.x 和 5.x 的 Prometheus 指标映射有什么区别？

A2: Neo4j 5.x 对 Prometheus 指标进行了优化，主要区别包括：

指标名称更加符合 Prometheus 命名规范
增加了更多细粒度的标签
调整了部分指标的计算方式
新增了一些 5.x 特有的指标

Q3: 如何监控多个 Neo4j 实例？

A3: 在 Prometheus 配置文件中添加多个目标，并使用标签区分不同实例：

yaml

scrape_configs:
  - job_name: 'neo4j'
    static_configs:
      - targets: ['neo4j-server-1:2004']
        labels:
          instance: 'neo4j-1'
      - targets: ['neo4j-server-2:2004']
        labels:
          instance: 'neo4j-2'

Q4: Prometheus 指标导出会影响 Neo4j 性能吗？

A4: 指标导出对性能的影响很小，但在高负载环境下，建议：

调整采集频率，降低采集频率
仅导出必要的指标
监控指标导出本身的性能开销

Q5: 如何调试 Prometheus 指标导出问题？

A5: 调试方法包括：

检查 Neo4j 日志中的指标相关信息
使用 curl 命令直接访问指标端点
检查 Prometheus 日志中的抓取错误
确保防火墙允许 Prometheus 访问 Neo4j 指标端口

Q6: 如何自定义 Prometheus 指标名称？

A6: Neo4j 不支持直接自定义指标名称，但可以在 Prometheus 中使用记录规则重命名指标：

yaml

groups:
- name: neo4j.rules
  rules:
  - record: custom_neo4j_transaction_count
    expr: sum(neo4j_transaction_active_total) by (instance)

Q7: 如何监控 Neo4j 企业版特有的指标？

A7: 确保使用 Neo4j 企业版，并在指标过滤器中包含企业版特有的指标前缀，如 causal_clustering.*、enterprise.* 等。

Q8: 如何导出 Neo4j 慢查询指标到 Prometheus？

A8: 配置慢查询日志，并使用 Prometheus 日志抓取工具（如 Promtail + Loki）分析慢查询日志，或将慢查询指标通过自定义 exporter 导出。

Q9: 如何监控 Neo4j 备份和恢复操作？

A9: 备份和恢复操作的指标可以通过以下方式监控：

监控备份过程中的 CPU、内存和磁盘使用率
检查备份日志中的关键指标
使用自定义脚本将备份状态导出到 Prometheus

Q10: 如何使用 Prometheus 监控 Neo4j 集群的健康状态？

A10: 监控集群健康状态的关键指标包括：

健康集群成员数（neo4j_causal_clustering_members_healthy_total）
复制延迟（neo4j_causal_clustering_replication_lag_seconds）
领导选举次数（neo4j_causal_clustering_leader_elections_total）
集群成员状态变化

可以使用这些指标创建 Grafana 仪表盘和告警规则，监控集群健康状态。

Neo4j Prometheus 指标映射 ​

集成配置 ​

启用 Prometheus 指标导出 ​

Prometheus 配置 ​

验证指标导出 ​

核心指标映射 ​

数据库核心指标 ​

查询执行指标 ​

存储指标 ​

JVM 指标 ​

网络指标 ​

集群指标 ​

指标查询示例 ​

数据库状态查询 ​

事务性能查询 ​

存储性能查询 ​

JVM 性能查询 ​

查询性能查询 ​

Grafana 仪表盘配置 ​

导入官方仪表盘 ​

自定义仪表盘 ​

告警配置 ​

最佳实践 ​

指标采集频率 ​

指标存储策略 ​

指标查询优化 ​

告警策略 ​

常见问题（FAQ） ​

Q1: 如何启用特定的 Neo4j 指标导出？ ​

Q2: Neo4j 4.x 和 5.x 的 Prometheus 指标映射有什么区别？ ​

Q3: 如何监控多个 Neo4j 实例？ ​

Q4: Prometheus 指标导出会影响 Neo4j 性能吗？ ​

Q5: 如何调试 Prometheus 指标导出问题？ ​

Q6: 如何自定义 Prometheus 指标名称？ ​

Q7: 如何监控 Neo4j 企业版特有的指标？ ​

Q8: 如何导出 Neo4j 慢查询指标到 Prometheus？ ​

Q9: 如何监控 Neo4j 备份和恢复操作？ ​

Q10: 如何使用 Prometheus 监控 Neo4j 集群的健康状态？ ​

Neo4j Prometheus 指标映射

集成配置

启用 Prometheus 指标导出

Prometheus 配置

验证指标导出

核心指标映射

数据库核心指标

查询执行指标

存储指标

JVM 指标

网络指标

集群指标

指标查询示例

数据库状态查询

事务性能查询

存储性能查询

JVM 性能查询

查询性能查询

Grafana 仪表盘配置

导入官方仪表盘

自定义仪表盘

告警配置

最佳实践

指标采集频率

指标存储策略

指标查询优化

告警策略

常见问题（FAQ）

Q1: 如何启用特定的 Neo4j 指标导出？

Q2: Neo4j 4.x 和 5.x 的 Prometheus 指标映射有什么区别？

Q3: 如何监控多个 Neo4j 实例？

Q4: Prometheus 指标导出会影响 Neo4j 性能吗？

Q5: 如何调试 Prometheus 指标导出问题？

Q6: 如何自定义 Prometheus 指标名称？

Q7: 如何监控 Neo4j 企业版特有的指标？

Q8: 如何导出 Neo4j 慢查询指标到 Prometheus？

Q9: 如何监控 Neo4j 备份和恢复操作？

Q10: 如何使用 Prometheus 监控 Neo4j 集群的健康状态？