Skip to content

PostgreSQL 内置诊断命令

常用诊断命令分类

1. 系统状态命令

用于查看数据库系统的整体状态和配置信息,帮助 DBA 了解系统运行情况。

pg_stat_activity

查看当前数据库连接和活动会话信息,是最常用的诊断命令之一:

sql
-- 查看所有活动会话
SELECT * FROM pg_stat_activity;

-- 查看活跃的查询会话,过滤掉空闲连接
SELECT pid, datname, usename, application_name, client_addr, backend_start, query_start, state, query
FROM pg_stat_activity
WHERE state = 'active';

-- 查看长时间运行的查询(超过5分钟),及时发现慢查询
SELECT pid, datname, usename, query_start, now() - query_start AS 持续时间, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - query_start > interval '5 minutes';

pg_settings

查看和修改数据库配置参数,方便调整系统性能:

sql
-- 查看所有配置参数
SELECT * FROM pg_settings;

-- 查看内存相关参数,如 shared_buffers 和 work_mem
SELECT name, setting, unit, category, short_desc
FROM pg_settings
WHERE name LIKE '%shared_buffers%' OR name LIKE '%work_mem%';

-- 查看参数的生效范围和类型,判断是否需要重启
SELECT name, setting, vartype, context, source
FROM pg_settings
WHERE name = 'max_connections';

pg_stat_database

查看数据库级别的统计信息,了解各个数据库的使用情况:

sql
-- 查看所有数据库的统计信息
SELECT * FROM pg_stat_database;

-- 查看特定数据库(如 postgres)的详细统计
SELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted
FROM pg_stat_database
WHERE datname = 'postgres';

2. 性能诊断命令

用于分析数据库性能问题,找出性能瓶颈。

EXPLAIN

查看查询执行计划,分析 SQL 语句的执行效率:

sql
-- 查看简单查询的执行计划
EXPLAIN SELECT * FROM pg_stat_user_tables WHERE n_dead_tup > 1000;

-- 查看执行计划的详细信息,包括实际执行统计
EXPLAIN ANALYZE SELECT * FROM pg_stat_user_tables WHERE n_dead_tup > 1000;

-- 查看执行计划的成本估算,了解查询优化方向
EXPLAIN VERBOSE SELECT * FROM pg_stat_user_tables WHERE n_dead_tup > 1000;

pg_stat_statements

查看 SQL 语句的执行统计信息(需要安装 pg_stat_statements 扩展),找出最耗时的查询:

sql
-- 查看最耗时的前10个查询
SELECT queryid, query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

-- 查看执行次数最多的前10个查询
SELECT queryid, query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;

pg_stat_bgwriter

查看后台写进程的统计信息,了解缓冲区管理和检查点情况:

sql
-- 查看后台写进程的详细统计
SELECT * FROM pg_stat_bgwriter;

-- 计算缓冲区命中率和定时检查点比例
SELECT 
  (1 - (blks_hit::numeric / NULLIF(blks_read + blks_hit, 0))) * 100 AS 缓冲区未命中率,
  (checkpoints_timed::numeric / NULLIF(checkpoints_timed + checkpoints_req, 0)) * 100 AS 定时检查点比例
FROM pg_stat_bgwriter;

3. 锁和等待事件命令

用于分析锁和等待事件,定位阻塞问题,解决死锁和长时间等待。

pg_locks

查看当前锁持有和等待情况,帮助定位阻塞源:

sql
-- 查看所有锁信息
SELECT * FROM pg_locks;

-- 查看阻塞关系,找出谁阻塞了谁
SELECT 
  blocked_locks.pid AS 被阻塞进程ID,
  blocked_activity.usename AS 被阻塞用户,
  blocking_locks.pid AS 阻塞进程ID,
  blocking_activity.usename AS 阻塞用户,
  blocked_activity.query AS 被阻塞查询,
  blocking_activity.query AS 阻塞查询
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
  AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
  AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
  AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
  AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
  AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
  AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
  AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
  AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
  AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
  AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

等待事件查询

通过 pg_stat_activity 查看会话的等待事件,了解会话正在等待什么资源:

sql
-- 查看当前所有等待事件
SELECT pid, datname, usename, wait_event_type, wait_event, state, query
FROM pg_stat_activity
WHERE wait_event IS NOT NULL;

-- 按等待事件类型统计,找出最常见的等待类型
SELECT wait_event_type, wait_event, count(*) AS 等待数量
FROM pg_stat_activity
WHERE wait_event IS NOT NULL
GROUP BY wait_event_type, wait_event
ORDER BY count(*) DESC;

4. 空间使用命令

用于查看数据库对象的空间使用情况,监控表和索引的大小变化。

表和索引空间查询

查看表和索引的空间使用情况,找出占用空间较大的对象:

sql
-- 查看表的总空间使用(包括索引和TOAST表),按大小降序排列前10名
SELECT schemaname, relname, pg_total_relation_size(relid) AS 总大小, pg_relation_size(relid) AS 表大小
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;

-- 查看特定表的空间使用明细,使用 pg_size_pretty 格式化显示
SELECT 
  pg_size_pretty(pg_relation_size('table_name')) AS 表大小,
  pg_size_pretty(pg_indexes_size('table_name')) AS 索引大小,
  pg_size_pretty(pg_total_relation_size('table_name')) AS 总大小;

表统计信息查询

查看用户表的统计信息,了解表的行数、死元组数量等:

sql
-- 查看表的行数和死元组数,按死元组数量降序排列
SELECT schemaname, relname, n_live_tup AS 活元组, n_dead_tup AS 死元组, last_vacuum AS 最后手动清理时间, last_autovacuum AS 最后自动清理时间
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;

-- 查看需要 vacuum 的表(死元组超过1000且比例超过10%)
SELECT schemaname, relname, n_dead_tup, n_live_tup, n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100 AS 死元组比例
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000 AND n_dead_tup::numeric / NULLIF(n_live_tup, 0) > 0.1;

5. WAL 相关命令

用于监控 WAL 写入和归档情况,确保数据安全和复制正常。

WAL 统计查询

查看 WAL 写入统计信息,了解 WAL 生成速率:

sql
-- 查看 WAL 统计信息
SELECT * FROM pg_stat_wal;

-- 查看 WAL 归档情况,确认归档是否正常
SELECT * FROM pg_stat_archiver;

WAL 位置查询

查看当前 WAL 位置,用于监控复制延迟:

sql
-- 查看当前 WAL LSN(日志序列号)
SELECT pg_current_wal_lsn();

-- 查看 WAL 写入速率
SELECT 
  pg_wal_lsn_diff(pg_current_wal_lsn(), pg_stat_get_wal_stats(true).wal_lsn) AS 已写入WAL量,
  pg_stat_get_wal_stats(true).wal_write_time / 1000000 AS 写入时间(秒)
FROM pg_stat_get_wal_stats(true);

诊断命令使用场景

1. 性能问题诊断

  • 使用 pg_stat_activity 查看长时间运行的查询,定位慢查询
  • 使用 EXPLAIN ANALYZE 分析查询执行计划,找出性能瓶颈
  • 使用 pg_stat_statements 找出最耗时的查询,优化高频慢查询
  • 使用 pg_stat_bgwriter 查看缓冲区命中率和检查点情况,调整内存配置

2. 阻塞问题诊断

  • 使用 pg_locks 查看锁持有和等待情况,找出阻塞关系
  • 使用 pg_stat_activity 查看会话的等待事件,了解等待类型
  • 结合两者定位阻塞源和被阻塞会话,及时解除阻塞

3. 空间问题诊断

  • 使用 pg_total_relation_size 查看大表,进行表分区或清理
  • 使用 pg_stat_user_tables 查看表膨胀情况,及时执行 VACUUM 或 VACUUM FULL
  • 使用 pg_stat_database 查看数据库增长趋势,规划存储空间

4. 复制问题诊断

  • 使用 pg_stat_replication 查看复制状态,确认复制是否正常
  • 使用 pg_current_wal_lsnpg_last_wal_replay_lsn 查看复制延迟
  • 使用 pg_stat_wal 查看 WAL 写入情况,监控 WAL 生成速率

最佳实践

1. 定期执行诊断命令

  • 每日查看 pg_stat_activity,监控长时间运行的查询
  • 每周查看 pg_stat_user_tables,及时清理死元组
  • 每月查看 pg_stat_wal,监控 WAL 写入情况和归档状态

2. 结合监控工具

将内置诊断命令的输出集成到 Prometheus、Zabbix 等监控系统中,设置告警阈值,及时发现问题:

  • 长时间运行查询告警
  • 高阻塞率告警
  • 表膨胀率告警
  • WAL 归档失败告警

3. 保存历史数据

定期保存诊断命令的输出到日志或数据库中,用于趋势分析和问题回溯:

  • 每周保存一次大表排名
  • 每月保存一次性能统计
  • 出现问题时保存详细的诊断信息

4. 注意权限控制

限制普通用户对某些敏感诊断命令的访问权限,保护数据库安全:

  • 只允许 DBA 角色访问 pg_stat_activity 中的查询内容
  • 限制普通用户查看系统级统计信息
  • 使用视图封装常用诊断查询,只暴露必要信息

5. 合理使用视图

创建自定义视图简化常用的诊断查询,提高运维效率:

sql
-- 创建活跃会话视图,方便查看非空闲会话
CREATE OR REPLACE VIEW v_active_sessions AS
SELECT pid, datname, usename, application_name, client_addr, backend_start, query_start, state, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE state != 'idle' AND pid != pg_backend_pid();

-- 创建大表视图,格式化显示表大小信息
CREATE OR REPLACE VIEW v_large_tables AS
SELECT schemaname, relname, 
  pg_size_pretty(pg_relation_size(relid)) AS 表大小,
  pg_size_pretty(pg_indexes_size(relid)) AS 索引大小,
  pg_size_pretty(pg_total_relation_size(relid)) AS 总大小,
  n_live_tup AS 活元组,
  n_dead_tup AS 死元组
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC;

常见问题(FAQ)

Q1:如何找出最耗时的查询?

A1:使用 pg_stat_statements 扩展:

sql
SELECT queryid, query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Q2:如何定位数据库阻塞问题?

A2:结合 pg_locks 和 pg_stat_activity:

sql
SELECT 
  blocked_locks.pid AS blocked_pid,
  blocked_activity.usename AS blocked_user,
  blocking_locks.pid AS blocking_pid,
  blocking_activity.usename AS blocking_user,
  blocked_activity.query AS blocked_query,
  blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
  AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
  AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
  AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

Q3:如何查看表的空间使用情况?

A3:使用 pg_total_relation_size 和 pg_relation_size:

sql
SELECT schemaname, relname, 
  pg_size_pretty(pg_relation_size(relid)) AS table_size,
  pg_size_pretty(pg_indexes_size(relid)) AS indexes_size,
  pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;

Q4:如何查看 PostgreSQL 缓冲区命中率?

A4:使用 pg_stat_bgwriter:

sql
SELECT 
  (1 - (blks_hit::numeric / NULLIF(blks_read + blks_hit, 0))) * 100 AS buffer_miss_rate
FROM pg_stat_bgwriter;

Q5:如何查看当前连接数?

A5:使用 pg_stat_activity:

sql
SELECT count(*) AS total_connections, 
  count(*) FILTER (WHERE state = 'active') AS active_connections
FROM pg_stat_activity;

Q6:如何查看 PostgreSQL 版本信息?

A6:使用 version() 函数:

sql
SELECT version();

Q7:如何查看数据库的检查点情况?

A7:使用 pg_stat_bgwriter:

sql
SELECT 
  checkpoints_timed,
  checkpoints_req,
  checkpoints_timed + checkpoints_req AS total_checkpoints,
  (checkpoints_timed::numeric / NULLIF(checkpoints_timed + checkpoints_req, 0)) * 100 AS checkpoint_timed_rate
FROM pg_stat_bgwriter;