外观
PostgreSQL 内置诊断命令
常用诊断命令分类
1. 系统状态命令
用于查看数据库系统的整体状态和配置信息,帮助 DBA 了解系统运行情况。
pg_stat_activity
查看当前数据库连接和活动会话信息,是最常用的诊断命令之一:
sql
-- 查看所有活动会话
SELECT * FROM pg_stat_activity;
-- 查看活跃的查询会话,过滤掉空闲连接
SELECT pid, datname, usename, application_name, client_addr, backend_start, query_start, state, query
FROM pg_stat_activity
WHERE state = 'active';
-- 查看长时间运行的查询(超过5分钟),及时发现慢查询
SELECT pid, datname, usename, query_start, now() - query_start AS 持续时间, query
FROM pg_stat_activity
WHERE state = 'active' AND now() - query_start > interval '5 minutes';pg_settings
查看和修改数据库配置参数,方便调整系统性能:
sql
-- 查看所有配置参数
SELECT * FROM pg_settings;
-- 查看内存相关参数,如 shared_buffers 和 work_mem
SELECT name, setting, unit, category, short_desc
FROM pg_settings
WHERE name LIKE '%shared_buffers%' OR name LIKE '%work_mem%';
-- 查看参数的生效范围和类型,判断是否需要重启
SELECT name, setting, vartype, context, source
FROM pg_settings
WHERE name = 'max_connections';pg_stat_database
查看数据库级别的统计信息,了解各个数据库的使用情况:
sql
-- 查看所有数据库的统计信息
SELECT * FROM pg_stat_database;
-- 查看特定数据库(如 postgres)的详细统计
SELECT datname, numbackends, xact_commit, xact_rollback, blks_read, blks_hit, tup_returned, tup_fetched, tup_inserted, tup_updated, tup_deleted
FROM pg_stat_database
WHERE datname = 'postgres';2. 性能诊断命令
用于分析数据库性能问题,找出性能瓶颈。
EXPLAIN
查看查询执行计划,分析 SQL 语句的执行效率:
sql
-- 查看简单查询的执行计划
EXPLAIN SELECT * FROM pg_stat_user_tables WHERE n_dead_tup > 1000;
-- 查看执行计划的详细信息,包括实际执行统计
EXPLAIN ANALYZE SELECT * FROM pg_stat_user_tables WHERE n_dead_tup > 1000;
-- 查看执行计划的成本估算,了解查询优化方向
EXPLAIN VERBOSE SELECT * FROM pg_stat_user_tables WHERE n_dead_tup > 1000;pg_stat_statements
查看 SQL 语句的执行统计信息(需要安装 pg_stat_statements 扩展),找出最耗时的查询:
sql
-- 查看最耗时的前10个查询
SELECT queryid, query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;
-- 查看执行次数最多的前10个查询
SELECT queryid, query, calls, total_exec_time, mean_exec_time
FROM pg_stat_statements
ORDER BY calls DESC
LIMIT 10;pg_stat_bgwriter
查看后台写进程的统计信息,了解缓冲区管理和检查点情况:
sql
-- 查看后台写进程的详细统计
SELECT * FROM pg_stat_bgwriter;
-- 计算缓冲区命中率和定时检查点比例
SELECT
(1 - (blks_hit::numeric / NULLIF(blks_read + blks_hit, 0))) * 100 AS 缓冲区未命中率,
(checkpoints_timed::numeric / NULLIF(checkpoints_timed + checkpoints_req, 0)) * 100 AS 定时检查点比例
FROM pg_stat_bgwriter;3. 锁和等待事件命令
用于分析锁和等待事件,定位阻塞问题,解决死锁和长时间等待。
pg_locks
查看当前锁持有和等待情况,帮助定位阻塞源:
sql
-- 查看所有锁信息
SELECT * FROM pg_locks;
-- 查看阻塞关系,找出谁阻塞了谁
SELECT
blocked_locks.pid AS 被阻塞进程ID,
blocked_activity.usename AS 被阻塞用户,
blocking_locks.pid AS 阻塞进程ID,
blocking_activity.usename AS 阻塞用户,
blocked_activity.query AS 被阻塞查询,
blocking_activity.query AS 阻塞查询
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;等待事件查询
通过 pg_stat_activity 查看会话的等待事件,了解会话正在等待什么资源:
sql
-- 查看当前所有等待事件
SELECT pid, datname, usename, wait_event_type, wait_event, state, query
FROM pg_stat_activity
WHERE wait_event IS NOT NULL;
-- 按等待事件类型统计,找出最常见的等待类型
SELECT wait_event_type, wait_event, count(*) AS 等待数量
FROM pg_stat_activity
WHERE wait_event IS NOT NULL
GROUP BY wait_event_type, wait_event
ORDER BY count(*) DESC;4. 空间使用命令
用于查看数据库对象的空间使用情况,监控表和索引的大小变化。
表和索引空间查询
查看表和索引的空间使用情况,找出占用空间较大的对象:
sql
-- 查看表的总空间使用(包括索引和TOAST表),按大小降序排列前10名
SELECT schemaname, relname, pg_total_relation_size(relid) AS 总大小, pg_relation_size(relid) AS 表大小
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;
-- 查看特定表的空间使用明细,使用 pg_size_pretty 格式化显示
SELECT
pg_size_pretty(pg_relation_size('table_name')) AS 表大小,
pg_size_pretty(pg_indexes_size('table_name')) AS 索引大小,
pg_size_pretty(pg_total_relation_size('table_name')) AS 总大小;表统计信息查询
查看用户表的统计信息,了解表的行数、死元组数量等:
sql
-- 查看表的行数和死元组数,按死元组数量降序排列
SELECT schemaname, relname, n_live_tup AS 活元组, n_dead_tup AS 死元组, last_vacuum AS 最后手动清理时间, last_autovacuum AS 最后自动清理时间
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC
LIMIT 10;
-- 查看需要 vacuum 的表(死元组超过1000且比例超过10%)
SELECT schemaname, relname, n_dead_tup, n_live_tup, n_dead_tup::numeric / NULLIF(n_live_tup, 0) * 100 AS 死元组比例
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000 AND n_dead_tup::numeric / NULLIF(n_live_tup, 0) > 0.1;5. WAL 相关命令
用于监控 WAL 写入和归档情况,确保数据安全和复制正常。
WAL 统计查询
查看 WAL 写入统计信息,了解 WAL 生成速率:
sql
-- 查看 WAL 统计信息
SELECT * FROM pg_stat_wal;
-- 查看 WAL 归档情况,确认归档是否正常
SELECT * FROM pg_stat_archiver;WAL 位置查询
查看当前 WAL 位置,用于监控复制延迟:
sql
-- 查看当前 WAL LSN(日志序列号)
SELECT pg_current_wal_lsn();
-- 查看 WAL 写入速率
SELECT
pg_wal_lsn_diff(pg_current_wal_lsn(), pg_stat_get_wal_stats(true).wal_lsn) AS 已写入WAL量,
pg_stat_get_wal_stats(true).wal_write_time / 1000000 AS 写入时间(秒)
FROM pg_stat_get_wal_stats(true);诊断命令使用场景
1. 性能问题诊断
- 使用
pg_stat_activity查看长时间运行的查询,定位慢查询 - 使用
EXPLAIN ANALYZE分析查询执行计划,找出性能瓶颈 - 使用
pg_stat_statements找出最耗时的查询,优化高频慢查询 - 使用
pg_stat_bgwriter查看缓冲区命中率和检查点情况,调整内存配置
2. 阻塞问题诊断
- 使用
pg_locks查看锁持有和等待情况,找出阻塞关系 - 使用
pg_stat_activity查看会话的等待事件,了解等待类型 - 结合两者定位阻塞源和被阻塞会话,及时解除阻塞
3. 空间问题诊断
- 使用
pg_total_relation_size查看大表,进行表分区或清理 - 使用
pg_stat_user_tables查看表膨胀情况,及时执行 VACUUM 或 VACUUM FULL - 使用
pg_stat_database查看数据库增长趋势,规划存储空间
4. 复制问题诊断
- 使用
pg_stat_replication查看复制状态,确认复制是否正常 - 使用
pg_current_wal_lsn和pg_last_wal_replay_lsn查看复制延迟 - 使用
pg_stat_wal查看 WAL 写入情况,监控 WAL 生成速率
最佳实践
1. 定期执行诊断命令
- 每日查看
pg_stat_activity,监控长时间运行的查询 - 每周查看
pg_stat_user_tables,及时清理死元组 - 每月查看
pg_stat_wal,监控 WAL 写入情况和归档状态
2. 结合监控工具
将内置诊断命令的输出集成到 Prometheus、Zabbix 等监控系统中,设置告警阈值,及时发现问题:
- 长时间运行查询告警
- 高阻塞率告警
- 表膨胀率告警
- WAL 归档失败告警
3. 保存历史数据
定期保存诊断命令的输出到日志或数据库中,用于趋势分析和问题回溯:
- 每周保存一次大表排名
- 每月保存一次性能统计
- 出现问题时保存详细的诊断信息
4. 注意权限控制
限制普通用户对某些敏感诊断命令的访问权限,保护数据库安全:
- 只允许 DBA 角色访问 pg_stat_activity 中的查询内容
- 限制普通用户查看系统级统计信息
- 使用视图封装常用诊断查询,只暴露必要信息
5. 合理使用视图
创建自定义视图简化常用的诊断查询,提高运维效率:
sql
-- 创建活跃会话视图,方便查看非空闲会话
CREATE OR REPLACE VIEW v_active_sessions AS
SELECT pid, datname, usename, application_name, client_addr, backend_start, query_start, state, wait_event_type, wait_event, query
FROM pg_stat_activity
WHERE state != 'idle' AND pid != pg_backend_pid();
-- 创建大表视图,格式化显示表大小信息
CREATE OR REPLACE VIEW v_large_tables AS
SELECT schemaname, relname,
pg_size_pretty(pg_relation_size(relid)) AS 表大小,
pg_size_pretty(pg_indexes_size(relid)) AS 索引大小,
pg_size_pretty(pg_total_relation_size(relid)) AS 总大小,
n_live_tup AS 活元组,
n_dead_tup AS 死元组
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC;常见问题(FAQ)
Q1:如何找出最耗时的查询?
A1:使用 pg_stat_statements 扩展:
sql
SELECT queryid, query, calls, total_exec_time, mean_exec_time, rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;Q2:如何定位数据库阻塞问题?
A2:结合 pg_locks 和 pg_stat_activity:
sql
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_query,
blocking_activity.query AS blocking_query
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;Q3:如何查看表的空间使用情况?
A3:使用 pg_total_relation_size 和 pg_relation_size:
sql
SELECT schemaname, relname,
pg_size_pretty(pg_relation_size(relid)) AS table_size,
pg_size_pretty(pg_indexes_size(relid)) AS indexes_size,
pg_size_pretty(pg_total_relation_size(relid)) AS total_size
FROM pg_stat_user_tables
ORDER BY pg_total_relation_size(relid) DESC
LIMIT 10;Q4:如何查看 PostgreSQL 缓冲区命中率?
A4:使用 pg_stat_bgwriter:
sql
SELECT
(1 - (blks_hit::numeric / NULLIF(blks_read + blks_hit, 0))) * 100 AS buffer_miss_rate
FROM pg_stat_bgwriter;Q5:如何查看当前连接数?
A5:使用 pg_stat_activity:
sql
SELECT count(*) AS total_connections,
count(*) FILTER (WHERE state = 'active') AS active_connections
FROM pg_stat_activity;Q6:如何查看 PostgreSQL 版本信息?
A6:使用 version() 函数:
sql
SELECT version();Q7:如何查看数据库的检查点情况?
A7:使用 pg_stat_bgwriter:
sql
SELECT
checkpoints_timed,
checkpoints_req,
checkpoints_timed + checkpoints_req AS total_checkpoints,
(checkpoints_timed::numeric / NULLIF(checkpoints_timed + checkpoints_req, 0)) * 100 AS checkpoint_timed_rate
FROM pg_stat_bgwriter;