外观
锁等待分析
锁是数据库并发控制的核心机制,它确保了多个事务可以安全地访问共享资源。然而,不恰当的锁使用会导致锁等待和死锁,影响数据库性能。通过分析锁等待,可以识别锁竞争热点,优化事务设计和SQL语句,从而提高数据库的并发性能。本文将详细介绍锁的基本概念、分析方法和优化策略。
锁基本概念
1. 锁的作用
- 并发控制:确保多个事务可以安全地访问共享资源
- 数据一致性:防止脏读、不可重复读和幻读
- 事务隔离:实现不同的事务隔离级别
- 资源保护:保护数据库对象(表、行、索引等)不被并发修改破坏
2. 锁粒度
PostgreSQL支持多种粒度的锁,从行级锁到数据库级锁:
| 锁粒度 | 描述 | 并发度 | 开销 |
|---|---|---|---|
| 行级锁 | 锁定表中的单行数据 | 最高 | 最高 |
| 页级锁 | 锁定数据页(默认8KB) | 高 | 中 |
| 表级锁 | 锁定整个表 | 低 | 低 |
| 数据库级锁 | 锁定整个数据库 | 最低 | 最低 |
3. 锁类型
PostgreSQL提供了多种类型的锁,用于不同的操作场景:
表级锁
| 锁类型 | 描述 | 冲突的锁类型 |
|---|---|---|
| ACCESS SHARE | SELECT查询 | ACCESS EXCLUSIVE |
| ROW SHARE | SELECT FOR UPDATE/SHARE | EXCLUSIVE, ACCESS EXCLUSIVE |
| ROW EXCLUSIVE | INSERT, UPDATE, DELETE | SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE |
| SHARE UPDATE EXCLUSIVE | VACUUM, ANALYZE | SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE |
| SHARE | CREATE INDEX | ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE |
| SHARE ROW EXCLUSIVE | CREATE INDEX CONCURRENTLY | ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE |
| EXCLUSIVE | ALTER TABLE, DROP TABLE | ROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE |
| ACCESS EXCLUSIVE | DROP DATABASE, TRUNCATE | 所有锁类型 |
行级锁
| 锁类型 | 描述 |
|---|---|
| FOR UPDATE | 排它锁,防止其他事务修改或删除该行 |
| FOR NO KEY UPDATE | 弱排它锁,允许其他事务添加非主键索引 |
| FOR SHARE | 共享锁,允许其他事务读取但不能修改该行 |
| FOR KEY SHARE | 弱共享锁,允许其他事务修改但不能删除该行 |
锁等待监控
1. 实时监控锁等待
sql
-- 查看当前锁等待情况
SELECT
bl.pid AS blocked_pid,
a.usename AS blocked_user,
kl.pid AS blocking_pid,
ka.usename AS blocking_user,
a.query AS blocked_query,
ka.query AS blocking_query,
a.query_start AS blocked_query_start,
now() - a.query_start AS wait_duration,
kl.mode AS blocking_mode,
bl.locktype AS lock_type,
bl.relation::regclass AS relation,
bl.page AS page,
bl.tuple AS tuple
FROM pg_locks bl
JOIN pg_stat_activity a ON bl.pid = a.pid
JOIN pg_locks kl ON bl.transactionid = kl.transactionid AND bl.pid != kl.pid
JOIN pg_stat_activity ka ON kl.pid = ka.pid
WHERE NOT bl.granted;
-- 查看所有锁信息
SELECT
pid,
usename,
locktype,
database::regclass,
relation::regclass,
mode,
granted,
page,
tuple,
transactionid,
virtualxid,
virtualtransaction,
pid::int8 AS process_id
FROM pg_locks
ORDER BY pid;
-- 查看当前活动事务和锁信息
SELECT
pid,
usename,
datname,
state,
query,
query_start,
xact_start,
now() - xact_start AS xact_duration,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE state = 'active'
ORDER BY xact_duration DESC;2. 锁等待日志
ini
# 记录锁等待信息
log_lock_waits = on # 记录等待时间超过deadlock_timeout的锁等待
# 设置锁等待超时时间
deadlock_timeout = 1s # 默认1秒
# 日志行前缀,包含必要信息
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '3. 锁等待视图
pg_stat_activity
包含当前活动连接和事务的信息,包括等待事件:
sql
SELECT
pid,
usename,
datname,
state,
wait_event_type,
wait_event,
query,
query_start,
xact_start,
now() - xact_start AS xact_duration
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';pg_locks
包含当前所有锁的信息:
sql
SELECT
locktype,
database::regclass,
relation::regclass,
mode,
granted,
count(*) AS lock_count
FROM pg_locks
GROUP BY locktype, database, relation, mode, granted
ORDER BY lock_count DESC;pg_stat_user_tables
包含表的锁等待统计信息:
sql
SELECT
schemaname,
relname,
n_live_tup,
n_dead_tup,
n_mod_since_analyze,
last_vacuum,
last_analyze,
n_locks * 100.0 / NULLIF(n_live_tup, 0) AS lock_ratio
FROM pg_stat_user_tables
ORDER BY lock_ratio DESC
LIMIT 10;死锁分析
1. 死锁的定义
死锁是指两个或多个事务互相等待对方释放锁,导致所有事务都无法继续执行的状态。
2. 死锁检测
PostgreSQL自动检测死锁,并回滚其中一个事务以打破死锁:
ini
# 死锁检测开关
# deadlock_timeout = 1s # 默认1秒,超过该时间检测死锁3. 死锁日志
当检测到死锁时,PostgreSQL会在日志中记录死锁信息:
2023-10-15 14:30:25 UTC [12345]: [1-1] user=postgres,db=mydb,app=psql,client=127.0.0.1 ERROR: deadlock detected
2023-10-15 14:30:25 UTC [12345]: [2-1] user=postgres,db=mydb,app=psql,client=127.0.0.1 DETAIL: Process 12345 waits for ShareLock on transaction 65536; blocked by process 12346.
Process 12346 waits for ShareLock on transaction 65537; blocked by process 12345.
Process 12345: UPDATE users SET name = 'test1' WHERE id = 1;
Process 12346: UPDATE users SET name = 'test2' WHERE id = 2;
2023-10-15 14:30:25 UTC [12345]: [3-1] user=postgres,db=mydb,app=psql,client=127.0.0.1 HINT: See server log for query details.
2023-10-15 14:30:25 UTC [12345]: [4-1] user=postgres,db=mydb,app=psql,client=127.0.0.1 STATEMENT: UPDATE users SET name = 'test1' WHERE id = 1;4. 死锁示例
示例1:简单死锁
事务1:
sql
BEGIN;
UPDATE users SET name = 'test1' WHERE id = 1;
-- 等待一段时间
UPDATE users SET name = 'test2' WHERE id = 2;
COMMIT;事务2:
sql
BEGIN;
UPDATE users SET name = 'test2' WHERE id = 2;
-- 等待一段时间
UPDATE users SET name = 'test1' WHERE id = 1;
COMMIT;示例2:死锁检测与回滚
当发生死锁时,PostgreSQL会回滚其中一个事务:
ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 65536; blocked by process 12346.
Process 12346 waits for ShareLock on transaction 65537; blocked by process 12345.
HINT: See server log for query details.锁等待分析方法
1. 识别锁等待热点
sql
-- 查看锁等待频繁的表
SELECT
relation::regclass AS table_name,
count(*) AS lock_wait_count
FROM pg_locks
WHERE NOT granted
GROUP BY relation
ORDER BY lock_wait_count DESC;
-- 查看锁等待时间最长的事务
SELECT
pid,
usename,
datname,
query,
query_start,
now() - query_start AS wait_duration,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY wait_duration DESC;
-- 查看锁定最多行的事务
SELECT
pid,
usename,
count(*) AS locked_rows
FROM pg_locks
WHERE locktype = 'tuple' AND granted
GROUP BY pid, usename
ORDER BY locked_rows DESC;2. 分析锁等待原因
原因1:长事务
长事务持有锁的时间过长,导致其他事务等待:
sql
-- 查看长时间运行的事务
SELECT
pid,
usename,
datname,
query,
query_start,
xact_start,
now() - xact_start AS xact_duration
FROM pg_stat_activity
WHERE state = 'active' AND now() - xact_start > interval '5 minutes'
ORDER BY xact_duration DESC;原因2:锁粒度不当
使用了过粗粒度的锁,导致并发度下降:
sql
-- 查看表级锁
SELECT
pid,
usename,
relation::regclass AS table_name,
mode,
granted,
query
FROM pg_locks
WHERE locktype = 'relation' AND mode IN ('ACCESS EXCLUSIVE', 'EXCLUSIVE', 'SHARE')
ORDER BY pid;原因3:索引缺失
缺少合适的索引,导致全表扫描,锁定过多行:
sql
-- 查看全表扫描的查询
SELECT
pid,
usename,
query,
query_start,
now() - query_start AS duration
FROM pg_stat_activity
WHERE state = 'active'
AND query LIKE '%Seq Scan%' -- 可以通过EXPLAIN ANALYZE确认
ORDER BY duration DESC;原因4:不正确的事务隔离级别
使用了过高的事务隔离级别,导致锁定更多资源:
sql
-- 查看当前事务隔离级别
SHOW default_transaction_isolation;
-- 查看会话的事务隔离级别
SELECT
pid,
usename,
current_setting('transaction_isolation') AS isolation_level
FROM pg_stat_activity
WHERE state = 'active';锁等待优化策略
1. 事务优化
- 缩短事务持有时间:尽快提交或回滚事务
- 避免在事务中执行非数据库操作:如网络请求、文件IO等
- 分批处理大量数据:将大事务拆分为多个小事务
- 使用合适的事务隔离级别:优先使用READ COMMITTED隔离级别
- 避免长时间锁持有:如SELECT FOR UPDATE应该尽快处理
2. SQL语句优化
- 创建合适的索引:避免全表扫描,减少锁定的行数
- 优化查询条件:使用选择性高的条件,减少锁定的行数
- 避免锁定不必要的资源:如使用SELECT FOR SHARE替代SELECT FOR UPDATE
- 优化UPDATE和DELETE语句:确保WHERE条件有索引
- 使用LIMIT:限制锁定的行数
3. 锁粒度优化
- 使用行级锁替代表级锁:提高并发度
- 避免表级锁操作:如避免在高并发时执行ALTER TABLE、CREATE INDEX等
- 使用CREATE INDEX CONCURRENTLY:创建索引时不阻塞读写操作
- 使用VACUUM ANALYZE替代VACUUM FULL:VACUUM FULL会持有表级锁
4. 并发控制优化
- 使用乐观锁:通过版本号或时间戳实现并发控制
- 使用排队机制:在应用层实现对热点资源的排队访问
- 使用分区表:将大表拆分为多个小表,减少锁竞争
- 使用读写分离:将读操作分流到从库,减少主库的锁竞争
5. 配置优化
- 调整deadlock_timeout:根据实际情况调整死锁检测时间
- 调整lock_timeout:设置事务等待锁的超时时间
- 调整max_locks_per_transaction:增加每个事务可以持有的锁数量
- 调整autovacuum相关参数:确保Autovacuum及时运行,减少锁竞争
锁等待最佳实践
1. 生产环境监控
sql
-- 创建锁等待监控视图
CREATE OR REPLACE VIEW v_lock_waits AS
SELECT
bl.pid AS blocked_pid,
a.usename AS blocked_user,
kl.pid AS blocking_pid,
ka.usename AS blocking_user,
a.query AS blocked_query,
ka.query AS blocking_query,
a.query_start AS blocked_query_start,
now() - a.query_start AS wait_duration,
kl.mode AS blocking_mode,
bl.locktype AS lock_type,
bl.relation::regclass AS relation,
bl.page AS page,
bl.tuple AS tuple
FROM pg_locks bl
JOIN pg_stat_activity a ON bl.pid = a.pid
JOIN pg_locks kl ON bl.transactionid = kl.transactionid AND bl.pid != kl.pid
JOIN pg_stat_activity ka ON kl.pid = ka.pid
WHERE NOT bl.granted;
-- 配置cron定期检查锁等待
-- 0 * * * * psql -c "SELECT * FROM v_lock_waits" > /tmp/lock_waits.log2. 锁等待告警
Prometheus + Alertmanager
yaml
# 锁等待告警规则
groups:
- name: postgresql-alerts
rules:
- alert: PostgreSQLHighLockWaits
expr: increase(postgresql_locks_count{granted="false"}[5m]) > 10
for: 1m
labels:
severity: warning
annotations:
summary: "PostgreSQL锁等待数量过高"
description: "过去5分钟内,锁等待数量超过10个"
- alert: PostgreSQLLongLockWait
expr: postgresql_lock_wait_time_seconds > 30
for: 1m
labels:
severity: critical
annotations:
summary: "PostgreSQL长时间锁等待"
description: "锁等待时间超过30秒"Zabbix
在Zabbix中配置锁等待监控:
- 创建监控项:
pg.lock.waits - 配置触发器:当锁等待数量超过阈值时触发告警
- 配置通知渠道:Email、SMS等
3. 常见锁问题与解决方案
| 问题 | 表现 | 解决方案 |
|---|---|---|
| 长事务锁 | 事务持有锁时间过长 | 缩短事务持有时间,分批处理数据 |
| 表级锁竞争 | ALTER TABLE等操作阻塞读写 | 使用CREATE INDEX CONCURRENTLY,选择低峰期执行DDL |
| 行级锁竞争 | 热点行被频繁更新 | 使用乐观锁,拆分热点数据,读写分离 |
| 死锁 | 事务互相等待对方释放锁 | 优化事务顺序,使用deadlock_timeout检测,避免循环等待 |
| 索引缺失导致全表锁 | UPDATE/DELETE锁定全表 | 创建合适的索引,优化WHERE条件 |
锁等待案例分析
案例1:热点行锁竞争
问题描述
电商系统中,商品库存更新导致锁竞争,大量事务等待锁。
分析
sql
-- 查看锁等待情况
SELECT
bl.pid AS blocked_pid,
a.usename AS blocked_user,
kl.pid AS blocking_pid,
ka.usename AS blocking_user,
a.query AS blocked_query,
ka.query AS blocking_query,
now() - a.query_start AS wait_duration
FROM pg_locks bl
JOIN pg_stat_activity a ON bl.pid = a.pid
JOIN pg_locks kl ON bl.transactionid = kl.transactionid AND bl.pid != kl.pid
JOIN pg_stat_activity ka ON kl.pid = ka.pid
WHERE NOT bl.granted;解决方案
使用乐观锁:通过版本号实现并发控制
sqlUPDATE products SET stock = stock - 1, version = version + 1 WHERE id = 1 AND version = $1;拆分热点数据:将热点商品拆分为多个记录,分散锁竞争
使用队列:在应用层实现库存更新队列
读写分离:将读操作分流到从库
案例2:长事务导致锁等待
问题描述
报表查询事务持有锁时间过长,导致OLTP事务等待。
分析
sql
-- 查看长时间运行的事务
SELECT
pid,
usename,
datname,
query,
xact_start,
now() - xact_start AS xact_duration
FROM pg_stat_activity
WHERE state = 'active' AND now() - xact_start > interval '10 minutes'
ORDER BY xact_duration DESC;解决方案
缩短事务持有时间:将报表查询拆分为多个小事务
使用READ COMMITTED隔离级别:减少锁持有时间
使用从库查询:将报表查询分流到从库
优化查询:创建合适的索引,减少查询时间
版本差异注意事项
PostgreSQL版本差异
| 版本 | 锁特性差异 |
|---|---|
| PostgreSQL 16 | 增强了锁监控功能,增加了更多锁等待统计信息 |
| PostgreSQL 15 | 改进了锁管理器性能,减少了锁操作的开销 |
| PostgreSQL 14 | 引入了更多锁等待事件类型,便于监控 |
| PostgreSQL 13 | 改进了死锁检测算法,提高了检测效率 |
| PostgreSQL 12 | 引入了并行VACUUM,减少了VACUUM的锁持有时间 |
| PostgreSQL 10 | 引入了乐观锁支持,通过SKIP LOCKED实现 |
锁等待事件差异
| 版本 | 锁等待事件 |
|---|---|
| PostgreSQL 14+ | lock, transactionidlock, relationlock, pagelock, tuplelock等 |
| PostgreSQL 10-13 | lock, transactionidlock等 |
总结
锁等待是影响PostgreSQL并发性能的常见问题,通过分析锁等待,可以识别锁竞争热点,优化事务设计和SQL语句,从而提高数据库的并发性能。在实际运维中,DBA应该:
- 定期监控锁等待情况,识别锁竞争热点
- 优化事务设计,缩短事务持有时间
- 创建合适的索引,减少锁定的行数
- 使用合适的锁粒度,提高并发度
- 配置合理的锁等待超时和死锁检测时间
- 实现锁等待告警,及时发现和解决锁问题
通过有效的锁等待分析和优化,可以显著提高PostgreSQL数据库的并发性能,确保数据库在高负载下稳定运行。
