Skip to content

锁等待分析

锁是数据库并发控制的核心机制,它确保了多个事务可以安全地访问共享资源。然而,不恰当的锁使用会导致锁等待和死锁,影响数据库性能。通过分析锁等待,可以识别锁竞争热点,优化事务设计和SQL语句,从而提高数据库的并发性能。本文将详细介绍锁的基本概念、分析方法和优化策略。

锁基本概念

1. 锁的作用

  • 并发控制:确保多个事务可以安全地访问共享资源
  • 数据一致性:防止脏读、不可重复读和幻读
  • 事务隔离:实现不同的事务隔离级别
  • 资源保护:保护数据库对象(表、行、索引等)不被并发修改破坏

2. 锁粒度

PostgreSQL支持多种粒度的锁,从行级锁到数据库级锁:

锁粒度描述并发度开销
行级锁锁定表中的单行数据最高最高
页级锁锁定数据页(默认8KB)
表级锁锁定整个表
数据库级锁锁定整个数据库最低最低

3. 锁类型

PostgreSQL提供了多种类型的锁,用于不同的操作场景:

表级锁

锁类型描述冲突的锁类型
ACCESS SHARESELECT查询ACCESS EXCLUSIVE
ROW SHARESELECT FOR UPDATE/SHAREEXCLUSIVE, ACCESS EXCLUSIVE
ROW EXCLUSIVEINSERT, UPDATE, DELETESHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE
SHARE UPDATE EXCLUSIVEVACUUM, ANALYZESHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE
SHARECREATE INDEXROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE
SHARE ROW EXCLUSIVECREATE INDEX CONCURRENTLYROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE
EXCLUSIVEALTER TABLE, DROP TABLEROW SHARE, ROW EXCLUSIVE, SHARE UPDATE EXCLUSIVE, SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, ACCESS EXCLUSIVE
ACCESS EXCLUSIVEDROP DATABASE, TRUNCATE所有锁类型

行级锁

锁类型描述
FOR UPDATE排它锁,防止其他事务修改或删除该行
FOR NO KEY UPDATE弱排它锁,允许其他事务添加非主键索引
FOR SHARE共享锁,允许其他事务读取但不能修改该行
FOR KEY SHARE弱共享锁,允许其他事务修改但不能删除该行

锁等待监控

1. 实时监控锁等待

sql
-- 查看当前锁等待情况
SELECT 
  bl.pid AS blocked_pid,
  a.usename AS blocked_user,
  kl.pid AS blocking_pid,
  ka.usename AS blocking_user,
  a.query AS blocked_query,
  ka.query AS blocking_query,
  a.query_start AS blocked_query_start,
  now() - a.query_start AS wait_duration,
  kl.mode AS blocking_mode,
  bl.locktype AS lock_type,
  bl.relation::regclass AS relation,
  bl.page AS page,
  bl.tuple AS tuple
FROM pg_locks bl
JOIN pg_stat_activity a ON bl.pid = a.pid
JOIN pg_locks kl ON bl.transactionid = kl.transactionid AND bl.pid != kl.pid
JOIN pg_stat_activity ka ON kl.pid = ka.pid
WHERE NOT bl.granted;

-- 查看所有锁信息
SELECT 
  pid,
  usename,
  locktype,
  database::regclass,
  relation::regclass,
  mode,
  granted,
  page,
  tuple,
  transactionid,
  virtualxid,
  virtualtransaction,
  pid::int8 AS process_id
FROM pg_locks
ORDER BY pid;

-- 查看当前活动事务和锁信息
SELECT 
  pid,
  usename,
  datname,
  state,
  query,
  query_start,
  xact_start,
  now() - xact_start AS xact_duration,
  wait_event_type,
  wait_event
FROM pg_stat_activity
WHERE state = 'active' 
ORDER BY xact_duration DESC;

2. 锁等待日志

ini
# 记录锁等待信息
log_lock_waits = on  # 记录等待时间超过deadlock_timeout的锁等待

# 设置锁等待超时时间
deadlock_timeout = 1s  # 默认1秒

# 日志行前缀,包含必要信息
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '

3. 锁等待视图

pg_stat_activity

包含当前活动连接和事务的信息,包括等待事件:

sql
SELECT 
  pid,
  usename,
  datname,
  state,
  wait_event_type,
  wait_event,
  query,
  query_start,
  xact_start,
  now() - xact_start AS xact_duration
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';

pg_locks

包含当前所有锁的信息:

sql
SELECT 
  locktype,
  database::regclass,
  relation::regclass,
  mode,
  granted,
  count(*) AS lock_count
FROM pg_locks
GROUP BY locktype, database, relation, mode, granted
ORDER BY lock_count DESC;

pg_stat_user_tables

包含表的锁等待统计信息:

sql
SELECT 
  schemaname,
  relname,
  n_live_tup,
  n_dead_tup,
  n_mod_since_analyze,
  last_vacuum,
  last_analyze,
  n_locks * 100.0 / NULLIF(n_live_tup, 0) AS lock_ratio
FROM pg_stat_user_tables
ORDER BY lock_ratio DESC
LIMIT 10;

死锁分析

1. 死锁的定义

死锁是指两个或多个事务互相等待对方释放锁,导致所有事务都无法继续执行的状态。

2. 死锁检测

PostgreSQL自动检测死锁,并回滚其中一个事务以打破死锁:

ini
# 死锁检测开关
# deadlock_timeout = 1s  # 默认1秒,超过该时间检测死锁

3. 死锁日志

当检测到死锁时,PostgreSQL会在日志中记录死锁信息:

2023-10-15 14:30:25 UTC [12345]: [1-1] user=postgres,db=mydb,app=psql,client=127.0.0.1 ERROR:  deadlock detected
2023-10-15 14:30:25 UTC [12345]: [2-1] user=postgres,db=mydb,app=psql,client=127.0.0.1 DETAIL:  Process 12345 waits for ShareLock on transaction 65536; blocked by process 12346.
        Process 12346 waits for ShareLock on transaction 65537; blocked by process 12345.
        Process 12345: UPDATE users SET name = 'test1' WHERE id = 1;
        Process 12346: UPDATE users SET name = 'test2' WHERE id = 2;
2023-10-15 14:30:25 UTC [12345]: [3-1] user=postgres,db=mydb,app=psql,client=127.0.0.1 HINT:  See server log for query details.
2023-10-15 14:30:25 UTC [12345]: [4-1] user=postgres,db=mydb,app=psql,client=127.0.0.1 STATEMENT:  UPDATE users SET name = 'test1' WHERE id = 1;

4. 死锁示例

示例1:简单死锁

事务1:

sql
BEGIN;
UPDATE users SET name = 'test1' WHERE id = 1;
-- 等待一段时间
UPDATE users SET name = 'test2' WHERE id = 2;
COMMIT;

事务2:

sql
BEGIN;
UPDATE users SET name = 'test2' WHERE id = 2;
-- 等待一段时间
UPDATE users SET name = 'test1' WHERE id = 1;
COMMIT;

示例2:死锁检测与回滚

当发生死锁时,PostgreSQL会回滚其中一个事务:

ERROR:  deadlock detected
DETAIL:  Process 12345 waits for ShareLock on transaction 65536; blocked by process 12346.
        Process 12346 waits for ShareLock on transaction 65537; blocked by process 12345.
HINT:  See server log for query details.

锁等待分析方法

1. 识别锁等待热点

sql
-- 查看锁等待频繁的表
SELECT 
  relation::regclass AS table_name,
  count(*) AS lock_wait_count
FROM pg_locks
WHERE NOT granted
GROUP BY relation
ORDER BY lock_wait_count DESC;

-- 查看锁等待时间最长的事务
SELECT 
  pid,
  usename,
  datname,
  query,
  query_start,
  now() - query_start AS wait_duration,
  wait_event_type,
  wait_event
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY wait_duration DESC;

-- 查看锁定最多行的事务
SELECT 
  pid,
  usename,
  count(*) AS locked_rows
FROM pg_locks
WHERE locktype = 'tuple' AND granted
GROUP BY pid, usename
ORDER BY locked_rows DESC;

2. 分析锁等待原因

原因1:长事务

长事务持有锁的时间过长,导致其他事务等待:

sql
-- 查看长时间运行的事务
SELECT 
  pid,
  usename,
  datname,
  query,
  query_start,
  xact_start,
  now() - xact_start AS xact_duration
FROM pg_stat_activity
WHERE state = 'active' AND now() - xact_start > interval '5 minutes'
ORDER BY xact_duration DESC;

原因2:锁粒度不当

使用了过粗粒度的锁,导致并发度下降:

sql
-- 查看表级锁
SELECT 
  pid,
  usename,
  relation::regclass AS table_name,
  mode,
  granted,
  query
FROM pg_locks
WHERE locktype = 'relation' AND mode IN ('ACCESS EXCLUSIVE', 'EXCLUSIVE', 'SHARE')
ORDER BY pid;

原因3:索引缺失

缺少合适的索引,导致全表扫描,锁定过多行:

sql
-- 查看全表扫描的查询
SELECT 
  pid,
  usename,
  query,
  query_start,
  now() - query_start AS duration
FROM pg_stat_activity
WHERE state = 'active'
AND query LIKE '%Seq Scan%' -- 可以通过EXPLAIN ANALYZE确认
ORDER BY duration DESC;

原因4:不正确的事务隔离级别

使用了过高的事务隔离级别,导致锁定更多资源:

sql
-- 查看当前事务隔离级别
SHOW default_transaction_isolation;

-- 查看会话的事务隔离级别
SELECT 
  pid,
  usename,
  current_setting('transaction_isolation') AS isolation_level
FROM pg_stat_activity
WHERE state = 'active';

锁等待优化策略

1. 事务优化

  • 缩短事务持有时间:尽快提交或回滚事务
  • 避免在事务中执行非数据库操作:如网络请求、文件IO等
  • 分批处理大量数据:将大事务拆分为多个小事务
  • 使用合适的事务隔离级别:优先使用READ COMMITTED隔离级别
  • 避免长时间锁持有:如SELECT FOR UPDATE应该尽快处理

2. SQL语句优化

  • 创建合适的索引:避免全表扫描,减少锁定的行数
  • 优化查询条件:使用选择性高的条件,减少锁定的行数
  • 避免锁定不必要的资源:如使用SELECT FOR SHARE替代SELECT FOR UPDATE
  • 优化UPDATE和DELETE语句:确保WHERE条件有索引
  • 使用LIMIT:限制锁定的行数

3. 锁粒度优化

  • 使用行级锁替代表级锁:提高并发度
  • 避免表级锁操作:如避免在高并发时执行ALTER TABLE、CREATE INDEX等
  • 使用CREATE INDEX CONCURRENTLY:创建索引时不阻塞读写操作
  • 使用VACUUM ANALYZE替代VACUUM FULL:VACUUM FULL会持有表级锁

4. 并发控制优化

  • 使用乐观锁:通过版本号或时间戳实现并发控制
  • 使用排队机制:在应用层实现对热点资源的排队访问
  • 使用分区表:将大表拆分为多个小表,减少锁竞争
  • 使用读写分离:将读操作分流到从库,减少主库的锁竞争

5. 配置优化

  • 调整deadlock_timeout:根据实际情况调整死锁检测时间
  • 调整lock_timeout:设置事务等待锁的超时时间
  • 调整max_locks_per_transaction:增加每个事务可以持有的锁数量
  • 调整autovacuum相关参数:确保Autovacuum及时运行,减少锁竞争

锁等待最佳实践

1. 生产环境监控

sql
-- 创建锁等待监控视图
CREATE OR REPLACE VIEW v_lock_waits AS
SELECT 
  bl.pid AS blocked_pid,
  a.usename AS blocked_user,
  kl.pid AS blocking_pid,
  ka.usename AS blocking_user,
  a.query AS blocked_query,
  ka.query AS blocking_query,
  a.query_start AS blocked_query_start,
  now() - a.query_start AS wait_duration,
  kl.mode AS blocking_mode,
  bl.locktype AS lock_type,
  bl.relation::regclass AS relation,
  bl.page AS page,
  bl.tuple AS tuple
FROM pg_locks bl
JOIN pg_stat_activity a ON bl.pid = a.pid
JOIN pg_locks kl ON bl.transactionid = kl.transactionid AND bl.pid != kl.pid
JOIN pg_stat_activity ka ON kl.pid = ka.pid
WHERE NOT bl.granted;

-- 配置cron定期检查锁等待
-- 0 * * * * psql -c "SELECT * FROM v_lock_waits" > /tmp/lock_waits.log

2. 锁等待告警

Prometheus + Alertmanager

yaml
# 锁等待告警规则
groups:
- name: postgresql-alerts
  rules:
  - alert: PostgreSQLHighLockWaits
    expr: increase(postgresql_locks_count{granted="false"}[5m]) > 10
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "PostgreSQL锁等待数量过高"
      description: "过去5分钟内,锁等待数量超过10个"

  - alert: PostgreSQLLongLockWait
    expr: postgresql_lock_wait_time_seconds > 30
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: "PostgreSQL长时间锁等待"
      description: "锁等待时间超过30秒"

Zabbix

在Zabbix中配置锁等待监控:

  1. 创建监控项:pg.lock.waits
  2. 配置触发器:当锁等待数量超过阈值时触发告警
  3. 配置通知渠道:Email、SMS等

3. 常见锁问题与解决方案

问题表现解决方案
长事务锁事务持有锁时间过长缩短事务持有时间,分批处理数据
表级锁竞争ALTER TABLE等操作阻塞读写使用CREATE INDEX CONCURRENTLY,选择低峰期执行DDL
行级锁竞争热点行被频繁更新使用乐观锁,拆分热点数据,读写分离
死锁事务互相等待对方释放锁优化事务顺序,使用deadlock_timeout检测,避免循环等待
索引缺失导致全表锁UPDATE/DELETE锁定全表创建合适的索引,优化WHERE条件

锁等待案例分析

案例1:热点行锁竞争

问题描述

电商系统中,商品库存更新导致锁竞争,大量事务等待锁。

分析

sql
-- 查看锁等待情况
SELECT 
  bl.pid AS blocked_pid,
  a.usename AS blocked_user,
  kl.pid AS blocking_pid,
  ka.usename AS blocking_user,
  a.query AS blocked_query,
  ka.query AS blocking_query,
  now() - a.query_start AS wait_duration
FROM pg_locks bl
JOIN pg_stat_activity a ON bl.pid = a.pid
JOIN pg_locks kl ON bl.transactionid = kl.transactionid AND bl.pid != kl.pid
JOIN pg_stat_activity ka ON kl.pid = ka.pid
WHERE NOT bl.granted;

解决方案

  1. 使用乐观锁:通过版本号实现并发控制

    sql
    UPDATE products SET stock = stock - 1, version = version + 1 WHERE id = 1 AND version = $1;
  2. 拆分热点数据:将热点商品拆分为多个记录,分散锁竞争

  3. 使用队列:在应用层实现库存更新队列

  4. 读写分离:将读操作分流到从库

案例2:长事务导致锁等待

问题描述

报表查询事务持有锁时间过长,导致OLTP事务等待。

分析

sql
-- 查看长时间运行的事务
SELECT 
  pid,
  usename,
  datname,
  query,
  xact_start,
  now() - xact_start AS xact_duration
FROM pg_stat_activity
WHERE state = 'active' AND now() - xact_start > interval '10 minutes'
ORDER BY xact_duration DESC;

解决方案

  1. 缩短事务持有时间:将报表查询拆分为多个小事务

  2. 使用READ COMMITTED隔离级别:减少锁持有时间

  3. 使用从库查询:将报表查询分流到从库

  4. 优化查询:创建合适的索引,减少查询时间

版本差异注意事项

PostgreSQL版本差异

版本锁特性差异
PostgreSQL 16增强了锁监控功能,增加了更多锁等待统计信息
PostgreSQL 15改进了锁管理器性能,减少了锁操作的开销
PostgreSQL 14引入了更多锁等待事件类型,便于监控
PostgreSQL 13改进了死锁检测算法,提高了检测效率
PostgreSQL 12引入了并行VACUUM,减少了VACUUM的锁持有时间
PostgreSQL 10引入了乐观锁支持,通过SKIP LOCKED实现

锁等待事件差异

版本锁等待事件
PostgreSQL 14+lock, transactionidlock, relationlock, pagelock, tuplelock等
PostgreSQL 10-13lock, transactionidlock等

总结

锁等待是影响PostgreSQL并发性能的常见问题,通过分析锁等待,可以识别锁竞争热点,优化事务设计和SQL语句,从而提高数据库的并发性能。在实际运维中,DBA应该:

  1. 定期监控锁等待情况,识别锁竞争热点
  2. 优化事务设计,缩短事务持有时间
  3. 创建合适的索引,减少锁定的行数
  4. 使用合适的锁粒度,提高并发度
  5. 配置合理的锁等待超时和死锁检测时间
  6. 实现锁等待告警,及时发现和解决锁问题

通过有效的锁等待分析和优化,可以显著提高PostgreSQL数据库的并发性能,确保数据库在高负载下稳定运行。