PostgreSQL 常见SQL优化案例

案例 1：索引优化 - 避免全表扫描

问题描述

查询用户表中特定年龄段的用户，执行时间过长。

原始 SQL

sql

SELECT * FROM users WHERE age BETWEEN 20 AND 30;

执行计划分析

Seq Scan on users  (cost=0.00..1000.00 rows=10000 width=100)
  Filter: ((age >= 20) AND (age <= 30))

问题：使用了全表扫描（Seq Scan），当表数据量大时性能很差。

优化方案

为 age 字段创建索引。

优化步骤

sql

-- 创建索引
CREATE INDEX idx_users_age ON users(age);

优化后执行计划

Bitmap Heap Scan on users  (cost=10.00..500.00 rows=10000 width=100)
  Recheck Cond: ((age >= 20) AND (age <= 30))
  ->  Bitmap Index Scan on idx_users_age  (cost=0.00..5.00 rows=10000 width=0)
        Index Cond: ((age >= 20) AND (age <= 30))

优化效果

查询执行时间从 1000ms 降低到 50ms
扫描行数从全表 100 万行减少到 1 万行

案例 2：查询结构优化 - 避免 SELECT *

问题描述

查询订单表时使用 SELECT *，返回所有字段，导致网络传输开销大。

原始 SQL

sql

SELECT * FROM orders WHERE order_date >= '2023-01-01';

执行计划分析

Seq Scan on orders  (cost=0.00..2000.00 rows=50000 width=200)
  Filter: (order_date >= '2023-01-01'::date)

问题：

返回所有字段，包括不需要的大字段（如备注、JSON 数据）
无法使用覆盖索引
网络传输数据量大

优化方案

只查询需要的字段，避免 SELECT *。

优化后 SQL

sql

SELECT order_id, customer_id, order_date, total_amount FROM orders WHERE order_date >= '2023-01-01';

进一步优化：创建覆盖索引

sql

CREATE INDEX idx_orders_date_id_customer_amount ON orders(order_date, order_id, customer_id, total_amount);

优化后执行计划

Index Only Scan using idx_orders_date_id_customer_amount on orders  (cost=0.00..1000.00 rows=50000 width=36)
  Index Cond: (order_date >= '2023-01-01'::date)

优化效果

查询执行时间从 1500ms 降低到 200ms
网络传输数据量减少 80%
使用了 Index Only Scan，避免了回表操作

案例 3：子查询优化 - 改用 JOIN

问题描述

使用子查询查询订单数量大于 10 的客户信息，性能较差。

原始 SQL

sql

SELECT * FROM customers 
WHERE customer_id IN (SELECT customer_id FROM orders GROUP BY customer_id HAVING COUNT(*) > 10);

执行计划分析

Hash Join  (cost=2000.00..3000.00 rows=1000 width=150)
  Hash Cond: (customers.customer_id = orders.customer_id)
  ->  Seq Scan on customers  (cost=0.00..500.00 rows=10000 width=150)
  ->  Hash  (cost=1950.00..1950.00 rows=4000 width=4)
        ->  HashAggregate  (cost=1550.00..1950.00 rows=4000 width=4)
              Group Key: orders.customer_id
              Filter: (count(*) > 10)
              ->  Seq Scan on orders  (cost=0.00..1000.00 rows=100000 width=4)

问题：子查询效率低，尤其是当订单表数据量大时。

优化方案

改用 JOIN 连接查询，提高性能。

优化后 SQL

sql

SELECT c.* 
FROM customers c
JOIN (
    SELECT customer_id 
    FROM orders 
    GROUP BY customer_id 
    HAVING COUNT(*) > 10
) o ON c.customer_id = o.customer_id;

进一步优化：使用 INNER JOIN 替代子查询

sql

SELECT c.*, COUNT(o.order_id) as order_count
FROM customers c
INNER JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id
HAVING COUNT(o.order_id) > 10;

优化后执行计划

HashAggregate  (cost=1800.00..2300.00 rows=1000 width=154)
  Group Key: c.customer_id
  Filter: (count(o.order_id) > 10)
  ->  Hash Join  (cost=500.00..1300.00 rows=100000 width=154)
        Hash Cond: (o.customer_id = c.customer_id)
        ->  Seq Scan on orders o  (cost=0.00..800.00 rows=100000 width=4)
        ->  Hash  (cost=300.00..300.00 rows=10000 width=150)
              ->  Seq Scan on customers c  (cost=0.00..300.00 rows=10000 width=150)

优化效果

查询执行时间从 2000ms 降低到 800ms
减少了子查询的开销

案例 4：连接优化 - 合理选择连接顺序

问题描述

多表连接查询时，连接顺序不合理，导致性能下降。

原始 SQL

sql

SELECT *
FROM large_table l
JOIN medium_table m ON l.id = m.large_id
JOIN small_table s ON m.id = s.medium_id
WHERE l.status = 'active';

执行计划分析

Hash Join  (cost=3000.00..5000.00 rows=50000 width=300)
  Hash Cond: (m.id = s.medium_id)
  ->  Hash Join  (cost=2000.00..4000.00 rows=100000 width=200)
        Hash Cond: (l.id = m.large_id)
        ->  Seq Scan on large_table l  (cost=0.00..1000.00 rows=200000 width=100)
              Filter: (status = 'active'::text)
        ->  Hash  (cost=1000.00..1000.00 rows=50000 width=100)
              ->  Seq Scan on medium_table m  (cost=0.00..1000.00 rows=50000 width=100)
  ->  Hash  (cost=500.00..500.00 rows=10000 width=100)
        ->  Seq Scan on small_table s  (cost=0.00..500.00 rows=10000 width=100)

问题：连接顺序不合理，先连接大表再连接小表，导致中间结果集过大。

优化方案

调整连接顺序，先过滤后连接，先连接小表再连接大表。

优化后 SQL

sql

SELECT *
FROM large_table l
WHERE l.status = 'active'
JOIN medium_table m ON l.id = m.large_id
JOIN small_table s ON m.id = s.medium_id;

进一步优化：为连接字段创建索引

sql

CREATE INDEX idx_large_table_status ON large_table(status);
CREATE INDEX idx_medium_table_large_id ON medium_table(large_id);
CREATE INDEX idx_small_table_medium_id ON small_table(medium_id);

优化后执行计划

Nested Loop  (cost=0.00..2500.00 rows=50000 width=300)
  ->  Nested Loop  (cost=0.00..1500.00 rows=100000 width=200)
        ->  Index Scan using idx_large_table_status on large_table l  (cost=0.00..500.00 rows=200000 width=100)
              Index Cond: (status = 'active'::text)
        ->  Index Scan using idx_medium_table_large_id on medium_table m  (cost=0.00..5.00 rows=1 width=100)
              Index Cond: (large_id = l.id)
  ->  Index Scan using idx_small_table_medium_id on small_table s  (cost=0.00..5.00 rows=1 width=100)
        Index Cond: (medium_id = m.id)

优化效果

查询执行时间从 4500ms 降低到 1500ms
连接顺序更合理，中间结果集更小
使用了索引扫描，避免了全表扫描

案例 5：聚合函数优化 - 使用过滤索引

问题描述

使用 COUNT(*) 查询满足条件的记录数，性能较差。

原始 SQL

sql

SELECT COUNT(*) FROM products WHERE status = 'active' AND category_id = 10;

执行计划分析

Aggregate  (cost=1000.00..1000.01 rows=1 width=8)
  ->  Seq Scan on products  (cost=0.00..900.00 rows=40000 width=0)
        Filter: ((status = 'active'::text) AND (category_id = 10))

问题：使用全表扫描，当表数据量大时性能很差。

优化方案

创建过滤索引，只包含满足条件的记录。

优化步骤

sql

-- 创建过滤索引
CREATE INDEX idx_products_active_category ON products(category_id) WHERE status = 'active';

优化后执行计划

Aggregate  (cost=500.00..500.01 rows=1 width=8)
  ->  Index Only Scan using idx_products_active_category on products  (cost=0.00..400.00 rows=40000 width=0)
        Index Cond: (category_id = 10)

优化效果

查询执行时间从 800ms 降低到 100ms
使用了 Index Only Scan，避免了回表操作
索引体积更小，查询速度更快

案例 6：排序优化 - 避免不必要的排序

问题描述

查询结果中包含不必要的排序，导致性能下降。

原始 SQL

sql

SELECT * FROM orders WHERE customer_id = 123 ORDER BY order_date DESC, order_id DESC;

执行计划分析

Sort  (cost=1000.00..1050.00 rows=2000 width=200)
  Sort Key: order_date DESC, order_id DESC
  ->  Seq Scan on orders  (cost=0.00..500.00 rows=2000 width=200)
        Filter: (customer_id = 123)

问题：使用了额外的 Sort 操作，增加了查询开销。

优化方案

创建包含排序字段的复合索引，避免额外的排序操作。

优化步骤

sql

-- 创建复合索引，包含过滤字段和排序字段
CREATE INDEX idx_orders_customer_date_id ON orders(customer_id, order_date DESC, order_id DESC);

优化后执行计划

Index Scan using idx_orders_customer_date_id on orders  (cost=0.00..300.00 rows=2000 width=200)
  Index Cond: (customer_id = 123)

优化效果

查询执行时间从 600ms 降低到 150ms
避免了额外的 Sort 操作
使用了索引扫描，性能大幅提升

案例 7：分区表优化 - 利用分区裁剪

问题描述

在大表上查询特定时间范围的数据，性能较差。

原始 SQL

sql

SELECT * FROM sales WHERE sale_date BETWEEN '2023-01-01' AND '2023-03-31';

执行计划分析

Seq Scan on sales  (cost=0.00..5000.00 rows=100000 width=150)
  Filter: ((sale_date >= '2023-01-01'::date) AND (sale_date <= '2023-03-31'::date))

问题：全表扫描，扫描了所有分区的数据。

优化方案

使用分区表，并利用分区裁剪功能。

优化步骤

sql

-- 创建分区表
CREATE TABLE sales (
    sale_id serial, 
    sale_date date, 
    amount numeric
)
PARTITION BY RANGE (sale_date);

-- 创建分区
CREATE TABLE sales_2023_q1 PARTITION OF sales FOR VALUES FROM ('2023-01-01') TO ('2023-04-01');
CREATE TABLE sales_2023_q2 PARTITION OF sales FOR VALUES FROM ('2023-04-01') TO ('2023-07-01');
CREATE TABLE sales_2023_q3 PARTITION OF sales FOR VALUES FROM ('2023-07-01') TO ('2023-10-01');
CREATE TABLE sales_2023_q4 PARTITION OF sales FOR VALUES FROM ('2023-10-01') TO ('2024-01-01');

-- 为每个分区创建索引
CREATE INDEX idx_sales_2023_q1_date ON sales_2023_q1(sale_date);
CREATE INDEX idx_sales_2023_q2_date ON sales_2023_q2(sale_date);
CREATE INDEX idx_sales_2023_q3_date ON sales_2023_q3(sale_date);
CREATE INDEX idx_sales_2023_q4_date ON sales_2023_q4(sale_date);

优化后执行计划

Seq Scan on sales_2023_q1  (cost=0.00..1000.00 rows=100000 width=150)
  Filter: ((sale_date >= '2023-01-01'::date) AND (sale_date <= '2023-03-31'::date))

优化效果

查询执行时间从 4000ms 降低到 800ms
只扫描了一个分区，而不是全表
利用了分区裁剪功能，提高了查询效率

案例 8：避免使用函数在索引字段上

问题描述

在索引字段上使用函数，导致索引失效。

原始 SQL

sql

SELECT * FROM users WHERE DATE(created_at) = '2023-01-01';

执行计划分析

Seq Scan on users  (cost=0.00..1000.00 rows=1000 width=150)
  Filter: (date(created_at) = '2023-01-01'::date)

问题：在索引字段 created_at 上使用了 DATE() 函数，导致索引失效，使用了全表扫描。

优化方案

避免在索引字段上使用函数，改用范围查询。

优化后 SQL

sql

SELECT * FROM users WHERE created_at >= '2023-01-01' AND created_at < '2023-01-02';

优化前索引

sql

CREATE INDEX idx_users_created_at ON users(created_at);

优化后执行计划

Index Scan using idx_users_created_at on users  (cost=0.00..500.00 rows=1000 width=150)
  Index Cond: ((created_at >= '2023-01-01'::timestamp without time zone) AND (created_at < '2023-01-02'::timestamp without time zone))

优化效果

查询执行时间从 900ms 降低到 100ms
索引正常使用，避免了全表扫描

案例 9：合理使用临时表

问题描述

复杂查询中多次使用相同的子查询结果，导致重复计算。

原始 SQL

sql

SELECT 
    (SELECT COUNT(*) FROM orders WHERE customer_id = c.customer_id AND status = 'completed') as completed_orders,
    (SELECT SUM(amount) FROM orders WHERE customer_id = c.customer_id AND status = 'completed') as total_amount,
    (SELECT MAX(order_date) FROM orders WHERE customer_id = c.customer_id AND status = 'completed') as last_order_date
FROM customers c WHERE c.status = 'active';

执行计划分析

Seq Scan on customers c  (cost=0.00..15000.00 rows=10000 width=162)
  Filter: (status = 'active'::text)
  SubPlan 1
    ->  Aggregate  (cost=1.00..1.01 rows=1 width=8)
          ->  Seq Scan on orders  (cost=0.00..1.00 rows=1 width=0)
                Filter: ((customer_id = c.customer_id) AND (status = 'completed'::text))
  SubPlan 2
    ->  Aggregate  (cost=1.00..1.01 rows=1 width=32)
          ->  Seq Scan on orders orders_1  (cost=0.00..1.00 rows=1 width=16)
                Filter: ((customer_id = c.customer_id) AND (status = 'completed'::text))
  SubPlan 3
    ->  Result  (cost=1.00..1.01 rows=1 width=4)
          InitPlan 3 (returns $3)
            ->  Limit  (cost=0.00..1.00 rows=1 width=8)
                  ->  Sort  (cost=0.00..1.00 rows=1 width=8)
                        Sort Key: orders_2.order_date DESC
                        ->  Seq Scan on orders orders_2  (cost=0.00..1.00 rows=1 width=8)
                              Filter: ((customer_id = c.customer_id) AND (status = 'completed'::text))

问题：多次执行相同的子查询，导致重复计算，性能较差。

优化方案

使用临时表存储子查询结果，避免重复计算。

优化后 SQL

sql

-- 创建临时表
CREATE TEMP TABLE temp_order_stats AS
SELECT 
    customer_id,
    COUNT(*) as completed_orders,
    SUM(amount) as total_amount,
    MAX(order_date) as last_order_date
FROM orders 
WHERE status = 'completed'
GROUP BY customer_id;

-- 查询结果
SELECT 
    t.completed_orders,
    t.total_amount,
    t.last_order_date
FROM customers c
LEFT JOIN temp_order_stats t ON c.customer_id = t.customer_id
WHERE c.status = 'active';

-- 删除临时表
DROP TABLE temp_order_stats;

优化后执行计划

Hash Left Join  (cost=2000.00..3000.00 rows=10000 width=162)
  Hash Cond: (c.customer_id = t.customer_id)
  ->  Seq Scan on customers c  (cost=0.00..500.00 rows=10000 width=4)
        Filter: (status = 'active'::text)
  ->  Hash  (cost=1500.00..1500.00 rows=40000 width=158)
        ->  Seq Scan on temp_order_stats t  (cost=0.00..1500.00 rows=40000 width=158)

优化效果

查询执行时间从 12000ms 降低到 2500ms
避免了重复计算，只执行了一次子查询
使用了临时表，提高了查询效率

案例 10：优化 UPDATE/DELETE 语句

问题描述

批量更新或删除大量数据时，性能较差，甚至导致锁等待。

原始 SQL

sql

-- 删除一年前的订单
DELETE FROM orders WHERE order_date < NOW() - INTERVAL '1 year';

执行计划分析

Delete on orders  (cost=0.00..2000.00 rows=100000 width=6)
  ->  Seq Scan on orders  (cost=0.00..2000.00 rows=100000 width=6)
        Filter: (order_date < (now() - '1 year'::interval))

问题：一次性删除大量数据，导致长时间锁表，影响其他操作。

优化方案

分批删除，减少锁持有时间。

优化后 SQL

sql

-- 分批删除，每次删除 1000 条
DO $$
DECLARE
    deleted_rows INT := 1;
BEGIN
    WHILE deleted_rows > 0 LOOP
        DELETE FROM orders 
        WHERE order_date < NOW() - INTERVAL '1 year'
        LIMIT 1000;
        
        GET DIAGNOSTICS deleted_rows = ROW_COUNT;
        
        -- 可选：每次删除后暂停一段时间，减少系统负载
        -- PERFORM pg_sleep(0.1);
    END LOOP;
END $$;

优化效果

避免了长时间锁表
减少了系统负载
允许其他操作并行执行

常见问题（FAQ）

Q1: 如何判断 SQL 是否需要优化？

A1: 可以通过以下方式判断：

执行时间过长的查询
全表扫描的查询（特别是大表）
包含复杂子查询或多表连接的查询
频繁执行的查询
占用大量资源的查询

Q2: 如何查看 SQL 的执行计划？

A2: 可以使用 EXPLAIN 或 EXPLAIN ANALYZE 命令查看执行计划：

sql

-- 查看预估执行计划
EXPLAIN SELECT * FROM users WHERE age BETWEEN 20 AND 30;

-- 查看实际执行计划（会执行查询）
EXPLAIN ANALYZE SELECT * FROM users WHERE age BETWEEN 20 AND 30;

Q3: 为什么在索引字段上使用函数会导致索引失效？

A3: 因为数据库无法直接使用索引来查找经过函数处理后的值，需要先计算所有行的函数结果，然后再过滤，这会导致全表扫描。

Q4: 如何选择合适的索引类型？

A4: 索引类型选择建议：

B-tree：默认索引类型，适合大多数场景，包括等值查询、范围查询、排序
Hash：适合等值查询，不支持范围查询
GiST：适合空间数据、全文搜索、数组等复杂类型
GIN：适合数组、JSONB 等复合类型
SP-GiST：适合非平衡数据结构，如四叉树
BRIN：适合非常大的表，尤其是按顺序存储的数据

Q5: 如何优化慢查询？

A5: 慢查询优化步骤：

查看执行计划，识别瓶颈
优化查询结构，避免不必要的复杂操作
为查询条件和连接字段创建合适的索引
避免在索引字段上使用函数
合理使用临时表和CTE
考虑使用分区表
优化数据库参数配置

Q6: 如何监控慢查询？

A6: 可以通过以下方式监控慢查询：

启用慢查询日志：

sql

ALTER SYSTEM SET log_min_duration_statement = 5000; -- 记录执行时间超过 5 秒的查询

使用 pg_stat_statements 扩展：

sql

CREATE EXTENSION pg_stat_statements;
SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

使用第三方工具，如 pgBadger、Prometheus + Grafana

Q7: 如何优化多表连接查询？

A7: 多表连接查询优化建议：

合理选择连接顺序，先连接小表
为连接字段创建索引
避免不必要的连接
考虑使用物化视图
优化连接类型（INNER JOIN、LEFT JOIN 等）

Q8: 如何优化聚合查询？

A8: 聚合查询优化建议：

为聚合字段创建索引
使用过滤索引，只包含需要的数据
避免在聚合函数中使用复杂表达式
考虑使用物化视图
合理使用 GROUP BY 和 HAVING 子句

Q9: 如何优化 ORDER BY 查询？

A9: ORDER BY 查询优化建议：

为排序字段创建索引
避免不必要的排序
合理使用索引的排序顺序
考虑使用 LIMIT 限制结果集大小
避免在排序字段上使用函数

Q10: 如何优化 UPDATE/DELETE 语句？

A10: UPDATE/DELETE 语句优化建议：

分批处理大量数据
为 WHERE 条件字段创建索引
避免全表更新或删除
考虑使用 DELETE ... USING 或 UPDATE ... FROM 语法
合理设置锁级别

PostgreSQL 常见SQL优化案例 ​

案例 1：索引优化 - 避免全表扫描 ​

问题描述 ​

原始 SQL ​

执行计划分析 ​

优化方案 ​

优化步骤 ​

优化后执行计划 ​

优化效果 ​

案例 2：查询结构优化 - 避免 SELECT * ​

问题描述 ​

原始 SQL ​

执行计划分析 ​

优化方案 ​

优化后 SQL ​

进一步优化：创建覆盖索引 ​

优化后执行计划 ​

优化效果 ​

案例 3：子查询优化 - 改用 JOIN ​

问题描述 ​

原始 SQL ​

执行计划分析 ​

优化方案 ​

优化后 SQL ​

进一步优化：使用 INNER JOIN 替代子查询 ​

优化后执行计划 ​

优化效果 ​

案例 4：连接优化 - 合理选择连接顺序 ​

问题描述 ​

原始 SQL ​

执行计划分析 ​

优化方案 ​

优化后 SQL ​

进一步优化：为连接字段创建索引 ​

优化后执行计划 ​

优化效果 ​

案例 5：聚合函数优化 - 使用过滤索引 ​

问题描述 ​

原始 SQL ​

执行计划分析 ​

优化方案 ​

优化步骤 ​

优化后执行计划 ​

优化效果 ​

案例 6：排序优化 - 避免不必要的排序 ​

问题描述 ​

原始 SQL ​

执行计划分析 ​

优化方案 ​

优化步骤 ​

优化后执行计划 ​

优化效果 ​

PostgreSQL 常见SQL优化案例

案例 1：索引优化 - 避免全表扫描

问题描述

原始 SQL

执行计划分析

优化方案

优化步骤

优化后执行计划

优化效果

案例 2：查询结构优化 - 避免 SELECT *

问题描述

原始 SQL

执行计划分析

优化方案

优化后 SQL

进一步优化：创建覆盖索引

优化后执行计划

优化效果

案例 3：子查询优化 - 改用 JOIN

问题描述

原始 SQL

执行计划分析

优化方案

优化后 SQL

进一步优化：使用 INNER JOIN 替代子查询

优化后执行计划

优化效果

案例 4：连接优化 - 合理选择连接顺序

问题描述

原始 SQL

执行计划分析

优化方案

优化后 SQL

进一步优化：为连接字段创建索引

优化后执行计划

优化效果

案例 5：聚合函数优化 - 使用过滤索引

问题描述

原始 SQL

执行计划分析

优化方案

优化步骤

优化后执行计划

优化效果

案例 6：排序优化 - 避免不必要的排序

问题描述

原始 SQL

执行计划分析

优化方案

优化步骤

优化后执行计划

优化效果