Skip to content

PostgreSQL高级SQL特性

PostgreSQL提供了许多高级SQL特性,这些特性可以帮助你编写更复杂、更高效的查询。本文将介绍PostgreSQL的一些高级SQL特性。

窗口函数

窗口函数是一种特殊的函数,它可以在一组相关行上执行计算,而不影响最终结果的行数。窗口函数在数据分析和报表生成中非常有用。

基本语法

sql
FUNCTION_NAME([argument_list]) OVER (
    [PARTITION BY partition_expression, ...]
    [ORDER BY sort_expression [ASC | DESC], ...]
    [ROWS frame_clause]
)

常用窗口函数

聚合函数作为窗口函数

sql
-- 计算每个产品的销售额以及该产品所属类别的平均销售额
SELECT 
    product_id, 
    category_id, 
    sales_amount, 
    AVG(sales_amount) OVER (PARTITION BY category_id) AS avg_category_sales
FROM sales;

-- 计算累计销售额
SELECT 
    order_date, 
    sales_amount, 
    SUM(sales_amount) OVER (ORDER BY order_date) AS cumulative_sales
FROM daily_sales;

排名函数

sql
-- ROW_NUMBER:为每行分配唯一的序号
SELECT 
    product_id, 
    category_id, 
    price, 
    ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS row_num
FROM products;

-- RANK:为每行分配排名,相同值排名相同,跳过下一个排名
SELECT 
    product_id, 
    category_id, 
    price, 
    RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS rank
FROM products;

-- DENSE_RANK:为每行分配排名,相同值排名相同,不跳过下一个排名
SELECT 
    product_id, 
    category_id, 
    price, 
    DENSE_RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS dense_rank
FROM products;

-- NTILE:将结果集分成指定数量的组
SELECT 
    product_id, 
    category_id, 
    price, 
    NTILE(4) OVER (PARTITION BY category_id ORDER BY price DESC) AS quartile
FROM products;

偏移函数

sql
-- LAG:获取前n行的值
SELECT 
    order_date, 
    sales_amount, 
    LAG(sales_amount) OVER (ORDER BY order_date) AS previous_day_sales,
    LAG(sales_amount, 7) OVER (ORDER BY order_date) AS last_week_sales
FROM daily_sales;

-- LEAD:获取后n行的值
SELECT 
    order_date, 
    sales_amount, 
    LEAD(sales_amount) OVER (ORDER BY order_date) AS next_day_sales
FROM daily_sales;

-- FIRST_VALUE:获取窗口中的第一个值
SELECT 
    product_id, 
    category_id, 
    price, 
    FIRST_VALUE(price) OVER (PARTITION BY category_id ORDER BY price DESC) AS highest_price
FROM products;

-- LAST_VALUE:获取窗口中的最后一个值
SELECT 
    product_id, 
    category_id, 
    price, 
    LAST_VALUE(price) OVER (PARTITION BY category_id ORDER BY price DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lowest_price
FROM products;

窗口定义

可以使用WINDOW子句预先定义窗口,以便在查询中多次使用:

sql
SELECT 
    product_id, 
    category_id, 
    price, 
    ROW_NUMBER() OVER w AS row_num, 
    RANK() OVER w AS rank, 
    DENSE_RANK() OVER w AS dense_rank
FROM products
WINDOW w AS (PARTITION BY category_id ORDER BY price DESC);

通用表表达式(CTE)

通用表表达式(Common Table Expressions,CTE)是一种临时结果集,只在查询执行期间存在。CTE可以使复杂查询更加清晰和易于维护。

基本语法

sql
WITH cte_name (column_list) AS (
    SELECT_statement
)
SELECT * FROM cte_name;

简单CTE

sql
-- 使用CTE查询销售额前10的产品
WITH top_products AS (
    SELECT 
        product_id, 
        SUM(sales_amount) AS total_sales
    FROM sales
    GROUP BY product_id
    ORDER BY total_sales DESC
    LIMIT 10
)
SELECT p.name, tp.total_sales
FROM top_products tp
JOIN products p ON tp.product_id = p.id;

递归CTE

递归CTE用于处理具有层次结构或树状结构的数据,如组织结构、分类目录等。

sql
-- 查询组织结构
WITH RECURSIVE org_structure AS (
    -- 基础查询:获取顶级员工
    SELECT 
        id, 
        name, 
        manager_id, 
        1 AS level
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- 递归查询:获取下级员工
    SELECT 
        e.id, 
        e.name, 
        e.manager_id, 
        os.level + 1 AS level
    FROM employees e
    JOIN org_structure os ON e.manager_id = os.id
)
SELECT * FROM org_structure ORDER BY level, name;

-- 查询分类树
WITH RECURSIVE category_tree AS (
    SELECT 
        id, 
        name, 
        parent_id, 
        name AS path
    FROM categories
    WHERE parent_id IS NULL
    
    UNION ALL
    
    SELECT 
        c.id, 
        c.name, 
        c.parent_id, 
        ct.path || ' > ' || c.name AS path
    FROM categories c
    JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree ORDER BY path;

多个CTE

可以在一个查询中定义多个CTE:

sql
WITH 
    monthly_sales AS (
        SELECT 
            DATE_TRUNC('month', order_date) AS month, 
            SUM(total_amount) AS sales
        FROM orders
        GROUP BY DATE_TRUNC('month', order_date)
    ),
    monthly_avg AS (
        SELECT AVG(sales) AS avg_sales FROM monthly_sales
    )
SELECT 
    month, 
    sales, 
    (sales - avg_sales) AS difference
FROM monthly_sales, monthly_avg
ORDER BY month;

递归查询

除了递归CTE外,PostgreSQL还支持使用RECURSIVE关键字进行递归查询。

斐波那契数列

sql
WITH RECURSIVE fibonacci(n, a, b) AS (
    VALUES (1, 0, 1)
    UNION ALL
    SELECT n+1, b, a+b FROM fibonacci WHERE n < 20
)
SELECT n, b AS fib FROM fibonacci;

日期序列生成

sql
WITH RECURSIVE date_series AS (
    SELECT '2023-01-01'::DATE AS date
    UNION ALL
    SELECT date + INTERVAL '1 day' AS date FROM date_series WHERE date < '2023-01-10'
)
SELECT date FROM date_series;

JSON查询

PostgreSQL提供了强大的JSON支持,包括JSON和JSONB两种数据类型。JSONB是二进制格式,支持索引,查询效率更高。

JSONB基本操作

sql
-- 查询JSONB字段
SELECT data->'name' AS name, data->>'age' AS age FROM users;

-- JSONB条件查询
SELECT * FROM users WHERE data->>'age' > '30';
SELECT * FROM users WHERE data @> '{"city": "Beijing"}'::JSONB;
SELECT * FROM users WHERE data -> 'tags' ? 'developer';
SELECT * FROM users WHERE data -> 'tags' ?| array['developer', 'designer'];
SELECT * FROM users WHERE data -> 'tags' ?& array['developer', 'postgres'];

-- JSONB路径查询(PostgreSQL 12+)
SELECT data#>'{address, city}' AS city FROM users;
SELECT data#>>'{address, zipcode}' AS zipcode FROM users;

-- 使用JSON路径表达式
SELECT jsonb_path_query(data, '$.tags[*]') AS tag FROM users;
SELECT jsonb_path_query(data, '$.address.city') AS city FROM users;

JSONB修改

sql
-- 添加键值对
UPDATE users SET data = data || '{"new_field": "value"}'::JSONB WHERE id = 1;

-- 修改键值对
UPDATE users SET data = jsonb_set(data, '{age}', '31'::JSONB) WHERE id = 1;

-- 删除键
UPDATE users SET data = data - 'new_field' WHERE id = 1;

-- 删除数组元素
UPDATE users SET data = data #- '{tags, 0}' WHERE id = 1;

JSONB索引

sql
-- 创建GIN索引
CREATE INDEX users_data_idx ON users USING GIN (data);

-- 创建特定键的索引
CREATE INDEX users_city_idx ON users ((data->>'city'));

-- 创建数组元素的索引
CREATE INDEX users_tags_idx ON users USING GIN ((data->'tags'));

-- 创建JSON路径索引(PostgreSQL 14+)
CREATE INDEX users_age_idx ON users USING GIN (jsonb_path_ops(data));

数组操作

PostgreSQL支持数组类型,可以存储多个值在一个列中。

数组基本操作

sql
-- 数组声明
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    tags TEXT[]
);

-- 插入数组数据
INSERT INTO products (name, tags) VALUES ('Laptop', '{electronics, computer, portable}');
INSERT INTO products (name, tags) VALUES ('Smartphone', ARRAY['electronics', 'mobile', 'communication']);

-- 访问数组元素
SELECT name, tags[1] AS first_tag, tags[2] AS second_tag FROM products;

-- 数组长度
SELECT name, array_length(tags, 1) AS tag_count FROM products;

-- 数组包含
SELECT * FROM products WHERE 'electronics' = ANY(tags);
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'mobile'];
SELECT * FROM products WHERE tags <@ ARRAY['electronics', 'mobile', 'communication', 'gadget'];

-- 数组连接
SELECT name, tags || ARRAY['new_tag'] AS updated_tags FROM products;

-- 数组追加
SELECT name, array_append(tags, 'new_tag') AS updated_tags FROM products;

-- 数组删除
SELECT name, array_remove(tags, 'mobile') AS updated_tags FROM products;

-- 数组排序
SELECT name, array_sort(tags) AS sorted_tags FROM products;

数组函数

sql
-- 数组聚合
SELECT category_id, array_agg(name) AS product_names FROM products GROUP BY category_id;

-- 数组去重
SELECT array_remove(array_agg(tag), NULL) AS unique_tags FROM (
    SELECT unnest(tags) AS tag FROM products
) AS t;

-- 数组转换为字符串
SELECT name, array_to_string(tags, ', ') AS tag_string FROM products;

-- 字符串转换为数组
SELECT string_to_array('electronics,computer,portable', ',') AS tags;

范围类型

PostgreSQL支持范围类型,可以存储连续的值范围,如日期范围、数值范围等。

范围类型基本操作

sql
-- 范围类型声明
CREATE TABLE events (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    event_period DATERANGE
);

-- 插入范围数据
INSERT INTO events (name, event_period) VALUES ('Conference', '[2023-01-01, 2023-01-03]');
INSERT INTO events (name, event_period) VALUES ('Workshop', '(2023-01-05, 2023-01-07)');

-- 访问范围边界
SELECT name, lower(event_period) AS start_date, upper(event_period) AS end_date FROM events;

-- 范围包含
SELECT * FROM events WHERE event_period @> '2023-01-02'::DATE;
SELECT * FROM events WHERE event_period @> '[2023-01-02, 2023-01-02]'::DATERANGE;

-- 范围重叠
SELECT * FROM events WHERE event_period && '[2023-01-04, 2023-01-06]'::DATERANGE;

-- 范围包含于
SELECT * FROM events WHERE event_period <@ '[2023-01-01, 2023-01-10]'::DATERANGE;

-- 范围相邻
SELECT * FROM events WHERE event_period -|- '[2023-01-03, 2023-01-05]'::DATERANGE;

范围函数

sql
-- 创建范围
SELECT daterange('2023-01-01', '2023-01-03');
SELECT daterange('2023-01-01', '2023-01-03', '[]');

-- 范围长度
SELECT name, upper(event_period) - lower(event_period) AS duration_days FROM events;

-- 范围合并
SELECT daterange_merge(array_agg(event_period)) AS merged_range FROM events;

-- 范围差异
SELECT daterange('2023-01-01', '2023-01-10') - daterange('2023-01-03', '2023-01-07');

表达式索引

表达式索引是基于表达式结果的索引,可以加速包含表达式的查询。

sql
-- 创建表达式索引
CREATE INDEX users_lower_email_idx ON users (LOWER(email));

-- 使用表达式索引的查询
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';

-- 创建JSONB表达式索引
CREATE INDEX products_price_idx ON products ((data->>'price')::numeric);

-- 使用JSONB表达式索引的查询
SELECT * FROM products WHERE (data->>'price')::numeric > 100;

部分索引

部分索引是基于表中部分行的索引,可以减少索引大小并提高查询性能。

sql
-- 创建部分索引
CREATE INDEX users_active_email_idx ON users (email) WHERE active = true;

-- 使用部分索引的查询
SELECT * FROM users WHERE email = 'user@example.com' AND active = true;

-- 创建基于条件的部分索引
CREATE INDEX orders_recent_idx ON orders (order_date) WHERE order_date > CURRENT_DATE - INTERVAL '3 months';

全文搜索

PostgreSQL提供了内置的全文搜索功能,可以高效地搜索文本数据。

基本全文搜索

sql
-- 创建全文搜索索引
CREATE INDEX articles_content_idx ON articles USING GIN (to_tsvector('english', content));

-- 全文搜索查询
SELECT title, ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'PostgreSQL tutorial')) AS rank
FROM articles
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'PostgreSQL tutorial')
ORDER BY rank DESC;

-- 使用配置的全文搜索
SELECT title, ts_rank_cd(to_tsvector('english', content), to_tsquery('english', 'PostgreSQL & tutorial')) AS rank
FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'PostgreSQL & tutorial')
ORDER BY rank DESC;

全文搜索配置

sql
-- 创建自定义全文搜索配置
CREATE TEXT SEARCH CONFIGURATION chinese (COPY = pg_catalog.simple);

-- 添加中文词典
ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR hword, hword_part, word WITH pg_catalog.simple;

-- 使用中文全文搜索
SELECT title, ts_rank(to_tsvector('chinese', content), plainto_tsquery('chinese', '数据库教程')) AS rank
FROM articles
WHERE to_tsvector('chinese', content) @@ plainto_tsquery('chinese', '数据库教程')
ORDER BY rank DESC;

事务控制高级特性

保存点

保存点允许在事务中创建标记,以便在需要时回滚到特定点,而不是回滚整个事务。

sql
BEGIN;

-- 执行一些操作
INSERT INTO users (username, email) VALUES ('user1', 'user1@example.com');

-- 创建保存点
SAVEPOINT my_savepoint;

-- 执行更多操作
INSERT INTO users (username, email) VALUES ('user2', 'user2@example.com');

-- 回滚到保存点
ROLLBACK TO my_savepoint;

-- 提交事务,只保存第一个INSERT
COMMIT;

事务隔离级别

PostgreSQL支持四种事务隔离级别:

sql
-- 设置事务隔离级别
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

-- 执行查询和修改
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;

COMMIT;

常见表表达式(CTE)与递归查询组合

sql
-- 使用CTE和递归查询生成日期维度表
WITH RECURSIVE date_dimension AS (
    SELECT 
        '2020-01-01'::DATE AS date
    UNION ALL
    SELECT 
        date + INTERVAL '1 day' AS date
    FROM date_dimension
    WHERE date < '2023-12-31'
)
SELECT 
    date,
    EXTRACT(YEAR FROM date) AS year,
    EXTRACT(MONTH FROM date) AS month,
    EXTRACT(DAY FROM date) AS day,
    EXTRACT(DOW FROM date) AS day_of_week,
    TO_CHAR(date, 'Day') AS day_name,
    EXTRACT(WEEK FROM date) AS week_of_year,
    CASE WHEN EXTRACT(DOW FROM date) IN (0, 6) THEN TRUE ELSE FALSE END AS is_weekend
FROM date_dimension
ORDER BY date;

最佳实践

  1. 使用窗口函数替代子查询:窗口函数通常比子查询更高效、更清晰
  2. 合理使用CTE:CTE可以使复杂查询更易于理解和维护
  3. 为JSONB字段创建适当的索引:提高JSON查询性能
  4. 使用部分索引优化特定查询:减少索引大小并提高查询性能
  5. 考虑使用全文搜索替代LIKE查询:对于复杂文本搜索,全文搜索更高效
  6. 使用EXPLAIN分析查询计划:确保查询使用了适当的索引和优化
  7. 避免在事务中执行长时间运行的操作:减少锁持有时间,提高并发性能

常见问题(FAQ)

窗口函数和聚合函数有什么区别?

  • 聚合函数会将多行数据合并为一行
  • 窗口函数不会减少结果集的行数,而是为每行添加一个计算列

什么时候应该使用CTE?

  • 当查询包含多个子查询时
  • 当查询需要递归时
  • 当需要提高查询的可读性和可维护性时

JSON和JSONB有什么区别?

  • JSON存储原始JSON文本,保留空格和顺序
  • JSONB存储二进制格式,更高效的存储和查询,不保留空格和顺序
  • 推荐使用JSONB

什么时候应该使用数组类型?

  • 当需要存储多个相关值时
  • 当这些值的顺序不重要时
  • 当这些值的数量不固定时

全文搜索和LIKE查询有什么区别?

  • LIKE查询适合简单的字符串匹配
  • 全文搜索适合复杂的文本搜索,支持关键词、短语、权重等
  • 全文搜索通常比LIKE查询更高效

总结

PostgreSQL提供了丰富的高级SQL特性,这些特性可以帮助你编写更复杂、更高效的查询。本文介绍了窗口函数、递归查询、通用表表达式、JSON查询、数组操作、范围类型、表达式索引、部分索引和全文搜索等高级功能。

掌握这些高级SQL特性可以提高你的查询编写能力,使你能够处理更复杂的数据查询和分析任务。建议在实际项目中根据具体需求选择合适的特性,并使用EXPLAIN分析查询计划,确保查询性能最优。