外观
PostgreSQL高级SQL特性
PostgreSQL提供了许多高级SQL特性,这些特性可以帮助你编写更复杂、更高效的查询。本文将介绍PostgreSQL的一些高级SQL特性。
窗口函数
窗口函数是一种特殊的函数,它可以在一组相关行上执行计算,而不影响最终结果的行数。窗口函数在数据分析和报表生成中非常有用。
基本语法
sql
FUNCTION_NAME([argument_list]) OVER (
[PARTITION BY partition_expression, ...]
[ORDER BY sort_expression [ASC | DESC], ...]
[ROWS frame_clause]
)常用窗口函数
聚合函数作为窗口函数
sql
-- 计算每个产品的销售额以及该产品所属类别的平均销售额
SELECT
product_id,
category_id,
sales_amount,
AVG(sales_amount) OVER (PARTITION BY category_id) AS avg_category_sales
FROM sales;
-- 计算累计销售额
SELECT
order_date,
sales_amount,
SUM(sales_amount) OVER (ORDER BY order_date) AS cumulative_sales
FROM daily_sales;排名函数
sql
-- ROW_NUMBER:为每行分配唯一的序号
SELECT
product_id,
category_id,
price,
ROW_NUMBER() OVER (PARTITION BY category_id ORDER BY price DESC) AS row_num
FROM products;
-- RANK:为每行分配排名,相同值排名相同,跳过下一个排名
SELECT
product_id,
category_id,
price,
RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS rank
FROM products;
-- DENSE_RANK:为每行分配排名,相同值排名相同,不跳过下一个排名
SELECT
product_id,
category_id,
price,
DENSE_RANK() OVER (PARTITION BY category_id ORDER BY price DESC) AS dense_rank
FROM products;
-- NTILE:将结果集分成指定数量的组
SELECT
product_id,
category_id,
price,
NTILE(4) OVER (PARTITION BY category_id ORDER BY price DESC) AS quartile
FROM products;偏移函数
sql
-- LAG:获取前n行的值
SELECT
order_date,
sales_amount,
LAG(sales_amount) OVER (ORDER BY order_date) AS previous_day_sales,
LAG(sales_amount, 7) OVER (ORDER BY order_date) AS last_week_sales
FROM daily_sales;
-- LEAD:获取后n行的值
SELECT
order_date,
sales_amount,
LEAD(sales_amount) OVER (ORDER BY order_date) AS next_day_sales
FROM daily_sales;
-- FIRST_VALUE:获取窗口中的第一个值
SELECT
product_id,
category_id,
price,
FIRST_VALUE(price) OVER (PARTITION BY category_id ORDER BY price DESC) AS highest_price
FROM products;
-- LAST_VALUE:获取窗口中的最后一个值
SELECT
product_id,
category_id,
price,
LAST_VALUE(price) OVER (PARTITION BY category_id ORDER BY price DESC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS lowest_price
FROM products;窗口定义
可以使用WINDOW子句预先定义窗口,以便在查询中多次使用:
sql
SELECT
product_id,
category_id,
price,
ROW_NUMBER() OVER w AS row_num,
RANK() OVER w AS rank,
DENSE_RANK() OVER w AS dense_rank
FROM products
WINDOW w AS (PARTITION BY category_id ORDER BY price DESC);通用表表达式(CTE)
通用表表达式(Common Table Expressions,CTE)是一种临时结果集,只在查询执行期间存在。CTE可以使复杂查询更加清晰和易于维护。
基本语法
sql
WITH cte_name (column_list) AS (
SELECT_statement
)
SELECT * FROM cte_name;简单CTE
sql
-- 使用CTE查询销售额前10的产品
WITH top_products AS (
SELECT
product_id,
SUM(sales_amount) AS total_sales
FROM sales
GROUP BY product_id
ORDER BY total_sales DESC
LIMIT 10
)
SELECT p.name, tp.total_sales
FROM top_products tp
JOIN products p ON tp.product_id = p.id;递归CTE
递归CTE用于处理具有层次结构或树状结构的数据,如组织结构、分类目录等。
sql
-- 查询组织结构
WITH RECURSIVE org_structure AS (
-- 基础查询:获取顶级员工
SELECT
id,
name,
manager_id,
1 AS level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- 递归查询:获取下级员工
SELECT
e.id,
e.name,
e.manager_id,
os.level + 1 AS level
FROM employees e
JOIN org_structure os ON e.manager_id = os.id
)
SELECT * FROM org_structure ORDER BY level, name;
-- 查询分类树
WITH RECURSIVE category_tree AS (
SELECT
id,
name,
parent_id,
name AS path
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.id,
c.name,
c.parent_id,
ct.path || ' > ' || c.name AS path
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree ORDER BY path;多个CTE
可以在一个查询中定义多个CTE:
sql
WITH
monthly_sales AS (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS sales
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
),
monthly_avg AS (
SELECT AVG(sales) AS avg_sales FROM monthly_sales
)
SELECT
month,
sales,
(sales - avg_sales) AS difference
FROM monthly_sales, monthly_avg
ORDER BY month;递归查询
除了递归CTE外,PostgreSQL还支持使用RECURSIVE关键字进行递归查询。
斐波那契数列
sql
WITH RECURSIVE fibonacci(n, a, b) AS (
VALUES (1, 0, 1)
UNION ALL
SELECT n+1, b, a+b FROM fibonacci WHERE n < 20
)
SELECT n, b AS fib FROM fibonacci;日期序列生成
sql
WITH RECURSIVE date_series AS (
SELECT '2023-01-01'::DATE AS date
UNION ALL
SELECT date + INTERVAL '1 day' AS date FROM date_series WHERE date < '2023-01-10'
)
SELECT date FROM date_series;JSON查询
PostgreSQL提供了强大的JSON支持,包括JSON和JSONB两种数据类型。JSONB是二进制格式,支持索引,查询效率更高。
JSONB基本操作
sql
-- 查询JSONB字段
SELECT data->'name' AS name, data->>'age' AS age FROM users;
-- JSONB条件查询
SELECT * FROM users WHERE data->>'age' > '30';
SELECT * FROM users WHERE data @> '{"city": "Beijing"}'::JSONB;
SELECT * FROM users WHERE data -> 'tags' ? 'developer';
SELECT * FROM users WHERE data -> 'tags' ?| array['developer', 'designer'];
SELECT * FROM users WHERE data -> 'tags' ?& array['developer', 'postgres'];
-- JSONB路径查询(PostgreSQL 12+)
SELECT data#>'{address, city}' AS city FROM users;
SELECT data#>>'{address, zipcode}' AS zipcode FROM users;
-- 使用JSON路径表达式
SELECT jsonb_path_query(data, '$.tags[*]') AS tag FROM users;
SELECT jsonb_path_query(data, '$.address.city') AS city FROM users;JSONB修改
sql
-- 添加键值对
UPDATE users SET data = data || '{"new_field": "value"}'::JSONB WHERE id = 1;
-- 修改键值对
UPDATE users SET data = jsonb_set(data, '{age}', '31'::JSONB) WHERE id = 1;
-- 删除键
UPDATE users SET data = data - 'new_field' WHERE id = 1;
-- 删除数组元素
UPDATE users SET data = data #- '{tags, 0}' WHERE id = 1;JSONB索引
sql
-- 创建GIN索引
CREATE INDEX users_data_idx ON users USING GIN (data);
-- 创建特定键的索引
CREATE INDEX users_city_idx ON users ((data->>'city'));
-- 创建数组元素的索引
CREATE INDEX users_tags_idx ON users USING GIN ((data->'tags'));
-- 创建JSON路径索引(PostgreSQL 14+)
CREATE INDEX users_age_idx ON users USING GIN (jsonb_path_ops(data));数组操作
PostgreSQL支持数组类型,可以存储多个值在一个列中。
数组基本操作
sql
-- 数组声明
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
tags TEXT[]
);
-- 插入数组数据
INSERT INTO products (name, tags) VALUES ('Laptop', '{electronics, computer, portable}');
INSERT INTO products (name, tags) VALUES ('Smartphone', ARRAY['electronics', 'mobile', 'communication']);
-- 访问数组元素
SELECT name, tags[1] AS first_tag, tags[2] AS second_tag FROM products;
-- 数组长度
SELECT name, array_length(tags, 1) AS tag_count FROM products;
-- 数组包含
SELECT * FROM products WHERE 'electronics' = ANY(tags);
SELECT * FROM products WHERE tags @> ARRAY['electronics', 'mobile'];
SELECT * FROM products WHERE tags <@ ARRAY['electronics', 'mobile', 'communication', 'gadget'];
-- 数组连接
SELECT name, tags || ARRAY['new_tag'] AS updated_tags FROM products;
-- 数组追加
SELECT name, array_append(tags, 'new_tag') AS updated_tags FROM products;
-- 数组删除
SELECT name, array_remove(tags, 'mobile') AS updated_tags FROM products;
-- 数组排序
SELECT name, array_sort(tags) AS sorted_tags FROM products;数组函数
sql
-- 数组聚合
SELECT category_id, array_agg(name) AS product_names FROM products GROUP BY category_id;
-- 数组去重
SELECT array_remove(array_agg(tag), NULL) AS unique_tags FROM (
SELECT unnest(tags) AS tag FROM products
) AS t;
-- 数组转换为字符串
SELECT name, array_to_string(tags, ', ') AS tag_string FROM products;
-- 字符串转换为数组
SELECT string_to_array('electronics,computer,portable', ',') AS tags;范围类型
PostgreSQL支持范围类型,可以存储连续的值范围,如日期范围、数值范围等。
范围类型基本操作
sql
-- 范围类型声明
CREATE TABLE events (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
event_period DATERANGE
);
-- 插入范围数据
INSERT INTO events (name, event_period) VALUES ('Conference', '[2023-01-01, 2023-01-03]');
INSERT INTO events (name, event_period) VALUES ('Workshop', '(2023-01-05, 2023-01-07)');
-- 访问范围边界
SELECT name, lower(event_period) AS start_date, upper(event_period) AS end_date FROM events;
-- 范围包含
SELECT * FROM events WHERE event_period @> '2023-01-02'::DATE;
SELECT * FROM events WHERE event_period @> '[2023-01-02, 2023-01-02]'::DATERANGE;
-- 范围重叠
SELECT * FROM events WHERE event_period && '[2023-01-04, 2023-01-06]'::DATERANGE;
-- 范围包含于
SELECT * FROM events WHERE event_period <@ '[2023-01-01, 2023-01-10]'::DATERANGE;
-- 范围相邻
SELECT * FROM events WHERE event_period -|- '[2023-01-03, 2023-01-05]'::DATERANGE;范围函数
sql
-- 创建范围
SELECT daterange('2023-01-01', '2023-01-03');
SELECT daterange('2023-01-01', '2023-01-03', '[]');
-- 范围长度
SELECT name, upper(event_period) - lower(event_period) AS duration_days FROM events;
-- 范围合并
SELECT daterange_merge(array_agg(event_period)) AS merged_range FROM events;
-- 范围差异
SELECT daterange('2023-01-01', '2023-01-10') - daterange('2023-01-03', '2023-01-07');表达式索引
表达式索引是基于表达式结果的索引,可以加速包含表达式的查询。
sql
-- 创建表达式索引
CREATE INDEX users_lower_email_idx ON users (LOWER(email));
-- 使用表达式索引的查询
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- 创建JSONB表达式索引
CREATE INDEX products_price_idx ON products ((data->>'price')::numeric);
-- 使用JSONB表达式索引的查询
SELECT * FROM products WHERE (data->>'price')::numeric > 100;部分索引
部分索引是基于表中部分行的索引,可以减少索引大小并提高查询性能。
sql
-- 创建部分索引
CREATE INDEX users_active_email_idx ON users (email) WHERE active = true;
-- 使用部分索引的查询
SELECT * FROM users WHERE email = 'user@example.com' AND active = true;
-- 创建基于条件的部分索引
CREATE INDEX orders_recent_idx ON orders (order_date) WHERE order_date > CURRENT_DATE - INTERVAL '3 months';全文搜索
PostgreSQL提供了内置的全文搜索功能,可以高效地搜索文本数据。
基本全文搜索
sql
-- 创建全文搜索索引
CREATE INDEX articles_content_idx ON articles USING GIN (to_tsvector('english', content));
-- 全文搜索查询
SELECT title, ts_rank(to_tsvector('english', content), plainto_tsquery('english', 'PostgreSQL tutorial')) AS rank
FROM articles
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'PostgreSQL tutorial')
ORDER BY rank DESC;
-- 使用配置的全文搜索
SELECT title, ts_rank_cd(to_tsvector('english', content), to_tsquery('english', 'PostgreSQL & tutorial')) AS rank
FROM articles
WHERE to_tsvector('english', content) @@ to_tsquery('english', 'PostgreSQL & tutorial')
ORDER BY rank DESC;全文搜索配置
sql
-- 创建自定义全文搜索配置
CREATE TEXT SEARCH CONFIGURATION chinese (COPY = pg_catalog.simple);
-- 添加中文词典
ALTER TEXT SEARCH CONFIGURATION chinese ADD MAPPING FOR hword, hword_part, word WITH pg_catalog.simple;
-- 使用中文全文搜索
SELECT title, ts_rank(to_tsvector('chinese', content), plainto_tsquery('chinese', '数据库教程')) AS rank
FROM articles
WHERE to_tsvector('chinese', content) @@ plainto_tsquery('chinese', '数据库教程')
ORDER BY rank DESC;事务控制高级特性
保存点
保存点允许在事务中创建标记,以便在需要时回滚到特定点,而不是回滚整个事务。
sql
BEGIN;
-- 执行一些操作
INSERT INTO users (username, email) VALUES ('user1', 'user1@example.com');
-- 创建保存点
SAVEPOINT my_savepoint;
-- 执行更多操作
INSERT INTO users (username, email) VALUES ('user2', 'user2@example.com');
-- 回滚到保存点
ROLLBACK TO my_savepoint;
-- 提交事务,只保存第一个INSERT
COMMIT;事务隔离级别
PostgreSQL支持四种事务隔离级别:
sql
-- 设置事务隔离级别
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- 执行查询和修改
SELECT * FROM accounts WHERE id = 1 FOR UPDATE;
UPDATE accounts SET balance = balance - 100 WHERE id = 1;
UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;常见表表达式(CTE)与递归查询组合
sql
-- 使用CTE和递归查询生成日期维度表
WITH RECURSIVE date_dimension AS (
SELECT
'2020-01-01'::DATE AS date
UNION ALL
SELECT
date + INTERVAL '1 day' AS date
FROM date_dimension
WHERE date < '2023-12-31'
)
SELECT
date,
EXTRACT(YEAR FROM date) AS year,
EXTRACT(MONTH FROM date) AS month,
EXTRACT(DAY FROM date) AS day,
EXTRACT(DOW FROM date) AS day_of_week,
TO_CHAR(date, 'Day') AS day_name,
EXTRACT(WEEK FROM date) AS week_of_year,
CASE WHEN EXTRACT(DOW FROM date) IN (0, 6) THEN TRUE ELSE FALSE END AS is_weekend
FROM date_dimension
ORDER BY date;最佳实践
- 使用窗口函数替代子查询:窗口函数通常比子查询更高效、更清晰
- 合理使用CTE:CTE可以使复杂查询更易于理解和维护
- 为JSONB字段创建适当的索引:提高JSON查询性能
- 使用部分索引优化特定查询:减少索引大小并提高查询性能
- 考虑使用全文搜索替代LIKE查询:对于复杂文本搜索,全文搜索更高效
- 使用EXPLAIN分析查询计划:确保查询使用了适当的索引和优化
- 避免在事务中执行长时间运行的操作:减少锁持有时间,提高并发性能
常见问题(FAQ)
窗口函数和聚合函数有什么区别?
- 聚合函数会将多行数据合并为一行
- 窗口函数不会减少结果集的行数,而是为每行添加一个计算列
什么时候应该使用CTE?
- 当查询包含多个子查询时
- 当查询需要递归时
- 当需要提高查询的可读性和可维护性时
JSON和JSONB有什么区别?
- JSON存储原始JSON文本,保留空格和顺序
- JSONB存储二进制格式,更高效的存储和查询,不保留空格和顺序
- 推荐使用JSONB
什么时候应该使用数组类型?
- 当需要存储多个相关值时
- 当这些值的顺序不重要时
- 当这些值的数量不固定时
全文搜索和LIKE查询有什么区别?
- LIKE查询适合简单的字符串匹配
- 全文搜索适合复杂的文本搜索,支持关键词、短语、权重等
- 全文搜索通常比LIKE查询更高效
总结
PostgreSQL提供了丰富的高级SQL特性,这些特性可以帮助你编写更复杂、更高效的查询。本文介绍了窗口函数、递归查询、通用表表达式、JSON查询、数组操作、范围类型、表达式索引、部分索引和全文搜索等高级功能。
掌握这些高级SQL特性可以提高你的查询编写能力,使你能够处理更复杂的数据查询和分析任务。建议在实际项目中根据具体需求选择合适的特性,并使用EXPLAIN分析查询计划,确保查询性能最优。
