数据分析工作中常用的3类SQL开窗函数详解
当前位置:点晴教程→知识管理交流
→『 技术文档交流 』
在日常工作中,我们经常要处理各种数据排名、累计求和、环比等问题。SQL窗口函数为我们提供了强大的工具来处理这些任务。 接下来,我们一起看看SQL常见窗口函数及其应用案例吧。 窗口函数中必须有一个over子句。 01 1.1 函数说明
1.2 案例说明 这里通过表data_learning.product_order(商品销量表)、data_learning.product(商品信息表)、data_learning.product_category(商品二级分类信息表)进行举例,两张表建表语句和示例数据插入语句可以查看我之前的文章。 数据表示例数据分别如下: data_learning.product_order(商品销量表): data_learning.product(商品信息表): data_learning.product_category(商品二级分类信息表):
问题:请将数据集按照商品二级类别(category_id)字段进行分组后,根据上面列举的窗口函数并按照sales_volume字段降序返回排名。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,ROW_NUMBER() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS ro_result ,RANK() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS rank_result ,DENSE_RANK() OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS dense_rank_result ,NTILE(10) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS ntile_resultFROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product )bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 结果如下(部分截图):
上面的案例显示,童装类别分组下,商品毛衣针织衫和儿童羽绒服销量都是1617,开窗函数row_number的排序结果是顺延,rank的排序结果是相同值并列、后续排序序号不连续,dense_rank的排序结果是相同值并列、后续排序序号连续。 开窗函数NTILE(n) OVER(PARTITION BY 分组列 ORDER BY 排序列)用于将一个结果集分割成指定数量的桶(buckets),并分配一个桶号给每个值。它通常用于在分组数据中将数据等分为多个组。 在上面的案例中,我们对商品分类分组数据集分割成10个桶,桶号从1开始。以童装类别为例,童装类别共17行数据,17不能被10整除,数据集无法均分成10个桶,所以结果集中前面7个桶包含2行数据,后面3个桶包含1行数据。
02 2.1 函数说明
2.2 案例说明 问题:请将数据集按照商品二级类别(category_id)字段进行分组、sales_volume字段进行降序后,根据上面列举的窗口函数对sales_volume进行聚合。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS cnt_result ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS sum_result ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS avg_result ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS max_result ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS min_result FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category )con a.category_id = c.category_id; 结果如下(部分截图): 从案例查询结果看,数据集按照category_id进行分组后,按照sales_volume降序排序,将每组中的sales_volume依次聚合;
这涉及到窗口函数的窗口帧。窗口帧(Window Frame)是在SQL中用于定义开窗函数计算时要考虑的行的范围,也就是开窗函数作用的范围。窗口帧使用在OVER子句中,与PARTITION BY和ORDER BY联合使用来分组,对组内的数据进行排序和聚合。语句紧接着写在ORDER BY之后。 窗口帧包含以下类型:
窗口帧表达式的语法如下:
具体x、y可取值如下:
若ORDER BY 后未指定框架,那么默认框架将采用 range unbounded preceding and current row,表示从开窗后的第一行到当前行。 若分组后不加ORDER BY 或者在ORDER BY 之后加上语句rows between unbounded preceding and unbounded following,也就是对分组中所有的数据进行聚合运算。具体示例和结果如下: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS cnt_result ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id) AS cnt_result1 ,COUNT(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS cnt_result2 ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS sum_result ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ) AS sum_result1 ,SUM(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS sum_result2 ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS avg_result ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id) AS avg_result1 ,AVG(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS avg_result2 ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS max_result ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id) AS max_result1 ,MAX(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS max_result2 ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS min_result ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id) AS min_result1 ,MIN(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC rows between unbounded preceding and unbounded following) AS min_result2FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN (SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 运行结果如下(部分截图): 可以看到,不加ORDER BY的字段如sum_result1和 ORDER BY 后+rows between unbounded preceding and unbounded following的字段sum_result2结果一样。 03 3.1 函数说明
3.2 案例说明 问题:请将数据集按照商品二级类别(category_id)字段进行分组、并利用其他常见窗口函数对sales_volume字段进行处理(主要是为了理解上述函数的作用效果)。 SQL: SELECT a.product_id ,b.product_name ,a.category_id ,c.category_name ,a.sales_volume ,LAG(a.sales_volume,1) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lag_result ,LEAD(a.sales_volume,1) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lead_result ,FIRST_VALUE(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS firstvalue_result ,LAST_VALUE(a.sales_volume) OVER(PARTITION BY a.category_id ORDER BY a.sales_volume DESC) AS lastvalue_result FROM(SELECT *FROM data_learning.product_order)aLEFT JOIN(SELECT *FROM data_learning.product)bon a.product_id = b.product_idLEFT JOIN(SELECT *FROM data_learning.product_category)con a.category_id = c.category_id; 结果如下(部分截图): 其他常见窗口函数同聚合类型的窗口函数一样,也可以结合窗口帧实现不同的数据统计效果。 04 简单总结下今天分享的内容。 我们详细介绍了SQL中的三类窗口函数,并且在每个函数下使用了相关的示例演示函数的使用效果。同时,介绍了窗口帧的使用,让我们可以在SQL查询中进行更灵活和精确的数据分析和聚合。 该文章在 2024/1/13 17:33:11 编辑过 |
关键字查询
相关文章
正在查询... |