2个场景实例讲解GaussDB(DWS)基表统计信息估算不准的处理方案 每日观点

2023-06-02 20:01:59 来源:博客园
摘要:通过2个实例场景讲解GaussDB(DWS)运维解决方案。

本文分享自华为云社区《GaussDB(DWS)运维 -- 基表统计信息估算不准的常见场景及处理方案》,作者:譡里个檔。

场景1:基表过滤字段存在的隐式类型时,基表行数估算偏小

这种场景绝大部分场景DWS能够处理,但是如果隐式类型转后的结果与统计信息中的字段枚举值的表达式不一样,就会导致估算的严重偏差

原始SQL如下


(资料图片)

SELECT * FROM dmgrpdi.dwl_inv_res_rpt_ci_grp_f WHERE period_id=202212 AND source_flag=1;

对应的执行计划

QUERY PLAN-------------------------------------------------------------------------------------------------------------------  id |                            operation                             | E-rows | E-memory | E-width |  E-costs ----+------------------------------------------------------------------+--------+----------+---------+----------- 1 | -> Row Adapter                                                  | 14160 | | 717 | 680025.43 2 | ->  Vector Streaming (type: GATHER) | 14160 | | 717 | 680025.43 3 | ->  Vector Partition Iterator                              | 14160 | 1MB      | 717 | 678241.33 4 | ->  Partitioned CStore Scan on dwl_inv_res_rpt_ci_grp_f | 14160 | 1MB      | 717 | 678241.33                  Predicate Information (identified by plan id) ------------------------------------------------------------------------------- 3 --Vector Partition Iterator         Iterations: 1 4 --Partitioned CStore Scan on dwl_inv_res_rpt_ci_grp_f         Filter: ((period_id = 202212::numeric) AND ((source_flag)::bigint = 1))         Pushdown Predicate Filter: (period_id = 202212::numeric)         Partitions Selected by Static Prune: 36

发现source_flag字段上存在隐式类型转换,查询字段source_flag的统计信息

postgres=# SELECT most_common_vals,most_common_freqs, histogram_bounds  FROM pg_stats WHERE tablename = "dwl_inv_res_rpt_ci_grp_f" AND attname = "source_flag"; most_common_vals | most_common_freqs | histogram_bounds------------------+-----------------------------------+------------------ {01,02,04,03}    | {.440034,.241349,.217413,.101089} | {05,06}(1 row)

发现隐式类型转后的结果(1)与统计信息中的字段枚举值("01")的表达式不一样

处理方案:修改过滤条件,禁止类型转换,并且使用正确的常量值书写过滤条件

如上SQL语句中的source_flag=1修改为source_flag="01",修改后SQL语句如下

SELECT * FROM dmgrpdi.dwl_inv_res_rpt_ci_grp_f WHERE period_id=202212 AND source_flag="01";

查询新语句的执行计划

QUERY PLAN----------------------------------------------------------------------------------------------------------------------  id |                            operation                             |  E-rows | E-memory | E-width |  E-costs ----+------------------------------------------------------------------+-----------+----------+---------+----------- 1 | -> Row Adapter                                                  | 108359075 | | 717 | 480542.98 2 | ->  Vector Streaming (type: GATHER) | 108359075 | | 717 | 480542.98 3 | ->  Vector Partition Iterator                              | 108359075 | 1MB      | 717 | 478758.88 4 | ->  Partitioned CStore Scan on dwl_inv_res_rpt_ci_grp_f | 108359075 | 1MB      | 717 | 478758.88                           Predicate Information (identified by plan id) ------------------------------------------------------------------------------------------------- 3 --Vector Partition Iterator         Iterations: 1 4 --Partitioned CStore Scan on dwl_inv_res_rpt_ci_grp_f         Filter: ((period_id = 202212::numeric) AND (source_flag = "01"::text))         Pushdown Predicate Filter: ((period_id = 202212::numeric) AND (source_flag = "01"::text))         Partitions Selected by Static Prune: 36
场景2:基表在多列组合主键上过滤时,基表行数估算偏大

这种场景是因为DWS对基表上多个过滤条件之间采取弱相关性处理,当多个过滤条件是主键时,可能导致结果集估算偏大。

原始SQL如下

SELECT * FROM mca.mca_period_rate_t mca_rate2WHERE period_number = "202208" AND from_currency_code = "RMB" AND to_currency_code = "USD"

执行信息如下

id |                      operation                       |       A-time | A-rows | E-rows | Peak Memory | E-memory | A-width | E-width | E-costs  ----+------------------------------------------------------+--------------------+--------+--------+-------------+----------+---------+---------+---------- 1 | -> Row Adapter                                      | 444.735 | 1 | 2033 | 227KB       | | | 321 | 22601.41  2 | ->  Vector Streaming (type: GATHER) | 444.720 | 1 | 2033 | 873KB       | | | 321 | 22601.41  3 | -> CStore Scan on mca_period_rate_t mca_rate2 | [435.167, 435.167] | 1 | 2033 | [5MB, 5MB] | 1MB      | | 321 | 22427.41                                                               Predicate Information (identified by plan id) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 3 --CStore Scan on mca_period_rate_t mca_rate2        Filter: (((period_number)::text = "202208"::text) AND ((from_currency_code)::text = "RMB"::text) AND ((to_currency_code)::text = "USD"::text)) Rows Removed by Filter: 425812        Pushdown Predicate Filter: (((period_number)::text = "202208"::text) AND ((from_currency_code)::text = "RMB"::text) AND ((to_currency_code)::text = "USD"::text))

可以发现基表mca.mca_period_rate_t的行数估算严重偏大。

使用如下SQL语句查看表mca.mca_period_rate_t的定义

SELECT pg_get_tabledef("mca.mca_period_rate_t"::regclass);

查询表mca.mca_period_rate_t定义

SELECT pg_get_tabledef("mca.mca_period_rate_t");SET search_path = mca;CREATE TABLE mca_period_rate_t (seq numeric NOT NULL,period_number character varying(10) NOT NULL,from_currency_code character varying(20) NOT NULL,to_currency_code character varying(20) NOT NULL,begin_rate numeric(35,18),end_rate numeric(35,18),avg_rate numeric(35,18),creation_date timestamp(0) without time zone NOT NULL,created_by numeric NOT NULL,last_update_date timestamp(0) without time zone,last_updated_by numeric,rmb_begin_rate numeric(35,18),usd_begin_rate numeric(35,18),rmb_end_rate numeric(35,18),usd_end_rate numeric(35,18),rmb_avg_rate numeric(35,18),usd_avg_rate numeric(35,18),crt_cycle_id numeric,crt_job_instance_id numeric,last_upd_cycle_id numeric,upd_job_instance_id numeric,cdc_key_id character varying(128) DEFAULT sys_guid(),end_rate2 numeric(35,18),avg_rate2 numeric(35,18),last_period_end_rate numeric(35,18))WITH (orientation=column, compression=low, colversion=2.0, enable_delta=false)DISTRIBUTE BY REPLICATIONTO GROUP group_version1;CREATE UNIQUE INDEX mca_period_rate_u1 ON mca.mca_period_rate_t USING cbtree (period_number, from_currency_code, to_currency_code) TABLESPACE pg_default;

发现 (period_number, from_currency_code, to_currency_code) 为组合的唯一索引。

处理方案:对组合索引列收多列统计信息

注意此种方案只适用在基表比较小的情况下。因为多列统计信息需要使用百分比采样的方式计算统计信息,当表比较大时,统计信息计算耗时回很长。

针对如上查询语句执行如下语句收集(period_number, from_currency_code, to_currency_code) 多列统计信息

ANALYZE mca.mca_period_rate_t((period_number, from_currency_code, to_currency_code));

收集多列统计信息之后,基表的行数估算恢复正产

id |                                      operation                                      |       A-time | A-rows | E-rows | Peak Memory | A-width | E-width | E-costs ----+-------------------------------------------------------------------------------------+--------------------+--------+--------+-------------+---------+---------+--------- 1 | -> Row Adapter                                                                     | 195.504 | 1 | 1 | 227KB       | | 321 | 675.14  2 | ->  Vector Streaming (type: GATHER) | 195.491 | 1 | 1 | 873KB       | | 321 | 675.14  3 | -> CStore Index Scan using mca_period_rate_u1 on mca_period_rate_t mca_rate2 | [164.344, 164.344] | 1 | 1 | [5MB, 5MB] | | 321 | 501.14                                                       Predicate Information (identified by plan id) ---------------------------------------------------------------------------------------------------------------------------------------------------------- 3 --CStore Index Scan using mca_period_rate_u1 on mca_period_rate_t mca_rate2 Index Cond: (((period_number)::text = "202208"::text) AND ((from_currency_code)::text = "RMB"::text) AND ((to_currency_code)::text = "USD"::text))

点击关注,第一时间了解华为云新鲜技术~

标签:

2个场景实例讲解GaussDB(DWS)基表统计信息估算不准的处理方案 每日观点

2023-06-02

初中英语语法点大全_初中英语语法点_聚焦

2023-06-02

丹阳社区:“门前三包”你我参与 文明卫生你我共享

2023-06-02

家园协同!武南幼儿园童心·童乐园萌娃趣享“六一”

2023-06-02

【世界报资讯】宛渠传说之倾世情缘

2023-06-02

烂场雨致麦收撞期_安徽村民拦下赴豫收割机就地作业|天天快资讯

2023-06-02

全球首次海上风电无淡化海水原位直接电解制氢海试成功

2023-06-02

生机无限_关于生机无限介绍

2023-06-02

每日热讯!混合现实概念股继续走强

2023-06-02

世界今日报丨港股异动|中国黄金国际(02099)涨超7%领涨黄金股 隔夜黄金期货微幅收涨 黄金板块处于降息前较好布局阶段

2023-06-02

视焦点讯!AI生态画廊·山东篇丨眼中的“只此青绿”

2023-06-02

抖音在内江成立巨量引擎公司 抖音接连成立巨量引擎新公司|观速讯

2023-06-02

新动态:期待更多中国游客来马耳他

2023-06-02

抽屉滑轨拆卸方法图_抽屉滑轨拆卸方法

2023-06-02

世界微速讯:手持式扫码验票系统推荐_手持式扫描系统

2023-06-02

手机建设银行网址

2023-06-02

解除保护模式qq安全中心入口_解除保护模式-天天通讯

2023-06-02

全球报道:秋分的由来(秋分的由来和风俗教案)

2023-06-02

微速讯:最新标准国际音标_对于最新标准国际音标简单介绍

2023-06-02

张利明会见杭州银行党委书记、董事长宋剑斌一行|环球热推荐

2023-06-02

食品安全板块6月1日涨0.96%,天瑞仪器领涨,主力资金净流入1.36亿元 世界热点

2023-06-02

小鹏汽车-W(09868.HK):6月1日南向资金增持47.01万股_每日消息

2023-06-02

谷雨节气表示什么_谷雨节气的含义|天天快报

2023-06-02

世界滚动:俄国防部:挫败“入侵”企图

2023-06-02

揖怎么读组词_揖怎么读

2023-06-02

月均增长“百亿级”!今年我国快递业务量已突破500亿件

2023-06-02

世界快消息!越野车型号(上汽越野车有哪几种型号?)

2023-06-01

永和大王 运河西路店_关于永和大王 运河西路店简述

2023-06-01

北镇市气象局发布大风蓝色预警【Ⅳ级/一般】【2023-06-01】

2023-06-01

天天视讯!深圳国际高性能医疗器械展即将启幕 剧透来了!

2023-06-01

Copyright ©  2015-2022 起点频道网版权所有  备案号:皖ICP备2022009963号-12   联系邮箱: 39 60 29 14 2@qq.com