前言

在搬砖道路上,写了多年的sql,一直认为的优化原来大跌所望,直到最近上线遇到问题,感谢同事的讲解,让我重新认识。来司时候,服务端曾告知我服务端拒绝使用关联查

一、in和inner join之间的优化

关于他们两使用这里就不讲了,直接sql贴上来
这里需求是:求今天的玩家进房间并且下注和昨日进房间并且下注的玩家 交集 并去重
源sql

SELECT count(*) INTO fourteenStay from
(select DISTINCT account_id from data_tslog.guess_user_room_flow a INNER JOIN data_tslog.guess_user_bet_flow b USING(account_id)
where a.type = 1 and a.event_time BETWEEN eventTime and endTime
and b.event_time BETWEEN eventTime and endTime
) A INNER JOIN
(select DISTINCT account_id from data_tslog.guess_user_room_flow a INNER JOIN data_tslog.guess_user_bet_flow b USING(account_id)
where a.type = 1 and a.event_time BETWEEN fourteenStayStartTime and fourteenStayEndTime
and b.event_time BETWEEN fourteenStayStartTime and fourteenStayEndTime
) B USING(account_id);

这个sql在生产环境千万数据中 执行一次 耗费了将近一个小时

测试

1、数据准备

SELECT * from guess_user_room_flow;
SELECT * from guess_user_bet_flow;

我这里数据不多,下面两个查询的时间和条数,看出差别就行
image.png
image.png

2、我准备了使用in和inner join来查询最近一个月数据(因为数据比较少)

select DISTINCT account_id from data_tslog.guess_user_room_flow a INNER JOIN data_tslog.guess_user_bet_flow b USING(account_id)
where a.type = 1 and a.event_time BETWEEN UNIX_TIMESTAMP('2020-05-01') and UNIX_TIMESTAMP('2020-05-30 23:59:59')
and b.event_time BETWEEN UNIX_TIMESTAMP('2020-05-01') and UNIX_TIMESTAMP('2020-05-30 23:59:59');

select DISTINCT account_id from data_tslog.guess_user_room_flow a WHERE a.account_id in (
	SELECT DISTINCT account_id FROM data_tslog.guess_user_bet_flow b WHERE b.event_time BETWEEN UNIX_TIMESTAMP('2020-05-01') and UNIX_TIMESTAMP('2020-05-30 23:59:59')
) and a.type = 1 and a.event_time BETWEEN UNIX_TIMESTAMP('2020-05-01') and UNIX_TIMESTAMP('2020-05-30 23:59:59');

image.png
image.png
image.png
从上面可以看出来这速度不只是快整整一倍,几乎三倍,数据更多时候差距更明显
使用EXISTS速度更差,实时证明EXISTS和inner join差不多。

3、最上面sql解析以及修改后时间对比

image.png
image.png
如上面所见,sql 经过in处理过后,由于他们是并列关系所以使用并列and来关联。速度明显快多了,在线上今天第一种执行了一个小时,后面那条最终几秒就跑完了。

原因解析

1、in是子查询的结果作为父查询的条件,inner join是联合两张表去对比所有数据。
2、子表数据少,用in的效率高于关联,如果子表比较大的情况下则用join的效率高于用in
3、EXISTS也多用于子表大情况下

自己的话:其实嘛,我以前一直认为in超级慢,我也在考虑子表数据很大情况下,因为出于in 和 = 的比较,但是和inner join比较,在子表小的情况下还是挺受益的。事实证明双表数据都大的情况下请使用in,因为可以去出一部分。小的时候使用关联查询吧

二、嵌套查询优化

背景

最近写一个存储过程,执行速度大于几个小时,导致其他同事的存储过程都停止了
,今天用了嵌套查询优化好了

第一版:初始脚本

SELECT startTime,b.room_id,b.vip,b.user_type,b.factor,b.account_id,2,
       IFNULL(SUM(IF(t.mfr_id = 268,t.quantity,0)),0) socks,IFNULL(SUM(IF(t.mfr_id = 283,t.quantity,0)),0) hat
FROM data_bi.`bi_user_catch_spec_fish_statistics` b left JOIN data_tslog.`inc_prop_flow` t on b.account_id = t.account_id
WHERE t.event_time BETWEEN startTime and endTime
GROUP BY b.room_id,b.vip,b.user_type,b.factor,b.account_id

第二版:临时表拆分

CREATE TEMPORARY TABLE `tempTableCatsh`
    (
		 `room_id` INT(11) NOT NULL DEFAULT 0,
		 `vip` INT(11) NOT NULL DEFAULT 0,
		 `user_type` INT(11) NOT NULL DEFAULT 0,
		 `factor` VARCHAR(10) NOT NULL DEFAULT 0,
     `account_id` BIGINT(20) NOT NULL DEFAULT 0,
		 `current_factor` INT(11) NOT NULL DEFAULT 0,
        PRIMARY KEY (room_id, vip, user_type, `factor`, account_id)
    );
CREATE TEMPORARY TABLE `tempTableIncProp`
    (
		 `account_id` BIGINT(20) NOT NULL DEFAULT 0,
		 `mfr_id` INT(10) NOT NULL DEFAULT 0,
		 `quantity` BIGINT(20) NOT NULL DEFAULT 0,
        PRIMARY KEY (`account_id`,`mfr_id`)
    );

-- 临时表新增数据
-- 查询今天数据并分组
insert into tempTableCatsh(current_factor,room_id,vip,user_type,factor,account_id)
SELECT SUM(current_factor),room_id,vip,user_type,factor,account_id
from data_bi.`bi_user_catch_spec_fish_statistics`  where event_time BETWEEN startTime and endTime AND fish_id = 141
GROUP BY room_id,vip,user_type,factor,account_id
    ON DUPLICATE KEY UPDATE current_factor=(current_factor);
-- 查询今天圣诞帽袜道具信息
insert into tempTableIncProp(account_id,mfr_id,quantity)
SELECT account_id,mfr_id,SUM(quantity)
FROM data_tslog.`inc_prop_flow` WHERE event_time BETWEEN startTime and endTime AND mfr_id in (268, 283)
GROUP BY account_id
    ON DUPLICATE KEY UPDATE quantity=(quantity);

-- -- 单玩家统计
insert into data_bi.`bi_chris_prop_statistics`(event_time,room_id,vip,user_type,factor,account_id,flag_type,socks_num,hat_num)
SELECT startTime, a.room_id,a.vip,a.user_type,a.factor,a.account_id,2,
       IFNULL(SUM(IF(b.mfr_id = 268,b.quantity,0)),0) socks,IFNULL(SUM(IF(b.mfr_id = 283,b.quantity,0)),0) hat
FROM tempTableCatsh a LEFT JOIN tempTableIncProp b ON a.account_id = b.account_id
GROUP BY a.room_id,a.vip,a.user_type,a.factor,a.account_id
HAVING a.room_id is NOT null and a.vip is NOT null and a.user_type is NOT null and a.factor is NOT null
  ON DUPLICATE KEY UPDATE socks_num=(socks_num), hat_num=(hat_num)

第三版:嵌套查询

建标就参考第二版的建表

-- 查询今天数据并分组
insert into tempTableCatsh(current_factor,room_id,vip,user_type,factor,account_id)
SELECT SUM(current_factor),room_id,vip,user_type,factor,account_id
FROM (
SELECT current_factor,room_id,vip,user_type,factor,account_id
from data_bi.`bi_user_catch_spec_fish_statistics` where event_time BETWEEN startTime and endTime AND fish_id = 141
) t
GROUP BY room_id,vip,user_type,factor,account_id
ON DUPLICATE KEY UPDATE current_factor=(current_factor);
-- 查询今天圣诞帽袜道具信息
insert into tempTableIncProp(account_id,mfr_id,quantity)
SELECT account_id,mfr_id,sum(quantity)
FROM (
SELECT account_id,mfr_id,quantity
FROM data_tslog.`inc_prop_flow`
WHERE event_time BETWEEN startTime and endTime AND mfr_id in (268, 283)
) t
GROUP BY account_id,mfr_id
ON DUPLICATE KEY UPDATE quantity=(quantity);

-- 捕获统计
insert into data_bi.`bi_chris_prop_statistics`(event_time,room_id,vip,user_type,factor,account_id,flag_type,socks_num,hat_num)
SELECT startTime, room_id,vip,user_type,factor,0,1,
  IFNULL(SUM(IF(mfr_id = 268,quantity,0)),0) socks,IFNULL(SUM(IF(mfr_id = 283,quantity,0)),0) hat
FROM ( 
	SELECT a.room_id,a.vip,a.user_type,a.factor,b.mfr_id,b.quantity,a.account_id
	FROM tempTableCatsh a LEFT JOIN tempTableIncProp b ON a.account_id = b.account_id
	WHERE a.room_id is NOT null and a.vip is NOT null and a.user_type is NOT null and a.factor is NOT null
	) t
GROUP BY room_id,vip,user_type,factor
ON DUPLICATE KEY UPDATE socks_num=(socks_num), hat_num=(hat_num);

-- 单个玩家
insert into data_bi.`bi_chris_prop_statistics`(event_time,room_id,vip,user_type,factor,account_id,flag_type,socks_num,hat_num)
SELECT startTime, room_id,vip,user_type,factor,account_id,1,
  IFNULL(SUM(IF(mfr_id = 268,quantity,0)),0) socks,IFNULL(SUM(IF(mfr_id = 283,quantity,0)),0) hat
FROM ( 
	SELECT a.room_id,a.vip,a.user_type,a.factor,b.mfr_id,b.quantity,a.account_id
	FROM tempTableCatsh a LEFT JOIN tempTableIncProp b ON a.account_id = b.account_id
	WHERE a.room_id is NOT null and a.vip is NOT null and a.user_type is NOT null and a.factor is NOT null
	) t
GROUP BY room_id,vip,user_type,factor,account_id
ON DUPLICATE KEY UPDATE socks_num=(socks_num), hat_num=(hat_num);

后语
如果本文对你哪怕有一丁点帮助,请帮忙点好看。你的好看是我坚持写作的动力。

打赏
支付宝 微信
上一篇 下一篇