网上可以查到很多这样的说法:
如果查询的两个表大小相当,那么用in和exists差别不大。
如果两个表中一个较小,一个是大表,则子查询表大的用exists,子查询表小的用in:
例如:表A(小表),表B(大表)
1:
select * from A where cc in (select cc from B) 效率低,用到了A表上cc列的索引;
select * from A where exists(select cc from B where cc=A.cc) 效率高,用到了B表上cc列的索引。
相反的
2:
select * from B where cc in (select cc from A) 效率高,用到了B表上cc列的索引;
select * from B where exists(select cc from A where cc=B.cc) 效率低,用到了A表上cc列的索引。
将下面的语句执行优化:
select count(uid) from user where uid in (SELECT did FROM demo);
select count(uid) from user where exists (SELECT 1 FROM demowhere demo.did = user.uid);
1.注意慢的原因就是内部每次与外部比较时,都需要遍历一次表操作,可以采用另外一个方法,在嵌套一层子查询,避免多次遍历操作
SELECT count(did) FROM demo where exists (SELECT uid FROM (SELECT uid from user) as b where b.uid = demo.did);
2.第二种优化就是先将子查询里的语句执行,使用GROUP_CONCAT将字段连接起来,
如果字符串长度不够可以使用:SET SESSION group_concat_max_len = 102400;
原sql:
SELECT
c.id
FROM
c 此表有712995条数据
LEFT JOIN u ON c.user_id = u.id
LEFT JOIN doc ON c.doctor_id = doc.id
LEFT JOIN s ON c.meal_id = s.id
WHERE
s.renew = 1
AND c.orderstatus = 1
AND c.endtime < UNIX_TIMESTAMP()
AND c.org_type = 'c'
AND u.is_doctor = 0
AND u.active = 1
AND doc.is_doctor IN (4, 5)
AND doc.is_family_doctor = 1
AND doc.active = 1
AND c.user_id NOT IN (
SELECT
user_id
FROM
d 此表有934455条数据
WHERE
d.log LIKE '%结束'
);
-- 执行时间为2.265s
优化后:
SET SESSION group_concat_max_len = 102400;
SELECT GROUP_CONCAT(user_id) FROM d WHERE d.log LIKE '%结束'; -- 执行了0.521s
SELECT
c.id
FROM
c
LEFT JOIN u ON c.user_id = u.id
LEFT JOIN doc ON c.doctor_id = doc.id
LEFT JOIN s ON c.meal_id = s.id
WHERE
s.renew = 1
AND c.orderstatus = 1
AND c.endtime < UNIX_TIMESTAMP()
AND c.org_type = 'c'
AND u.is_d = 0
AND u.active = 1
AND doc.is_d IN (4, 5)
AND doc.is_f_d = 1
AND doc.active = 1
AND c.user_id NOT IN (24986,24986,24986,24986,24986,24986,..............................................大概5千个id);
-- 执行时间1.579s
执行时间少了0.686s,但是GROUP_CONCAT(user_id)还执行了0.521s,所以总体时间没有什么差别(当前数量级),
而且后一个需要考虑字符串的大小问题。
目前就了解这些,以后有时间再细细琢磨。