有 时候我们需要找出在一组连续的号码中,有哪些是缺失的部分。具体的需求又分为两种情况
- 查找 连续部分的最大和最小值
- 查找缺少的部分
经过构造的数据如下
-- preparation drop table test1 purge; create table test1(id int,dt date); insert into test1 (id, dt) select rownum as id, trunc(add_months(sysdate, rownum), 'mm') as dt from dual connect by rownum <= 20; delete test1 where id in (4,7,13,19); commit; SQL> select * from test1; ID DT ---- ----------- 1 2009-4-1 2 2009-5-1 3 2009-6-1 5 2009-8-1 6 2009-9-1 8 2009-11-1 9 2009-12-1 10 2010-1-1 11 2010-2-1 12 2010-3-1 14 2010-5-1 15 2010-6-1 16 2010-7-1 17 2010-8-1 18 2010-9-1 20 2010-11-1 16 rows selected
查找 连续部分的最大和最小值
下面的场景是连续部分的起始和终点,使用NUMBER类型
-- case 1 number type + rownum select min(v2.id) as n1, max(v2.id) as n2 from (select v1.id, (v1.id - rownum) as delta from (select id from test1 order by id) v1) v2 group by v2.delta order by 1; 首先需要对取出的id值进行排序 用这个id值和rownum相减,如果差值是相同的,说明这些id是连续的 N1 N2 ---------- ---------- 1 3 5 6 8 12 14 18 20 20
然后使用分析函数实现 同样的功能。
-- case 2 number type + analyse func with v as (select id, lag(id) over(order by id) as lag, lead(id) over(order by id) as lead from test1 order by id) select v1.id, v2.id from (select v.id, rownum as rn from v where (v.id - v.lag) != 1 or v.lag is null) v1, (select v.id, rownum as rn from v where (v.lead - v.id) != 1 or v.lead is null) v2 where v1.rn = v2.rn; 这里使用WITH构造一个视图,避免了对表的多次扫描 如果一个ID值和它的上一个值之间的绝对值差不是1,表示发生了跳跃。 这个ID是某个连续区间的最小或者最大
相同的需求,不同的是这里使用日期类型。注意,因为构造的数据是以MONTH为单位的,所以这里需要使用months_between()来计算差值。
-- case 3 date type + rownum select min(v2.dt) as n1, max(v2.dt) as n2 from (select v1.dt, months_between(v1.dt, trunc(add_months(sysdate, rownum), 'mm')) as delta from (select dt from test1 order by id) v1) v2 group by v2.delta order by 1; N1 N2 ----------- ----------- 2009-4-1 2009-6-1 2009-8-1 2009-9-1 2009-11-1 2010-3-1 2010-5-1 2010-9-1 2010-11-1 2010-11-1
查找缺少的部分查找缺少的部分,基本思路是在找到连续部分的基础上,构造缺少的值
-- case 4 find lost value with v as (select min(v2.id) as n1, max(v2.id) as n2, lead(min(v2.id)) over(order by min(v2.id)) as n3 from (select v1.id, (v1.id - rownum) as delta from (select id from test1 order by id) v1) v2 group by v2.delta order by 1) select t1.sid + t2.rn as lost from (select v.n2 as sid, v.n3 as eid from v) t1, (select rownum rn from dual connect by rownum <= 50) t2 where t1.sid + t2.rn < t1.eid; 使用case1的算法得到连续部分的最大n2,最小值n1 使用lead函数把某个最大值n2和下一个最小值n3 放到同一行 使用rownum构造连续的缺少部分 LOST ---------- 4 7 13 19
另一种查找缺少的方 法,使用外连接
-- case 5 find lost value with v as (select min(id) as n1, max(id) as n2 from test1) select t2.rn as lost from v, test1 t1, (select rownum rn from dual connect by rownum <= 50) t2 where t2.rn between v.n1 and v.n2 and t1.id(+) = t2.rn and t1.id is null order by t2.rn; 取得表的最大n2和最小n1,这里是全部数据的集合 构造大表,并且只取得最大最小之间的rownum 使用外连接取得缺少的值 注意:和case4相比,这个方法要扫描两次表
Note: 更复杂的情况,可以参考Partition Outter Join