维护的流程又出问题了,查看日志调整后重新执行,等待了若个钟头后发现还是不行。尼玛,再来一次又是几个小时没了,于是乎手痒痒了,开始改造。
一通改造之后,没有具体对比,保守估计至少有50%的提升吧。
其间发现自己对这些逻辑的东西不太擅长,觉得挺绕的,记下来,做个备忘。
感谢多次指点迷津的俊杰。
补充:今天在扩容后的集群上实测,运行时间为20min,相比扩容前需要10+h,效果显著。
其中各个模块的作用如下表:
模块 | 作用 |
---|---|
delete duplicate date | 删除重复的数据 |
truncate tmp table | 清空临时表 |
gen tmp table | 生成临时表 |
TW_os_nrr[1-4] | 计算4个维度的新增留存 |
merge NRR table | 合并结果 |
delete duplicate date
删除重复数据
DELETE
FROM DM_OS_NRR2_WEEKLY t
WHERE t.statdate = to_date('${v_day}','yyyymmdd')
truncate tmp table
清空临时表
truncate table DM_OS_NRR2_WEEKLY_TMP;
gen tmp table
生成临时表
drop table if exists default.os_nrr_n_tmp;
drop table if exists default.os_nrr_tmp;
drop table if exists default.os_nrr1_tmp;
drop table if exists default.os_nrr2_tmp;
drop table if exists default.os_nrr3_tmp;
CREATE TABLE default.os_nrr_n_tmp AS
SELECT (CASE
WHEN osversion LIKE '%i%' THEN osversion
WHEN romversion LIKE '%EROM%' THEN romversion
ELSE osversion
END) AS romversion,
model,
IMEI,
(CASE
WHEN osversion LIKE '%i%' THEN '265672'
WHEN romversion LIKE '%EROM%' THEN '265672'
ELSE '265671'
END) AS system_id
FROM stat_rom_ods.fxt_imei
WHERE dayno >= ${v_week}
AND dayno <= ${v_day}
AND model <> 'R821T';
CREATE TABLE default.os_nrr_tmp AS
SELECT (CASE
WHEN osversion LIKE '%i%' THEN osversion
WHEN romversion LIKE '%EROM%' THEN romversion
ELSE osversion
END) AS romversion,
model,
IMEI,
(CASE
WHEN osversion LIKE '%i%' THEN '265672'
WHEN romversion LIKE '%EROM%' THEN '265672'
ELSE '265671'
END) AS system_id
FROM stat_rom_ods.ds_apps_launched_stat
WHERE dayno >= ${v_week}
AND dayno <= ${v_day}
AND model <> 'R821T';
联合启动数据表,获取维度信息(romversion,model)
CREATE TABLE default.os_nrr1_tmp AS
SELECT DISTINCT T1.romversion,
T1.model,
T1.imei,
T1.system_id
FROM
(SELECT (CASE WHEN osversion LIKE '%i%' THEN osversion WHEN romversion LIKE '%EROM%' THEN romversion ELSE osversion END) AS romversion,
model,
IMEI,
(CASE WHEN osversion LIKE '%i%' THEN '265672' WHEN romversion LIKE '%EROM%' THEN '265672' ELSE '265671' END) AS system_id
FROM stat_rom_ods.ds_apps_launched_stat
WHERE dayno >= ${v_last_week}
AND dayno < ${v_week}
AND model <> 'R821T') T1 LEFT semi
JOIN
(SELECT DISTINCT IMEI
FROM stat_rom_ods.fxt_imei
WHERE dayno >= ${v_last_week}
AND dayno < ${v_week}) T2 ON T1.IMEI = T2.IMEI;
CREATE TABLE default.os_nrr2_tmp AS
SELECT DISTINCT T1.romversion,
T1.model,
T1.imei,
T1.system_id
FROM
(SELECT (CASE WHEN osversion LIKE '%i%' THEN osversion WHEN romversion LIKE '%EROM%' THEN romversion ELSE osversion END) AS romversion,
model,
IMEI,
(CASE WHEN osversion LIKE '%i%' THEN '265672' WHEN romversion LIKE '%EROM%' THEN '265672' ELSE '265671' END) AS system_id
FROM stat_rom_ods.ds_apps_launched_stat
WHERE dayno >= ${v_last_2_week}
AND dayno < ${v_last_week}
AND model <> 'R821T') T1 LEFT semi
JOIN
(SELECT DISTINCT IMEI
FROM stat_rom_ods.fxt_imei
WHERE dayno >= ${v_last_2_week}
AND dayno < ${v_last_week}) T2 ON T1.IMEI = T2.IMEI;
CREATE TABLE default.os_nrr3_tmp AS
SELECT DISTINCT T1.romversion,
T1.model,
T1.imei,
T1.system_id
FROM
(SELECT (CASE WHEN osversion LIKE '%i%' THEN osversion WHEN romversion LIKE '%EROM%' THEN romversion ELSE osversion END) AS romversion,
model,
IMEI,
(CASE WHEN osversion LIKE '%i%' THEN '265672' WHEN romversion LIKE '%EROM%' THEN '265672' ELSE '265671' END) AS system_id
FROM stat_rom_ods.ds_apps_launched_stat
WHERE dayno >= ${v_last_3_week}
AND dayno < ${v_last_2_week}
AND model <> 'R821T') T1 LEFT semi
JOIN
(SELECT DISTINCT IMEI
FROM stat_rom_ods.fxt_imei
WHERE dayno >= ${v_last_3_week}
AND dayno < ${v_last_2_week}) T2 ON T1.IMEI = T2.IMEI;
1. 这五个临时表可由顺序计算改成并行计算; 2. 部分数据可以重复使用——这一周计算的前1周和前2周数据,就是下一周的前2周和前3周数据。
TW_os_nrr[1-4]
计算4个维度的新增留存率——新增imeis、前1周留存率、前2周留存率和前3周留存率
TW_os_nrr1
分两层来看,内层先是按照维度romversion
和model
计算,然后union all
,没有的指标给零,外层继续按照维度group by
,sum
做聚合。
SELECT tt.statdate AS statdate,
tt.system_id AS system_id,
tt.romversion AS romversion,
tt.model AS model,
sum(tt.count_imei) AS count_imei,
sum(tt.nrr1_imei) AS nrr1_imei,
sum(tt.nrr2_imei) AS nrr2_imei,
sum(tt.nrr3_imei) AS nrr3_imei
FROM
(
--计算当周的新增
SELECT '${v_week1}' AS statdate,
T.system_id,
T.romversion AS romversion,
T.model AS model ,
count(DISTINCT T.imei) AS count_imei,
0 AS nrr1_imei,
0 AS nrr2_imei,
0 AS nrr3_imei
FROM default.os_nrr_n_tmp T
WHERE length(T.romversion) > 0
AND length(T.model) > 0
AND length(T.model) < 33
GROUP BY T.system_id,
T.romversion,
T.model
UNION ALL
--计算前1周的新增留存比
SELECT '${v_last_week1}' AS statdate,
T.system_id,
T.romversion AS romversion,
T.model AS model ,
0 AS count_imei,
COUNT(DISTINCT T3.IMEI)/COUNT(DISTINCT T.IMEI) AS nrr1_imei,
0 AS nrr2_imei,
0 AS nrr3_imei
FROM default.os_nrr1_tmp T
LEFT OUTER JOIN default.os_nrr_tmp T3 ON T.romversion = T3.romversion
AND T.model = T3.model
AND T.IMEI = T3.IMEI
WHERE length(T.romversion) > 0
AND length(T.model) > 0
AND length(T.model) < 33
GROUP BY T.system_id,
T.romversion,
T.model
UNION ALL
--计算前2周的新增留存比
SELECT '${v_last_2_week1}' AS statdate,
T.system_id,
T.romversion AS romversion,
T.model AS model ,
0 AS count_imei,
0 AS nrr1_imei,
COUNT(DISTINCT T3.IMEI)/COUNT(DISTINCT T.IMEI) AS nrr2_imei,
0 AS nrr3_imei
FROM default.os_nrr2_tmp T
LEFT OUTER JOIN default.os_nrr_tmp T3 ON T.romversion = T3.romversion
AND T.model = T3.model
AND T.IMEI = T3.IMEI
WHERE length(T.romversion) > 0
AND length(T.model) > 0
AND length(T.model) < 33
GROUP BY T.system_id,
T.romversion,
T.model
UNION ALL
--计算前3周新增留存比
SELECT '${v_last_3_week1}' AS statdate,
T.system_id,
T.romversion AS romversion,
T.model AS model ,
0 AS count_imei,
0 AS nrr1_imei,
0 AS nrr2_imei,
COUNT(DISTINCT T3.IMEI)/COUNT(DISTINCT T.IMEI) AS nrr3_imei
FROM default.os_nrr3_tmp T
LEFT OUTER JOIN default.os_nrr_tmp T3 ON T.romversion = T3.romversion
AND T.model = T3.model
AND T.IMEI = T3.IMEI
WHERE length(T.romversion) > 0
AND length(T.model) > 0
AND length(T.model) < 33
GROUP BY T.system_id,
T.romversion,
T.model)tt
GROUP BY tt.statdate,
tt.system_id,
tt.romversion,
tt.model
TW_os_nrr2
类似的两层,区别在于维度不同——内层只有romversion这个维度
SELECT tt.statdate AS statdate,
tt.system_id AS system_id,
tt.romversion AS romversion,
'' model,
sum(tt.count_imei) AS count_imei,
sum(tt.nrr1_imei) AS nrr1_imei,
sum(tt.nrr2_imei) AS nrr2_imei,
sum(tt.nrr3_imei) AS nrr3_imei
FROM
(
--计算当周的新增
SELECT '${v_week1}' AS statdate,
T.system_id,
T.romversion AS romversion,
'' AS model ,
count(DISTINCT T.imei) AS count_imei,
0 AS nrr1_imei,
0 AS nrr2_imei,
0 AS nrr3_imei
FROM default.os_nrr_n_tmp T
WHERE length(T.romversion) > 0
GROUP BY T.system_id,
T.romversion
UNION ALL
--计算前1周新增留存比
SELECT '${v_last_week1}' AS statdate,
T.system_id,
T.romversion AS romversion,
'' AS model ,
0 AS count_imei,
COUNT(DISTINCT T3.IMEI)/COUNT(DISTINCT T.IMEI) AS nrr1_imei,
0 AS nrr2_imei,
0 AS nrr3_imei
FROM default.os_nrr1_tmp T
LEFT OUTER JOIN default.os_nrr_n_tmp T3 ON T.romversion = T3.romversion
AND T.model = T3.model
AND T.IMEI = T3.IMEI
WHERE length(T.romversion) > 0
GROUP BY T.system_id,
T.romversion
UNION ALL
--计算前2周新增留存比
SELECT '${v_last_2_week1}' AS statdate,
T.system_id,
T.romversion AS romversion,
'' AS model ,
0 AS count_imei,
0 AS nrr1_imei,
COUNT(DISTINCT T3.IMEI)/COUNT(DISTINCT T.IMEI) AS nrr2_imei,
0 AS nrr3_imei
FROM default.os_nrr2_tmp T
LEFT OUTER JOIN default.os_nrr_tmp T3 ON T.romversion = T3.romversion
AND T.model = T3.model
AND T.IMEI = T3.IMEI
WHERE length(T.romversion) > 0
GROUP BY T.system_id,
T.romversion
UNION ALL
--计算前3周新增留存比
SELECT '${v_last_3_week1}' AS statdate,
T.system_id,
T.romversion AS romversion,
'' AS model ,
0 AS count_imei,
0 AS nrr1_imei,
0 AS nrr2_imei,
COUNT(DISTINCT T3.IMEI)/COUNT(DISTINCT T.IMEI) AS nrr3_imei
FROM default.os_nrr3_tmp T
LEFT OUTER JOIN default.os_nrr_tmp T3 ON T.romversion = T3.romversion
AND T.model = T3.model
AND T.IMEI = T3.IMEI
WHERE length(T.romversion) > 0
GROUP BY T.system_id,
T.romversion)tt
GROUP BY tt.statdate,
tt.system_id,
tt.romversion
TW_os_nrr[3-4]
类似的代码不贴了
合并
TW_os_nrr[1-4]
内层数据,而不是每计算一个维度的时候都重新union all
,要知道每一个维度这样计算,初略观察大概耗时3h,4个维度算下来半天没了!
其中各个模块的作用如下表:
模块 | 作用 |
---|---|
Job_get_start_user | 封装了原来的 delete duplicate date、truncate tmp table和gen tmp table |
gen mid table | 生成中间表,合并TW_os_nrr[1-4] 内层数据 |
TW_os_nrr | 计算4个维度的新增留存 |
merge NRR table | 合并结果 |
粗看没啥变化,且往下看。
Job_get_start_user
delete duplicate date
这里将删除重复数据和清空临时表写到一个SQL模块中
DELETE FROM DM_OS_NRR2_WEEKLY t
WHERE t.statdate = to_date('${v_day}','yyyymmdd');
truncate table DM_OS_NRR2_WEEKLY_TMP;
然后5个临时表由之前的顺序计算改成并行,同时取消对前2周和前3周的计算,所要求的改动就是再临时表的表名中带有日期信息。例如,计算前一周新增数据时,表名从default.os_nrr1_tmp
变成default.os_nrr_tmp_${v_last_week}
--两周前开始的一周内新增启动数据
drop table if exists default.os_nrr_tmp_${v_last_week};
CREATE TABLE default.os_nrr_tmp_${v_last_week} AS
SELECT DISTINCT T1.romversion,
T1.model,
T1.imei,
T1.system_id
FROM
(SELECT (CASE WHEN osversion LIKE '%i%' THEN osversion WHEN romversion LIKE '%EROM%' THEN romversion ELSE osversion END) AS romversion,
model,
IMEI,
(CASE WHEN osversion LIKE '%i%' THEN '265672' WHEN romversion LIKE '%EROM%' THEN '265672' ELSE '265671' END) AS system_id
FROM stat_rom_ods.ds_apps_launched_stat
WHERE dayno >= ${v_last_week}
AND dayno < ${v_week}
AND model <> 'R821T') T1 LEFT semi
JOIN
(SELECT DISTINCT IMEI
FROM stat_rom_ods.fxt_imei
WHERE dayno >= ${v_last_week}
AND dayno < ${v_week}) T2 ON T1.IMEI = T2.IMEI;
还有一处改动,就是计算当周启动数据(1.2.1),在外层做了一次按照romversion
、model
、imei
和system_id
的group by。这是为后面合并TW_os_nrr[1-4]
内层数据做的准备,这是为了填坑,后面会说。
-- 一周启动数据
drop table if exists default.os_nrr_tmp;
CREATE TABLE default.os_nrr_tmp AS
SELECT romversion,
model,
imei,
system_id
FROM
( SELECT (CASE WHEN osversion LIKE '%i%' THEN osversion WHEN romversion LIKE '%EROM%' THEN romversion ELSE osversion END) AS romversion,
model,
IMEI,
(CASE WHEN osversion LIKE '%i%' THEN '265672' WHEN romversion LIKE '%EROM%' THEN '265672' ELSE '265671' END) AS system_id
FROM stat_rom_ods.ds_apps_launched_stat
WHERE dayno >= ${v_week}
AND dayno <= ${v_day}
AND model <> 'R821T') t
GROUP BY romversion,
model,
imei,
system_id;
gen mid table
生成中间表,合并TW_os_nrr[1-4]
内层数据。
内层通过union all
合并5个临时表数据,值得注意的是,这里引入了flag
,用来标识某条记录属于第几周以及类型。例如flag=1
表示这一周的新增数据,flag=10
表示上一周的新增数据。
中间按照最小粒度group by
,根据sum(flag)
聚合的值得到对应的日期。例如,sum(flag) in (10, 12)
中10
表示前一周的新增数据,12=10+2
表示留存数据,即前一周的新增数据在当周有过启动。
看这种情况,
sum(flag) in (1, 3)
,其中1
表示当周的新增数据,3=1+2
表示当周的新增+启动数据。这里有个问题,为什么会出现1
这种情形,难道新增数据只是新增没有启动,要知道新增数据是从启动数据中来的,理解不能。不过这种数据的占比不大,低于.2%。
外层将不符合日期要求的数据过滤到,减少数据量。
drop table if exists default.os_nrr_week_tmp;
CREATE TABLE default.os_nrr_week_tmp AS
SELECT statdate,
romversion,
model,
imei,
system_id,
flag
FROM
( SELECT CASE
WHEN sum(t.flag) IN (1, 3) THEN '${v_week1}'
WHEN sum(t.flag) IN (10, 12) THEN '${v_last_week1}'
WHEN sum(t.flag) IN (100, 102) THEN '${v_last_2_week1}'
WHEN sum(t.flag) IN (1000, 1002) THEN '${v_last_3_week1}'
ELSE '1988-05-12'
END AS statdate ,
t.romversion AS romversion,
t.model AS model,
t.imei AS imei,
t.system_id AS system_id,
sum(t.flag) AS flag
FROM
(SELECT x.*, 1 AS flag
FROM default.os_nrr_n_tmp x
WHERE length(x.romversion) > 0
AND length(x.model) > 0
AND length(x.model) < 17
UNION ALL
SELECT x.*, 2 as flag
FROM default.os_nrr_tmp x
WHERE length(x.romversion) > 0
AND length(x.model) > 0
AND length(x.model) < 17
UNION ALL
SELECT x.*, 10 AS flag
FROM default.os_nrr_tmp_${v_last_week} x
WHERE length(x.romversion) > 0
AND length(x.model) > 0
AND length(x.model) < 17
UNION ALL
SELECT x.*, 100 AS flag
FROM default.os_nrr_tmp_${v_last_2_week} x
WHERE length(x.romversion) > 0
AND length(x.model) > 0
AND length(x.model) < 17
UNION ALL
SELECT x.*, 1000 AS flag
FROM default.os_nrr_tmp_${v_last_3_week} x
WHERE length(x.romversion) > 0
AND length(x.model) > 0
AND length(x.model) < 17
) t
GROUP BY t.romversion,
t.model,
t.imei,
t.system_id) t1
WHERE t1.statdate != '1988-05-12';
在计算的时候,发现当周的新增数据变得异常低。经过一番漫长的跟踪——拿一条已知的新增数据,跟踪它在每一步是否出现,发现有很多数据,由于sum(t.flag)
后出现了各种各样的值,例如512这种,在内层的case when
中日期被判断成默认日期,导致在外层select的时候被过滤掉。
由于新增数据都是唯一一条记录,所以可疑的地方就是启动数据,也就是表default.os_nrr_tmp
。拿已知的新增数据在表中检索了一番,发现存在多条记录,问题确认,之前在计算这个表的数据的时候,没有做去重,见1.2.1。修改后的结果见2.0.2。
TW_os_nrr
计算4个维度的新增留存,这里也是整合到了一个模块中,好处就是可以并行计算。
按照维度model
和romversion
计算
SELECT statdate,
system_id,
romversion,
model,
count(DISTINCT CASE WHEN flag in (1, 3) THEN imei END) AS count_imei,
count(DISTINCT CASE WHEN flag = 12 THEN imei END) / count(DISTINCT CASE WHEN flag in (10, 12) THEN imei END) AS nrr1_imei,
count(DISTINCT CASE WHEN flag = 102 THEN imei END) / count(DISTINCT CASE WHEN flag in (100, 102) THEN imei END) AS nrr2_imei,
count(DISTINCT CASE WHEN flag = 1002 THEN imei END) / count(DISTINCT CASE WHEN flag in (1000, 1002) THEN imei END) AS nrr3_imei
FROM
default.os_nrr_week_tmp
WHERE length(model) < 17
GROUP BY statdate,
system_id,
romversion,
model
按照维度romversion
计算
SELECT statdate,
system_id,
romversion,
'' as model,
count(DISTINCT CASE WHEN flag in (1, 3) THEN imei END) AS count_imei,
count(DISTINCT CASE WHEN flag = 12 THEN imei END) / count(DISTINCT CASE WHEN flag in (10, 12) THEN imei END) AS nrr1_imei,
count(DISTINCT CASE WHEN flag = 102 THEN imei END) / count(DISTINCT CASE WHEN flag in (100, 102) THEN imei END) AS nrr2_imei,
count(DISTINCT CASE WHEN flag = 1002 THEN imei END) / count(DISTINCT CASE WHEN flag in (1000, 1002) THEN imei END) AS nrr3_imei
FROM
default.os_nrr_week_tmp
WHERE length(model) < 17
GROUP BY statdate,
system_id,
romversion
第一个坑是在计算留存率分母的时候,指定了flag=100
,心想100对应的就是前2周的新增数据。对不对?呃。。。。。。问题就出在这里,对于那些留存数据,它的flag应该是102,因为上一步合并数据的时候,sum了一次(102=100+2)。
第二个坑稍显复杂(对于我这个左脑不太好使的人来说是这样的)。考虑这样一种场景,如果一个用户在当周更新了版本,那么就会出现两条记录,在按照维度model
计算的时候,count
就重复计算了一次,所以这里在count
的时候是带有distinct
的。
merge NRR table
这是当初觉得没问题的一个模块,可是最终还是掉了进去——当在merge数据到结果表后,发现已有数据没了。呃,要“打地鼠”了。
保险起见,把结果表先是做了备份。各种测试后发现,是有由于合并的时候出现了NULL
和数值相加的情形,导致原有的记录被覆盖。解决方式就是使用NVL
做一次判断。
MERGE INTO DM_OS_NRR2_WEEKLY T
USING DM_OS_NRR2_WEEKLY_TMP T1
ON (T.STATDATE = T1.STATDATE AND
NVL(T.romversion, '@#@#@#') = NVL(T1.romversion, '@#@#@#') AND
NVL(t.model, '@#@#@#') = NVL(T1.model, '@#@#@#') AND
NVL(t.SYSTEM_ID, '@#@#@#') = NVL(T1.SYSTEM_ID, '@#@#@#'))
WHEN MATCHED THEN
UPDATE
SET T.NRR1_IMEI = NVL(T.NRR1_IMEI, 0) + nvl(T1.NRR1_IMEI,0),
T.NRR2_IMEI = NVL(T.NRR2_IMEI, 0) + nvl(T1.NRR2_IMEI,0),
T.NRR3_IMEI = NVL(T.NRR3_IMEI, 0) + nvl(T1.NRR3_IMEI,0),
T.count_imei = NVL(T.count_imei, 0) + nvl(T1.count_imei,0),
T.last_update_time = SYSDATE
WHEN NOT MATCHED THEN
INSERT
VALUES
(T1.STATDATE,
T1.System_id,
T1.romversion,
T1.model,
T1.NRR1_IMEI,
T1.NRR2_IMEI,
t1.NRR3_IMEI,
SYSDATE,
T1.count_imei);
至于为什么之前是可以的,是因为merge的数据是数值类型,见1.3.0。
2014-09-13@前海花园
评论