BQ SQL解决方案,用于根据方差比较行
我试图比较BigQuery中的零售商品价格数据(根据时间段和零售商的不同,大约2-3B行); 目的是确定有意义的价格差异。 例如1.99美元vs 2.00美元没有意义,但1.99美元vs 2.50美元是有意义的。 有意义的定价为价格之间的2%差异。
一个项目的示例数据集如下所示:
ITEM Price($) Meaningful (This is the column I'm trying to flag)
Apple $1.99 Y (lowest price would always be flagged)
Apple $2.00 N ($1.99 v $2.00)
Apple $2.01 N ($1.99 v $2.01) Still using $1.99 for comparison
Apple $2.50 Y ($1.99 v $2.50) Still using $1.99 for comparison
Apple $2.56 Y ($2.50 v $2.56) Now using $2.50 as new comp. price
Apple $2.62 Y ($2.55 v $2.62) Now using $2.56 as new comp. price
我希望只是使用SQL窗口函数(lead,lag,partition over等)来解决问题,将当前行的价格与下一行进行比较。 然而,当我得到一个无意义的价格时,这并不正确,因为我总是希望将下一个值与最新的有意义的价格进行比较(参见2.50美元以上的行示例,与前一行的2.00美元和2.01美元相比) )
我的问题:
以下是BigQuery标准SQL
#standardSQL
CREATE TEMPORARY FUNCTION x(prices ARRAY<FLOAT64>)
RETURNS ARRAY<STRUCT<price FLOAT64, flag STRING>>
LANGUAGE js AS """
var result = [];
var last = 0;
var flag = '';
for (i = 0; i < prices.length; i++){
if (i == 0) {
last = prices[i];
flag = 'Y'
} else {
if ((prices[i] - last)/last > 0.02) {
last = prices[i];
flag = 'Y'
} else {flag = 'N'}
}
var rec = [];
rec.price = prices[i];
rec.flag = flag;
result.push(rec);
}
return result;
""";
SELECT item, rec.*
FROM (
SELECT item, ARRAY_AGG(price ORDER BY price) AS prices
FROM `yourTable`
GROUP BY item
), UNNEST(x(prices) ) AS rec
-- ORDER BY item, price
你可以用你的问题中的下面的虚拟数据来玩/测试它
#standardSQL
CREATE TEMPORARY FUNCTION x(prices ARRAY<FLOAT64>)
RETURNS ARRAY<STRUCT<price FLOAT64, flag STRING>>
LANGUAGE js AS """
var result = [];
var last = 0;
var flag = '';
for (i = 0; i < prices.length; i++){
if (i == 0) {
last = prices[i];
flag = 'Y'
} else {
if ((prices[i] - last)/last > 0.02) {
last = prices[i];
flag = 'Y'
} else {flag = 'N'}
}
var rec = [];
rec.price = prices[i];
rec.flag = flag;
result.push(rec);
}
return result;
""";
WITH `yourTable` AS (
SELECT 'Apple' AS item, 1.99 AS price UNION ALL
SELECT 'Apple', 2.00 UNION ALL
SELECT 'Apple', 2.01 UNION ALL
SELECT 'Apple', 2.50 UNION ALL
SELECT 'Apple', 2.56 UNION ALL
SELECT 'Apple', 2.62
)
SELECT item, rec.*
FROM (
SELECT item, ARRAY_AGG(price ORDER BY price) AS prices
FROM `yourTable`
GROUP BY item
), UNNEST(x(prices) ) AS rec
ORDER BY item, price
结果如下
item price flag
---- ----- ----
Apple 1.99 Y
Apple 2.0 N
Apple 2.01 N
Apple 2.5 Y
Apple 2.56 Y
Apple 2.62 Y
链接地址: http://www.djcxy.com/p/62073.html
上一篇: BQ SQL solution solution for comparing rows based on variance