Prestoのdate_diffでunitをmonthにしたときの挙動がいまいちよく分からなかったので確認することにした バージョンはPresto 0.223を利用している。
Prestoのdate_diffでunitをmonthにしたとき、date_diffの値はどのような計算になるのだろうか。
- 30日区切りで1ヶ月
- 31日区切りで1ヶ月
- 開始日が1月であれば31日、2月であれば28日というような区切り
- 終了日が1月であれば31日、2月であれば28日というような区切り
結論から言うと3でなのだけれど、理由はソースコードを見てもよく分からなかった。jodaのgetDifferenceAsLongを使っているのはわかっているんだが… このようなクエリを書いて確かめてみる。
SELECT date_diff('day',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-01-31 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-01-31 00:00:00.000' AS TIMESTAMP)) AS diff_months, date_diff('day',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-01 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-01 00:00:00.000' AS TIMESTAMP)) AS diff_months, date_diff('day',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-28 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-28 00:00:00.000' AS TIMESTAMP)) AS diff_months, date_diff('day',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-03-01 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-03-01 00:00:00.000' AS TIMESTAMP)) AS diff_months, date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months, date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months
ローカルで実行してみる。開始日が1月だったらdate_diffが31日以上で1ヶ月の差となるし、開始日が2月だったらdate_diffが28日以上で1ヶ月の差となっている。 つまり、開始日の月の日数を基準にdiffを取っていることがなんとなく分かる。
presto> SELECT date_diff('day',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-01-31 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-01-31 00:00:00.000' AS TIMESTAMP)) AS diff_months, -> date_diff('day',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-01 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-01 00:00:00.000' AS TIMESTAMP)) AS diff_months, -> date_diff('day',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-28 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-28 00:00:00.000' AS TIMESTAMP)) AS diff_months, -> date_diff('day',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-03-01 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-03-01 00:00:00.000' AS TIMESTAMP)) AS diff_months, -> date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months, -> date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months -> ; diff_days | diff_months | diff_days | diff_months | diff_days | diff_months | diff_days | diff_months | diff_days | diff_months | diff_days | diff_months -----------+-------------+-----------+-------------+-----------+-------------+-----------+-------------+-----------+-------------+-----------+------------- 30 | 0 | 31 | 1 | 27 | 0 | 28 | 1 | 27 | 0 | 28 | 1 (1 row)
うるう年とで比較してみたけど、うるう年を考慮した作りになっているな。
SELECT date_diff('day',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months, date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months, date_diff('day',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months, date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days, date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months
presto> SELECT date_diff('day',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months, -> date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months, -> date_diff('day',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months, -> date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days, -> date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months -> ; diff_days | diff_months | diff_days | diff_months | diff_days | diff_months | diff_days | diff_months -----------+-------------+-----------+-------------+-----------+-------------+-----------+------------- 28 | 0 | 27 | 0 | 29 | 1 | 28 | 1 (1 row)