by shigemk2

当面は技術的なことしか書かない

Prestoのdate_diffでunitをmonthにしたときの挙動

Prestoのdate_diffでunitをmonthにしたときの挙動がいまいちよく分からなかったので確認することにした バージョンはPresto 0.223を利用している。

Prestoのdate_diffでunitをmonthにしたとき、date_diffの値はどのような計算になるのだろうか。

  1. 30日区切りで1ヶ月
  2. 31日区切りで1ヶ月
  3. 開始日が1月であれば31日、2月であれば28日というような区切り
  4. 終了日が1月であれば31日、2月であれば28日というような区切り

結論から言うと3でなのだけれど、理由はソースコードを見てもよく分からなかった。jodaのgetDifferenceAsLongを使っているのはわかっているんだが… このようなクエリを書いて確かめてみる。

SELECT date_diff('day',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-01-31 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-01-31 00:00:00.000' AS TIMESTAMP)) AS diff_months,
       date_diff('day',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-01 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-01 00:00:00.000' AS TIMESTAMP)) AS diff_months,
       date_diff('day',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-28 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-28 00:00:00.000' AS TIMESTAMP)) AS diff_months,
       date_diff('day',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-03-01 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-03-01 00:00:00.000' AS TIMESTAMP)) AS diff_months,
       date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months,
       date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months

ローカルで実行してみる。開始日が1月だったらdate_diffが31日以上で1ヶ月の差となるし、開始日が2月だったらdate_diffが28日以上で1ヶ月の差となっている。 つまり、開始日の月の日数を基準にdiffを取っていることがなんとなく分かる。

presto> SELECT date_diff('day',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-01-31 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-01-31 00:00:00.000' AS TIMESTAMP)) AS diff_months,
     ->        date_diff('day',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-01 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2001-01-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-01 00:00:00.000' AS TIMESTAMP)) AS diff_months,
     ->        date_diff('day',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-28 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-02-28 00:00:00.000' AS TIMESTAMP)) AS diff_months,
     ->        date_diff('day',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-03-01 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2001-02-01 00:00:00.000' AS TIMESTAMP),CAST('2001-03-01 00:00:00.000' AS TIMESTAMP)) AS diff_months,
     ->        date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months,
     ->        date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months
     -> ;
 diff_days | diff_months | diff_days | diff_months | diff_days | diff_months | diff_days | diff_months | diff_days | diff_months | diff_days | diff_months
-----------+-------------+-----------+-------------+-----------+-------------+-----------+-------------+-----------+-------------+-----------+-------------
        30 |           0 |        31 |           1 |        27 |           0 |        28 |           1 |        27 |           0 |        28 |           1
(1 row)

うるう年とで比較してみたけど、うるう年を考慮した作りになっているな。

SELECT date_diff('day',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months,
       date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months,
       date_diff('day',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months,
       date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days,
       date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months
presto> SELECT date_diff('day',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months,
     ->        date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-14 00:00:00.000' AS TIMESTAMP)) AS diff_months,
     ->        date_diff('day',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2000-02-15 00:00:00.000' AS TIMESTAMP),CAST('2000-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months,
     ->        date_diff('day',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_days,
     ->        date_diff('month',CAST('2001-02-15 00:00:00.000' AS TIMESTAMP),CAST('2001-03-15 00:00:00.000' AS TIMESTAMP)) AS diff_months
     -> ;
 diff_days | diff_months | diff_days | diff_months | diff_days | diff_months | diff_days | diff_months
-----------+-------------+-----------+-------------+-----------+-------------+-----------+-------------
        28 |           0 |        27 |           0 |        29 |           1 |        28 |           1
(1 row)