実データで覚える Treasure Client コマンドラインリファンス 〜1.Data Import〜 - doryokujin's blog
基本的に上のリンクをそのままに。
# テーブルを作る
$ td table:create test shigemk2_bulk
Table 'test.shigemk2_bulk' is created.
# セッションを作る
$ td import:create session_shigemk2 test shigemk2_bulk
Bulk import session 'session_shigemk2' is created.
# 1行目をヘッダーとして準備用データを用意する これを利用して何度もimportできるようにする
$ td import:prepare 101-2014-02.csv --format csv --column-header --time-column 'time' -o ./parts/
Preparing sources
Output dir : ./parts/
Source : 101-2014-02.csv (13842646 bytes)
Converting '101-2014-02.csv'...
sample row: {"time":0,"device":"1366x768","browser":"Mozilla\/5.0 (Windows NT 6.3; WOW64) AppleWebKit\/537.36 (KHTML, like Gecko) Chrome\/32.0.1700.102 Safari\/537.36","unknown":24,"language":"ja,en-US;q=0.8,en;q=0.6","referer":"http:\/\/zenback.itmedia.co.jp\/contents","ip":"xxx.xx.xxx.xxx"}
Prepare status:
Source : 101-2014-02.csv
Status : SUCCESS
Read lines : 37881
Valid rows : 37880
Invalid rows : 0
Converted Files : ./parts/101-2014-02_csv_0.msgpack.gz (2084235 bytes)
Next steps:
=> execute following 'td import:upload' command. if the bulk import session is not created yet, please create it with 'td import:create <session> <database> <table>' command.
$ td import:upload <session> './parts/101-2014-02_csv_0.msgpack.gz'
# データをアップロードする。この段階ではデータをあげているだけ。
$ td import:upload session_shigemk2 './parts/101-2014-02_csv_0.msgpack.gz'
Uploading prepared sources
Session name : session_shigemk2
Source : ./parts/101-2014-02_csv_0.msgpack.gz (2084235 bytes)
Uploading ./parts/101-2014-02_csv_0.msgpack.gz (2084235 bytes)...
Upload status:
Source : ./parts/101-2014-02_csv_0.msgpack.gz
Status : SUCCESS
Part name : 101-2014-02_csv_0_msgpack_gz
Size : 2084235
Retry count : 0
Next Steps:
=> execute 'td import:perform session_shigemk2'.
# データの保存。結構時間かかった
$ td import:perform session_shigemk2
Job 9279134 is queued.
Use 'td job:show [-w] 9279134' to show the status.
$ td job:show -w 9279134 JobID : 9279134
Status : running
Type : bulk_import_perform
Database : test
queued...
started at 2014-04-03T22:41:06Z
14/04/03 22:41:11 INFO log.MLog: MLog clients using log4j logging.
14/04/03 22:41:11 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
14/04/03 22:41:12 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/04/03 22:41:17 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
finished at 2014-04-03T23:32:15Z
Use '-v' option to show detailed messages.
# データのコミット
$ td import:commit session_shigemk2
Bulk import session 'session_shigemk2' started to commit.