元ネタが見つからなかったんだけど、Athena/Hiveのカラム名にハイフンが使えないらしいので、SERDEPROPERTIESのところでmapping.xxx_xxx = xxx-xxx
ってな書き方でマッピングしないといけない。↓の例だと、ses:configuration-set
がオリジナルのカラム名で、mappingでses_configurationset
に書き換えている(ちなみに:
もだめ)
CREATE EXTERNAL TABLE sesblog2 ( eventType string, mail struct<`timestamp`:string, source:string, sourceArn:string, sendingAccountId:string, messageId:string, destination:string, headersTruncated:boolean, headers:array<struct<name:string,value:string>>, commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>, tags:struct<ses_configurationset:string,ses_source_ip:string,ses_from_domain:string,ses_caller_identity:string> > ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' WITH SERDEPROPERTIES ( "mapping.ses_configurationset"="ses:configuration-set", "mapping.ses_source_ip"="ses:source-ip", "mapping.ses_from_domain"="ses:from-domain", "mapping.ses_caller_identity"="ses:caller-identity" ) LOCATION 's3://<YOUR BUCKET HERE>/FH2017/'
CloudFormationだとこんな感じ(一部省略) ちなみにstructの中はスペースなしで詰めないとテーブルが正しく作られない
GlueTable: Type: AWS::Glue::Table Properties: CatalogId: !Ref 'AWS::AccountId' DatabaseName: "データベース名" TableInput: Name: "テーブル名" PartitionKeys: - Name: カラム Type: string StorageDescriptor: Columns: - Name: tags Type: struct<ses_configurationset:string,ses_source_ip:string> Compressed: False Location: 's3://<YOUR BUCKET HERE>/FH2017/' InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat SerdeInfo: SerializationLibrary: org.openx.data.jsonserde.JsonSerDe TableType: EXTERNAL_TABLE Parameters: mapping.ses_configurationset: ses:configuration-set mapping.ses_source_ip: ses:source-ip