by shigemk2

当面は技術的なことしか書かない

Athena カラム名 mapping

元ネタが見つからなかったんだけど、Athena/Hiveのカラム名にハイフンが使えないらしいので、SERDEPROPERTIESのところでmapping.xxx_xxx = xxx-xxxってな書き方でマッピングしないといけない。↓の例だと、ses:configuration-setがオリジナルのカラム名で、mappingでses_configurationsetに書き換えている(ちなみに:もだめ)

CREATE EXTERNAL TABLE sesblog2 (
  eventType string,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:boolean,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>,
              tags:struct<ses_configurationset:string,ses_source_ip:string,ses_from_domain:string,ses_caller_identity:string>
              > 
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.ses_configurationset"="ses:configuration-set",
  "mapping.ses_source_ip"="ses:source-ip", 
  "mapping.ses_from_domain"="ses:from-domain", 
  "mapping.ses_caller_identity"="ses:caller-identity"
  )
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/' 

docs.oracle.com

aws.amazon.com

aws.amazon.com

CloudFormationだとこんな感じ(一部省略) ちなみにstructの中はスペースなしで詰めないとテーブルが正しく作られない

  GlueTable:
    Type: AWS::Glue::Table
    Properties:
      CatalogId: !Ref 'AWS::AccountId'
      DatabaseName: "データベース名"
      TableInput:
        Name: "テーブル名"
        PartitionKeys:
          - Name: カラム
            Type: string
        StorageDescriptor:
          Columns:
            - Name: tags
              Type: struct<ses_configurationset:string,ses_source_ip:string>
          Compressed: False
          Location: 's3://<YOUR BUCKET HERE>/FH2017/' 
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          OutputFormat: org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat
          SerdeInfo:
            SerializationLibrary: org.openx.data.jsonserde.JsonSerDe
        TableType: EXTERNAL_TABLE
        Parameters:
          mapping.ses_configurationset: ses:configuration-set
          mapping.ses_source_ip: ses:source-ip

docs.aws.amazon.com