by shigemk2

当面は技術的なことしか書かない

Digdag EMR

emr>: Amazon Elastic Map Reduce — Digdag 0.9.5 documentation

typeのところから一部抜粋。spark/hive/script/commandが使えると思う

- type: hive
  script: queries/hive-query.q
  vars:
    INPUT: s3://my-bucket/data/
    OUTPUT: s3://my-bucket/output/
  hiveconf:
    hive.support.sql11.reserved.keywords: false

- type: spark
  application: spark/pi.scala

- type: spark
  application: s3://my-bucket/spark/hello.py
  args: [foo, bar]

- type: spark
  application: spark/hello.jar
  class: com.example.Hello
  jars:
    - libhello.jar
    - s3://td-spark/td-spark-assembly-0.1.jar
  conf:
    spark.locality.wait: 5s
    spark.memory.fraction: 0.5
  args: [foo, bar]

- type: spark-sql
  query: spark/query.sql
  result: s3://my-bucket/results/${session_uuid}/

- type: script
  script: s3://my-bucket/scripts/hello.sh
  args: [hello, world]

- type: script
  script: scripts/hello.sh
  args: [world]

- type: command
  command: echo
  args: [hello, world]