3

I have an oozie job that has one shell action. First, the shell action programmatically finds certain sql script files stored on hdfs. Then tries to run each of those sql scripts on impala.

Since the list of sql scripts I want to run is not know in advance, and thus cannot be passed to the oozie action as <file> parameters, is there a way to run impala-shell and give it an hdfs path instead of a linux path?

2 Answers 2

1

Impala shell can accept query text from STDIN. As described here, option -f

-f query_file or --query_file=query_file

query_file=path_to_query_file

Passes a SQL query from a file. Multiple statements must be semicolon (;) delimited. In Impala 2.3 and higher, you can specify a filename of - to represent standard input. This feature makes it convenient to use impala-shell as part of a Unix pipeline where SQL statements are generated dynamically by other tools.

So in your case, your shell script can simply do something like

$ hdfs dfs -cat <hdfs_file_name> | impala-shell -i <impala_daemon> -f -
Sign up to request clarification or add additional context in comments.

Comments

0

If you have the fixed number of queries, or you can collect (cat) them into one file, then you can pass the name of this file(s) as a parameter out of the <action> using <capture-output/> tag:

$ hdfs hdfs -cat /user/impala/sql/custom_script_name.sql

CREATE TABLE default.t1(n INT);
INSERT INTO default.t1 VALUES(1);

$ hdfs hdfs -cat /oozie/shell/prepare-implala-sql.sh

#!/bin/bash
echo HDFS_IMPALA_SCRIPT:/user/impala/sql/custom_script_name.sql

$ hdfs hdfs -cat /user/oozie/workflow/wf_impala_env/wf_impala_env.xml

<workflow-app name="wf_impala_env" xmlns="uri:oozie:workflow:0.5">
  <start to="a1"/>
  <kill name="a0">
    <message>Error: [${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="a1">
    <shell xmlns="uri:oozie:shell-action:0.2">
      <job-tracker>${resourceManager}</job-tracker>
      <name-node>${nameNode}</name-node>
      <exec>bash</exec>
      <argument>prepare-implala-sql.sh</argument>
      <file>/oozie/shell/prepare-implala-sql.sh#prepare-implala-sql.sh</file>
      <capture-output/>
    </shell>
    <ok to="a2"/>
    <error to="a0"/>
  </action>
  ...

And then use it in Impala step as a <file> parameter:

  ...
  <action name="a2">
    <shell xmlns="uri:oozie:shell-action:0.2">
      <job-tracker>${resourceManager}</job-tracker>
      <name-node>${nameNode}</name-node>
      <exec>impala-shell</exec>
      <argument>-i</argument>
      <argument>${impalad}</argument>
      <argument>-f</argument>
      <argument>query.sql</argument>
      <env-var>PYTHON_EGG_CACHE=./myeggs</env-var>
      <file>${wf:actionData("a1")["HDFS_IMPALA_SCRIPT"]}#query.sql</file>
      <capture-output/>
    </shell>
    <ok to="a99"/>
    <error to="a0"/>
  </action>

  <end name="a99"/>
</workflow-app>

Just don't forget about PYTHON_EGG_CACHE for impala-shell (or bash -> impala-shell).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.