Passing script files on hdfs to impala-shell

Question

I have an oozie job that has one shell action. First, the shell action programmatically finds certain sql script files stored on hdfs. Then tries to run each of those sql scripts on impala.

Since the list of sql scripts I want to run is not know in advance, and thus cannot be passed to the oozie action as <file> parameters, is there a way to run impala-shell and give it an hdfs path instead of a linux path?

mazaneicha · Accepted Answer · 2019-08-06 18:50:40Z

1

Impala shell can accept query text from STDIN. As described here, option -f

-f query_file or --query_file=query_file

query_file=path_to_query_file

Passes a SQL query from a file. Multiple statements must be semicolon (;) delimited. In Impala 2.3 and higher, you can specify a filename of - to represent standard input. This feature makes it convenient to use impala-shell as part of a Unix pipeline where SQL statements are generated dynamically by other tools.

So in your case, your shell script can simply do something like

$ hdfs dfs -cat <hdfs_file_name> | impala-shell -i <impala_daemon> -f -

answered Aug 6, 2019 at 18:50

mazaneicha

9,5604 gold badges38 silver badges57 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

egor7 · Accepted Answer · 2019-10-01 09:56:56Z

If you have the fixed number of queries, or you can collect (cat) them into one file, then you can pass the name of this file(s) as a parameter out of the <action> using <capture-output/> tag:

$ hdfs hdfs -cat /user/impala/sql/custom_script_name.sql

CREATE TABLE default.t1(n INT);
INSERT INTO default.t1 VALUES(1);

$ hdfs hdfs -cat /oozie/shell/prepare-implala-sql.sh

#!/bin/bash
echo HDFS_IMPALA_SCRIPT:/user/impala/sql/custom_script_name.sql

$ hdfs hdfs -cat /user/oozie/workflow/wf_impala_env/wf_impala_env.xml

<workflow-app name="wf_impala_env" xmlns="uri:oozie:workflow:0.5">
  <start to="a1"/>
  <kill name="a0">
    <message>Error: [${wf:errorMessage(wf:lastErrorNode())}]</message>
  </kill>
  <action name="a1">
    <shell xmlns="uri:oozie:shell-action:0.2">
      <job-tracker>${resourceManager}</job-tracker>
      <name-node>${nameNode}</name-node>
      <exec>bash</exec>
      <argument>prepare-implala-sql.sh</argument>
      <file>/oozie/shell/prepare-implala-sql.sh#prepare-implala-sql.sh</file>
      <capture-output/>
    </shell>
    <ok to="a2"/>
    <error to="a0"/>
  </action>
  ...

And then use it in Impala step as a <file> parameter:

  ...
  <action name="a2">
    <shell xmlns="uri:oozie:shell-action:0.2">
      <job-tracker>${resourceManager}</job-tracker>
      <name-node>${nameNode}</name-node>
      <exec>impala-shell</exec>
      <argument>-i</argument>
      <argument>${impalad}</argument>
      <argument>-f</argument>
      <argument>query.sql</argument>
      <env-var>PYTHON_EGG_CACHE=./myeggs</env-var>
      <file>${wf:actionData("a1")["HDFS_IMPALA_SCRIPT"]}#query.sql</file>
      <capture-output/>
    </shell>
    <ok to="a99"/>
    <error to="a0"/>
  </action>

  <end name="a99"/>
</workflow-app>

Just don't forget about PYTHON_EGG_CACHE for impala-shell (or bash -> impala-shell).

Collectives™ on Stack Overflow

Passing script files on hdfs to impala-shell

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related