0

Trying to figure out a way to backfill partitions of a ds partitioned Hive table.

I know how to run a Hive command from CLI, e.g.

$HIVE_HOME/bin/hive -e 'select a.col from tab1 a'

What I would like to do is provide a .txt file of different DS and have a new job run for each of those DS's, e.g.

    $HIVE_HOME/bin/hive -e 'INSERT OVERWRITE PARTITION ds = $DS_VARIABLE_HERE 
                            select a.col from tab1 a where ds = $DS_VARIABLE_HERE'

But I'm not so sure how to do this

I'm thinking of trying out

cat date_file.txt | hive -e 'query here' 

But I'm not sure how to place the variable from the date_file file into the Hive query string.

1 Answer 1

1

My suggestion is to use shell command to iterate through the values:

Option 1:

If you have fixed set of values you want to iterate through then

DS_VARIABLE_HERE=('val1' 'val2' 'val3')

for ((i=0;i<${#DS_VARIABLE_HERE[@]};i++))
do
$HIVE_HOME/bin/hive -e "INSERT OVERWRITE PARTITION ds = ${DS_VARIABLE_HERE[$i]} select a.col from tab1 a where ds = ${DS_VARIABLE_HERE[$i]}"
done

Option 2:

if you want to iterate through lets say 1 to 10

for ((i=1;i<=10;i++))
do
$HIVE_HOME/bin/hive -e "INSERT OVERWRITE PARTITION ds = ${i} select a.col from tab1 a where ds = ${i}"
done
Sign up to request clarification or add additional context in comments.

2 Comments

Since u want the values in .txt file. just use option 1. put the following part of code in .txt file : DS_VARIABLE_HERE=('val1' 'val2' 'val3') and in the shell script use "source file.txt"
The for ((i = ...) option, it would be difficult if the numbers aren't 0 padded

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.