Suppose I have a shell script stored in a GCS bucket. Is it possible to execute it using Apache Beam? If yes, then how? I haven't currently tried anything for it yet as I couldn't find anything of this sort in the documentations for Apache Beam or Dataflow. So just wanted to know what approach I must take for it. Thanks.
1 Answer
It's unusual, but not unheard of to want to execute a whole shell script from something like a DoFn. Is this what you want to do? Do you want to run it once for each element in a PCollection?
If so, you'd want to use the GCS API, or the FileSystems API to obtain the whole contents of the shell script into a String or byte array, and then pass it as a side input into your ParDo.
Then you can execute it using a tool like subprocess in Python, or ProcessBuilder in Java.
Let me know if you need something more specific, and we can iterate a solution.
2 Comments
eilalan
Hi Pablo, I know it is an old question but I would like to run a similar scenario. I have two shell commands from a library that was uploaded to the worker with the setup.py script Running. The first command is a configuration command and the second is an execution that is dependant on the config command. I tried separating them with ';' or '&&' at the subprocess with no success. i tried also os.system with both commands and it didn't work out. I can not separate them because the second subporcess will not have the first subporcess configuration setting. please let me know your thoughts. thx
Pablo
You may be able to do something like this:
subprocess.call(['sh', '-c', 'echo "hi" && echo "why"']) - except you replace your two commands. Is that possible?