0

How do i copy a file that is required for a hadoop program, to all compute nodes? I am aware that -file option for hadoop streaming does that. How do i do this for java+hadoop?

1 Answer 1

1

Exactly the same way.

Assuming you use the ToolRunner / Configured / Tool pattern, the files you specify after the -files option will be in the local dir when your mapper / reducer / combiner tasks run:

public class Driver extends Configured implements Tool {
    public static void main(String args[]) {
        ToolRunner.run(new Driver(), args);
    }

    public int run(String args[]) {
        Job job = new Job(getConf());
        // ...
        job.waitForCompletion(true);
    }
}

public class MyMapper extends Mapper<K1, V1, K2, V2> {
    public void setup(Context context) {
        File myFile = new File("file.csv");
        // do something with file
    }


    // ...
}

You can then execute with:

#> hadoop jar myJar.jar Driver -files file.csv ......

See the Javadoc for GenericOptionsParser for more info

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.