0

I have been playing around with Hadoop and right now I am trying to figure out a way to read multiple files from a directory. The code below works fine when I read a file. What would be the best way to read multiple files from hdfs and read each line form every file.

try {
        Path pt = new Path("hdfs://profile/generate/work/output/errors.txt");
        FileSystem fs = FileSystem.get(job.getConfiguration());
        BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(pt)));
        String line;
        line=br.readLine();
        while (line !=null){
            //sendemail
        }

    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
1
  • You could have a great win in performance by having a thread that does the reading. Doing so you could spawn a thread for each file, hence reading multiple files at once. Commented Aug 10, 2015 at 15:06

1 Answer 1

1

Just add a FileStatus[] status = fs.listStatus(new Path(path)) and make a for on your status before read one hdfs file

FileStatus[] status = fs.listStatus(new Path("path"));

for (int i=0;i<status.length;i++){

        //afficher le contenu d'un fichier hdfs
        BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(status[i].getPath())));
        String line;
        line=br.readLine();
        while (line != null){
            System.out.println(line);
            line=br.readLine();
        }
    }   
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.