java- How to get filepath in MySQL and get the subsequent file from directory?

Question

I have a method in Java requires to scan through a table in MySQL that looks for filepath.

Here is a sample table filequeue:

 UniqueID   FilePath                 Status     
 1          C:\Folder1\abc.pdf       Active
 2          C:\Folder1\def.pdf       Active
 3          C:\Folder1\efg.pdf       Error

I would like to scan through the table and look for files with Status= Active. Then I will grab the filepath and locate the actual file from the location and start doing some processing to these files(extracting text).

I am new to Java and so far I am doing this way as shown below:

public void doScan_DB() throws Exception{

        Properties props=new Properties();


        InputStream in = getClass().getResourceAsStream("/db.properties");

        props.load(in);
        in.close();



        String driver = props.getProperty("jdbc.driver");
        if(driver!=null){
            Class.forName(driver);

        }

        String url=props.getProperty("jdbc.url");
        String username=props.getProperty("jdbc.username");
        String password=props.getProperty("jdbc.password");

        Connection con = DriverManager.getConnection(url,username,password);
         Statement statement = con.createStatement();
         ResultSet rs=statement.executeQuery("select * from filequeue where Status='Active'");

    while(rs.next()){

      // grab those files and call index()

    }

    }




}

From here, how do I proceed to capture the file and then call an index function to do some extraction of text to the files?

Also, do let me know if my way of doing it is wrong.

EDIT: Include my other function to extracts PDF texts:

 public void doScan() throws Exception{


        File folder = new File("D:\\PDF1");
        File[] listOfFiles = folder.listFiles();

        for (File file : listOfFiles) {
            if (file.isFile()) {
                //  HashSet<String> uniqueWords = new HashSet<>();
                ArrayList<String> list
                        = new ArrayList<String>();
                String path = "D:\\PDF1\\" + file.getName();
                try (PDDocument document = PDDocument.load(new File(path))) {

                    if (!document.isEncrypted()) {

                        PDFTextStripper tStripper = new PDFTextStripper();
                        String pdfFileInText = tStripper.getText(document);
                        String lines[] = pdfFileInText.split("\\r?\\n");
                        for (String line : lines) {
                            String[] words = line.split(" ");
                            // words.replaceAll("([\\W]+$)|(^[\\W]+)", ""));


                            for (String word : words) {
                                // check if one or more special characters at end of string then remove OR
                                // check special characters in beginning of the string then remove
                                // uniqueWords.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
                                list.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
                                // uniqueWords.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
                            }

                        }


                    }
                } catch (IOException e) {
                    System.err.println("Exception while trying to read pdf document - " + e);
                }

                String[] words1 =list.toArray(new String[list.size()]);
                // String[] words2 =uniqueWords.toArray(new String[uniqueWords.size()]);

                // MysqlAccessIndex connection = new MysqlAccessIndex();



                index(words1,path);




                System.out.println("Completed");

            }
        }

Ayush · Accepted Answer · 2018-11-23 08:15:12Z

1

You can get the path and file by

    while(rs.next()){

        String path= rs.getString(2);
    // Create a PdfDocument instance
    PdfDocument doc = new PdfDocument();
    try {
      // Load an existing document
      doc.load(path);
      // Get page count and display it on console output
      System.out.println(
        "Number of pages in sample_doc1.pdf is " +
        doc.getPageCount());
      // Close document
      doc.close();      
    } catch (IOException | PdfException e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
    }
}

You will be needing additional JARS which will give you predefined methods for PDF.

Visit this link for more information

https://www.gnostice.com/nl_article.asp?id=101&t=How_to_Read_and_Write_PDF_Files_in_Java

answered Nov 23, 2018 at 8:15

Ayush

2848 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Daredevil Over a year ago

But how does this get the file from that directory and do something too it?

Ayush Over a year ago

There are many jars available, please visit the link once and give it a read :)

Ayush Over a year ago

PdfDocument doc = new PdfDocument(); creates an instance . doc.load(path) loads the file into that instance which will make "doc" ready for further actions. You can work on variable "doc" and call your methods on it.

Daredevil Over a year ago

So it loads and locates the pdf file from directory, is that correct? Then I can just call whatever method or function to extract text from that pdf right?

Daredevil Over a year ago

Actually I already wrote the PDF doc using PDFBox, although I don't think I want to change to PdfDocument. You can take a look at my edited code

|

Collectives™ on Stack Overflow

java- How to get filepath in MySQL and get the subsequent file from directory?

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related