I have a method in Java requires to scan through a table in MySQL that looks for filepath.
Here is a sample table filequeue:
UniqueID FilePath Status
1 C:\Folder1\abc.pdf Active
2 C:\Folder1\def.pdf Active
3 C:\Folder1\efg.pdf Error
I would like to scan through the table and look for files with Status= Active. Then I will grab the filepath and locate the actual file from the location and start doing some processing to these files(extracting text).
I am new to Java and so far I am doing this way as shown below:
public void doScan_DB() throws Exception{
Properties props=new Properties();
InputStream in = getClass().getResourceAsStream("/db.properties");
props.load(in);
in.close();
String driver = props.getProperty("jdbc.driver");
if(driver!=null){
Class.forName(driver);
}
String url=props.getProperty("jdbc.url");
String username=props.getProperty("jdbc.username");
String password=props.getProperty("jdbc.password");
Connection con = DriverManager.getConnection(url,username,password);
Statement statement = con.createStatement();
ResultSet rs=statement.executeQuery("select * from filequeue where Status='Active'");
while(rs.next()){
// grab those files and call index()
}
}
}
From here, how do I proceed to capture the file and then call an index function to do some extraction of text to the files?
Also, do let me know if my way of doing it is wrong.
EDIT: Include my other function to extracts PDF texts:
public void doScan() throws Exception{
File folder = new File("D:\\PDF1");
File[] listOfFiles = folder.listFiles();
for (File file : listOfFiles) {
if (file.isFile()) {
// HashSet<String> uniqueWords = new HashSet<>();
ArrayList<String> list
= new ArrayList<String>();
String path = "D:\\PDF1\\" + file.getName();
try (PDDocument document = PDDocument.load(new File(path))) {
if (!document.isEncrypted()) {
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
String lines[] = pdfFileInText.split("\\r?\\n");
for (String line : lines) {
String[] words = line.split(" ");
// words.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
for (String word : words) {
// check if one or more special characters at end of string then remove OR
// check special characters in beginning of the string then remove
// uniqueWords.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
list.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
// uniqueWords.add(word.replaceAll("([\\W]+$)|(^[\\W]+)", ""));
}
}
}
} catch (IOException e) {
System.err.println("Exception while trying to read pdf document - " + e);
}
String[] words1 =list.toArray(new String[list.size()]);
// String[] words2 =uniqueWords.toArray(new String[uniqueWords.size()]);
// MysqlAccessIndex connection = new MysqlAccessIndex();
index(words1,path);
System.out.println("Completed");
}
}