2

I need to parse through and HTML file for a homework project, and therefore I can't use Jsoup.

I have tried crawling through the file, but I don't know how to save what I'm looking for.

This is what I have:

    FileInputStream fis = new FileInputStream(filename);
    InputStreamReader inStream = new InputStreamReader(fis);
    BufferedReader reader = new BufferedReader(inStream);

    String fileLine;
    while((fileLine = reader.readLine()) != null){

        String tag = fileLine.substring(fileLine.indexOf("<") + 1,fileLine.indexOf(">"))
    }

I need to find the information inside the title> tags, but I can't figure out how to get that information without getting tags I don't need or how to handle cases where there are no tags.

I want to take the information in the title tag and turn it into a string that I can use.

1
  • How is the actual html file? And how is it formatted? Do you need to read it line by line? Posting the actual html file might help. Commented Apr 8, 2019 at 18:11

1 Answer 1

2
String fileDataString = Files.readAllLines(Paths.get(fileName), Charset.forName("UTF-8")).stream().collect(Collectors.joining("\n"));

String title = StringUtils.substringBetween(fileDataString, "<title>", "</title>"));

This should work to get the text between < title > and < /title >

EDIT: Thank you BlackPearl for the Stream<String>.collect(Collectors.joining("\n")); suggestion

Sign up to request clarification or add additional context in comments.

4 Comments

This approach will work only if the opening and closing title tags are on the same line.
Changed it so it first reads the whole file. then looks for the title tags and gets the string in between
or better, stream().collect(Collectors.joining("\n"))
still not pure Java. StringUtils belong to Apache Commons

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.