0

I have devices that publish an html page when you connect via their ip address. For example, if I were to go to "192.168.1.104" on my computer, i would see the html page the device publishes. I am trying to scrape this html, but I am getting some errors, specifically a MalformedURLException at the first line of my method. I have posted my method below. I found some code for getting html and tweaked it for my needs. Thanks

public String getSbuHtml(String ipToPoll) throws IOException, SocketTimeoutException {
    URL url = new URL("http", ipToPoll, -1, "/");
    URLConnection con = url.openConnection();
    con.setConnectTimeout(1000);
    con.setReadTimeout(1000);
    Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
    Matcher m = p.matcher(con.getContentType());
    String charset = m.matches() ? m.group(1) : "ISO-8859-1";
    BufferedReader r = new BufferedReader(
            new InputStreamReader(con.getInputStream(), charset));
    String line = null;
    StringBuilder buf = new StringBuilder();
    while ((line = r.readLine()) != null) {
        buf.append(line).append(System.getProperty("line.separator"));
    }
    return buf.toString();
}

EDIT: The above code has been changed to reflect constructing a new URL to work properly with an ip. However, when I try and get the contentType from the connection, it is null.

1

3 Answers 3

2

A URL (Uniform Resource Locator) must have a resource to locate (index.html) along with the means of network communication (http://). So an example of valid URL can be

http://192.168.1.104:8080/app/index.html 

Merely 192.168.1.104 doesn't represent a URL

Sign up to request clarification or add additional context in comments.

Comments

1

You need to add http:// to the front of your String that you pass into the method.

Comments

0

Create your URL as follows:

URL url = new URL("http", ipToPoll, -1, "/");

And since you're reading a potentially long HTML page I suppose buffering would help here:

BufferedReader r = new BufferedReader(
                   new InputStreamReader(con.getInputStream(), charset));
String line = null;
StringBuilder buf = new StringBuilder();
while ((line = r.readLine()) !- null) {
    buf.append(line).append(System.getProperty("line.separator"));
}
return buf.toString();


EDIT: In response to your contentType coming null problem.

Before you inspect any headers like with getContentType() or retrieve content with getInputStream() you need to actually establish a connection with the URL resource by calling

URL url = new URL("http", ipToPoll, "/"); // -1 removed; assuming port = 80 always
// check your device html page address; change "/" to "/index.html" if required

URLConnection con = url.openConnection();

// set connection properties
con.setConnectTimeout(1000);
con.setReadTimeout(1000);

// establish connection
con.connect();

// get "content-type" header
Pattern p = Pattern.compile("text/html;\\s+charset=([^\\s]+)\\s*");
Matcher m = p.matcher(con.getContentType());

When you call openConnection() first (it wrongly suggests but) it doesn't establish any connection. It just gives you an instance of URLConnection to let you specify connection properties like connection timeout with setConnecTimeout().

If you're finding this hard to understand it may help to know that it's analogous to doing a new File() which simply represents a File but doesn't create one (assuming it doesn't exist already) unless you go ahead and call File.createNewFile() (or pass it to a FileReader).

1 Comment

I am trying to implement this, but when i try to get the contentType from the URLconnection, it is null. I have edited my post above to reflect these changes.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.