3

I am trying to use Poco to grab news from the front page of reddit. I'm looking at this pdf ( http://pocoproject.org/slides/200-Network.pdf ) for the answer, but it's a bit over my head at this point and I'm not sure how to accomplish my goal. As I said, I'm trying to simply grab the news articles (specifically, the article titles) from www.reddit.com.

The code I have so far grabs ALL off the html from reddit's front page and cout's it to the screen:

#include <iostream>        
#include "Poco/Net/SocketAddress.h"
#include "Poco/Net/StreamSocket.h"
#include "Poco/Net/SocketStream.h"
#include "Poco/StreamCopier.h"

using namespace std;
using namespace Poco::Net;
using namespace Poco;

int main(int argc, char *argv[])
{   
    SocketAddress sa("www.reddit.com", 80);
    StreamSocket socket(sa);
    SocketStream str(socket);
    str << "GET / HTTP/1.1\r\n"
     "Host: www.reddit.com\r\n"
     "\r\n";
    str.flush();

    StreamCopier::copyStream(str, cout);

    system("PAUSE");
}

Looking at the above mentioned pdf, it looks like my answer may be in there somewhere, but I am still learning about computer networks and internet protocol, so most of it is above my head at this point.

Main Question: Can someone help me figure out how to get the article titles from www.reddit.com into a string or array of strings?

6
  • OK, reading/getting the HTML content is the one thing, parsing it is much more harder. POCO provides a HTML document model AFAIR, I'm just afraid it's one of the commercial parts of the library :( ... Commented Feb 22, 2014 at 23:46
  • Can you use something else then Poco then? Commented Feb 23, 2014 at 0:12
  • I think it would really depend on what you need this for. Why don't you just use some basic string parsing? The string <a class="title seems like a starting point for determining title text. Something less brittle might be using HTML Tidy + some SAX/DOM parser. Commented Feb 23, 2014 at 0:15
  • Well that is part of the problem I am having. Since my only chance to see the output is in the line StreamCopier::copyStream(str, cout); when it outputs the data to the console, I need to find a way to get that data into a string (which I don't know how to do, but believe me I am trying). I'm honestly not worried about the string parsing, I've done that a bunch of times. It's just that I need to get the data into a string. I just figured that I might have to do that through the POCO libraries, hence my question. Commented Feb 23, 2014 at 0:23
  • Googling leads me to SteamCopier.copyToString appinf.com/docs/poco/Poco.StreamCopier.html#14286 Have you tried that? Commented Feb 23, 2014 at 0:32

1 Answer 1

1

Why not grab http://www.reddit.com/.rss, which is much simpler than html? For example to get news titles using qt framework:

class Foo : public QObject { Q_OBJECT
public:
  Foo();
private slots:
  void got_it(QNetworkReply* reply);
private:
  QNetworkAccessManager* news_grabber;
};

Foo::Foo() {
  news_grabber = new QNetworkAccessManager(this);
  QObject::connect(news_grabber, SIGNAL(finished(QNetworkReply*)),
           this, SLOT(got_it(QNetworkReply*)));
  news_grabber->get(QNetworkRequest(QUrl("http://www.reddit.com/.rss")));
}

void Foo::got_it(QNetworkReply* reply) {
  QDomDocument document;
  std::vector<QString> items_storage;
  document.setContent(static_cast<QIODevice*>(reply));
  QDomNodeList items = document.elementsByTagName("item");
    for (int i = 0; i < items.length(); i++)
      items_storage.push_back(items.at(i).firstChildElement("title").text());
  }
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.