0

I'm using Py-StackExchange to get a list of questions from CrossValidated. I need to filter by the titles of pages that include the word "keras".

This is my code. Its execution takes a very long time and finally returns nothing.

cv = stackexchange.Site(stackexchange.CrossValidated, app_key=user_api_key, impose_throttling=True)
cv.be_inclusive()

for q in cv.questions(pagesize=100):
    if "keras" in q.title:
       print('--- %s ---' % q.title)
       print(q.creation_date)

I checked the same query manually with a search and obtained the list of questions very quickly.

How can I do the same using Py-StackExchange?

1
  • You have two options: use the API or SEDE. The API has real-time data, but you'd have to do a few calls to get the questions. SEDE (Stack Exchange Data Explorer) is updated weekly (every Sunda), but you can fetch all the questions at once. Which one would you like? Commented Sep 26, 2020 at 7:10

1 Answer 1

1

You have two options:

  1. Use this SEDE query. This will give you all questions which contain keras in their title on Cross Validated. However, note that SEDE is updated weekly.

  2. Use the Stack Exchange API's /search/advanced method. This method has a title parameter which accepts:

    text which must appear in returned questions' titles.

    I haven't used Py-StackExchange before, so I don't know how it works. Therefore, in this example I'm going to use the StackAPI library (docs):

    from stackapi import StackAPI
    
    q_filter = '!4(L6lo9D9ItRz4WBh'
    word_to_search = 'keras'
    SITE = StackAPI('stats')
    keras_qs = SITE.fetch('search/advanced',
                          filter = q_filter,
                          title = word_to_search)
    print(keras_qs['items'])
    print(f"Found {len(keras_qs['items'])} questions.")
    

    The filter I'm using here is !-MOiN_e9RRw)Pq_PfQ*ovQp6AZCUT08iP; you can change that or not provide it at all. There's no reason to provide an API key (the lib uses one) unless there's a readon to do so.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. Does it retrieve all questions with the key word "keras". Even without filters I get a very small number of questions (less than 10), which is unrealistic. I need to retrieve all questions with the keyword in a title.
Thanks for suggestions. Yes, I specified SITE. I've been struggling with this tuning for a while. I always get a non-realistic subset. Could you please give a complete example with pagination, that is supposed to retrieve a complete data set (at least for 1 year, so that I can slice over years).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.