1

I am reading hundreds of tweets where I inspect the URLs. I am using multithreading for this task as the URL reading takes more than a second. However, I am not sure how many threads can I run in parallel at a time in this situation?

import Queue
import threading
q = Queue.Queue()
thread limit = 100

for tweet in tweets[0:threadlimit]:
            t = threading.Thread(target=process_tweet, args=(q, tweet))
            t.daemon = True
            t.start()


    for tweet in tweets[0:threadlimit]:
          tweet = q.get()

The reason I am asking this that when I use a thread limit of 100 then it works fine but for a threadlimit of 200, it gets stuck.

Platform: Linux

1 Answer 1

3

The operating system is always having some limits on the number of threads, and each thread uses some resources (notably some space, perhaps a megabyte, for the thread's call stack). So it is not reasonable to have lots of threads. Details are operating system and computer specific. On Linux, see getrlimit(2) for RLIMIT_STACK (the default stack size) and RLIMIT_NPROC (number of processes, actually tasks, including threads, you are permitted to have).. and also pthread_attr_setstacksize(3) & pthread_create(3).

Threads are often heavy on resources (so read about green threads). You don't want to have many (e.g. thousands, or even a hundred) of them on a laptop or desktop (some supercomputers or costly servers have hundreds of cores with NUMA, then you could try having more threads).

Read also about the C10K problem.

Common implementations of Python use a single Global Interpreter Lock so having lots of threads is not effective. I would recommend using a thread pool of a reasonable size (perhaps configurable, and probably a few dozens at most).

Consider using PycURL and probably its MULTI interface (see the documentation of the relevant C API in libcurl). Think in terms of an event loop (and perhaps continuation-passing style).

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.