(For context, I’m basically referring to Python 3.12 “multiprocessing.Pool Vs. concurrent.futures.ThreadPoolExecutor”…)
Today I read that multiple cores (parallelism) help in CPU bound operations. Meanwhile, multiple threads (concurrency) is due when the tasks are I/O bound.
Is this correct? Anyone cares to elaborate for me?
At least from a theorethical standpoint. Of course, many real work has a mix of both, and I’d better start with profiling where the bottlenecks really are.
If serves of anything having a concrete “algorithm”. Let’s say, I have a function that applies a map-reduce strategy reading data chunks from a file on disk, and I’m computing some averages from these data, and saving to a new file.
Speed-wise, multiple processes and multiple threads should be identical, if you are using the same primitives (shared memory, system-wide semaphore).
Threads are easier to use and use less RAM, because all your memory is shared automatically, and system-wide semaphores have complicated API.
On python, because of the Gil, multi processing should always be preferred if possible.
Also logging is not isolated. Bleeds all over the place. Which is a deal breaker
Not worth the endless time doing forensics
Agree! Lets stick with multiprocessing
one thread sounds nice. Lets do much more of that