Seems like many got confused with it, at least when they relying on the documentation. There are quite a lot of github issues including #1638. With a deep understanding of Python it might be trivial. For me, it wasn’t.
There are three input arguments that are related to this issue. (Documentation)
max_queue_size=10, workers=1, use_multiprocessing=False
max_queue_size: It specifies how many batches it’s going to prepare in the queue. It doesn’t mean you’ll have multiple generator instances. See the example below.
Unlike a comment I saw in some keras/issue, it doesn’t mean the training begins only after the queue is filled. Making the yield super-slow shows this.
use_multiprocessing: with the naive input loader it fails. with the naive input loader +
use_multiprocessing=True, it works with many generator instances. But you should make it sure if this is what you want. If you see the result, the indices will help you understand what’s going on.
Here, I knew generators are non-picklable, so actually don’t understand why it still works with multiprocessing. But what’s happening is obvious with the printed logs. People report some thread problems here in keras/issue.
- With a thread-safe generator?
With a thread-safe implementation, multiple workers have no problem with both
False. Multiple generators are instantiated only when
My conclusion: make it thread-safe and perhaps turn on the multiprocessing option if you think you’d need it to speed up.