In the previous post I explained that, after a certain point, improving efficiency of the data loaders or increasing the number of data loaders have a marginal impact on the training speed of deep learning.
Today I will explain a method that can further speed up your training, provided that you already achieved sufficient data loading efficiency.
Data Prefetcher In a typical deep learning pipeline, one must load the batch data from CPU to GPU before the model can be trained on that batch.
Have you wondered why some of your training scripts halt every n batches where n is the number of loader processes? This likely means your pipeline is bottlenecked by data loading time, as shown in the following animation:
Training is bottlenecked by data loading time. In the animation above, mean loading time for each batch is 2 seconds, and there are 7 processes but forward+backward pass for each batch only takes 100ms.
Can reinforcement learning agents generate and benefit from conventions when cooperating with each other in imperfect information game?
This is the question that led to my course project in “Advancing AI through cognitive science” course at NYU Center for Data Science. In summary, I applied theory-of-mind modeling to the Hanabi challenge  and observed an improvement.
Applying Theory of Mind to the Hanabi Challenge Hanabi is a card game created in 2010 which can be understood as cooperative solitaire.
‘Processing Megapixel Images with Deep Attention-Sampling Models’ (referred as ‘ATS’ below)  proposes a new model that can save unnecessary computations from Deep MIL .
They first compute an attention map of all possible patch locations from an image. They do so by feeding a downsampled image to a shallow CNN without much pooling operations. They sample a small number of patches from the attention distribution and show that feeding these samplied patches to MIL classifier is an unbiased minimum-variance estimator of the prediction made with all patches.
Are natural language inference (NLI) deep learning models capable of numerical reasoning? This is the question that led to my course project in “Natural Language Understanding and Computational Semantics” course at NYU Center for Data Science. In summary, my teammates and I tried adversarial data augmentation and modified numerical word embedding to show that some of SoTA NLI architectures at the time could not perform correct numerical reasoning that involve adding multiple number words.
The main claims of the paper  A certain level of localization labels are inevitable for WSOL. In fact, prior works that claim to be weakly supervised use strong supervision implicitly. Therefore, let’s standardize a protocol where the models are allowed to use pixel-level masks or bounding boxes to a limited degree. According to their proposed evaluation method, they have not observed any improvement in WSOL performances since CAM (2016) in this protocol.