Interpretability

Paper Review: 'Processing Megapixel Images with Deep Attention-Sampling Models'

‘Processing Megapixel Images with Deep Attention-Sampling Models’ (referred as ‘ATS’ below) [1] proposes a new model that can save unnecessary computations from Deep MIL [2]. They first compute an attention map of all possible patch locations from an image. They do so by feeding a downsampled image to a shallow CNN without much pooling operations. They sample a small number of patches from the attention distribution and show that feeding these samplied patches to MIL classifier is an unbiased minimum-variance estimator of the prediction made with all patches.