Towards Robust, Interpretable and Scalable Visual Representations