Visual question answering and beyond