Acquiring Linguistic Knowledge From Multimodal Input