Retrieving Images through Bi-modal Visual and Language Queries