Audio-Visual Speech Processing for Multimedia Localisation