Space-aware subword tokenisation and complex word processing in language models