Taming Data and Transformers for Audio Generation