Cramming - Training a Language Model on a Single Gpu in One Day

arXiv V1: CRAMMING: TRAINING A LANGUAGE MODEL ON A SINGLE GPU IN ONE DAY