On Layer Normalization in the Transformer Architecture

arXiv V2: On Layer Normalization in the Transformer Architecture