Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, Shirui Pan, Chengqi Zhang

Recurrent neural nets (RNN) and convolutional neural nets (CNN) are widelyused in NLP tasks to capture the long-term and local dependencies respectively.Attention mechanisms have recently attracted enormous interest due to theirhighly parallelizable computation, significantly less training time, andflexibility in modeling dependencies. We propose a novel attention mechanism inwhich the attention between elements from input sequence(s) is directional andmulti-dimensional, i.e., feature-wise. A light-weight neural net, “DirectionalSelf-Attention Network (DiSAN)”, is then proposed to learn sentence embedding,based solely on the proposed attention without any RNN/CNN structure. DiSAN isonly composed of a directional self-attention block with temporal orderencoded, followed by a multi-dimensional attention that compresses the sequenceinto a vector representation. Despite this simple form, DiSAN outperformscomplicated RNN/CNN models on both prediction quality and efficiency. Itachieves the best test accuracy among all sentence encoding methods andimproves the most recent best result by about 1.0% on the Stanford NaturalLanguage Inference (SNLI) dataset, and shows the state-of-the-art test accuracyon the Stanford Sentiment Treebank (SST), Sentences Involving CompositionalKnowledge (SICK), TREC Question-type Classification and Multi-Genre NaturalLanguage Inference (MultiNLI) datasets.