Start building your own chatbot now!

In a previous blog post, we talked about Recurrent Neural Networks (RNNs), and how they are the go-to neural network architecture for Natural Language Processing (NLP) tasks. Here, we’ll follow up on this statement by focusing on Convolution Neural Networks (CNNs), which have been successful on several text processing tasks in the recent years.

Before listing the resources I aggregated for you on CNNs, I’ll take a minute to explain the main principles behind CNNs, and present some of their applications.

CNN applied on text, Credits to Ye Zhang et al.,, 2015

Pros and cons of convolutional neural networks

The main advantage of Convolutional Neural Networks against Recurrent Neural Network is speed. The reason behind this is that CNNs are parallelizable, whereas RNNs are sequential. At each timestep T, the RNN will compute a state that will be conditioned on the previous state at timestep T – 1, CNNs however, will only use the local context (within the convolution window) to build its state. Since there is not conditionality between states, we can simply parallelize those computations, and make our network several times faster. Another advantage of CNNs is their capacity to extract the most important features (read: key information) from the word embedding matrix they’re fed (read: from a sentence), usually by using max-pooling layers. The benefit of using a max-pooling layer after a convolution layer is that it reduces the output’s dimensionality by keeping only the most salient n-gram features across the whole sentence.

Now, I’ve been describing the bright side of convolutions, but there’s also a few drawbacks to take in account. The first one is the difficulty for CNNs to model long words relationships. Because they are relying on word embeddings, we would need to concatenate the whole sentence, indubitably leading to a data sparsity issue. The second is related to word order, since they do not have the “recurrence” of RNNs, they do not keep track of the position of the words within the sentence, leading to a loss of information.

Schematic visualization of the features triggering a positive/negative review, Credits to Asma Daoud,, 2018

Given those pros and cons, we can define a clear set of requirements needed to make the CNNs shine: the input is short sentences, or when specific features need to be detected. Here are a few examples: Emotion Detection [2], Sentiment Analysis [3], Named Entity Recognition [4], Machine Translation [5]. If your task is highly sequential or needs to have long-distance relationships handled (Dialog Management, Semantic Role Labelling, POS Tagging), you’re still better off with RNNs.

CNN applied on word embeddings via a lookup table, Credits to Ronan Collobert et al.,, 2011

Whether you choose an RNN or a CNN to tackle your NLP task, but chances are you will rely on word embeddings [6]. In the past few months, we’ve seen several new techniques being proposed, the most recent being using language modeling as a way of building strong word embeddings. More, using this task for transfer learning improved the downstream models by up to 20 percent [7]!

As Sebastian Ruder stated in his recent article [8]: “it only seems to be a question of time until pretrained word embeddings will be dethroned and replaced by pretrained language models in the toolbox of every NLP practitioner.”

Resources about convolutional neural networks

If you want to learn more about CNNs, here is a list of blog posts, studies and research papers you should read. Enjoy!



Convolutional Methods for Text, Tal Perry, 2017

Understanding how Convolutional Neural Network (CNN) perform text classification with word embeddings, Joshua Kim, 2017

Convolutions in NLP, Asma Daoud, 2018

Convolutional Neural Networks for Text Classification, David Batista, 2018


Conv Nets: A Modular Perspective, Christopher Olah, 2014

Understanding Convolutional Neural Networks for NLP, Denny Britz, 2015

Convolutional Neural Networks (CNNs): An Illustrated Explanation, Abhineet Saxena, 2016

Detecting Sarcasm with Deep Convolutional Neural Networks, Omar Sar, 2018


Implementing a CNN for Text Classification in TensorFlow, Denny Britz, 2015

Text classification using CNN : Example, Nitin Agarwal, 2016

A Comprehensive Guide to Understand and Implement Text Classification in Python, Shivam Bansal, 2018

Text Classification using CNN, LSTM and visualize word embeddings: Part-2, Sabber Ahamed, 2018

How to Develop an N-gram Multichannel Convolutional Neural Network for Sentiment Analysis, Jason Brownlee, 2018

Sentence Classification using CNN with Deep Learning Studio, Rajat Gupta, 2018

Text Classification Using a Convolutional Neural Network on MXNet, Apache Incubator



Convolutional Neural Network for Computer Vision and Natural Language Processing, Mingbo Ma, 2015

Comparative Study of CNN and RNN for Natural Language Processing, Wenpeng Yin, 2017


Sequential Short-Text Classification with Neural Networks, Franck Dernoncourt, 2017


Phoneme recognition using time-delay neural networks, Alexander Waibel et al., 1989

Object Recognition withP Gradient-Based Learning, Yann LeCun et al., 1999

Natural Language Processing (almost) from Scratch, Ronan Collobert et al., 2011

Recurrent Continuous Translation Models, Nal Klachbrenner et al., 2013

Convolutional Neural Networks for Sentence Classification, Yoon Kim, 2014

Effective Use of Word Order for Text Categorization with Convolutional Neural Networks, Rie Johnson et al., 2014

Convolutional Neural Network for Modelling Sentences, Nal Kalchbrenner et al., 2014

Learning Character-level Representations for Part-of-Speech Tagging, Cícero Nogueira dos Santos et al., 2014

Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts, Cicero Nogueira dos Santos et al., 2014

Text Understanding from Scratch, Xiang Zhang et al., 2015

Character-Aware Neural Language Models, Yoon Kim et al., 2015

A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification, Ye Zhang et al., 2015

Recurrent Convolutional Neural Networks for Text Classification, Siwei Lai et al., 2015

Semantic Clustering and Convolutional Neural Network for Short Text Categorization, Peng Wang et al., 2015

Efficient Likelihood Learning of a Generic CNN-CRF Model for Semantic Segmentation, Alexander Kirillov et al., 2015

Semi-supervised Convolutional Neural Networks for Text Categorization via Region Embedding, Rie Johnson et al., 2015

Character-level Convolutional Networks for Text Classification, Xiang Zhang et al. 2015

Aspect extraction for opinion mining with a deep convolutional neural network, Soujanya Poria et al., 2015

Dependency-based Convolutional Neural Networks for Sentence Embedding, Mingbo Ma et al., 2015

Discriminative Neural Sentence Modeling by Tree-Based Convolution, Lili Mou et al., 2015

Tree-based Convolution for Sentence Modeling, Mingbo Ma et al., 2015

Context-Dependent Translation Selection Using Convolutional Neural Network, Zhaopeng Tu et al., 2015

Neural Machine Translation in Linear Time, Nal Kalchbrenner et al., 2016

A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks, Soujanya Poria et al., 2016

Fast and Accurate Entity Recognition with Iterated Dilated Convolutions, Emma Strubell et al., 2017

Convolutional Sequence to Sequence Learning, Jonas Gehring et al., 2017

Deep Pyramid Convolutional Neural Networks for Text Categorization, Rie Johnson et al., 2017

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification, Jin Wang et al., 2017

A Practitioners’ Guide to Transfer Learning for Text Classification using Convolutional Neural Networks, Tushar Semwal et al., 2018

A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning, Ronan Collobert et al., 2018


  1. ML Spotlight I: Investigating Recurrent Neural Networks, Paul Renvoisé, 2017
  2. Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks, Sayyed Zahiri et al., 2017
  3. A Convolutional Neural Network for Modelling Sentences, Nal Kalchbrenner, 2011
  4. Fast and Accurate Entity Recognition with Iterated Dilated Convolutions, Emma Strubell, 2017
  5. Convolutional Sequence to Sequence Learning, Jonas Gehring et al., 2017
  6. Deep Learning, NLP, and Representations, Christopher Olah, 2014
  7. Deep Contextualized word representations, Matthew Peters et al., 2018
  8. NLP’s ImageNet moment has arrived, Sebastian Ruder, 2018

Ask your questions on SAP Answers or get started with SAP Conversational AI!

Follow us on