In this tutorial, we will learn and implement the code for one of the most popular methodologies of Natural Language Processing(NLP) using Python, Stemming and Lemmatization. Python is a preferable language for NLP, because of it’s simple syntax and easy to execute code.
NLP – Natural Language Processing
Natural Language Processing is focused on making computers understand and process human languages. Computers are great at processing and learning tons of data, from spreadsheets. The language humans use is unstructured, computers need structured or organized form of data to understand
There exist multiple techniques for NLP, such as Sentiment Analysis, Named Entity Recognition, Stemming, Lemmatization, Bag of words, Term Frequency-Inverse Document Frequency, and Wordcloud.
Stemming in Natural Language Processing(NLP) Python
Stemming is a technique for eliminating affixes from words to obtain their basic form. It’s the same as pruning branches down to the trunk. The stem of the terms eating, eats, and eaten, for example, is eat. Search engines index words using stemming.
Let’s look at another example; the word “friends”, “friendships”, ”friendships” after stemming it is reduced to “friendship”
Two well-known libraries are available for the English language in nltk PorterStemmer and LancasterStemmer. These two algorithms are compatible with Python and easily executable.
PorterStemmer
Let’s implement the basic and simple PorterStemmer and understand how it works and how to implement it.
Step 1: Importing the library from NLTK
#importing the PorterStemmer Library
from nltk.stem import PorterStemmer
Step 2: Create a function and implement it with the library
#basic implementation
# PorterStemmer
porter = PorterStemmer()
print(porter.stem("friendship"))
Output:
friendship
Now here we can see that the output is friendship only instead it should be friend, But the PorterStemmer algorithm does not follow the linguistics, but it follows a set of 5 rules for different cases that are applied in phases to generate stems.
Lancaster Stemmer in NLP
Here, let’s implement the code for Lancaster Stemmer and understand how it executes.
The steps will be the same as of PorterStemmer, import the library and create a function with the library.
#Import the LancasterStemmer Library
#importing the LancasterStemmer Library
from nltk.stem import LancasterStemmer
#Defining the function using the library
#LancasterStemmer
lancaster=LancasterStemmer()
print(lancaster.stem("friendship"))
Output:
friend
Given the input friendship, the LancasterStemmer gave the output friend. So, here the output produced is correct and is accurate. LancasterStemmer performs heavy stemming because of iteration over-stemming happens. The output produced can be of no meaning.
There are several other libraries you can explore yourself; try SnowballStemme from nltk.stem
Lemmatization in Natural Language Processing(NLP) Python
Unlike, stemming lemmatization doesn’t stem out the word. Instead of truncating, it searches in the dictionary for the word. So it does requires the dictionary of the particular language in order to generate output.
This process of searching makes the lemmatization algorithm slower, but results are more accurate in comparison to stemming. If the speed is not the issue and the result is, lemmatization is a better option.
Let’s look at the implementation of Lemmatization.
Step 1: Import libraries
#import the libraries
from nltk import WordNetLemmatizer
Step 2: Defining the function
lemmatizer = WordNetLemmatizer()
words = ['articles', 'friendship', 'studies', 'phones']
for word in words:
print(lemmatizer.lemmatize(word))
Output:
article
friendship
study
phone
The output for the words articles, friendships, studies, and phones is article, friendship, study, phone. This depicts that the accuracy in Lemmatization is better compared to stemming.
Lemmatization can generate different output for different Part of speech, such verb(v), noun(n), adjective(a), and adverb(r). The Default POS value in lemmatization is Noun so the output in the above example is of noun only.
Let’s try a different POS value (V)
from nltk import WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
words = ['be', 'is', 'are', 'were', 'was']
#Changing the pos value to verb (V)
for word in words:
print(lemmatizer.lemmatize(word, pos='v'))
Output:
be
be
be
be
be
Now it is clear that POS value is V the output is only “be”.
Comparison between Stemming and Lemmatization.
The table below will help you to understand the difference between PorterStemmer, LancasterStemmer, and Lemmatization. It will shed the light on different results generated by each of the three algorithms.

Now here in this table we can see that how the results are different for each of the algorithm.
Summary
To conclude this tutorial, we learned about the most used techniques of NLP, which are Stemming and Lemmatization. The stemming technique is fast, but one cannot rely on this technique if accuracy is a preference. Lemmatization is not as fast as Stemming but it can definitely generate accurate results.
In upcoming tutorials, we will be exploring other techniques of NLP which are complicated but easy to learn and understand.
Do check out our article on “How to dynamically generate images with Node.js and Canvas“
Hoping this article will help you understand about NLP Technique.