Pipelining the architecture :
The article discusses multiple text cleaning techniques for pre-processing of text and also highlights commonly used libraries and standard techniques for the same. The process of text cleaning involves multiple steps like tokenization, stemming, lemmatization, filtering punctuations etc. and for the same, I have tried to cover some of the basic techniques from a wider prospect by including the alternative choices to the users for cleaning the text in their text-mining/NLP based applications, and I hope that the content will be found useful specially if you are a beginner in NLP or other related domain.
1. Removing Punctuations:
True Authors of the paper : Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Ves Stoyanov, Luke Zettlemoyer
The article talks about a way of denoising the pretraining of a sequence to sequence model for Natural Language Generation. I have tried to explain everything from my study in a lucid way with the hope that every reader will understand the writing and will get benefitted from it. …