Natural language processing (NLP) is a subfield of Artificial Intelligence, for analysing and processing the interactions between computers and human (natural) languages. Speech recognition, Natural language understanding, and Natural language generation etc are some of the applications of NLP. Stemming, Tokenization, Lemmatization, Part-of-speech tagging are some of the steps involved in Natural Language Processing.
Part-of-speech tagging also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech. This is the identification of words as nouns, verbs, adjectives, adverbs, etc.
Tokenization is the task of chopping it up into pieces, called tokens and we can throw away certain characters, such as punctuations utilising this technique. Tokens are often referred to as terms or words. A token is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing.
Stemming is the process of reducing inflected (or sometimes derived) words to their word stem, base or root form, generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. Stemming is like a suffix stripping algorithm.
Lemmatisation is the process of grouping together the inflected forms of a word so they can be analysed as a single item, identified by the word’s lemma, or dictionary form. Unlike stemming, lemmatisation depends on correctly identifying the intended part of speech and meaning of a word in a sentence. Lemmatisation is closely related to stemming. The difference is that a stemmer operates on a single word without knowledge of the context, and therefore cannot discriminate between words which have different meanings depending on part of speech.
Sentiment Analysis is an often performed application of Natural Language processing. It’s Extracting subjective information usually from a set of documents, often using online reviews to determine “polarity” about specific objects and opinios. It is especially useful for identifying trends of public opinion in the social media, for the purpose of marketing. For sentiment analysis, use of text analysis, computational linguistics, and biometrics are also used to systematically identify, extract, quantify, and study affective states and subjective information.
A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level, whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. TextBlob is a popular library for Sentiment analysis. We also make use of Packages like NLTK, Rasa NLU, Spacy and also Google NL API.
Comments are closed.