NLP for all: MeaningCloud

December 15, 2019

Imagine that you work in a big company that has a huge amount of feedback stored on its customer service platform or in a telephone company that regularly sends satisfaction surveys to its customers. In both cases, customer service, marketing, communication, or sales departments will receive tens, hundreds, or thousands of responses per wave, time and again.

The tasks of analyzing each response and extracting relevant information to improve the business are important but time-consuming and tedious. They are so due to the high volume of texts and the fact that new data are constantly being collected. Overall, the analysis requires a lot of company resources.

You may also need to detect plagiarized text within a collection of documents to find out if there is any kind of fraudulent activity in tenders or exams. Or you may need to know what is being said about a person or an entity in Twitter’s huge feed when an issue becomes a trending topic. How long would it take us to compare hundreds of documents one by one? Would it be realistic to do so within the deadline of a tender? Would we be able to read all tweets in detail?

The common factor in these situations is that we must deal with the unstructured content that needs to be classified and analyzed to extract relevant information. And this task can be automated thanks to MeaningCloud, Sngular's great text analytics tool.

What are MeaningCloud’s functionalities?

As we have seen, the application areas of MeaningCloud are many and varied. The platform is an ideal solution for the following situations.

Document Structure Analysis

We are not always lucky enough to find structured documents that feature an index, which means that we must scroll down to the end of each document to get an idea of its structure.

This MeaningCloud API can extract titles, section and subsection headers, as well as recipients, senders, or email subjects automatically, helping us to understand the content’s structure.

This can be extremely useful to manage an organization’s knowledge base, which can be made of hundreds and hundreds of documents; to add a description of any publication’s structure making it more exploitable and valuable; or to detect suspicious patterns in compliance applications by analyzing the structure of a collection of emails.

Document Structure Analysis

Sentiment Analysis

Perhaps you’re more familiar with the concept of opinion mining; be as it may, Sentiment Analysis is a powerful API for identifying and extracting subjective information from social media content, satisfaction surveys, product reviews in forums, or any other medium. All customer insights sources or interactions at customer touch points can be examined for improving user experience.

Thanks to natural language processing, text analytics, and computational linguistics, we can deal with large volumes of text more efficiently. In addition, we can design monitoring tools to detect real-time alerts.

Sentiment Analysis

Deep Categorization

This API is the best option when you need to categorize great amounts of different types of text. Thanks to Deep Categorization, you will be able to analyze in detail the meaning of textual content or extract contexts to reflect the relations among topics and subtopics based on predefined rules or rules adapted to specific domains.

These rules cover not only lexical and grammatical levels but also the semantic aspects based on morphosyntactic and deep semantic analysis. In this way, thanks to the patterns we define and the possibility of integrating custom dictionaries, the semantic categories of an ontology are matched and displayed.

In sum, this API enables us to create complex rules that extract the meaning of any expression, going well beyond literal expressions. Deep Categorization rules can be applied to analyze the voice of the customer and the voice of the employee, or to process documents in a deeper way and “understand” their contents in detail and with high accuracy.

Deep Categorization

Text Classification

This functionality helps you automatically classify any content into categories to facilitate its management, grouping, and filtering. It is a potent tool for search engines, media, or online stores since their content or products will be perfectly ordered according to a hierarchical classification or taxonomy, making search and browsing easier.

How does it classify? Text Classification employs several standard classification models, but custom classification models can also be added. One of the standard ones is the IPTC (International Press Telecommunication Council) model, which is used by media outlets to organize pieces of news by labeling them with one or more categories among 1300 available. Another model follows the IAB (Interactive Advertising Bureau) standard and can be employed for positioning ads according to their relevance (this model can be customized too).

This API can be also used for content search and recommendation systems or for classifying medical, legal, or financial records.

Text Classification

Text Clustering

If you have a collection of documents, this API enables you to discover their most frequent topics, distributing them in different groups or clusters. The system finds the similarities and differences between the items and organizes them according to their contents, i.e., without the need to predefine any list of categories.

Text Clustering processes unstructured contents and groups them according to their relevance in relation to the most frequent subjects that appear in a set. The main difference with respect to the Text Classification API is that this latter one needs a previously defined taxonomy. Text Clustering analyzes a set of documents and distributes them in groups according to their similarities, without depending on a previous model. Both tools are complementary.

We use clustering to retrieve information and create recommendation systems, analyze feedback, and perform opinion mining, document classification, or media monitoring tasks.

Text Clustering

Topics Extraction

Thanks to an ontology with more than 200 classes, Topics Extraction extracts relevant information from any text and recognizes entity types such as people, places, organizations, and products... The information is then labeled and displayed showing its semantic fingerprint.

It also detects relevant concepts and data such as dates, telephone numbers, monetary amounts, or e-mail addresses. It can be customized by adding personal dictionaries of entities and concepts.

The annotation of entities, their classification, and disambiguation improve information search, search engine positioning, or related content recommendation, to mention a few.

Topics Extraction

Summarization

Finally, with MeaningCloud you can automatically summarize documents, extracting their most relevant sections. This way, you can decide in seconds whether a document is worth reading or not.

The API identifies the most relevant sentences in a document and generates a synopsis. In other words, it extracts the main ideas and distributes them in a number of sentences or paragraphs, as we can customize the length of the summary. It is ideal for media monitoring, knowledge management, and content publishing.

Summarization

Corporate Reputation

The Corporate Reputation API analyzes opinions about a company or organization. It combines three tasks: Topics Extraction, Sentiment Analysis, and Text Classification.

Based on text analysis, this API identifies which organizations are mentioned, and which reputational topics are discussed (for example, their innovation initiatives or their position with respect to a given social cause) and assigns polarity (positive, negative, or neutral) to the detected entities. The aim is to discover a company’s reputation based on what is being said about it.

Among the APIs we have seen, Topics Extraction, Sentiment Analysis, Deep Categorization, and Corporate Reputation seem to be very powerful tools able to provide valuable business insights. What do they have in common? First and foremost, the fact that they can be 100% customized and improved by integrating user dictionaries. We'll tell you more below.

Corporate Reputation

Predefined Dictionaries and User Dictionaries

MeaningCloud's technology supports the integration of generic or domain-specific dictionaries to add morphological and semantic features to the platform's generic lexical resources. Custom dictionaries may contain two types of entries: entities, which are real-world objects with a proper name (Entity = Barcelona), and concepts, which are terms or keywords that represent classes of entities (Concept = city).

Let's imagine that we need to extract terms from a domain-specific text such as a letter to shareholders of any English-speaking company. If we use the Topics Extraction API with a generic dictionary, the results will probably be quite poor. Therefore, specific dictionaries such as FIBO (Financial Industry Business Ontology) enable us to cover more specific terminology. With this dictionary, we can recognize the terms and keywords defined in the FIBO Vocabulary and link them to the corresponding node of the FIBO ontology, adding certain information such as their definition, origin, etc.

But there is more. We are not working with open-access text, but with a knowledge base related to our business. In addition to generic financial terminology, we need to integrate into the model the names or references of the clients or the companies we do business with. To improve coverage and accuracy, the general resources can be complemented with user dictionaries tailored to our needs… Indeed, we can create our own ontology!

Another example: a company may want to detect the names of the authors of the documents stored on its database. We could not count on a general resource as if we were to detect known writers. How can we approach this project? It would be as simple as building a user dictionary with the names provided by the company, to which we would add an AUTHOR feature or tag: if we integrate this dictionary into the API we intend to use, it will detect any instance of the dictionary and show it as an output. It's fantastic!

In short, the generic lexical resources used by MeaningCloud APIs are designed to analyze and work with texts of any common scope, so they have a wide coverage; however, sometimes, and due to our business needs, we must customize the output of certain APIs for any specific task. To do this, we can benefit from the dictionaries predefined by MeaningCloud (if they help us) or dictionaries customized by the user.

If you are interested in MeaningCloud and want to talk to our team, contact us and we will schedule a meeting to show you all the details.

Contact us!