Text Analytics on the HANA platform

8 March 2016

George Whitby

George Whitby

BI & Analytics Consultant

The HANA platform offers massive benefits to businesses, but one of the less known offerings in the HANA arsenal is the Text Analytics capability.  

Text Analytics is another native capability of your HANA box, meaning if you’re already up and running with your enterprise licence, its right there waiting for you. Maybe you have begun to scratch the surface and are using the search capability, or you may have yet to discover this functionality. Detailed below are its three key capabilities which may be able to offer value to your business where perhaps you might not have expected.

Search

Okay, so it may not seem the most glamorous of features, but having a speedy and robust search feature in a system can work wonders for user experience. It works by generating a ‘shadow’ index on every table meaning that users essentially have a lightning fast system glossary, coupled with a ‘google-esque’ fuzzy search capability.

These quick search suggestions can provide significant new capabilities when combined with a technology called ‘stemming’. Stemming reduces inflected or derived words to their root form e.g ‘Brentford F.C.’, ‘Brentford Football Club’, ‘Brentford FC’ ‘The Bees’ would all reduce to the root ‘Brentford Football Club’. This means that users no longer have to be quite as explicit in their requests and can find their query using ambiguous or poorly worded queries.

Text Analysis

Text Analysis (TA) is the ‘fact-extraction’ capability in HANA. This is a powerful engine that helps break down and categorise your documents. It starts with basic grammar (nouns, adjectives) before building to concepts. HANA builds upwards to fact extraction, where it classifies the relationships it has built around your words. This leads to the marketer's dream; sentiment analysis.

Last year I worked on an internal Bluefin demo where we used R, an Open Source programming language, to ‘analyse’ the sentiment of tweets in the build up to the election.  We attempted to associate these with the parties involved and predict the outcome of particular constituencies. I’ll admit that when I started playing with the fact extraction capability in HANA, I was expecting something slightly more sophisticated (if a bit more performant). In truth, HANA puts our old demo to shame. In particular because of its ability to distinguish the context in which a word is used. 

‘Part of Speech’ (PoS) looks at sentences as a whole and provides grammatical context for words. Enabling the system to distinguish between some of the more confusing parts of the English language e.g. for the sentence “The grumpy teenager moped around all day on his moped”, HANA can differentiate when moped is being used as a verb or a noun.

In addition to this there is the ability to customise the dictionaries used for the analyses, meaning that you are able to tailor the TA engine to your specific scenarios. For example, if you were to describe a product as cheap, it would be an example of positive sentiment because you are speaking about its good value. However if you were to describe a person as cheap, all of a sudden the connotations are not so positive.

Text mining

Text mining is a way of understanding documents as a whole. It uses something called ‘vector-space-determination’ to compare documents and categorise them. That means scoring documents along multiple dimensions and it enables not only categorisation, but also identification of key words. This means it can provide a more contextual (and potentially more relevant) method of searching on subjects and highlighting links between documents with common themes.

Summary

Whilst for a lot of companies it's unlikely that Text Analytics is the reason to purchase that HANA license it definitely adds to the business case. Being able to augment your existing analytical reporting with insights from your unstructured data can create links at a more granular and statistically significant level than before.  All of a sudden, you can see the ‘sentiment uplift’ from your initiatives and provide some real-world context to your decision making. When you consider what the current generation of social metrics like Net Promoter Scores mean to businesses, all of a sudden Text Analytics doesn't seem so far-fetched.

About the author

George Whitby

BI & Analytics Consultant

George is passionate about enabling data driven decision making. A ‘hybrid’ consultant who works somewhere between the business, the data and the technology. George is able to bring true Business Intelligence into being. 

With knowledge gained from working across multiple industries and covering a range of SAP technologies including BI/BW, HANA, TPx and Predictive Analytics, George specialises in helping global businesses influence sales where they don't necessarily have direct contact with the customer.

Bluefin and SAP S/4HANA - welcome to the one horse race