In December 2013 SAP released SAP HANA SP07 and SAP River, which is designed to be a rapid development platform for apps. I thought I'd challenge myself to build an app in a day. Thanks to Matthias Steiner, SAP have opened up the data within their SAP Community Network website to a beta community.
So I thought - can I extract data out of SCN into SAP River and then use Lumira to answer the question: who are the movers and shakers in the SAP HANA community?
What is SAP River?
I wrote a post covering this in more detail, but in essence River is a development language for SAP HANA that allows fast, descriptive development. I described the SCN entities: Spaces, People and Content and how they relate to each other, in around 1 page of code and 30 minutes work.
River then automatically generates the HANA objects for you - with no fuss. So I've now loaded information about the 210 SCN Spaces, information about the 7000 people who have generated blogs or documents (note that SCN has a lot more users than this, but I've assumed that discussion forums don't influence people, for the purposes of this exercise), and over 40,000 pieces of content. Each blog or document is stored in full in a HANA object by River.
Why use SAP HANA for this?
Well as Edmund Hillary said, "because it was there". But also because HANA has a text analysis engine. We issue one command, and it builds a Google style text index including Voice of the Customer sentiment analysis. There is no need for a separate text analysis engine as well as an analytics engine. In fact in my opinion, SCN should run on HANA because then SAP could do all this analysis in one place, but that's a whole other conversation.
In addition, we can build HANA Information Views on top of this model, to allow us to do rapid ad-hoc analysis using Lumira. Yes, we can integrate information about Spaces, People, Content (blogs + documents) and the HANA Sentiment Analysis. We can include a Time dimension to allow us to very efficiently do analysis in time. This whole model was built in one day. Let's get started!
In this first graphic we pull total views by Year by Quarter. SCN had a new platform at the end of 2011, and so they probably measure views differently. That's probably the cause of the very large spike in content. Despite this, we see that Q4 2013 was the most popular quarter of all time for SCN.
In 2013, ABAP Development was still the most popular area of SCN - that's a testimony to the huge install based of SAP ERP customers. There are a bunch of BI areas. We can see that HANA is pretty popular - 430k views this year.
Let's drill into this a bit. Now it's getting interesting. Dick Hirsch gets more reads than anyone else on SCN followed by yours truly. Vivek Singh Bhoj has fewer views but a ton of replies and likes.
If we drill into this, we can see that Vivek wrote one very popular developer blog which got a lot of likes. Interesting!
Now, let's filter by this interesting group of people and see how popular they are around the rest of SCN. It paints a different picture: Thomas Jung is now by far the most read author - this is because he mostly blogs in a different space, not the HANA space. It also brings a fourth influencer that I didn't expect:
If we look at Andy, we see again that he had one very popular piece of content for Basis consultants. It's clear that the SCN audience loves reference guides!
Now let's go and take a look at the text analysis data - containing over 1m keywords and sentiment, which we have linked into the same model. This gets updated in real-time, as data arrives. Let's take a look at the sentiment for our top 10 authors!
It's no surprise that our top influencers are mostly Weak Positive and Strong positive. Strongly positive content seems to be better viewed. Let's drill into the positive content by influencer and see what we see...
Interestingly, Dick Hirsch is the most positive (followed by myself). More interestingly, SAP Employee Thomas Jung is quite neutral, but knowing Tom, this makes sense as he writes very factual content. Let's see how the positivity of influencers changes over time:
This is fascinating: the most views happens in Q2, when SAP's SAPPHIRE conference happens. But much more interestingly, the top 10 influencers have been getting more positive through 2013 - peaking in Q4. Let's compare this to the total sentiment:
And here's the most fascinating thing of all: overall, there are more page views during Q2 (and more in 2013 than 2012 or 2011), but for the overall audience, sentiment has remained neutral.
I was thoroughly impressed that it's possible to build and load a SAP River model in one day, including numeric and text-based analysis. Some thanks must go to Matthias Steiner, for helping me with access to the SCN data.
For me, the main learning was that the ease of building complex apps using SAP HANA and SAP River is the main value of this platform. It would be easy to extend this app into something much more, and the SAP HANA platform allows you to easily build HTML5 apps on top of this data.
No doubt this will cause some conversation around the meaning of Influence, and this only takes into account SCN data, and not this website (if someone will give me access to the API, I'll include this). Amusingly it doesn't include any of the major influencers within SAP like Hasso Plattner, Vishal Sikka and Franz Faerber. That, of course, is the nature of this sort of analysis.
*This post was first published on saphana.com on 31/12/13*