Insights

View full profileJohn Appleby

Head of Business Analytics & Technology
Bluefin Solutions

Innovation Weekend and the irony of "fixing SCN from within"

19 Oct 2010

I'm not really sure how I ended up at Innovation Weekend 2010 in Vegas.

Background and history

For me it goes back to Hackers night 2009 in Vienna - when Craig Cmehil (our very own Simon Cowell) and I were talking late at night, not long before Craig, Tom Jung and Duane Chaos and I got kicked out the MesseCentrum building. We were discussing the RIA hackers night, which I felt had turned into a late night learning session based on our own laptops. And whilst it was still a cool thing to do, I for one felt that it was missing something. Mostly, real hacking and a deliverable. And real purpose. And what I didn't realise at that time was that Marilyn Pratt had already taken a load of hackers into the BPX slam and those guys were really hacking.

Craig and I kept in touch and whilst I don't know how much I shaped the end product - he probably already had a plan, he usually does - the end result was the Innovation Weekend. An all night hackers night running from Sunday to Monday. I wasn't able to attend in Berlin and as luck had it, the cheap tickets to Vegas were on Saturay, meant that I was just about recovered from jetlag by Sunday at 1pm.

Now I have to take you back again 6 months to the SAP Inside Track in London that Darren Hague kindly organised. I met a PhD student called Sarah Otner, who was doing a PhD on the recognition system in the SAP Community Network. I loved her passion and interest in the system and she was really frustrated, because she needed data in order to do the mining she needed to do to write her thesis. SAP were blocking her desire to get the data out, either for technical or legal reasons. I don't think that it was an orchestrated attack - but rather that it was the typical problems that you see in a large corporation.

I saw her in Berlin last week and she looked slightly downtrodden - no progress on data in the preceding 6 months since I saw her in SIT London. I felt that for SAP there was no downside - free research and exposure for one of the most exciting community networks in the word.

Fast forward to Vegas

... and I found myself in the amazing Innovation Weekend masterminded by Marilyn Pratt and Craig Cmehil. Without those guys it would be nothing.

They had prepared 8 BPX focussed business cases and one of these was as follows:

8. "Physician: Heal Thyself": Improving the SCN from within!

Posted by:  Sarah Otner  GOAL:  Improve the recognition systems of SCN by examining the historical data

  • Does the SCN recognition system reward the right kinds of behaviors and contributions?
  • What's the <ins style="line-height: 1.22em; border-style: initial; border-color: initial; outline-width: 0px; outline-style: initial; outline-color: initial; font-size: 11px; vertical-align: baseline; background-image: initial; background-attachment: initial; background-origin: initial; background-clip: initial; background-color: transparent; text-decoration: none; border-width: 0px; padding: 0px; margin: 0px">real</ins> value of being a Top Contributor?
    PROBLEM: Initial attempts to pull the source data already available on SCN into Excel failed as they only returned 10 lines and the same 10 lines upon each request (a problem when one Top Contributor  table has 17,000 individuals).

CHALLENGE:

  • A database of community members and their activity year-on-year for as many years as is available.
  • Scrape the Contributor Recognition Program, the Top Contributors' lists, the Topic Leaders' lists, and the Mentors' rosters into a format easily manipulable (by me! J) for analysis.
  •  

What next?

Fellow SAP Mentor Thorsten Franz turned up at the table along with a number of other great individuals. And it became clear that this was a pretty easy technology challenge, provided we could get the data. So I set about getting the data whilst Thorsten, Arun, Laurant and others worked on analytics and presentation.

Mounting a DOS on SCN (aka making friends and influencing people)

So it turns out that the only way to get points data out of SCN is to read the RSS feeds on the contributor pages. Only the contributor page version is broken. The company version does however work, and it is possible to see points - by Company by Person by Year by Development Area. Can you see where I am headed?

So if you want to find out the contributors for Bluefinsolutions.com - for 2010 and for Mobile, you can go here.

So all I needed to do was to write a script to get this for all companies, all years and all points areas. Simple, right. Here's the bash script to do it:

for year in `cat ../year`; do for devel in `cat ../devel`; do for comp in `cat ../companynames`; do wget -O $year,$devel,$comp 'http://www.sdn.sap.com/irj/sdn/topcontributorsrss?periodId='$year'&minimumPointsCount=20&areaIds='$devel'&organization='$comp; done; done; done

Note that I downloaded the years, company names and development areas using the same techniques and put them in files - and note that the filename is cued to be part of the CSV name. But... I forgot to escape the & by surrounding it in inverted commas. So in doing so, I opened up 2500 threads (I used the top 2500 companies). And SDN died for 3 hours.

After SDN came back up I fixed my script and parallelised it by year - so just 8 threads running. It took 10 hours to download all the data into some 180,000 XML files. Thankfully, we have lots of CPU power these days. So I wrote some scripts around that too.

First, files that are 409 bytes long don't actually have any data in them. So we strip them out the list of files to process as follows:

for a in `find -not -size 409c -print | sed '1d'| cut -c3-100`; do echo $a; done > ../filled

And then we strip the XML out, turn it into a flat file and append the filename that relates to it, to each line.

for a in `cat ../filled`; do cat $a | sed '1,9d' | more | sed ':a;N;$!ba;s/<\/title>\n/,/g' | sed 's/<title>//' | sed ':a;N;$!ba;s/<\/link>\n/,/g' | sed 's/<link>//' | sed ':a;N;$!ba;s/<\/description>\n/,/g' | sed 's/<description>//' | sed ':a;N;$!ba;s/<\/pubDate>\n/,/g' | sed 's/<pubDate>//'| sed ':a;N;$!ba;s/<\/scn:rank>\n/,COMPANY/g' | sed 's/<scn:rank>//'| sed ':a;N;$!ba;s/<\/item>\n//g' | sed 's/<item>//'| sed ':a;N;$!ba;s/<\/rss>//g'| sed ':a;N;$!ba;s/<\/channel>//g' | sed '1d' | sed '$d' | sed s/COMPANY/$a/; done >> ../fillout.csv

This gives us a bunch of data that looks like this:

Jon Reed,https://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.sdn.businesscard.SDNBusinessCard?u=gLyawsX5bMI%3D,80,Tue, 10 Feb 2009 2:43:19,1,y08,P,jonerp.com

All we do then is convert the data and replace some years and development areas, and we've got a nice big CSV file with people by year, development area and company.

The rest is easy

The rest of our demo was easy - we uploaded the big CSV file into SAP's cloud BI Service - http://bi.ondemand.com and used SAP BusinessObjects Explorer to look at the data. We also used the new beta BUPA dashboarding service which worked pretty well.

Conclusions

Well, we have done what we set out to achieve. We have 7 years of SCN data explorable by most of the metrics that Sarah was looking for. There are some things that were hard to do - especially scraping the master data from SCN business cards and that is a work in progress.

But what we're hoping, and there's a number of us that share this vision, is that as the SCN team start to realise the value of analysis by students of the data, we are able to break down the walls of getting more detailed information available to people like Sarah who want to run PhD theses into the community.

Huge thanks to Marilyn and Craig for making it possible. To Kai and Mark and Chip and everyone from the SCN team who I inconvenienced. Sorry about that.



Comments

John Appleby 27 Oct 2010

A few people have mentioned legal restrictions but noone has been able to specify what. Here's some extracts from the SCN terms of use:

"A. Except for Web sites within SCN which are clearly identified as non-public forums (each a Non-Public Forum), the SCN is intended to be a public forum and You agree not to provide SAP or other Users with any confidential or proprietary information"

"You are permitted to Use the Services only in strict compliance with the terms of this TOU to obtain information, so long as that information is not being gathered for a use in any manner which is or could be detrimental to SAP (unless such use is otherwise protected by law), and/or to provide feedback or other constructive comments to SAP (both positive and negative) and the SCN Community."

Bear in mind that at the point in time that we did this, we had a table full of people from SAP and SCN working with us and we asked permission - it wasn't done behind their backs.

I think though in the end the main thing is this has increased Sarah's profile to the point where she can hopefully engage with SAP to get the data she really needs.

Christian Braukmüller (@cbasis) 27 Oct 2010

Hi John,
great report of your activities!
I fear that you broke some legal restrictions, but from the technical and adventure point of view it is magnificent. Helping Sarah to get further was another great decision.

Commandline rulez!

thx for sharing.
Christian

John Appleby 26 Oct 2010

Sorry Darren of course it was. Kudos to Nigel as well. Of course since it was an unconference it was... unorganised? Whatever it was, it worked great.

My sed skillz aren't so l33r any more, I used to make a living out of them 10 years ago mind you...

Would have loved to have had a non-production version of SDN! There were *cough* enough SCN employees kicking around that might have done the same ;=)

Darren Hague 26 Oct 2010

To give credit where it's due, SAP Inside Track London was co-organised by myself and Nigel James.

Credit to you too for a nice hack, showing off your l33t sed skillz. Another time, give me a call first and maybe I can point you at a non-productive version of SDN? :-)

Cheers,
Darren

Add a comment