Google Scholar, Scopus and the h-index – a social science micro-study

May 22, 2015

Like it or not, the h-index and citation counting season is upon us again. Recently I had an interesting case brought to my attention of a Massey researcher in the social sciences who had a pretty reasonable h-index in Scopus and a much better one in Google Scholar. At one point it would have been easy to dismiss this as the result of sloppiness on the part of Google Scholar but in recent years this seems to have largely been fixed, so differences in coverage would seem to be the answer. To test this I took a look at one article that reported 21 citations in Scopus and 56 in Google Scholar. This is what I found

Citations found by Scopus but not by Google Scholar – 1
Citations found by both Scopus and Google Scholar – 20
Citations in journals not covered by Scopus – 2
Citing articles missing from Scopus (i.e. the journals that the citing articles appeared in are covered by Scopus but the articles themselves were missing) – 3
Citing articles present in Scopus but with no lists of references (i.e. the Scopus record does not generate a citation count) – 5
Citations in Masters theses – 3
Citations in PhD theses – 6
Citations in books – 7
Citations in working papers – 2
Citations in conference papers – 1
Citation in a PowerPoint presentation – 1
Citations in foreign languages I was unable to interpret – 4
False citations (when I checked the document the article had not in fact been cited) – 2

If we discount the PowerPoint presentation and the two false hits, and give the benefit of the doubt for the four foreign language publications, then this still comes to over 50 bona fide citations or a 150% increase on the Scopus figure. Interestingly, only a small amount of the difference (2 articles) was due to Scopus’s actual coverage of the journal literature while a greater effect (5 articles) was due to Scopus’s practice of not including lists of references (from which citation counts are generated) with in-press articles. Articles that should have been in Scopus but were simply missing (3) also diminished the Scopus count, and if all the potential Scopus citations had been generated correctly then the citation figure would have been 29 rather than 21.

In the past Google Scholar has been strongly criticised for the inaccuracy of its citation counts, such as the inclusion of duplicate versions of the same article each adding to the citation counts of all the documents they cite, or citations from undergraduate student essays that the Google Scholar spider had harvested, but no such examples were found in this case. The appearance of two “phantom” citations is a concern but is not hugely significant in this case.

So where does this leave the Google Scholar h-index? I’m going to be really hard and eliminate the two working papers and the conference papers as there is no evidence of their peer review status, and I’ll also drop one of the foreign language citations just in case, so the inflation figure of Google Scholar citations is around 12% for this article. If I apply this to the Google Scholar h-index for this researcher and discount the citation counts for all this researcher’s articles by this amount then it drops by only two points, from 26 to 24.

It would be tempting to do the same in reverse to the Scopus figures and inflate the citation counts by around 40% but I’m not sure that this would be admissible as many of the Scopus citations will show up in the fullness of time as in-press articles get linked to their full reference lists. However the fact that three citing articles were simply missing from Scopus points to a more disturbing coverage problem that the comparison with Google Scholar highlights. It is well known that Scopus does not include all articles that it is supposed to and this has an obvious if unquantifiable effect on the reliability of its metrics.

So, this is an unrepresentative, unscientific back of the beer-mat rule-of-thumb calculation but it highlights a couple of points around the h-index as an indicator of long-term impact –

1) There is no such thing as the h-index, there are a number of them and they are all wrong.
2) For the social sciences (and also the humanities and probably business as well) the superior coverage of the Google Scholar h-index makes it a more realistic measure.

Bruce White

Leave a Reply

Your email address will not be published. Required fields are marked *

    Search Posts in this Blog

    Polls