Citations and Circulation Counts – Not Really Holiday Reading

December 22, 2016

If you think that an article called Citations and Circulation Counts – Data Sources for Monograph Deselection in Research Library Collections doesn’t sound like terribly exciting reading for the holiday break you may well be right. But one thing that does sometimes get people excited (well, worked up) about libraries is when they THROW OUT OLD BOOKS!!! Especially when those old books are not just any old books but seminal scholarly works that have influenced a generation of researchers in their field and changed the course of research and someone who doesn’t know anything about the subject in question has decided to retire them because they just look like, erm, old books.

University libraries buy many thousands of books every year (increasingly in electronic format and as part of packages rather than as individual titles) and, although they are all intended to support teaching or research, not all of them are timeless contributions to scholarship. And neither should they be. Many are aimed at the needs of undergraduate students or deal with the “topic of the month” and as time passes they become dated and even misleading and cease to merit their place on library shelves. In order to create space for newer titles, and to give them the prominence they deserve, libraries periodically remove older titles from their shelves and either place them in storage or dispose of them altogether. Personally I had the term “weeding” because it suggests that some books are noxious when really they’re just old, but that’s what it’s called. Sorry.

The question then arises as to how the decision is made to remove a book from library shelves or retain it. In the distant past librarians looked at issue slips and date stamps to see how often a book had been borrowed and when it was last returned, and as library systems automated this information became available as DATA in the form of circulation counts and last-used dates. Since at least the 1970s there has been a strong consensus among librarians that weeding, which I euphemistically call deselection, should be based primarily on these system-derived data, along with the publication date, as being the best indicator of a book’s value to library users – i.e. a book that is used less often than another similar one is a better candidate for removal from the shelves. There is a strong inherent logic to this, and libraries need to ensure that the books occupying valuable shelf space are doing their job of providing information to researchers and students, but books are also unique individual objects and there is a real problem with comparing apples (a scholarly work on the philosophy of Plato) with oranges (a text aimed at first year business students). The long tail, to swap metaphors, is a very real phenomenon with books and the function of a research library is to serve the highly specialised niche just as well as it does the airport market.

Of course librarians have recognised this and have tried to avoid applying circulation data in a totally formulaic way. Every subject from nursing to robotics has its classics that can’t simply be removed from the shelves (although they might sometimes be consigned to “stack” collections) and a host of other older titles that are still of current relevance and importance, and deciding which of these to retain and which to dispose of becomes a matter of “professional judgment.” Unfortunately librarians are generally not in a position to know which books on a given subject are of ongoing importance and which are not. (A relatively easy experiment will prove this but it can try the patience of one’s colleagues so exercise caution if trying to replicate this finding.) This is not because we as a group are spectacularly ignorant, however, but because knowledge is so specialised and granulated that in any given area only a few (well, maybe a few hundred) experts know the literature well enough to say with any confidence which book is a classic, which one is a significant work and what other titles are past their use-by-dates. Circulation data are an unreliable guide to this as highly specialised works may be intellectually accessible to only a small number of readers, but no less important, or more important, than popular titles on the same topic.

For the past few years some of us at Massey Library have been using the Scopus database to find citations of books within the scholarly literature as a means of identifying these titles of long-term significance. By including full references in addition to abstracts Scopus makes this relatively easy, and by scripting a Scopus search into an Excel spreadsheet of titles it takes less than a minute per title to find out how many journal articles and books have cited each book. While this is still “only a metric” (and we are still largely ignorant of the books contents) this turns out to be a really useful metric in determining significance and, dare I say it, value, particularly when it is also possible to find the number of recent citations and those by Massey authors. Who knew, for example, that Aristotle’s History of Animals has been cited 195 times since 2010 (once by a Massey researcher) while Milton Fingerman’s 1981 tome Animal Diversity has only ever been cited three times, the most recent in 2003? It seems obvious when you see it like this with the numbers, but without them I would have had no way of guessing that the Fingerman book had made no significant contribution to scholarship while good old Aristotle is still receiving a lot of attention.

I became interested in testing how using citation data as well as circulation data for deselection decisions compared with using circulation data only, and whether the additional effort was worth it in terms of the additional information received. If the citation data merely confirmed what we already knew from circulation data (i.e. if Aristotle had been borrowed many more times than Fingerman) then the additional effort was not worthwhile, but if the citation data failed to correlate significantly with the circulation numbers then it was giving us new and important information. (That is if you think keeping a copy of Aristotle’s work on zoology in a university library is important.) So I sat down for several weekends in the winter months of 2015 and clicked on links for over 1200 titles in zoology and political sciences and the answer was YES, the citation data did not correlate strongly with the circulation data and was supplying new and important information on which to guide our decisions. You can read the full article from the journal College & Research Libraries here.

This study relates to the deselection of print books and may seem rather out-of-date in a world where scholarly  books  are incresaingly becoming electronic and virtual. However, physically removing books from shelves is only one means by which libraries filter and shape their collections and we have already faced the prospect of having to remove dated ebooks from our virtual shelves. In fact if libraries and collections are to have a real meaning in future, as opposed to just an “everything that exists” approach, then some means of identifying and measuring significance as opposed to usage or popularity will become even more critical and the citation approach would seem to be a promising one. We are currently looking at the prospect of automating the citation gathering process so that in future we don’t have to click on hundreds or thousands of links and that has the potential of creating a very useful tool for both librarians and researchers.

One of the nervous-making things about doing research of this kind is the possibility finding that what you have found is not original, that someone got there before you and your work is confirmatory but not original. Fortunately I didn’t find that to be the case and the literature showed a strong consensus towards using circulation data only – in the rare case it was mentioned as a possible means of supporting deselection decisions it was more or less dismissed out of hand. However no good idea is entirely original and I did find one precursor paper hidden in a footnote, a 1989 article by Amrita Burdick – Science Citation Index Data as a Safety Net for Basic Science Books Considered for Weeding. At the time Burdick carried out her study Scopus didn’t exist and Science Citation Index was expensive to use so her sample was only 79 titles, but she made the prescient observation that “a correlation of citation frequency with circulation statistics would provide evidence of the value of citations in predicting use.” Actually what was found was that citation and circulation figures do not correlate closely but it was exciting to test her hypothesis.
Bruce White
Science Librarian

Leave a Reply

Your email address will not be published. Required fields are marked *

    Search Posts in this Blog

    Polls