Friday, December 21, 2007

Evaluating Researchers

With the recent bickering about the ranking of universities and how or whether the emergence of a host of new universities in Norway has lowered the overall standard of higher education and research, I thought it would be kewl to shift the focus to the individual researcher. As an academic and a scientist, I'm being evaluated every time I submit a grant application, negotiate a raise, teach a class or submit a research article. Even more so when applying for a job within academia - when I applied for my present position, my credentials and academic potential was scrutinized by an international commitee which compared the merits of the applicants with the international standard for the same type of position. Since then, I've been evaluating researchers and graduate candidates myself in lower-level commitees (i.e. not for hiring of faculty members, but for PhD and post doc positions at two universities). From my own experience, it's sometimes been very hard to distinguish between candidates (conversely; some times it's hardly been a competition at all), and with commitee members emanating from different backgrounds, it can be quite an ordeal to agree on a common set of rules for how to evaluate researchers. Granted, this is a lot easier when evaluating candidates for a PhD position, as grades then become the primary measure. More specifically, a candidate might be evaluated by:
  • Relevance and level of education: If you've got a BSc in art history and you're applying for a PhD scholarship in plasmonics, then you're 0 for 2.
  • Grades: Nag and complain all you want about the relevance of grades, but unless you've got a better system of objectively evaluating the student's ability to learn new material in a given time frame and demonstrate the acquired knowledge at the end of this period, have a big, tall glass of stfu. The better the average grade, the better the student is at absorbing new information and demonstrating new knowledge. For one particular course there might be cases of "Man I got the worst luck on this exam", but averaged out over the fifty or so final exams you take during your undergraduate years...I don't think so.
  • Relevant grades: If two students have equal/similar average grades but one has done poorly in introductory phisosophy while consistently having good grades in the topics most pertinent to the job description, while the other candidate has so-so grades in the relevant topics but excellent grades in all the perspective courses, then the selection process is a no-brainer.
  • Relevant experience, other: If you've got relevant experience from industry or something which is pertinent to the job description, it definitely counts in your favor. However, being President of the Partying Down Chapter of the Student Union five years running probably only gives you a high risk of kidney failure or liver problems at an early age.

Post docs are often much more tricky to evaluate. The number of publications counts of course, but this has to be weighted against the number of years since graduating. Teaching experience may or may not be relevant depending on the position. For hiring new faculty members, I'm thinking the equation is even more complex, despite having more criteria to check off, like:

  • Does the candidate fulfill the formal criteria. (i.e. does the applicant have a PhD from an accredited academic institution, and not just some diploma from some fictional university like University of South Ucklahoma - U. SUck)
  • Teaching experience: Some countries and institutions only look at the teaching experience a candidate can put on a paper, while others require applicants to either have or to complete a pedagogic course within a specified time frame. Since the job description calls for 50% teaching (at least in Norway), it's a great idea on papyrus to introduce a pedagogic course in order to bring everybody up to a minimum required level. It's a great idea on paper...
  • Management experience: ....'cause you're gonna have to do a lot of paperwork if you get a faculty position.
  • Prospects of candidate: How valuable to the institution is the applicant going to be? The "Young and Promising" factor.
  • Quality and productivity of research: The meat and potatoes of the competency.

The quality of the research and the productivity of the researcher can, as luck would have it, be evaluated semi-objectively through the publishing record. Or rather; by asking

  1. How many publications does the candidate have?
  2. What is the publication frequency?
  3. Where does the candidate publish/median impact factor?
  4. Where on the author list does the candidate consistently appear?

All these four questions are important, because the first two are related to the productivity of the researcher, the third is related to the quality of the research (assuming that a higher impact factor - roughly estimated as the number of expected citations after three years for a given journal - is correlated to quality and not just readership), and the fourth is indicative of the candidate's relative importance to the emergence of the research findings and subsequent publication (assuming the research group follows standard protocol for positioning authors). Most academics I've talked to agre that all these factors are determining, but agreeing on the weighting of these factors is more troublesome. Luckily, some physicist fella from UCSD at La Jolla named Jorge E. Hirsch published an article in Proceedings of the National Academy Of Sciences of the United States of America titled "An index to quantify an individual's scientific research output" (PNAS, 2005, 102(46), 16569-16572).

This index, typically referred to as the h-index or Hirsch-factor provides an objective bibliometric measure of the distribution of citations relative to the total number of peer-reviewed publications for a given researcher. Hirsch defines this index as "A scientist has index h if h of his Np papers have at least h citations each, and the other (Np - h) papers have at most h citations each". What's so cool about that? It looks at the overall publication trend of a researcher, which can then be used to see whether that one "Science" paper was a fluke, or whether this researcher really is this good. Moreover, the h-index is available as a tool on Web Of Science, making it easily accessible to other researchers. Awesome; I gotta get me some of that. Are there limitations? Sure - including but not limited to:

  • Limited by the number of total publications. If a brilliant scientist churns out only five publications, but these five are sufficient to yield a Nobel prize and spawn new technologies or scientific paradigms, said scientist is still limited to h = 5.
  • The h-index is sensitive to self-citations
  • Does not in its present form limited by citation data bases; publications such as books do not count.
  • Suffers from a time lag, meaning that the h-index for "new" academics is misleading.
  • Does not account for the number of authors.
  • Does not account for gratuitous authorship, in that it doesn't distinguish free-loaders from hard-working researchers

A number of people have made modifications to the h-index in order to overcome some of these difficulties, like the h-b index, wherein Michael Banks of the Max Planck Institute shifted the focus from author to topic: "For the case of a topic it is useful to define the h-b index in terms of the number of years, n as h = nm If the h-b index is linear with the number of years, then m is given as the gradient. In this respect, a compound or topic with a large m and h-b index can be defined as a hot topic". How much of an improvement is this? Not much, as far as I'm concerned, especially when the task at hand is ranking scientists. Besides, it doesn't take a genious or a spreadsheet to figure out that anything "nano" is going to have a higher h-b index than, say, phrenology. A more mathematical approach was taken by Leo Egghe in defining the g-factor: "Given a set of articles ranked in decreasing order of the number of citations that they received, the g-index is the (unique) largest number such that the top g articles received (together) at least g^2 citations". Note that the g-index suffers from approximately the same shortcomings as the h-index.

I am all for use of the h-index as a guideline, in case it wasnt obvious, as it provides a documented yardstick which is easily accessible for scientists. The fact that it only takes peer-reviewed work into consideration doesn't really detract from its usefulness in my opinion.

2 comments:

Anders said...

What's wrong by rating alphabetical after the first name? That would rate me just behing Albert Einstein, and that sounds pretty fair to me... ;-)

Let's not consider the teaching part, even though you've said that could be ~50% of an academics work. Then I think we all agree that the quality is documented by the researchers publication record. Of course the quantity of publications are easy to measure, the problem is to (objectivly) measure the quality of the publication and how large part of the work the researcher has done. I.e. when I have to read up on a subject outside my main field of interest, I find it hard to evaluate how good a publication is. However, I do get a really good sense of what is good and what is not after a (relativly) short while. So I think most academics know a strong publication from a weak one, but would have a hard time putting up objective parameters that could easily be measured.

In short, like grades, the h-index does have it's flaws, but it's the best we got. I see no problem using it (as a guideline, like you said).

Wilhelm said...

You're right; since teaching is 50% of an academic position at your standard Norwegian university, it does suck that being a good teacher gets you nowhere. This is actually one of the things that is being changed, starting with the introduction of the mandatory pedagogic course. However, they haven't figured out how to measure "good teaching", and consequently, reward for teaching efforts remains a pipe dream.

The only thing even resembling such a reward system is those "teaching awards" assigned by the students.