Nobel Laureates - A Closer Look At The Data
Salvino A. Salvaggio  
1 . Background
The ritual is now well established: since more than a century every year on early October, the STEM small-world but also the media and many science-&-technology aficionados and enthusiasts wait for the Nobel Prizes announcement. As the 2016 round of Nobel laureates was announced a few days ago only, perhaps this is a good time to have a closer look at the Nobel Prize data, now covering 115 years (from 1901 to 2016).
The data are open and easily available in .csv from http://api.nobelprize.org/v1/laureate.csv. A json version is also available from http://api.nobelprize.org/v1/prize.json. For more details on the data, check this page on the official Nobel Prize website.
2 . Data Overview
2.1 . Categories
Since the Nobel Prize was established in 1901, 911 laureates or organisations were granted with the prestigious award. They are distributed in 6 categories: chemistry, economics, literature, medicine, peace, physics. The economics category has the lowest number of laureates (78) because it was established in 1969 only as The Sveriges Riksbank Prize in Economic Sciences –not really a Nobel Prize as the other five.
As shown in the next Section, the highest number of laureates in medicine and physics comes from the fact they comprehend the highest number of Nobel Prizes shared between 2 or 3 laureates.
2.2 . Sharing
A single Nobel Prize can be concurrently awarded up to 3 people (or organizations) that collaborate on a same topic or objective. Despite awards to one laureate are more common than Nobel Prizes shared by 2 or 3 laureates overall, the Nobel committee awards more frequently shared Prizes in scientific domains than literature or peace (which Prizes are still mostly granted to individuals). Data, however, also indicate a change in the global historical trend: although most of the Nobel Prizes have been awarded to single individuals, such trend is declining for the benefit of 3-member teams which are increasingly awarded since the 1950’s.
It is worth noting that the vast majority of laureates won the Nobel Prize once, but 6 laureates demonstrated that such an achievement can be something more than a once-in-a-lifetime experience, as they received the price twice or even more:
2.3 . Gender
The count of Nobel laureates by gender shows a huge and deep gap in the attribution of the Prize. Actually, there are approx. 17 times more males than females (836 to 49) with slight differences only between categories.
Over time, a (very) moderate improvement in the women representation has slowly been emerging, especially since the early 1980’s. The progressive rise of women amongst the Nobel laureates is not balanced, though. The growing number of female Nobel laureates mostly happened in medicine, literature, and peace. Since the year 1960, only 2 women won the Nobel in chemistry, 1 in economics and 1 in physics (back in 1963) whereas 11 were awarded for medicine, 10 for peace, and 9 for literature.
Based on this gender analysis, one could genuinely think that the Nobel committees that take the final decisions on the awards are heavily biased by a strong sexist prejudice. Actually, things are not that straight… Appearances can be (somehow) deceiving. As a matter of fact, Nobel committees can only decide whether to award or not nominees that have been submitted to their attention by nominators. So, the funnel to the supreme award can be summarised as follows:
- nominators –> nominees –> laureates
As the official website of the Nobel Prize also provides data on the nominators and the nominees (at least, for the period 1901 till 1965), it is possible to rebuild the funnel for a period as long as almost 65 years, covering 21586 nominations by 18344 nominators (a single person can be nominated more than once and a single nominator can nominate more people over his/her life).
This global vision changes the picture: data from the funnel show that the gender bias amongst the Nobel laureates is not entirely generated by the Nobel committees. The Nobel committees are actually fed with biased annual clusters of nominees by a highly biased set of massively predominant male nominators. But, overall, the Nobel Prize clearly appear as a male playground in all its selection phases and from its inception.
2.4 . Countries
The laureates dataset includes the country of birth and country of death of each Nobel laureate, but more importantly it provides the country of affiliation, i.e. the country where the laureate’s institution belongs to. A very limited number of laureates are affiliated to several institutions, and this could introduce some confusion. Besides the fact that these cases are limited in number, they often times (but not always) show a multiple affiliation in the same country. It is also important to highlight that the vast majority of Nobel laureates in literature or peace are not affiliated to a specific institution and, therefore, the dataset does not include any information on their affiliation country. That is why, analyzing the country of birth of the humanities Nobel laureates (literature or peace) seems more sensible.
Data show the predominance of the USA in both cases (scientific and humanities laureates). However, the geographic distribution of the laureates appears quite different. Scientific laureates are mostly concentrated into “developed” countries (USA, continental Europe, UK, Scandinavia, Russia and few laureates in Australia, India, China, Argentina), the humanities laureates are more representative of the diversity of the world overall with more laureates born in Africa, Asia, Latin and Central America in addition to the substantial proportion originally from the same “developed” countries as in the scientific laureates case.
International circulation or migration of high caliber researchers, scientists, talented literature authors and peace activists is a well known and documented fact, and Nobel laureates are not an exception to that. On the contrary ! But how does this happen ? What is the pattern, if any, of their migration ? The following graph does not allow to trace the laureates (of scientific Nobel mostly, but not only) from a lab to another, from a country to the next one, but it highlights the two geographical extremes of the route at a specific time, i.e. the country of birth and the country of institutional affiliation at the time of the Nobel. And what does the data show ? To cut a long story short, data shows two main “migration patterns”:
- The diagonal tells us that a number of Nobel laureated were affiliated to an institution located in their country of birth. So, these laureates –regardless of any possible mobility between the birth and the Nobel award– were working in their originating country when they were awarded. This could be described as a non-migration pattern. Along that same diagonal, it is also possible to spot which are the country that better retained their talents (France, Germany, Russia, Sweden, Switzerland, UK and USA).
- The high concentration of grey or grey-orange spots on the first three rows indicates that USA, UK, and Switzerland are strong talent attractors. A consistent proportion of Nobel laureates that were awarded while being affiliated to an USA, UK or Swiss institution were not born in these 3 countries. Germany and France somehow also attract talents, although in a lower proportion.
2.5 . Age
Although, the dataset does not provide the exact age of the laureates at the time they were given the Prize, the date of birth and the year the Prize was awarded are both available. Therefore, the calculated age might not be fully precise but it gives an accurate picture (with maximum 1 year difference from the reality). And the less one could say is that it takes patience to get the Nobel Prize. Both median and average age of the laureates overall are close to 60 (mean: 59.4, median: 60), and the median point in each Nobel category is always higher or equal to 54 year-old. The proportion of all the laureates who are up to 40-year old is ridiculously low, only 6.7% of the total (59 over 881). Despite things do not look as bad for the few women awarded (as 16.3% of them were 40-year old or less when awarded), this cannot be considered as a hard fact supported by the data analysis.
The trend of age by Nobel category over time also shows an increase in the age the Nobel Prize is awarded. The only decreasing trend is in the Nobel for Peace, for a number of outliers in the last decade bent the regression line.
 This document is the result of an analysis carried out by the author and reflects only and exclusively the opinions of the author. Therefore, this document does not involve in any way, neither directly nor indirectly, none of the employers, past or present, of the author. The author confirms that he has no conflict of interest in this matter.
 eMail: salvino [dot] salvaggio [at] gmail [dot] com
 It is, however, worth noting that the difference in gender proportions between nominees and laureates cannot be attributed to chance as the p-value of the t.test is near to 0.
 The hypothesis of men and women laureates having a different mean age by chance/coincidence cannot be rejected: p-value of the t.test is 0.52.
While I was circulating a draft of the present document, a friend of mine told me about the paper authored by Neil Saunders, Analysis of Nobel Prize Data, published early October on R-pubs which is similar to my work. Neil Saunders published his contribution a few days before mine, so all the credit goes to him.
In this blog I publish data analysis cases based on the R statistical language. No statistical or mathematical theory here, no discussions of the R language, no software tutorials, but only concrete case studies using existing R tools.
To download the R code and dataset, click here (1 MB).