“Olympic Legacies” Conference, Oxford, March 2008.

 

 

Searching for the Greatest Olympic Performances, Using a Complete Summer Olympics Database

 

By Charles Davis

 

A complete digital database has been developed for all performances in the Olympic summer games from 1896 to 2004, using historical sources and some online material. The database covers all athletes in all events in all sports, and includes ranks and performances at all stages including heats, qualification rounds and preliminary rounds. There are more than 105,000 athletes listed, and more than 275,000 lines of data.

 

Such complete digital data allows new forms of analysis. The performances and the competitiveness of entire fields, not just medallists or finalists, can be analysed, and changes tracked over time. Relative standard deviation (rsd) of results measures how closely together results cluster. The lower the rsd, the more competitive the event; in the 100m sprint in athletics, the rsd for men is 2-3%, while for women it is 3-4%. The rsd for the men’s long jump has declined from 6% from 1912-1924 to 3% in recent Games. The negative impact of boycotts in 1980 and 1984 can also be seen. Performances for individual champions can now be measured and compared statistically as z-scores, showing how far above their contemporaries they stood. Extraordinary z-scores were recorded by Bob Beamon in the 1968 long jump, Bob Hayes in the 1964 100m, and Wilma Rudolph in the 1960 100m. The search for exceptional performances is ongoing.

 

 

The Database

A complete digital database has been developed for all performances in the Olympic summer games from 1896 to 2004. The data has been sourced from printed sources and some online material. The Maritchev Volumes (Who’s Who in the Summer Olympics 1896-1992) and official reports of Olympic games have been instrumental in this endeavour. There are more than 105,000 athletes listed, and more than 275,000 lines of data.

 

The database is in a spreadsheet format, with one line per performance in most sports. Each line names an athlete and gives his or her performance in a specific race or contest. For a complete event, there are separate entries for each stage: heats, semi-finals, finals etc.

 

Each line names an athlete along with some limited information about that athlete. Date of birth is given where known, and alternative spellings of names (including name changes), where they have been recorded, are offered. Names are limited to the English alphabet. Where athletes have appeared in more than one sport, or for more than one country, this is recorded. A numbering system has been developed to simplify the identification of athletes, especially those who change names or countries.

 

Tracking down changes of names and movement of athletes has been a major task in preparing the database.

 

The performances of the athletes in individual events are recorded in fields covering the year, sport, event, and stage, with results recorded in terms of ranking and measured performance (times etc). Additional information is also available, particularly for recent games, including intermediate times in races. Results for the decathlon, for example, include a breakdown of results in each event.

 

For one-on-one contest-based sports, such as boxing, judo and wrestling, each bout gets two entries, from the perspective of each contestant. For “doubles” events such as in tennis and badminton, the team mate of the athlete, along with the identities of opponents, are given.

 

 While individual entry style is maintained for team events such as relay and rowing events, team sports are treated a little differently. Each player in a team gets an individual entry, but this won’t contain every team result, although the final placing of the team is given. Where available, the number of goals, points, etc contributed by the player in the tournament is given. There are separate records in the database for the team as a whole, giving all results.

 

The results include all “Demonstration” sports. Where a Demonstration sport has gone on to become an official sport, the status of the sport in specific years is identified.

 

Single Games Studies

The availability of comprehensive data in a single database greatly facilitates study of statistics of whole fields in events. This opens up many potential areas for study; one of these is comparison of male and female events.

 

As an example, at the Sydney 2000 Games, average performances for men’s and women’s fields were calculated for a range of events in athletics and swimming. The differences in performances for track events are summarised in Table 1.

 

Table 1. Sydney 2000 Track Events: Male and Female Average Performances

Number of Athletes

Avge

Difference

M

F

M

F

Gold

Field

100m

108

65

10.43

11.54

8.2%

9.6%

200m

55

48

20.75

23.20

8.0%

10.6%

400m

62

52

45.97

52.52

10.7%

12.5%

800m

55

34

107.43

122.35

10.3%

12.2%

1500m

39

37

219.73

250.49

13.5%

12.3%

5000m

33

48

819

931

8.9%

12.1%

10,000m

33

27

1684

1912

9.9%

11.9%

Marathon

75

40

8480

9180

9.1%

7.6%

Note: In most events, some non-competitive athletes, who did not meet qualifying standards, have been excluded.

 

In most events, the difference between male and female gold medal winners is less than the difference between the average performers, suggesting that male events on the whole are more competitive. This can be better measured by calculated relative standard deviation (rsd, a measure of how tightly the performances of all athletes are clustered together) of data for each event. A “Competitiveness Index” (CI) can also be used (Table 2), which takes into account both rsd and the number of competitors

 

            CI         =          (No of competitors)/(rsd x 100)

 

An event with a low rsd (performances of most athletes are in a narrow range), and with a high number of competitors, will have very high Competitiveness Index.

 

Table 2. Competitiveness Indices for selected male and female track events.

M

F

100m

54.5

21.9

200m

31.9

20.1

400m

36.5

15.3

Marathon

17.2

10.2

 

It is interesting that the difference between field performance is relatively low in the marathon (7.6%). In fact, there is considerable overlap between the weaker men’s times and the strongest women’s: the women’s winner, Naomo Takahashi, would have beaten 31 male competitors home. Such overlap is not seen in most events; in other events, such as in the pole vault, there are significant gaps between the weakest men and strongest women. The reduced difference between men and women in endurance events is paralleled in the swimming, where the difference between the fields in the 400m freestyle and medleys are 8.0-8.5%, while the 50m and 100m freestyle had differences of 12.2% and 10.6% respectively.

 

Analysis of whole fields can also allow comparison between events, and a quantitative measure of the greatest performances in these sports. By measuring the number of standard deviations from the average performance achieved by gold medallists, we produce a score (a type of z-score) that shows us how far above his peers a particular athlete stood. At Sydney, the highest z-scores recorded in athletics and swimming are given in Table 3.

 

Table 3. Highest “Z-Scores” in Athletics and Swimming, Sydney 2000

Men: 400m

2.72

Johnson, Michael

United States

Men: 400m Freestyle

2.47

Thorpe, Ian

Australia

Women: 200m

2.47

Jones, Marion

United States

Men: 100m

2.40

Greene, Maurice

United States

Men: Long Jump

2.33

Pedroso,Ivan

Cuba

Women: Triple Jump

2.30

Marinova, Tereza 

Bulgaria

Women: 100m

2.30

Jones, Marion

United States

Men: 100m Freestyle

2.23

Van Den Hoogenband, Pieter

Netherlands

Women: 100m Butterfly

2.20

De Bruijn, Inge 

Netherlands

Women: Discus

2.20

Zvereva, Ellina 

Belarus

 

 

Multiple Games Studies

Where results for all competitors are available, we can track changes in performance patterns over time. Note that there are many earlier Games (especially prior to 1952) for which times or performances data is not comprehensive, because this was recorded for winners or placegetters only.

 

The long jump is an interesting example because complete data for the whole field can be found back to 1912, earlier than for many other events. (However, data is not complete for 1932, 1936 1948, and 1956). A comparison of winning performance, average of Top 10, and median performance of the entire field through this history (Figure 1) highlights the improvements in performance that have occurred. It is perhaps surprising that there is little or no upward trend from 1968 to 2004 for either the winners or the Top Ten. In 1972, the Top Ten averaged 8.06 metres; it was exactly the same 28 years later. There was somewhat greater increase in the Median for the field, which increased from 7.68 metres to 7.86 metres.

 

Figure 1. Men’s Long Jump History: Best, Top 10, and Median

 

This means there is some “compression” in the performances, with the average competitor getting closer to the elite over successive Games. While not absolutely clear-cut, this is supporting evidence for Stephen Jay Gould’s theory of sports performance, which argues that as sports develop, the performances of the very best competitors changes little as they approach physiological limits, but the number of elite competitors does increase.

 

By incorporating the data from all competitors in a given event, the competitiveness of the event can be tracked over time. A fall in rsd, or an increase in CI, over succeeding Games is evidence of increasing competitiveness.

 

This can be seen a little more clearly in the rsd. Changes in rsd for the men’s long jump are tracked in Table 4.

 

Table 4. Men’s Olympic Long Jump. Relative Standard Deviations

1912

6.57%

1920

5.79%

1924

7.12%

1928

4.92%

1952

3.85%

1960

5.33%

1964

4.20%

1968

5.88%

1972

4.13%

1976

4.57%

1980

5.76%

1984

6.69%

1988

4.82%

1992

3.96%

1996

2.96%

2000

3.06%

2004

3.21%

 

 

The rsd for the men’s long jump has declined from 6% from 1912-1924 to 3% in recent Games. The negative impact of boycotts in 1980 and 1984 can be seen. The three most recent Games appear to be clearly the most competitive in this event.

 

The 100m Sprint

 

Complete data for all competitors is available from 1952 onwards in both the men’s and women’s events. There have been changes in numbers of competitors in these events, so the Competitiveness Index shows the clearest result. (Figure 2)

 

Figure 2. 100m Sprint - Men and Women Competitiveness

 

Both men’s and women’s events have increased in CI over the years, with the men’s event maintaining a clear margin, the CI for men being almost double the CI for women. There is some scatter in the results, and it is notable that in spite of the positive trend, the men’s 100m in 1952 appears, by this measure, to have been almost as competitive as the 2004 Games. The effect of the boycott is seen clearly in the 1980 Games, but less clearly in 1984.

 

100m Freestyle

Data since 1924 (Figure 3) shows competitiveness increasing at a faster rate than for the 100m sprint. This suggests that swimming has undergone a steeper development curve than sprinting. However, there is scatter in the data; it may well be that development in the sport has been slower since 1952 than it was before. Improvement has been faster than in the 100m sprint or long jump; men’s performances in the last 50 years have improved about 5% in the 100m sprint (similar for women) and 10% in the long jump, but 16% in the 100m freestyle (19% in the women’s 100m freestyle). 

 

A striking feature is that competitiveness in the women’s event is much closer to the men’s than in the case of the track race. In fact, at a few Games the women’s event has been more competitive than the men’s. This includes early Games in 1924 and 1932, a surprising finding.

 

The boycott of 1980 appears to have severely damaged the competitiveness of both men’s and women’s events at that Games, more so than in the athletic events studied.

 

Figure 3. 100m Freestyle Swimming - Men and Women Competitiveness

 

 

The Measure of Greatness

The search for the most extreme individual z-scores can be extended across all Games for which complete performance data is available. The highest z-scores tend to be recorded by men, perhaps because they have historically outnumbered women competitors, often by a large margin. The search for extreme performances will be an ongoing effort; for now, this analysis will conclude with the highest z-scores recorded in the small number of events studied. As it happens, each of these performances has already earned lasting fame in the annals of the Olympic Games.

 

 

z-score

Men’s long jump (1912-2004)

3.05

Bob Beamon, Mexico City 1968

Men’s 100m Sprint (1952-2004)

2.65

Bob Hayes, Tokyo 1964

Women’s 100m Sprint (1952-2004)

2.42

Wilma Rudolph, Rome 1960

Men’s 100m freestyle (1924-2004)

2.48

Jim Montgomery, Montreal 1976

Women’s 100m freestyle (1924-2004)

2.52

Dawn Fraser, Tokyo 1964