Joseph Greenberg’s comparative notebooks

Judith Kaplan
University of Pennsylvania

In John Webster Spargo’s 1931 translation of Holger Pedersen’s contribution to the genre of Disziplingeschichte, readers are introduced to a legion of mostly well-bearded men, marching toward the ‘discovery’ of the Comparative Method. Summing up his approach to nineteenth-century developments, Pedersen writes: “Evolution of method and expansion of material went on side by side, with constant reciprocal influence. But in the following treatment, the two sides of the growth of our science will be considered separately.”[1] This way of writing the history of linguistics—differentiating substrate from substance—has had enduring appeal, with emphasis given primarily to the latter. In what follows, I attempt to reunite the material and intellectual ‘sides’ of linguistic research, paying attention to the ‘constant reciprocal influence’ between Joseph Greenberg’s (1915-2001) comparative notebooks, on the one hand, and his inclusive genetic hypotheses, on the other.

Greenberg’s training, like his subsequent linguistic research, was staggeringly broad by contemporary standards. Not only was he an accomplished musician (see the previous hiphilangsci post for more on the productive exchange between music and linguistics), his work bore the imprint of early exposure to American structuralism, European structuralism, comparative-historical linguistics, logical positivism, and cultural anthropology as well.[2] Furthermore, Greenberg was committed to exploring intersections between (what came to be) linguistics and various allied disciplines, which provided crucial reinforcements to his own theories in return. As he wrote in the preface to Essays in Linguistics in 1957:

In the nature of things, problems as diverse as those dealt with here often have solutions which do not depend on one another. If there is any single point of view that runs through the whole, it is that further substantial progress in linguistics requires the abandonment of its traditional isolationism, one for which there was formerly much justification, in favor of a willingness to explore connections in other directions. The borderline areas most prominent in the present essays are those with logic, mathematics, anthropology, and psychology, but of course, others exist.[3]

Autonomy was won from philology and anthropology only to beg the question of their newly interdisciplinary significance. Despite such breadth of research outlook, Greenberg is best remembered for two contributions—to linguistic typology and genetic classification. Nor were these disconnected. As others have noted, Greenberg regarded the work of genetic classification as a necessary preliminary to typological analysis, in which he sought to identify universal phenomena through constraints on cross-linguistic variation (ranging over distinct family and areal groupings, once established), rather than cross-linguistic uniformity.[4]

My remarks in what follows will focus on three of Greenberg’s genetic studies—concerning the Indo-Pacific, Amerind, and Eurasian hypotheses—setting to one side his most celebrated work on the languages of Africa.[5] Each one of these cases represents Greenberg’s irrepressible research style. As he told Peter Thomas in a 1994 interview looking back on the African classification, “I’m attracted…to areas of the world in which classification has not yet been accomplished to people’s satisfaction. There are always new etymologies to be discovered…it’s very much like detective work.”[6] Just how was this detective work to be carried out? He continued,

In Africa…it seemed to me that the sensible thing was to actually look at all of the languages. I usually had preliminary notebooks in which I took those elements of a language, which, on the whole, we know are the most stable over time…I would look at a very large number of languages in regard to these matters, and I did find that they fell into quite obvious groupings.

Two aspects of this recollection stand out to me: Greenberg’s allusion to the controversial method of multilateral comparison, and his introduction of the comparative notebooks.

The method, of course, would become a lightening rod after the 1987 publication of Language in the Americas.[7] But it was anticipated already in a 1953 essay on the study of “Historical Linguistics and Unwritten Languages,” published in Alfred Kroeber’s encyclopedic project, Anthropology Today.[8] Revised and expanded for Essays in Linguistics, this relied on a two-fold justification of the “technique of group comparison of languages.”[9] The first point had to do with statistical power:

The likelihood of finding a resemblance in sound and meaning in three languages is the square of its probability in two languages. In general, the probability for a single language must be raised to the (n—1th) power for n languages…Hence the presence of a fair number of recurrent sound-meaning resemblances in three, four, or more languages is a certain indication of historical connection.

The second concerned the historical validity of a stepwise grouping procedure: “…it is not mere percentage of resemblances between pairs of languages which is decisive,” he wrote, “but rather the setting-up of restricted groups of related languages which then enter integrally into more distant comparisons.” The “effectiveness” of the procedure was asserted visually with the following table.


Fig. 1: From Greenberg, “Genetic Relationship Among Languages,” in Essays in Linguistics (New York: Viking Fund Publications in Anthropology No. 24, 1957), 43.

This table reflects certain “affordances” and limits of using “paper tools” to work with language data—it is easy to see that rows and columns in the table above are duplicated to fit words of more than six and nine characters, respectively.[10] The material possibilities and constraints of managing data on paper played a part in Greenberg’s research practice as well. His comparative tables were inscribed within standard-issue narrow-ruled (e.g. Hammermill “eye-ease”) notebooks, forty-one lines from top to bottom. Though he would often defend the method of multilateral comparison in terms of the statistical power to be gained from detecting patterns across as many languages as possible, in actual practice, one could only ‘eyeball’ data from as many languages as would fit on a given notebook opening with forty rows.


Fig. 2: Cover of Eurasiatic Comparative Notebook. Joseph H. Greenberg Papers, Department of Special Collections, Stanford University Libraries: SC 0615, Accn. # 2001-126, Box 2.

Greenberg was evidently and justifiably proud of his unique solution to the problem of linguistic data management. He credited his notebooks directly in a seldom-recognized 1971 publication on the “Indo-Pacific” hypothesis, which argued for a new macro-family of the same name.[11] In it, he wrote:

Since 1960, I have assembled virtually exhaustive vocabulary information from published sources and a few unpublished ones…These were entered in 12 notebooks each of which contains places for upwards of 350 entries, many of which however, are obviously not available in the existing sources. Each notebook contains entries from 60 up to 80 language sources and is organized roughly by groupings of languages whose closer relationship to each other are evident on inspection. In addition detailed notes on grammar were copied in three additional notebooks and reclassified by historically significant rubrics in still another notebook.[12]

Greenberg was able to make data from sixty-to-eighty languages fathomable at a single glance by taping wings to the cover boards, front and back, on which they were listed; and by duplicating semantic field columns, left then right, for each opening. His manipulation of paper thus promoted what Lorraine Daston has referred to as the “all-at-once-ness of virtuoso perception” by making broad comparisons ready for a “mental picture recorded through the eye.”[13]

Elaborating on his procedure for television viewers in 1994, Greenberg recalled the path to Amerind:

I began to take the common words, write them down, so on, and look at them. And eventually, I put them into notebooks, and the notebooks are like the ones I have here, in which you have the names of languages down one side, and down the other. One can get   eighty languages in a notebook like this. And across, I have various words in English for which we find translations in the American Indian languages. So, for example, on this page, after having finished putting the numerals in, I have the pronouns, so I have “I” and “thou,” the second person singular pronoun. But, the notebook is actually fairly extensive and contains hundreds of words…[14]

Though Greenberg’s approach was decidedly empirical, it is worth mentioning that the vast majority of his notebook entries were copied from the publications and fieldwork of other linguists. In this way, they reflect the “logic of distributed effort” that contemporary data curators have attributed to the sub-discipline of genetic linguistics.[15] The idea, simply put, being that linguistic fieldwork is too time and resource intensive for any one linguist to survey all the languages that would need to be compared. This has meant that there is more trust, greater intertextuality, and less verification in comparative-historical linguistics than in other data-intensive research traditions. Greenberg frankly acknowledged this state of affairs in prefacing Language in the Americas: “The present work is in many respects a pioneering one…” he wrote, “[a]lthough I have exercised great care, it would be miraculous if, in handling such a vast mass of material, there were no errors of fact or interpretation.”[16] Readers were invited to track citations through manuscript copies of the notebooks, which Greenberg deposited—anticipating open source ethics today—at Stanford University’s Green Library.[17] Even so, potential errors could be absorbed by the method of multilateral comparison itself. In one of the most controversial claims he made on behalf of the method, anticipating contemporary “data-driven” science, Greenberg asserted it “is so powerful that it will give reliable results even with the poorest of materials. Incorrect material should have merely a randomizing effect. If a clear pattern emerges, the hypothesis is all the more likely to be correct.”[18]

Greenberg traced this work back to “a paper given in 1956,” which kicked off the collection of “a truly massive amount of data.”[19] The numbers here matched those given for the Indo-Pacific hypothesis mentioned above, lending support to the notion that material constraints—in addition to family specific considerations—gave structure to the language groupings under consideration. Greenberg continued:

The materials on Amerind and Na-Dene alone…occupy 23 notebooks…each containing information on approximately 80 languages, and each including, so far as the sources permit, a maximum of 400 items for every language. On a rough estimate, these notebooks encompass well over 2,000 sources and contain perhaps a quarter of a million separate entries.

The ‘items’ he collected roughly conformed to the Swadesh lists of “basic vocabulary,” standardized versions of which were also under development during the early years of Greenberg’s extensive note-taking.[20]

As sweeping as it was, the classification presented in Language in the Americas was less ambitious than originally intended. Etymological treatment of the Eurasiatic family, which Greenberg believed was linked to the Eskimo-Aleut languages of North America, was reserved for a later volume, and occupied him until his death. The foundation of this project, a smooth extension of his earlier work on African, Indo-Pacific, and Amerind comparisons, was laid—again—in a series of manuscript notebooks, adapted for the purposes of surveying basic vocabulary for as many as eighty languages at once.


Fig. 3: Opening of Eurasiatic Comparative Notebook. Joseph H. Greenberg Papers, Department of Special Collections, Stanford University Libraries: SC 0615, Accn. # 2001-126, Box 2.

This opening shows a selection of Eurasiatic languages grouped by proximity (belonging, in order of appearance (top to bottom and left to right), to Uralic, Turkic, Tungusic, Mongolic, Chukotko-Kamchatkan, Eskimo-Aleut, Korean, Ainu, and Japanese). By working in pencil, Greenberg was able to erase and revise entries (as seen under the second Uralic group in the left-hand column). Across the top, are the vocabulary items “morning,” “mosquito,” “mother,” “mountain,” “mouse,” and “mouth.” Significantly, these words exceed both the 100- and 200-item Swadesh lists (“morning,” “mosquito,” and “mouse” not appearing on either one), nor do they entirely overlap with entries in the Amerind etymological dictionary (“morning,” “mother,” and “mountain” absent here), suggesting that genetic and/or areal specificity superseded global comparability in Greenberg’s collection priorities to at least some degree. By manipulating the material affordances of standard narrow-ruled notebooks, Greenberg realized a key concept of multilateral comparison, to look at “many languages across a few words,” rather than “looking at few languages across many words.”[21]

Greenberg’s notebooks were intimately bound up with his controversial method of multilateral comparison, and they imposed certain physical limits on its design and development (“storage” in this case having to do with the size and spacing of a sheet of ruled paper). Though decidedly manual and papery, they anticipate several issues currently facing the digital curation and archiving of descriptive language data—how to commensurate sources through a transparent data structure, how to facilitate broad and exploratory comparisons, and how to make sources open and available to peer review.


[1] Holger Pedersen, The Discovery of Language: Linguistic Science in the Nineteenth Century, trans. John Spargo (Bloomington: Indiana University Press, 1931), 13.

[2] William Croft, “Joseph Harold Greenberg,” Language 77 (2001): 816.

[3] Joseph Greenberg, Essays in Linguistics (New York: Viking Fund Publications in Anthropology, Vol. 24, 1957), v.

[4] Martin Haspelmath, “Preface to the reprinted edition,” in Language Universals: With Special Reference to Feature Hierarchies” (Berlin & New York: Mouton de Gruyter, 2005 [1966]), vii.

[5] The latter initially appeared serially in seven papers published in the Southwestern Journal of Anthropology during 1949 and 1950.

[6] This and subsequent comments were aired on the PBS program Nova, part of a show titled, “In search of the First Language.” A full transcript is available at (accessed 10 April 2017).

[7] Joseph Greenberg, Language in the Americas (Stanford: Stanford University Press, 1987).

[8] Joseph Greenberg, “Historical Linguistics and Unwritten Languages,” in Anthropology Today, ed. A. L. Kroeber (Chicago: University of Chicago Press), 282.

[9] Greenberg, Essays, 40.

[10] Boris Jardine has written an excellent essay on these concepts, “State of the Field: Paper Tools,” forthcoming in Studies in the History and Philosophy of Science, C. In it, he explores the concept of material affordances, derived from the work of James Gibson (1979) and Tim Ingold (2007). The term and literature on “paper tools” stems from Ursula Klein’s work on Berzelian chemical classification (1999, 2001), informed by Bruno Latour’s approach to the study of scientific inscriptions (1986).

[11] Joseph Greenberg, “The Indo-Pacific Hypothesis,” Current Trends in Linguistics 8 (The Hague: Mouton): 808-71.

[12] Ibid, 808.

[13] Lorraine Daston, “On Scientific Observation,” Isis 99 (2008): 101.

[14] Transcript at (accessed 10 April 2017).

[15] Lewis, Farrar, & Langendoen, “Linguistics in the Internet Age: Tools and Fair Use,” in Proceedings of the EMELD ’06 Workshop on Digital Language Documentation: Tools and Standards: The State of the Art (2006). Available at E-MELD (accessed 12 April 2015).

[16] Greenberg, Language in the Americas, ix.

[17] On open source imperatives in the sciences, see e.g. Sabina Leonelli et al, “Sticks and Carrots: Encouraging Open Science at its Source,” Geo: Geography and Environment 2 (2015): 12-16. (accessed 10 April 2017).

[18] Greenberg, Language in the Americas, 29. On “data-driven science,” see e.g. Bruno Strasser, “Collecting Nature: Practices, Styles, and Narratives,” Osiris 27 (2012): 303-340.

[19] This was published as “The General Classification of Central and South American Languages,” Selected Papers of the Fifth International Congress of Anthropological and Ethnological Sciences, A. Wallace, ed. (Philadelphia: University of Pennsylvania Press, 1960), 791-794.

[20] Morris Swadesh, “Lexicostatistic Dating of Prehistoric Ethnic Contacts: With Special Reference to North American Indians and Eskimos,” Proceedings of the American Philosophical Society 96 (1952): 452-463; Idem, “Towards Greater Accuracy in Lexicostatistic Dating,” International Journal of American Linguistics 21 (1955): 121-137.

[21] Greenberg, Language in the Americas, 23.

How to cite this post

Kaplan, Judtih. 2017. Joseph Greenberg’s comparative notebooks. History and Philosophy of the Language Sciences.

Tagged with:
Posted in 20th century, America, Article, History, Linguistics, Typology
One comment on “Joseph Greenberg’s comparative notebooks
  1. Pierre Bancel says:

    Dear Dr Kaplan,

    Thank you very much for your post. I am Greenberg’s translator into French (of his Eurasiatic book, vol. 1, a fantastic deed on many accounts) and am happy to see his name wherever I find it. I had heard of course of his notebooks but had never seen one until tonight.

    Best regards,
    Pierre Bancel

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Upcoming events

28-30 June 2023
Faro, Portugal
International Inter-association (History of Language Teaching) Conference
Language teachers, methodologies and teacher training in historical perspective

4-6 September 2023
Universidade de Trás-os-Montes e Alto Douro, Portugal
2023 Annual Colloquium of the Henry Sweet Society
What counts as scientific in the History of Linguistics?

6-9 September 2023
Universidade de Trás-os-Montes e Alto Douro, Portugal
XXXII. International Colloquium of the “Studienkreis ‘Geschichte der Sprachwissenschaft'” (SGdS)
Controversies in the history of linguistics

%d bloggers like this: