Function and form of reference works:

Setting a record straight for J. McH. Sinclair and the COBUILD

 

Robert de Beaugrande

 

[COMMENTARY: This was originally part of a piece I wrote in defence of corpus listen which, like so many of its fellows, was shot down by defensive reviewers. Sinclair had written a riposte to a ‘book review’ of the COBUILD Dictionary by one Broder Carstensen, only to have it also ‘reviewed’ and ‘rejected’ by the same journal. Here for the first time his reactions are made public - RdB]

 

The field of lexicography has, rather understandably, not attracted much interest in GLATZmodern linguistics. Among other reasons, the lexicon itself has been largely neglected (Bolinger 1970), no doubt because it resists idealisation much more than does grammar, and looks to Saussurians like another ‘heterogeneous mass’ (section 1). Saussure (1966 [1916]: 133) himself proposed to ‘say that languages in which there is least motivation are more lexicological, and those in which it is greatest are more grammatical’. Chomsky (1965: 86f) in turn saw an ‘advantage’ in making ‘the lexical entries’ absorb all the ‘idiosyncrasies’ and ‘irregularities of the language’. Such defensive moves are predictable if the order of the lexicon, as I have described it, could not be grasped by mainstream ‘theories’ portraying language as a static, determinate system (sections 3 and 4).

Such linguists would have even less respect for the slogging work of lexicographers. There you must produce and publish a reasonably complete and final description, whereas the linguists can busy themselves indefinitely with elaborate theoretical ratiocinations which prescribe how a description should be done, but which provide just a few episodic demonstrations (e.g. Chomsky) or none at all (e.g. Hjelmslev). Lexicographers must plunge undaunted into the ‘mass’ of ‘speech facts’ that those same linguists feel entitled to disregard; and must commit themselves in print to formulations of multifarious grainy details about the size and contents of the ‘vocabulary’ of the language and of the ‘meanings’, including numerous ‘idiosyncrasies’.

In return, lexicographers have, just as understandably, not been much devoted to the exposition of ambitious theories. Most dictionaries contain brief prefaces dealing chiefly with practical matters: how to look up items, how to interpret the symbols and abbreviations, or how the dictionary was compiled. The publishers doubtless expect that the general public would have little interest in the theoretical issues and problems of lexicography, and would not bother to read extensive prefaces which explore them.

Still, such issues can bear on the users in practical ways, such as the sequential order of the definitions, which tends to be understood as a ranking of importance, perhaps incorrectly. The preface of Webster’s Seventh of 1963 tells us that ‘the earliest ascertainable meaning is placed first’, on the grounds that ‘the historical order is of especial value to those interested in the development of meaning, and offers no difficulty to the user who is merely looking for a particular meaning’ (p. 5a). Surely most users look up an item when they don’t know its meaning, particular or otherwise; and are far less ‘interested in the development of meaning’ than lexicographers would be. Besides, the earliest meaning is also the most likely one to be out of date, as when the same dictionary gives the first meaning for flamboyant as ‘characterised by waving curves suggesting flames’ (p. 316), a use I have never encountered, though I can see its etymological source in French. In contrast, a corpus-driven dictionary like the Collins COBUILD can ‘take the point of view of user who encounters’ the item and ‘does not know much about English etymology’ (Sinclair 1988: 13), and can give as the first meaning ‘someone who is flamboyant behaves in a very noticeable, confident, and exaggerated way’ (p. 546), a use so up-to-date I heard it applied to myself by a university commission only last week.

And yet lexicography, this unprestigious borderline field of language research, gave the decisive impetus in propelling corpus research up into orders of magnitude where broad and vague ideas began to take on clear shape and fall into place. Perhaps that heritage will, at least secretly, be held against the field of corpus linguistics by the idealisers of ‘language’: But the field does manifest how a ‘theory’ can ‘arise from data’, no matter who might behold there a ‘dreadful heresy’ (section 1).

Moreover, some applied linguists have indeed felt their ‘intellectual identity’ and ‘security’ threatened, to borrow Widdowson’s terms (section 4). A stringently unfavourable review published in one professional journal and declaring the 1987 Collins COBUILD English Language Dictionary unfit to be a reference work for learners of English as a foreign language (Carstensen 1988) prompted Sinclair to write a riposte, which the journal editor cannily treated as a new submission and sent out to be rejected by obliging reviewers. I now find it appropriate to publicise Sinclair’s side of the dispute because he lucidly expounded some salient points about how the use of dictionaries can relate to learning English as a non-native language.

These points bear partly on the role of the compilers, and partly on the role of users. For compilers, Sinclair (1988: 3) pointed out that at least since the 19th century, the traditional method for accessing language in lexicography has been to collect citations, each being ‘a short quotation, usually only a few words long, that has caught the attention of a reader’. A corpus-based dictionary, in contrast, can access and compare large numbers of uses of those commonplace words that seldom attract attention, such as about, for which nothing in the Webster’s entry indicates the meaning of ‘phrasal verbs such as lie about, sit about, or mess about to show that someone is not achieving very much’ (COBUILD, p. 4). Some of these uses can be culled from citations, but others are likely to get overlooked because they are intuitively far from obvious, e.g., about in saying what something ‘involves and what its aims are’ as in education is really about a search for meaning (COBUILD) — in effect, when your concept is about something it may appear not to be about, e.g., when education seems largely meaningless to some children.

Also, the corpus supplies current expressions that traditional compilers with a bookish outlook would be unlikely to attend to. The corpus enables informed judgements about usages that might count as ‘informal’, ‘humorous’, ‘rude’, ‘offensive’, and so on, such as boffin, nutter, bumph, chinwag, phut, booze-up, snog, hink, and kerfuffle (COBUILD). A learner who encounters such refreshing and useful expressions and turns to a traditional dictionary would conclude that they don’t properly belong to English. At the other end of the scale of usage, the traditional methods attach excessive importance to obscure or specialised items. Lexicographers might justly argue that such words are quite likely to send ordinary speakers to a dictionary. Yet a problematic side-effect can be an antiquarian and acquisitive acceptance of rarities like the Verb disembosom reported by Clive Holes (1994: 174) in a English-Arabic dictionary published in 1987, ‘not marked by the compiler as in any way unusual or rare’ and, to judge from the Arabic gloss, meaning ‘get it off your chest’, an expression ‘the same dictionary significantly lists as “slang” (meaning, for the average Arab user, merely ”unacceptable”); at best, the inclusion of such archaisms is a waste of space; at worst, coupled with the prejudice against non-literary usage, it gives the user a false picture of the language as it is actually used.’

Sinclair (1988) also pointed out that the traditional method of collecting citations can only register positive evidence of occurrence, and not negative evidence for non-occurrence when an item has passed out of use. Webster’s (pp. 295, 435) of 1963 still lists, again with no warnings, exsert (‘thrust out’) and inhume (‘bury’), doubtless obsolete morphological counterparts for insert and exhume. Nor does the same dictionary give any warnings for detritus whose morphological basis could be downright misleading, such as forgetive (‘inventive, imaginative’, from ‘forge’, not ‘forget’) (328) and disannul (‘annul’, which should be its opposite) (236). A corpus-based dictionary is in no danger of considering such relics in the first place.

Again on the user’s side. I would heartily commend the user-friendly innovation of the COBUILD to define items in ordinary discourse, e.g.:

[1] If you go spare or if something drives you spare, you become frantic with anger, irritation, or worry; used in very informal English. (p. 1396):

Learners of English as a non-native language would naturally feel more comfortable than with the staid sobriety of definitions in what we might call ‘dictionarese’ with stilted expressions like ‘of or pertaining to…’, ‘belonging to or relating to…’, ‘any of several…’ (from Webster’s Seventh).

These factors concerning compilers and users evidently cut no ice with disgruntled reviewer of the COBUILD, who was staunchly ‘committed to traditional lexicography’ (Sinclair 1988: 1). His complaints about omissions — the obvious way to niggle a corpus-driven dictionary — contradicted the orientation of a learner dictionary rather than a reference dictionary. Objections were lodged against the entry stipulating that the adjective glad is not used in attributive position, whereas other dictionaries (e.g. the Longman Dictionary of Contemporary English) list it with glad tidings and glad news. Sinclair (1988: 5) replied that such uses are extremely rare in the corpus, and he asked us to consider

which is the better description for learners: a dictionary that warns users against a wrong usage such as a glad winner, a glad occasion, and a glad idea, or a dictionary that fails to warn against these errors so that it can accommodate two old-fashioned, rare expressions?

Or, perhaps the reviewer believed that foreign learners have an urgent need for rare expressions, like the bookish academics at which older dictionaries were apparently aimed.

The same conclusion might be drawn from the irate reviewer’s complaint about the omissions of flip-flop in the technical sense, of which the Bank of English had ‘one slightly doubtful instance in 20 million words’ (Sinclair 1998: 6) because people rarely discusses the micro-architecture or Boolean logic of electronic circuits any more. He also complained about the absence of roger as a conventional affirmative on short-wave radio. The Bank of English had no instances at all, though it had two examples of roger as a Verb in a sense which I do not find even in the tolerant and trendy Random House Webster’s and which few English teachers would be keen to present. The Time Out Film Guide would keep you current, though: he’s rogering her to help her cheat on her husband (1993 edition, p. 130).

One more of Sinclair’s points deserves to be raised here. Today at least, the most indispensable function of dictionaries is to explain to the general public the specialised technical terms copiously generated by science and technology. Traditional dictionaries consigned the writing of definitions to insiders in the various specialisations, on the assumption that their assessments should be the most authoritative and representative. But the results reveal how inconsiderate specialists may be toward non-specialists by writing definitions no less obscure than the expression itself, as when cosecant got defined as ‘the secant of the complement, or the reciprocal of the sine, of a given angle or arc’ (Random House Webster’s, p. 307). Here, an even more peculiar and academic user is implied who has specialised knowledge of a field but odd gaps in the vocabulary, e.g., who knows what secant and reciprocal of the sine mean but has missed cosecant.

A corpus-driven dictionary can accommodate non-specialists far more readily:

many words of technical origin in current use have highly specific meanings which are not really accessible to anyone who does not know the subject. They are explained, so to speak, within a scientific or humanistic discipline. If we just wrote out the ‘official explanation’, our users would hardly be helped at all. (Sinclair at al. 1988: xix)

The corpus can be consulted to ensure that ‘the meanings given are the meanings that are actually used in ordinary texts and not necessarily what a specialist would say’ (ibid.). The contrast with conventional dictionaries is quite instructive for definitions like these:

[2] gyroscope: a wheel or disc mounted to spin rapidly about an axis and also free to rotate about one or both of two axes perpendicular to each other and to the axis of spin so that a rotation of one of the two mutually perpendicular axes results from application of torque to the other when the wheel is spinning and so that the entire apparatus offers considerable opposition depending on the angular momentum to any torque that would change the direction of the axis of spin (Webster’s Seventh, p. 372)

[3] A gyroscope is a device that contains a disc rotating on an axis that can turn freely in any direction, so that the disc maintains the same position, whatever the position or movement of the surrounding structure. (COBUILD, p. 699)

[2] was manifestly written by a specialist who was so solicitous to be technically exact about all of the mechanical details of construction and operation that he or she neglected to be readable. If you don’t know the technical term torque, and look it up, you find:

[4] torque: something that produces or tends to produce rotation and torsion and whose effectiveness is measured by the product of the force and the perpendicular distance from the line of action of the force to the axis of rotation (Webster’s Seventh, p. 934)

Quite apart from the odd implication that ‘tending to produce rotation’ may somehow differ from ‘producing’ it, I cannot imagine an ordinary user plugging [4] into the already overloaded [2] and gaining a practical understanding of the meaning of gyroscope. [3], in contrast, highlights the salient point: the ability to ‘turn freely in any direction’ on an ‘axis’ paradoxically enables ‘the disc to maintain the same position’ — which [2] had obscurely portrayed as ‘offering considerable opposition’ ‘to any torque that would change the direction’.

In sum, corpus-driven dictionaries present sound motives for optimism about the revisions of language pedagogy made possible only by very large corpora. We can finally determine on rational grounds which entries deserve to be included at all, and which meanings are the most common or important ones, even if they would not be intuitively judged the more ‘basic’ (let alone the ‘earliest’!). Reliability will steadily increase as the corpus grows.

 

references

Bolinger, D.L. 1970. ‘Getting the words in.’ American Speech 45: 78-84.

Carstensen, B.  1988. Review of Collins Cobuild English Language Dictionary. Neusprachliche Mitteilungen aus Wissenschaft und Praxis 1.

Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge: MIT Press.

Holes, C. 1994. ‘Designing English-Arabic dictionaries’ in R. de Beaugrande, A. Shunnaq, and M.H. Heliel (eds.), Translation, Education, and Culture: An Arabic Perspective. Amsterdam: Benjamins, 157-190.

Saussure, F. de. 1966 [orig. 1916]. Course in General Linguistics (transl. Wade Baskin). New York: McGraw-Hill.

Sinclair, J.McH. 1984. ‘Naturalness in language use’ in J.McH. Sinclair et al. Lexis and Lexicography. Singapore: National University Press, 96-104.

Sinclair, J.McH. 1988. New Directions in English Dictionaries. Unpublished manuscript.

Sinclair, J.McH. 1991. ‘Shared knowledge’ in J. Alatis (ed.) Georgetown University Round Table on Languages and Linguistics 1991. Washington, D.C.: Georgetown University Press, 489-500.

Sinclair, J.McH. et al. 1988. Collins COBUILD English Language Dictionary. London: Harper Collins.

Sinclair, J.McH. et al. 1990. Collins COBUILD English Grammar. London: Harper Collins.