Function and form of reference works:
Setting a record straight for J. McH. Sinclair and the COBUILD
Robert de
Beaugrande
[COMMENTARY: This was originally part
of a piece I wrote in defence of corpus listen which, like so many of its
fellows, was shot down by defensive reviewers. Sinclair had written a riposte
to a ‘book review’ of the COBUILD Dictionary by one Broder Carstensen,
only to have it also ‘reviewed’ and ‘rejected’ by the same journal. Here for
the first time his reactions are made public - RdB]
The field of lexicography has, rather understandably, not attracted
much interest in modern
linguistics. Among other reasons, the lexicon itself has been largely neglected
(Bolinger 1970), no doubt because it resists
idealisation much more than does grammar, and looks to Saussurians like another
‘heterogeneous mass’ (section 1). Saussure (1966 [1916]: 133) himself proposed
to ‘say that languages in which there is
least motivation are more lexicological, and those in which it is greatest are
more grammatical’. Chomsky (1965: 86f) in turn saw an ‘advantage’ in making ‘the lexical entries’ absorb
all the ‘idiosyncrasies’ and ‘irregularities of the language’. Such defensive
moves are predictable if the order of the lexicon, as I have described it,
could not be grasped by mainstream ‘theories’ portraying language as a static,
determinate system (sections 3 and 4).
Such linguists would have even less respect for the
slogging work of lexicographers. There you must produce and
publish a reasonably complete and final description, whereas the linguists can
busy themselves indefinitely with elaborate theoretical ratiocinations which
prescribe how a description should be done, but which provide just a few
episodic demonstrations (e.g. Chomsky) or none at all (e.g. Hjelmslev).
Lexicographers must plunge undaunted into the ‘mass’ of ‘speech facts’ that
those same linguists feel entitled to disregard; and must commit themselves in
print to formulations of multifarious grainy details about the size and
contents of the ‘vocabulary’ of the language and of the ‘meanings’, including
numerous ‘idiosyncrasies’.
In return, lexicographers have, just as
understandably, not been much devoted to the exposition of ambitious theories.
Most dictionaries contain brief prefaces dealing chiefly with practical
matters: how to look up items, how to interpret the symbols and abbreviations,
or how the dictionary was compiled. The publishers doubtless expect that the
general public would have little interest in the theoretical issues and
problems of lexicography, and would not bother to read extensive prefaces which
explore them.
Still, such issues can bear on the users in practical
ways, such as the sequential order of the definitions, which tends to be
understood as a ranking of importance, perhaps incorrectly. The preface of Webster’s Seventh of 1963 tells us that
‘the earliest ascertainable meaning is placed first’, on the grounds that ‘the
historical order is of especial value to those interested in the development of
meaning, and offers no difficulty to the user who is merely looking for a
particular meaning’ (p. 5a). Surely most users look up an item when they don’t
know its meaning, particular or otherwise; and are far less ‘interested in the
development of meaning’ than lexicographers would be. Besides, the earliest
meaning is also the most likely one to be out of date, as when the same dictionary
gives the first meaning for flamboyant
as ‘characterised by waving curves suggesting flames’ (p. 316), a use I have
never encountered, though I can see its etymological source in French. In
contrast, a corpus-driven dictionary like the Collins COBUILD can ‘take
the point of view of user who encounters’ the item and ‘does not know much
about English etymology’ (Sinclair 1988: 13), and can give as the first meaning
‘someone who is flamboyant behaves in a very noticeable, confident, and
exaggerated way’ (p. 546), a use so up-to-date I heard it applied to myself by
a university commission only last week.
And yet lexicography, this unprestigious borderline field of language
research, gave the decisive impetus in propelling corpus research up into
orders of magnitude where broad and vague ideas began to take on clear shape
and fall into place. Perhaps that heritage will, at least secretly, be held
against the field of corpus linguistics by the idealisers of ‘language’: But
the field does manifest how a ‘theory’ can ‘arise from data’, no matter who
might behold there a ‘dreadful heresy’ (section 1).
Moreover, some applied linguists have indeed felt their ‘intellectual
identity’ and ‘security’ threatened, to borrow Widdowson’s terms (section 4). A
stringently unfavourable review published in one professional journal and
declaring the 1987 Collins COBUILD
English Language Dictionary unfit to be a reference work for learners of
English as a foreign language (Carstensen 1988) prompted Sinclair to write a
riposte, which the journal editor cannily treated as a new submission and sent
out to be rejected by obliging reviewers. I now find it appropriate to
publicise Sinclair’s side of the dispute because he lucidly expounded some
salient points about how the use of dictionaries can relate to learning English
as a non-native language.
These points bear partly on the role of the compilers, and partly on the
role of users. For compilers, Sinclair (1988: 3) pointed out that at least since the 19th
century, the traditional method for accessing language in lexicography has been
to collect citations, each being ‘a short
quotation, usually only a few words long, that has caught the attention of a
reader’. A corpus-based dictionary, in contrast, can access and compare large
numbers of uses of those commonplace words that seldom attract attention, such
as about, for which nothing in the Webster’s entry indicates the meaning of
‘phrasal verbs such as lie about, sit about, or mess about to show that someone is not achieving very much’ (COBUILD, p. 4). Some of these uses can
be culled from citations, but others are likely to get overlooked because they
are intuitively far from obvious, e.g., about
in saying what something ‘involves and what its aims are’ as in education is really about a search for
meaning (COBUILD) — in effect,
when your concept is about something
it may appear not to be about, e.g.,
when education seems largely meaningless to some children.
Also, the corpus supplies current expressions that traditional compilers
with a bookish outlook would be unlikely to attend to. The corpus enables
informed judgements about usages that might count as ‘informal’, ‘humorous’,
‘rude’, ‘offensive’, and so on, such as boffin,
nutter, bumph, chinwag, phut, booze-up, snog, hink, and kerfuffle (COBUILD). A learner who encounters such refreshing and useful
expressions and turns to a traditional dictionary would conclude that they
don’t properly belong to English. At
the other end of the scale of usage, the traditional methods attach excessive
importance to obscure or specialised items. Lexicographers might justly argue
that such words are quite likely to send ordinary speakers to a dictionary. Yet
a problematic side-effect can be an antiquarian and acquisitive acceptance of
rarities like the Verb disembosom
reported by Clive Holes (1994: 174) in a English-Arabic dictionary published in
1987, ‘not marked by the compiler as in any way unusual or rare’ and, to judge
from the Arabic gloss, meaning ‘get it off your chest’, an expression ‘the same
dictionary significantly lists as “slang” (meaning, for the average Arab user,
merely ”unacceptable”); at best, the inclusion of such archaisms is a waste of
space; at worst, coupled with the prejudice against non-literary usage, it
gives the user a false picture of the language as it is actually used.’
Sinclair (1988) also pointed out that the traditional
method of collecting citations can only register positive evidence of
occurrence, and not negative evidence for non-occurrence when an item has
passed out of use. Webster’s (pp.
295, 435) of 1963 still lists, again with no warnings, exsert (‘thrust out’) and inhume
(‘bury’), doubtless obsolete morphological counterparts for insert and exhume. Nor does the same dictionary give any warnings for detritus
whose morphological basis could be downright misleading, such as forgetive (‘inventive, imaginative’,
from ‘forge’, not ‘forget’) (328) and disannul
(‘annul’, which should be its opposite) (236). A corpus-based dictionary is in
no danger of considering such relics in the first place.
Again on the user’s side. I would heartily commend the user-friendly
innovation of the COBUILD to define
items in ordinary discourse, e.g.:
[1]
If you go spare or if something drives you spare, you become frantic with anger, irritation, or worry; used in
very informal English. (p. 1396):
Learners
of English as a non-native language would naturally feel more comfortable than
with the staid sobriety of definitions in what we might call ‘dictionarese’
with stilted expressions like ‘of or pertaining to…’, ‘belonging to or relating
to…’, ‘any of several…’ (from Webster’s Seventh).
These factors concerning compilers and users evidently
cut no ice with disgruntled reviewer of the COBUILD, who was staunchly
‘committed to traditional lexicography’ (Sinclair 1988: 1). His complaints about omissions — the obvious way to niggle
a corpus-driven dictionary — contradicted the orientation of a learner
dictionary rather than a reference dictionary. Objections were lodged against
the entry stipulating that the adjective glad
is not used in attributive position, whereas other dictionaries (e.g. the Longman Dictionary of Contemporary
English) list it with glad tidings
and glad news. Sinclair (1988: 5)
replied that such uses are extremely rare in the corpus, and he asked us to
consider
which is
the better description for learners: a dictionary that warns users against a
wrong usage such as a glad winner, a glad occasion, and a glad idea, or a dictionary that fails
to warn against these errors so that it can accommodate two old-fashioned, rare
expressions?
Or, perhaps the reviewer believed that foreign learners have an urgent
need for rare expressions, like the bookish academics at which older
dictionaries were apparently aimed.
The same conclusion might be drawn from the irate
reviewer’s complaint about the omissions of flip-flop
in the technical sense, of which the Bank of English had ‘one slightly doubtful
instance in 20 million words’ (Sinclair 1998: 6) because people rarely
discusses the micro-architecture or Boolean logic of electronic circuits any
more. He also complained about the absence of roger as a conventional affirmative on short-wave radio. The Bank
of English had no instances at all, though it had two examples of roger as a Verb in a sense which I do
not find even in the tolerant and trendy Random
House Webster’s and which few
English teachers would be keen to present. The Time Out Film Guide would keep you current, though: he’s rogering her to help her cheat on her husband (1993
edition, p. 130).
One more of Sinclair’s points deserves to be raised
here. Today at least, the most indispensable function of dictionaries is to
explain to the general public the specialised technical terms copiously
generated by science and technology. Traditional
dictionaries consigned the writing of definitions to insiders in the various
specialisations, on the assumption that their assessments should be the most
authoritative and representative. But the results reveal how inconsiderate
specialists may be toward non-specialists by writing definitions no less
obscure than the expression itself, as when cosecant
got defined as ‘the secant of the complement, or the reciprocal of the sine, of
a given angle or arc’ (Random House
Webster’s, p. 307). Here, an
even more peculiar and academic user is implied who has specialised knowledge
of a field but odd gaps in the vocabulary, e.g., who knows what secant and reciprocal of the sine
mean but has missed cosecant.
A corpus-driven dictionary can accommodate non-specialists far more
readily:
many words of technical
origin in current use have highly specific meanings which are not really accessible
to anyone who does not know the subject. They are explained, so to speak,
within a scientific or humanistic discipline. If we just wrote out the
‘official explanation’, our users would hardly be helped at all. (Sinclair at
al. 1988: xix)
The
corpus can be consulted to ensure that ‘the meanings given are the meanings
that are actually used in ordinary texts and not necessarily what a specialist
would say’ (ibid.). The contrast with conventional dictionaries is quite
instructive for definitions like these:
[2] gyroscope: a wheel or disc mounted to spin rapidly about an axis and
also free to rotate about one or both of two axes perpendicular to each other
and to the axis of spin so that a rotation of one of the two mutually
perpendicular axes results from application of torque to the other when the
wheel is spinning and so that the entire apparatus offers considerable
opposition depending on the angular momentum to any torque that would change
the direction of the axis of spin (Webster’s
Seventh, p. 372)
[3]
A gyroscope is a device that contains a disc rotating on an axis that can turn
freely in any direction, so that the disc maintains the same position, whatever
the position or movement of the surrounding structure. (COBUILD, p. 699)
[2]
was manifestly written by a specialist who was so solicitous to be technically
exact about all of the mechanical details of construction and operation that he
or she neglected to be readable. If you don’t know the technical term torque, and look it up, you find:
[4]
torque: something that produces or tends to produce rotation and torsion and
whose effectiveness is measured by the product of the force and the
perpendicular distance from the line of action of the force to the axis of
rotation (Webster’s Seventh, p. 934)
Quite
apart from the odd implication that ‘tending to produce rotation’ may somehow
differ from ‘producing’ it, I cannot imagine an ordinary user plugging [4] into
the already overloaded [2] and gaining a practical understanding of the meaning
of gyroscope. [3], in contrast,
highlights the salient point: the ability to ‘turn freely in any direction’ on
an ‘axis’ paradoxically enables ‘the disc to maintain the same position’ —
which [2] had obscurely portrayed as ‘offering considerable opposition’ ‘to any
torque that would change the direction’.
In sum, corpus-driven dictionaries present sound motives for optimism
about the revisions of language pedagogy made possible only by very large
corpora. We can finally determine on rational grounds which entries deserve to
be included at all, and which meanings are the most common or important ones,
even if they would not be intuitively judged the more ‘basic’ (let alone the
‘earliest’!). Reliability will steadily increase as the corpus grows.
references
Bolinger, D.L. 1970. ‘Getting the words in.’ American Speech 45: 78-84.
Carstensen, B. 1988. Review of Collins Cobuild English Language Dictionary.
Neusprachliche Mitteilungen aus
Wissenschaft und Praxis 1.
Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge:
MIT Press.
Holes, C. 1994.
‘Designing English-Arabic dictionaries’ in R. de Beaugrande, A. Shunnaq, and M.H. Heliel (eds.), Translation, Education, and Culture: An
Arabic Perspective. Amsterdam:
Benjamins, 157-190.
Saussure, F. de. 1966 [orig. 1916]. Course in
General Linguistics (transl. Wade Baskin). New York: McGraw-Hill.
Sinclair, J.McH. 1984.
‘Naturalness in language use’ in J.McH. Sinclair et al. Lexis and Lexicography. Singapore: National University Press,
96-104.
Sinclair, J.McH. 1988. New
Directions in English Dictionaries. Unpublished manuscript.
Sinclair, J.McH. 1991.
‘Shared knowledge’ in J. Alatis (ed.) Georgetown
University Round Table on Languages and Linguistics 1991. Washington, D.C.:
Georgetown University Press, 489-500.
Sinclair, J.McH. et al.
1988. Collins COBUILD English Language
Dictionary. London: Harper Collins.
Sinclair, J.McH. et al.
1990. Collins COBUILD English Grammar.
London: Harper Collins.