John McHardy Sinclair (ed.). How to Use Corpora in Language Teaching. Amsterdam: Benjamins, 2004. 299 + vii pp.
The last time I heard John Sinclair lecture in 1996, his mood was not altogether buoyant. If memory serves, he said that the deployment of corpora would improve the teaching and learning of English worldwide, but that he did not expect it to happen. In the present volume, he recalls a ‘decade’ of people ‘trying to barricade the profession against the influence of corpora’ (p. 2), which was a predictable detritus of the academic politics of the 1960s and 1970s, when corpus-based descriptive linguistics was waspishly skewered and pushed aside by generative linguistics — a travesty of a science rejecting observed data in principle (Beaugrande 1998). As the barricade grew porous, the roster of ‘critical arguments’ grew steadily more tortuous and obfuscatory, seemingly not well understood even by their proponents (Beaugrande 2000). For example, the one that ‘no corpus can be a totally accurate sample of the language’ should apply far more severely to the handfuls of invented sentences in conventional textbooks on both linguistics and language-learning; that one that ‘occurrence is a corpus is no guarantee of correctness’ blots out the intertwined facts that most speakers of English are not all that concerned about being ‘correct’ and that would-be guardians of usage can’t agree about what that means anyway; and so on
Eight years later, Sinclair can confirm that ‘the flurry of resistance is largely behind us’, and this book is a first-rate show of evidence, and not just for English. Whereas those of my generation redirected our careers when corpora became available because we had wanted them all along, even if it meant having our work rejected on windy pretexts by journal editors and their, erm, peer reviewers, a new generation has arisen whose careers have been firmly centred on corpus work from the start.
In the present, volume, they not merely point to concrete successes in curriculum and classroom, but initiate interested readers in ‘how to’ set up their own programmes and projects. The represent a nice geographical spread: Italy, Hong Kong, England, Finland, Portugal, Switzerland, Hungary, Germany, the United State, and that space enveloping the itinerant Sinclair that is forever Scotland.
This new generation not merely points to concrete successes in curriculum and classroom, but initiates interested readers in ‘how to’ set up their own programmes and projects. For me, the range of outlooks and applications is highly impressive and laudable, not least because a number of them represent or resemble projects that have occupied my own cogitation and energies on various continents: student-oriented discovery learning (Bernardini); a variegated corpus of Hong Kong English, learner English, academic English, and spoken and written mass media as a base for on-line access by teachers with questions about usage (Tsui); an informative mismatch between spoken lectures versus textbook lessons (Conrad), the authenticity and classroom usefulness of spoken corpora (Mauranen); a sophisticated text-to-sound corpus of international spoken Portuguese on 4 CD-ROMs (Santos Pereira); a sovereign overview of learner corpora, complete with an appendix of such projects with their website addresses (Nesselhauf); an in-depth study of the use of adverbial connectors in argumentative texts in English by advanced Hungarian students (Tankó); a mismatch in the uses of modal auxiliaries in corpus data versus EFL textbooks (Römer); a demonstration of the value of multiple and concurrent transformations of texts according to different views of a corpus (Barlow); a lesson on how to author your own concordance software (Danielsson); and a proposal for relating learner oral corpora to network-based language teaching for the purposes of data-driven learning (Pérez-Parades).
The volume closes with a most welcome disquisition on some ‘radical questions’, which the seasoned Sinclair-reader may find untypical in two ways: it has fairly little illustrative data; and it comes as close as anything of his I’ve seen to becoming ‘philosophical’ (not a term he relishes). He probes, on a deeper level than the other authors, the ‘unsettling failure’ of corpora to ‘confirm the consensus view of language’ in most classrooms’ (271). Although my own experience in classrooms in many countries leads me surmise that a major obstacle is rather the lack of a consensus view, if not indeed a hesitation to formulate and sustain any explicit and consistent ‘view of language’, perhaps we could infer some implicit and hence unquestioned views, which have rendered me with my critical counter-views persona non grata in three successive jobs:
The ‘English language’ exists in just one ‘standard’ version worthy of being ‘taught’ and ‘learned’; English teachers, by virtue of their office, are qualified masters of that version and empowered to invent data; ‘teaching’ should rely on simplified, colourless examples, which students parrot back verbatim to avoid saying ‘anything wrong’; teachers have a reverent mandate to find and ‘correct’ every ‘error’ and to base evaluation on a simple count of ‘errors’; ‘grammar’ is neatly separated from lexicon (or ‘vocabulary’) and is far more essential; on more advanced levels, syntax, semantics, and pragmatics should be ‘taught’ — separately of course — just as they are handed over from ‘mainstream linguistics’.
I would also surmise that such views, being unable to withstand rational analysis, contribute more to the teachers’ defensive resistance against the use of corpora than does the officious ‘barricading’ pontificated by muckety-mucks in language education and applied linguistics. The sensation of acknowledging years or even decades of misapprehension and misguidance is understandably repugnant; and tangible proof of the advantages and successes, as well as user-friendly initiations, such as are offered in the present volume, must be prominently circulated on the ground.
However, Sinclair’s assessment reaches more deeply than this bundle of shallow views, into the roots of what forms a consensus among those whose business it is to contemplate and account for language. They largely originated, I believe, when the gap between theory and practise — and thus between ‘language’ and ‘actual speech’ — was officially imposed early in the 20th century. The study of language was henceforth hamstrung by a conflict between the need for data to support even the airiest generalisations versus the limbo-status of data created by disembodied theoretical bootstrapping (Beaugrande 1991).
Sinclair pinpoints several factors that have proven recalcitrant just because they have not critically held up to the yardstick of authentic data. Semantics has been approaching ambiguity like a fire brigade that starts its own fires. Construing ‘meaning’ has as an static object of analysis rather than a dynamic process of synthesis has created an infinite regress in search of the most basic ‘units’ or ‘features’ of ‘meaning’, such as ‘±Human’; yet ‘human’ is among the most complex and fluid concepts you might single out. Clocking 19,272 occurrences in the British National Corpus, it should be a cased in a veritable beehive of ambiguities. But what the data reveal is on-the-spot differentiation, as Sinclair points out (p. 274). We can at least agree that each of these samples entails its own meaning of ‘human’ that is not confused with the others.
[1] The only human sexual rhythm is the female menstrual cycle (Supersense)
[2] Students are lazy, according to a survey. Almost a third admitted doing no work in an average day; […] ‘Students are only human, they're not a lot different from other people of their age’, said a Durham University spokesman (Today)
[3] Ruthless as he was, Khrushchev gave the Soviet Union a more human face. (The Fifties)
[4] The Smiths are the only human band in England. (The Smiths )
Surely no one would raise doubts about whether the devoted fan in [4] might mean the Smiths are the only menstruating band in England. Publicity stunts do have some, erm, human limits.
Even when a potential ambiguity is quite obtrusive or whimsical, it will be resolved by synthesis, not analysis, as in:
[5] Dad wants baby left on airliner (Daily Camera)
[6] Split Rears in Farmer Movement (Denver Post)
[7] Police chase winds through four counties (Courier News)
[8] Doctor Testifies in Horse Suit (Waterbury Republican)
Perhaps variation is not so much the ‘opposite of ambiguity’ (p. 272) as a process which controls ambiguity. When ‘Steffi’s No 1 fan’, one Donna Davidson of North Ireland, informs Tennis World that ‘Steffi Graf is human’ and ‘challenges any reader to prove otherwise’ (a safe move, that), I recognise a new variation of the ‘only human’ theme lamely trotted out in [2], as if not working most days were a trait of youthful humanity, and not the pang of unemployment in the world of ‘globalisation’. Steffi is among the hardest workers I can think of in the world of sports, and some might call her superhuman; but this fan, on the contrary, wanted to protest that she has her ‘problems’ like the rest of us but never uses them as ‘excuses’ for a poor show.
When pop singer Philip Oakey of the Human League exculpated his infidelity on the grounds that he was ' only human’, ‘made of flesh and blood’ and ‘born to make mistakes’ — besides, he ‘needed someone to hold me, to fill the void while you were gone’ — he was comforted by Joanne Catherall (or was it Suzanne Sulley?) that she ‘forgave’ him because ‘while we were apart I was human too’. Another variation on the same theme, and no ambiguity at all.
Variation is also a motor of creativity, a prominent factor in corpus work but rarely one in EFL. The captious mandate to preclude ‘errors’ encourages drastic limits upon what students are expected to say; they in turn conclude that being creative means entering a grey zone of uncharted risks. A solution implicated by Sinclair is to exploit the distinction between reception and production (p. 275). Students who command only one way to say something can be familiarised with understanding creative variations.
Sinclair’s example of ‘save + (possessive) + skin’ (p. 275) has a whole family of cousins, all using a signally unprestigious body-part as a metonymy for the whole person who gets through a tight spot. (The figures in parentheses report the occurrences on the Internet using the AltaVista Search engine.)
[9] 3 members have been hurt or have lost someone from riding w/o a helmet. Wearing one has saved my ass twice. (V4 Honda Message Board) (84,600)
[10] I have a complete set of these Husky Liners. They are marvelous and saved my butt a number of times on messy cleanups. (Carspace) (69,700)
[11] I'm using a belkin power authority just as surge protector. Has saved my bacon twice this year (Techspot) (53,000)
[12] Bujitsu has saved my arse on many a bad night (Martial Arts Community) (2150)
[13] I have recently taken up inline skating. When attempting stairs, I found that using the handrail for security saved my rear end repeatedly. (Skating Stairs) (402)
[14] I'd suggest Sprite Backup. The Pocket PC version has saved my backside countless times! (SmartphoneThoughts) (281)
[15] a great way to defend against Trojans and Viruses is to use a software 'rollback' utility, [which] has saved my rump a few times now (spywareinfo) (63)
Far less prolific is the facetious threat to punish by making casual use of some part of the victim’s anatomy, conventionally as shown in [16]. The variation in [17] plays off a second collocation: people who ‘use their own head for a hat rack’ instead of for thinking [18]; I do not find any other attestations as a threat.
[16] By the third day I expect third-years to work alone, and if you slip up, gal, I’ll have your guts for garters! (Hospital Circlers) (5.620)
[17] A secretary whips away the remote. ‘Keep that handy’, I warn him, ‘or I’ll have your head for a hat-rack.’ (The Dyke & the Dybbuk)
[18] You can use your brain to make work simpler, or you can ignore it and use your head for a hatrack. (Build-in Windows power tools)
Terminology is a Pandora’s box if ever there was one. The ‘consensus view’, seems to be ‘you use your terms, I’ll use mine’. No doubt this attitude reflects the (above-cited) implicit views of English teachers being authorised sources of information and invented data; and of ‘teaching’ relying on simplified, colourless examples. The ensuing morass of defiant tinkerings wiothj terms has favoured at least three pernicious brands of usage. One is uselessly vague [19]; one is hopelessly jargonised [20]; and one is flat out wrong [21].
[19] The noun is a word used for naming an object of thought; […] the verb is a word that makes a statement about something else. (Business English)
[20] As for the bare infinitive, it can be shown that in all of its uses it implies that the extra-verbal support’s place in time cannot be conceived as a before-position with respect to the infinitive’s event. (Internet)
[21] The definite article and the indefinite article are of course adjectives. (Business English)
If absurdities like [21] can pass into print unchallenged, publishers also seem to regard terminology as a matter of personal preference.
A deeper paradox lurks in the manipulation of data. Making English seem simpler than it is involves complicating the description with terms and devices to tidy up the data. Since people often do not speak in ‘complete’ or ‘well-formed clauses and sentences’, a zoo of ‘rules’ must be invented to construct ‘understood’ or ‘underlying’ patterns or ‘deep structures’.
As Sinclair points out, we have inherited an orthodox separation of ‘form’ and ‘meaning’, which implicates a further separation of ‘grammar’ and ‘lexis’ (pp 276-77). The result is a prolixity of ‘grammatical’ terms that are supposedly aimed at form alone — ‘unable to reflect the usage yet not precise enough to avoid some reliance on the implied meaning’ (276). A survey by Greenbaum and Taylor (1981) found English teachers marking as ‘incorrect’ student data like ‘I am leaving tomorrow’ because it ought to have ‘future tense’.
Lexical terminology, in contrast, is in short supply, and is mostly ‘woven’ or ‘spiked’ into ‘grammar’ (277). My own impression is that a host of new terms would be burdensome and disputatious. Most of the constraints and regularities can be more accessibly described in quite ordinary language, such as Sinclair’s ‘nice’ and ‘nasty’ (285). For example, a learner of English should know that ‘things that set in’ ‘are usually not very nice’; in the BNC, I find ‘decline’, ‘decay’, ‘decadence’, ‘infection’, ‘meningitis’, ‘rigor mortis’, and so on.
English offers a number of face-saving phrases that mollify what one says when it might not be well-received, but I am not aware of any user-friendly term for them. Teachers and learners are better served with well-chosen examples, e.g., putting in question the ‘worth’ of an ‘opinion’:
[22] Harris tapped the report once again. ‘For what it's worth, I think this is wildly over-optimistic. I can't really go along with these performance figures’ (Foxbat)
[23] ‘My own opinion, for what it's worth, is that he's a closet gay.’ I said I didn't think her opinion was worth very much. (Darcy's Utopia)
The risk of being told it’s ‘not worth much’ is not great, according to the BNC.
Since incompleteness is ‘an inherent feature of descriptions’ (279), I cannot see it as a problem or a defect like the other three addressed by Sinclair. Ambiguity can be (and usually is) resolved, but the misdirected analysis common in semantics makes it seem ubiquitous and debilitating. Variation can be clearly registered without determining its range, e.g., the variety of ‘not nice’ things that ‘set in’. Terminology is a self-made obstacle bred of the free-wheeling, data-poor theorising that sometimes passes for ‘linguistics’, especially ‘syntax’. But I would argue that incompleteness is also ‘an inherent feature of languages’ in so far as the individual user are concerned. ‘Fluency’ cannot ‘entail mastery of all the detailed and sensitive patterning that the corpus reveals’ (cf. p. 282); I rarely open the BNC without encountering surprises which lead me off in search of some previously unrecognised pattern.
Fluency is perhaps better defined as a critical mass of patternings whence the speaker or learner can reliably invest them in ‘discursive work’. And that mass can be attained only by accumulating plentiful and thoughtful observation of them doing their work, viz.:
corpus work […] can provide enough evidence and stimuli for the learner to arrive at developmentally appropriate generalisations, i.e., accounts that […] agree with the learner’s current language system (Bernardini, p. 17)
On another level, a corpus itself reaches critical mass when its size and diversity suffice to render it representative of the language, though of course such an assumption is precarious to justify in a specific case. This will necessarily exceed the critical mass of any fluent user, but the two masses must be intimately related, so that the user could prospectively appropriate insights from unfamiliar patterns in a corpus.
Here is where the distinction between native versus non-native learners signally obtrudes. The language learner needs to assemble a personal (mini‑)corpus by participating in discourse, and for a time intuitively absorbing patterns by imitating what is usually said to carry forward the discursive work at hand. The native has a massive advantage which, given some training and encouragement, a suitably compiled corpus can partially offset for the non-native.
This may not seem ‘revolutionary’ (p. 297) (another word Sinclair does not relish), but it could hardly be more different from business as usual in EFL, which feeds learners on dabs of inane English like ‘I hear with ears’ and ‘my brother is a boy’ (actual samples). That textbooks misinform and misrepresent, as some authors in the volume have shown, is not at all surprising when neither the compilers or their editors check them against real data.
Now, the textbook industry is famously autocratic and profit-minded, and has a captive consumership that can be strung along with snazzy “new editions” of the same old bumph. But then such can be said of the dictionary industry too; yet as soon as the Collins COBUILD came out, publishers awoke from their drowse and rushed after corpora. Here too is a question of critical mass: as books like the present volume accumulate and their insights are translated into educational practice permeated with computers and yielding impressive results, the market will undergo a similar reorientation. Why? Because
corpus and corpus analysis provide one of the most powerful tools […] for classroom discovery learning activities […] designed to engage the learner’s interest, to be motivating, […] to provide opportunities for (in)formal interaction, to encourage the setup of relaxed atmosphere… (Bernardini 32)
Robert de Beaugrande
Università del Litorale
References
Beaugrande, Robert de. 1998. Beaugrande Performative speech acts in linguistic theory: The rationality of Noam Chomsky. Journal of Pragmatics 29, 765-803.
Beaugrande, Robert de. 2000. Corporate Bridges 'Twixt Text and Language: Twenty Arguments against Corpus Research And Why They're a Right Load of Old Codswallop. Posted at www.beaugrande.com.
Greenbaum, Sidney, and Taylor, John. The recognition of usage errors by instructors of freshman composition. College Composition and Communication, 1981, 32, 169-174.