Large Corpora and Applied
Linguistics
H.G. Widdowson versus
J.McH. Sinclair1
[COMMENTARY: H.G. Widdowson
has repeatedly voiced unfounded criticisms of corpus research and of its
potential application to language teaching. And attempts to publish refutations,
including my own, have been blocked by editors and reviewers who support his
position or at least are anxious not to admit major changes into the landscape
of applied linguistics. So I am obliged to present the other side of the issues
here on my website.]
1. the large corpus and the language teacher
In 1991, a controversy arose at the Georgetown University Round Table on Languages and Linguistics during an interchange between Henry Widdowson and John Sinclair. After carefully analysing the two published papers and separately discussing the issues with each of the two linguists, I have concluded that their respective positions are closer together than the controversy might suggest. Widdowson seems to have argued from some positions which are not actually his, and attributed to his opponent some positions which are definitely not Sinclair’s.
A predictable crux of the
controversy was how corpus evidence might relate to the ‘competence’ of native
speakers on the one hand and to the needs of learners of English as a Foreign
Language (hereafter EFL) on the other. As a noted spokesperson for applied
linguistics in EFL, Widdowson (1991: 14) felt provoked by Sinclair’s typical
criticisms, and cited this one: ‘we are teaching English in ignorance of a vast
amount of basic fact’ (Sinclair 1985: 282). To be sure, Sinclair has not blamed
the teachers, but the sources they are offered, such as dictionaries, viz:
Teachers
and learners have become used to a diet of manufactured, doctored, lop-sided,
unnatural, peculiar, and even bizarre examples through which, in the absence of
anything better, traditional dictionaries present the language. It is perhaps
the main barrier to real fluency. (1988: 6f)
Nonetheless, Widdowson seemed indignant that
‘linguists’ who have debarred ‘discrimination against languages’ should practice
‘discrimination against ideas about language’; and that ‘linguists have no
hesitation in saying that certain ideas held by the uninformed commoner or
language teacher are ill-conceived, inadequate, or hopelessly wrong’, and in
‘rubbishing the theories of colleagues with relish in prescribing their own’
(1991: 11). By these tactics, each linguist’s ‘point of view is sustained by
eliminating all others, so that the diversity of experience is reduced in the
interests of intellectual security’ (1991: 11).
My own detailed studies of the
discourse of theoretical linguists in considerable detail (e.g. Beaugrande
1991) confirm Widdowson’s remarks. But we should make due allowance for the
fact than theoretical linguistics has been largely an enterprise for replacing
real language with ideal language existing nowhere except in some
‘linguistic theory’ (cf. Beaugrande 1997a, 1997b, 1998a, 1998b). In
consequence, the major resources for rationally adjudicating theories or models
become unavailable, and debaters merely contest that
‘my idealisation is better than yours!’ A that stage, ‘rubbishing the theories of colleagues’ and ‘eliminating’
other ‘points of view’ become prominent tactics.
The same mode of linguistics would
naturally shower ‘haughty disapproval, not to say disdain’ upon the attempts of ‘applied linguists to ‘appropriate’ its ‘ideas’, as Widdowson (1997: 146) has more recently
complained (see Beaugrande 1998b for a riposte posted on this website). This
posture is not just the ordinary casual ‘disdain’ of authentic experts for
ordinary people. It is the calculated defence of a sham expertise that could be severely imperilled by applications,
e.g., ones that would quickly debunk Chomsky’s (1965: 33) straight-faced denial
that ‘information regarding situational context’ ‘plays any role in how
language is acquired, once the mechanism’ — the ‘language acquisition device’ —
‘is put to work’ ‘by the child’.
So those earlier polemic tactics
ensued from replacing real language with ideal language, whereas the arguments
Widdowson was castigating here were being marshalled against this very replacement by Sinclair, as they have also been
by Pike, Chafe, Firth, Halliday, Hasan, Schegloff, Roy Harris, and many others.
Unfortunately, the reinstatement of real language at the rightful centre of
modern linguistics cannot be achieved without strenuous ‘discrimination against
ideas about language’ which really are
‘ill-conceived, inadequate, or hopelessly wrong’ but which have been enthroned
by linguists whose ‘theories’ must be sustained by ‘rubbishing’ the others.
And, our own objective is just the
opposite of ‘reducing the ‘diversity of experience’ ‘in the interests of
intellectual security’; we are resolve to disrupt
the unearned ‘intellectual security’
of linguists, theoretical or applied, who have indeed ‘reduced the diversity of
experience’ of language and discourse and left us with a ‘trivial picture’ (Halliday 1997: 25).
Widdowson’s paper proposed a
contrast between the two positions. Whereas the one claims ‘objectivity’ and
‘correctness’ in ‘descriptions of language’, the other adopts ‘the relativist
or pluralist position on the nature of knowledge’:
The
principles or equality and objectivity are comfortable illusions. Descriptions
of language are not more or less correct but more or less influential, and
therefore prescriptive in effect. They tell us less about truth than about
power, about the privilege and prestige accorded to acknowledged authority. […]
We cannot any longer be sure of our facts. It is not a very comfortable
position to be in. (1991:11f)
Despite the first person pronouns (‘us’, ‘we’),
Widdowson avoided committing himself to this ‘pluralist position’,4 but
he did imply that Sinclair opposes it by invoking ‘basic fact’ ‘about which
teachers were previously ignorant’ (Widdowson 1991: 12).
Widdowson then posed the rhetorical
question ‘what kind of fact is it that comes out of computer analysis of a
corpus of text?’ (1991: 12). Characteristically, he did not answer it here or
anywhere else in the paper by quoting a single ‘corpus fact’; at one point, he
speculated on the ‘relative frequency’ of specific words without ‘having any
evidence immediately to hand’ (1991: 17). Instead, he evoked the ‘distinction’
drawn between ‘externalised language’ versus ‘internalised language’ (1991: 12)
by none other than Chomsky, the linguist who has memorably taken the most
‘relish’ in ‘rubbishing the theories of colleagues’ whilst ‘prescribing his
own’. Moreover, Chomsky (1991: 89) has ‘doubted very much that linguistics has
anything to contribute’ to ‘teaching’ (Chomsky 1991: 89), as Widdowson (1990:
9f) has elsewhere acknowledged even whilst rating ‘Chomsky’s position as
consistent with the position I expressed’ (but see below). The genuine
opposition is still between real language versus ideal language, which, I have
asserted, can seriously mislead the language teaching profession.
Widdowson (1991: 12-15) also invoked
a further series of oppositions or dichotomies we might do well to deconstruct.
These included ‘competence’ versus ‘performance’ (of course); ‘the possible’
versus ‘the performed’ (after Hymes 1972); ‘knowledge’ in ‘the mind’ versus
‘behaviour’ (Chomsky again); and ‘first person’ versus ‘third person
perspective’ (Widdowson’ own theme, e.g. 1997: 158f), which should not be
misconstrued as referring to the morphology of English Verbs. Sinclair was
reproached for conveying the ‘clear implication’ that the corpus is identical
with the language, and for excluding the first pole of each opposition whilst
allowing only for the second:
You
do not represent language beyond the corpus: the language is represented by the
corpus. What is not attested in the data is not English; not real English at
any rate. […] what is not part of the corpus is not part of competence. […] What
is not performed is just not possible. (Widdowson 1991: 14)
Against this supposed position of ‘the work of
Sinclair and his colleagues’, Widdowson quoted Greenbaum (1988: 83) that ‘the
major function of the corpus is’ ‘to supply examples that represent language
beyond the corpus’. But this position is just as much Sinclair’s, e.g.:
‘language users treat the regular patterns as jumping off points, and create
endless variations to suit particular purposes’ (Sinclair 1991: 492). His real
position should concur with the notion the collocability
and colligability of the lexicogrammar of English are partly realised by the
collocations and grammatical colligations of discourse and partially innovated
against (Beaugrande 2000).
Sinclair was astounded to be stuck
in the straw-man realist position of ‘what is not attested in the data is not
real English’ and ‘what is not performed is just not possible’. If he held
those positions, he would stop expanding the corpus straightaway because
nothing more is ‘possible’ and because any differing data would be ‘not real
English’, whereas he has in fact insisted, at times to the dismay of agitated
project sponsors, that the corpus must be hugely expanded. He would also have
to assume that the sources of his corpus are the linguistic equivalent of the
sum total all ‘possible’ sources, whereas he candidly asserts that a much wider
selection of spoken data would have already been included but for severe
problems of labour and expense.
The evolution of modern linguistics
proffers an ironic context for another one of Widdowson’s (1991: 13)
polarities: ‘Chomsky’s view is that you go for the possible, Sinclair’s view is
that you go for the performed’. By any realistic measure, Chomsky’s programme
has always gone for the impossible, advocating,
with tireless self-confidence, one project after another that never materialise
and never could — a ‘grammar’ that is ‘autonomous and independent of meaning’;
a solution to ‘the general problem of analysing the process of “understanding”’
by ‘explaining how kernel sentences are understood’; an account of how human
‘children’ ‘acquire language’ by ‘inventing a generative grammar that defines
well-formedness and assigns interpretations to sentences even though linguistic
data’ are ‘deficient’ (1957: 17, 92; 1965: 201); and more others than I have
room to list here (for a thorough analysis of Chomskyan discourse, see now
Beaugrande 1998b).
Here we can look to Hjelmslev (1969 [1943]: 17) for the most striking formulation, this one
concerning the ‘possible’: ‘the
linguistic theoretician must’ ‘foresee all conceivable possibilities’,
including ‘texts and languages that have not appeared in practice’ and ‘some of
which will probably never be realised’ Easy enough to say once you decide (as
we saw Hjelmslev do) that ‘linguistic theory cannot be verified (confirmed or
invalidated) by reference to any existing texts and languages’.
Chomsky (1965: 25, 27) fulfilled
Hjelmslev’s vision in the most facile manner when he simply installed, by fiat,
just such a ‘theory’ in the ‘language acquisition device’ of the human child:
‘as a precondition for language learning’ the child ‘must possess a linguistic
theory that specifies the form of the grammar of a possible human language’
plus ‘a strategy for selecting a grammar’ by ‘determining which of the humanly
possible languages is that of the community’. This is definitely not the
position of Widdowson, who has firmly rejected the concept of ‘internalisation’
by means of a ‘universal Chomskyan language acquisition device’ (1990: 19).
The conception of the ‘possible’ is
too abstract to be very useful for language pedagogy anyway. Learners of
English as a non-native language produce many utterances which may not seem
possible to the teacher’s intuition, but, as I have noted, we are currently
finding new motives for doubting the reliability of intuition. Far more
relevant is what is or is not both ‘possible’ and ‘performed’ at the learners’ current stage of skills and knowledge,
since that is all we can realistically hope to build upon. There, we can
productively orient our approach toward large corpora of learners’ English, such as have been collected by Sylviane Granger at the University of Louvain (cf. Granger 1996) and by John Milton at the Hong Kong University of
Science and Technology (cf. Milton and Freeman 1996). Such data can also
systematically alert teachers and learners to typical problems such as language
interference.
Another of Widdowson’s polarities we
might deconstruct is the one between ‘knowledge’ in ‘the mind’ versus
‘behaviour’, the latter term perhaps reminding language teachers of
behaviourist pedagogy and Skinnerean behaviourism.5 But linking a
large corpus with behaviour and behaviourist methods would be flawed for at
least two reasons. The more obvious reason is that the behaviourist
‘audio-lingual’ method with its pattern drills and prefabricated dialogues was
based on mechanical language patterns more than on authentic data; it equated
language with behaviour in order to reduce language, whose relative complexity
it could not grasp, to behaviour, whose relative simplicity seemed ideal for
‘conditioning’, ‘reinforcement’ and so on; and the method was backed up by
heavy behaviourist commitments with in general pedagogy and by the prestige and
authority of American military language institutes, where ‘drills’ are
literally the ‘order of the day’. Nor does Sinclair advocate a teaching method
whereby learners parrot back corpus data; on the contrary, he has expressly
counselled against ‘heaping raw texts into the classroom,
which is becoming quite fashionable’, and in favour of having ‘the patterns of
language to be taught undergo pedagogic processing’ (1996).
The more subtle reason is that
corpus data are not equivalent to ‘behaviour’ in the ‘externalised’ sense which
Widdowson’s polarities imply and which is often encountered in discussions of
pedagogy, e.g., when a ‘syllabus’ ‘identifies’ ‘behavioural skills’ (Sinclair
1988: 175). Instead, they are discourse,
and the distinction is crucial. External behaviour consists of observable
corporeal enactments, of which the classic examples in behaviourist research
were running mazes, pulling levers, and pressing keys. Discourse is behaviour
in that externalised sense only as an array of articulatory and acoustic operations,
or, for written language, of inscriptions and visual recognitions; and no one
has for a long time — certainly not Sinclair — proposed to describe language in
those terms, nor does a corpus represent language that way. When discourse
realises lexical collocability and grammatical colligability by
means of collocations and colligations, the ‘performed’ continually
re-specifies and adjusts the contours of the ‘possible’. In parallel, ‘knowledge’ in ‘the mind’ decides the significance of the
‘behaviour’. Sinclair’s true position is
that these operations are far more delicate and specific than we can determine
without extensive corpus data. Moreover, analysing corpus data is less
equivalent to observing behaviour
than to participating in discourse.
We thus move on to deconstruct
Widdowson’s polarity between ‘first person’ versus ‘third person’:
The
description of internalised language requires a first person perspective. You
really have no choice if you are seeking to prise knowledge out from the recesses
of the mind: knowledge which is not realised as behavioural evidence available
to the observer […] Corpus linguistics […] adopts the third person perspective
and only describes what can be observed, [and so cannot] reveal […] ‘member
categories’ […] of the speech community itself which account for their
intuitions about the language. (1991: 15)
On the contrary: corpus linguists can reveal the ‘member categories’ they themselves hold and apply as ‘members of the speech community’
sharing what (Sinclair
1991: 498) would call the ‘general acculturation’ of the intended ‘target
reader’. They too are deeply
concerned with ‘the pragmatic use of the language in the transaction of social
business’ and ‘the interaction of social relations’, which Widdowson would reserve
for ‘discourse analysis’6 while boxing ‘corpus linguistics of the
COBUILD kind’ into a ‘text analysis’ concerned only with ‘performance
frequencies’ (1991: 13).
Especially for data from public
sources, corpus linguists can readily adopt a first person perspective as
potential speaker (e.g., how I might use language
to stress national prosperity, [14]);
a second person perspective as a potential addressee (e.g., how I might react
to a discourse about national prosperity
when all I can see is isolated pockets of prosperity among the rich); and a third person perspective
(e.g., how the general populace might be persuaded by
such discourse to vote in the interests of the rich). All this resembles what
ordinary speakers and hearers do, and discourse analysts as well. Having plenty
of data can trigger intuitions that might otherwise lie untapped if you were
just trying to ‘prise knowledge out from
the recesses of the mind’, which sounds like shelling a stubborn walnut.
The ‘difficulty’ with ‘generative
linguists’ ‘acting as their own informants’ and ‘drawing introspectively on
their own competence’ is not just, as Widdowson (1991: 15) commented, that
‘they are also members’ of ‘the community of linguists with all its
disciplinary sub-culture of different and incompatible attitudes and values’. A
more severe difficulty is that these linguists have in effect disowned that
membership in order to arrogate to themselves the authority of the ‘ideal
speaker-hearer’. Thus, Chomsky has denied that the
(presumably real) ‘speaker of a language’ ‘is aware of the rules of the grammar
or even’ ‘can become aware of them’; so ‘a generative grammar attempts to
specify what a speaker actually knows, not what he may report about his
knowledge’ (1965: 8). By implication, linguists who assume the role of the
speaker are claiming, simply by virtue of holding an academic degree in
‘linguistic theory’, to command superhuman powers for ‘becoming aware of and
reporting’ what other speakers cannot. Presumably, the ‘kernel sentences’ they
invent, like the man hit the ball,
would in turn be perfect data; and these — or at least their ‘underlying’ order
or ‘deep structure’ — would be far more suited to represent ideal language than
real data would be.
So it would not be at all
‘disturbing for the claims of corpus linguistics if there were disparities
between’ ‘what people indicate they would
say in a given context’ and ‘what they actually do say in such contexts’ (Widdowson 1991: 17). Quite the contrary: compare Widdowson’s
(1991: 17) view that ‘the correspondence between what people claim they would say and what they actually do say cannot be taken on trust’ with
Sinclair et al.’s (1990: xi) view that ‘any such points emerging from a set of
constructed examples could not, of course, be trusted’. Sinclair does not
attribute this lack of ‘trust’ to people being ‘ignorant’ and ‘hopelessly
wrong’, which Widdowson (1991: 11f) suggests he does; the obstacle is simply
that many constraints upon what people say, as I have pointed out, only emerge
during the actual discourse — what people do
say and not just what they would say.
The final Widdowsonian polarity (one
I already cited) we might deconstruct is between ‘internal’ or ‘I-language’
versus ‘external’ or ‘E-language’ appropriated from Chomsky’s more recent work.
‘I-language’ in Chomsky’s own sense is quite irrelevant to Widdowson’s
argument, being a universal code which is common to all languages and which is
not accessible to the interventions of language teachers because it is
genetically and biologically installed and implemented in fine detail: ‘there is a highly determinate, very definite structure of concepts and
of meaning that is intrinsic to our nature, and as we acquire language or other
cognitive systems these things just kind of grow in our minds, the same way we
grow arms and legs’ (Chomsky 1991: 66). Moreover, when Chomsky now ‘speculates’
‘that there may be only one computational system and in that sense only one
language’, his ‘radically different’ ‘post-1980s theories’ have ‘no constructions;
there are no rules’, ‘that is, language-specific rules’ (1991: 81, 92). What
Sinclair wants to describe and Widdowson surely wants to teach would still be
‘E-languages’, which Chomsky (1986: 25) has shrugged aside as ‘epiphenomena at
best’. Similarly, if ‘the distinction
between I-language and E-language description refers to what aspects of
language are to be described’ (Widdowson 1991: 15), the description of a
Chomskyan ‘I-language’ would be utterly useless for
language teaching, which has to deal extensively with ‘language-specific rules’
and ‘constructions’; and, as noted, ‘I-language’ is not teachable at all.
Or again, Widdowson means something
quite different than Chomsky does, and their ‘positions’ are not so
‘consistent’ after all (see above). Besides, even if ‘I-language’ versus
‘E-language’ are informally taken to designate what speakers know of their
language versus what they say in the language, the distinction between the two
could not be the same for native language learning (or ‘acquisition’), where
extensive knowledge is indeed acquired without ordinary learning, versus
non-native language learning, where that same acquired knowledge needs to be
revised, often consciously, to accommodate knowledge of the non-native language
(Beaugrande 1997b). And the same distinction might be unstable and inconsistent
for the same speaker in different contexts and for different speakers in the
‘same’ context. The special qualities of corpus data indicate that this
instability and inconsistency are a natural reflex of the huge range and
variety of constraints emerging on the plane of the actual discourse
(Beaugrande 2000).
So the evolving dialectic between ‘possible’ versus’ ‘performed’ in Hymes’
terms, or between ‘I-language’ versus ‘E-language’ in Widdowson’s (but not
Chomsky’s) terms, or between Chomsky’s ‘competence’ versus ‘performance’ would
best account both for the ‘fluency’ language teachers seek to instil and for
the regularities in large corpus data. I can see no sound justification for cordoning
off the two sides of any of these polarities in order to insulate the
activities of teachers from those of corpus analysts, as Widdowson’s
reservations seem to suggest even whilst, a bit inconsistenly, he is accusing
Sinclair of trying to discard the first term of each polarity.
In another source, Widdowson (1990:
18) has proposed yet another polarity between what language learners know
versus how they perform: ‘acquisition having to do with knowledge’ versus
‘accuracy having to do with behaviour’. The first term is problematic: for Halliday (1973: 24),
‘acquisition’ is a ‘misleading metaphor, suggesting that language’ is ‘property
to be owned’. The term was mainly promoted when generative linguists decided to
invent an account which was pointedly distinct from ‘learning’ — a distinction
later exploited by Krashen to discredit established methods of language
learning by reciting his airy that ‘learning cannot become
acquisition’ (e.g. Krashen 1985: 22, 24, 41, 48, 55) (see now Beaugrande 1997b). The second term is problematic too insofar as corpus
data indicate that many of the detailed decisions on the plane of the actual
discourse are not properly determined by criteria of ‘accuracy’ but of
‘appropriateness’ as defined by Hymes and cited by Widdowson (1990: 13) among
the criteria belonging to ‘E-language’, whereas Widdowson apparently consigns
‘knowledge of language’ to ‘I-language’; besides, criteria of ‘accuracy’ can
have the practical effect (noted below) of ranking conformity high above creativity. Perhaps we
might agree to distinguish instead between a person’s ‘language capability’ and
‘language achievement’; or between ‘known options’ versus ‘selected options’;
or between ‘available regularities’ versus ‘on-line decisions’.
A further polarity in that same
other source cited Bialystok and Sharwood-Smith’s (1985) ‘difference between
knowledge of language’ versus ‘the ability to access that knowledge
effectively’, with the implication that the ‘variation may either be because
these forms are tied in some way to a particular kind of context and so are not
freely transferable or because the second context imposes inhibiting conditions
which prevent learners from accessing and applying what they know’ (Widdowson
1990: 18). This position sounds reasonably compatible with Sinclair’s, since
corpus data are quite helpful for telling in fine detail which ‘forms are tied
to a particular kind of context’, and indeed suggest that such ‘tyings’ are the
rule rather than the exception, at least in English.
And precisely this fine detail may
be a submerged crux of the language teaching controversy, hinging upon an inclination of foreign language teaching,
and one Widdowson himself opposes, to ‘set a high premium on correctness’: ‘the
imposition of correctness’ ‘has the effect of inhibiting the learners’
engagement of relevant procedures for mediation acquired through an experience
of their own language’ (Widdowson 1991: 121, 124). Learners may arrive at the
intimidating misconception that there must be a ‘correct’ answer ‘rule’ for
everything detail, may besiege the teacher to tell them what it is, as reported
by Kova…i… (1998) for teaching English in Slovenia. This practice concurs only too well with a ‘linguistic theory’ wherein
‘language consists of a set of rules for the
combination of words into well-formed and meaningful sentences’ (Sinclair and
Renouf 1984: 76; cf. Beaugrande 1998b).
The crux would now revolve around be the danger of
corpus research getting misinterpreted (to stay with Widdowson’s terms) as demonstrations
of the accurate things language
learners must say rather than the appropriate
things the learners should take as their framework of orientation for what
they say. Only then would the teaching and learning of EFL be saddled with the
doomed precept that ‘what is not
attested in the data is not real English’. If this be Widdowson’s
real anxiety, it would be heartily shared by Sinclair and his team, witness the
Collins COBUILD English Grammar ‘that
contains a lot of productive rules; these rules are not restrictive, they are “do not” rules; they are “try this one” rules where
you can hardly go wrong’ (Sinclair 1991: 493; cf.
Sinclair et al. 1990: 493).
Moreover, the same anxiety might profoundly disturb
language teachers about large-corpus data if they viewed these as a colossal
compilation of fine-grained ‘prescriptions’ that must be ‘drilled’ into the
learners on top of the usual ‘grammar’ and ‘vocabulary’. Sinclair has on
numerous occasions espoused the opposite view, viz.:
More
adequate description will so organise the detail that it largely falls in line
with the meaning, and becomes easy, rather than difficult, to learn. If the
grammatical choices turn out in the main to be also lexical choices, then a
massive simplification can be expected. If on
top of that, grammar is seen as a springboard for creativity rather than as an
instrument of social discipline, the pleasure to teaching and learning can
increase enormously. (Sinclair
1991: 497)
These prospects reinforce the advocacy repeatedly
lodged in my own paper against separating of ‘grammar’ from ‘vocabulary’, which
pull the unity of the language apart. Francis and Sinclair (1994: 200) in turn
warn against ‘presenting learners with syntactic structures’ and ‘then
presenting lexis separately and haphazardly as a resource for slotting into
these structures’; ‘we should not burden learners with vast amounts of
syntactic information on the one hand, and lexical (“vocabulary”) information
on the other, which they then have to match according to principles which are
not naturally available to them as non-native speakers’.
Nor again should the relative frequency statistics in corpus data be misinterpreted as
the degrees of obligation for
teachers to prescribe and enforce the various usages. Such could be one
implication of Widdowson’s (1991: 20) reservation that ‘language prescriptions for the inducement of learning’ ‘cannot be
modelled’ on ‘the frequency profiles of text analysis’. He notes that language
teaching may have sound reasons for presenting data ‘because they are useful,
not because they are frequently used’ (1991: 20), and that artificially
simplified data would be fully admissible under this provision. Sinclair, in
contrast, would recommend simplifying language teaching by restricting the presentation
of artificial data in ways to prevent learners overgeneralising by not knowing
the authentic constraints. This recommendation is reasonably compatible with
some positions adopted by Widdowson elsewhere, e.g.:
there
is a great deal that the native speaker knows of his language which takes the
form less of unanalysed grammatical rules than adaptable lexical chunks. [they] are, of course, subject to
differing degrees of sentence modification. At one end of the spectrum, we have
fixed phrases that cannot be dismantled; at the other end, we have
collocational clusters which can be freely adjusted as sentence constituents.
[…] native speakers do not exercise the creative potential of syntactic rules
to anything like their full extent [and] indeed if they did so they would not
be accepted as exhibiting native-like control of the language [cf. Pawley and
Snider 1983]; anybody producing these syntactic variants of fixed idiomatic
phrases would nevertheless be adjudged incompetent in the language. (1989: 132f)
Here again, corpus data could offer language teachers
handy ways for estimating the status of their grammatical and lexical materials
along the parameter between ‘fixed phrases’ versus ‘collocational clusters’.
‘Widdowson’s point about unpredictable gaps in
corpora’ (Sinclair 1991: 493) needs further clarification too. Just as an array of choices in a corpus can, as a whole, be
highly improbable or even unique in a statistical sense,
there will be many arrays which do not
happen to show up in a corpus but which
could be readily produced and comprehended by native speakers of the language.
Yet insofar as these arrays are related to productive regularities that are implemented in the corpus data, they
do not properly constitute ‘gaps’. Sinclair (and I) would predict that in a
corpus of the size of the Bank of English, all of the really significant
productive regularities of English will be represented, but also that we will
always find ‘patterns for which there is some evidence, but insufficient to
make a conclusive case for significance’ (1991: 491). The gravity of this
problem should steadily recede as the corpus arrives at higher orders of
magnitude. At that stage, I would be surprised if we discover regularities (not
specific wordings like flip-flop or roger) which both are not represented in corpus data and yet are judged essential by teachers of EFL.
Clarification might be helpful once more when
Widdowson (1991: 18) asserted the ‘intuitive significance’ and ‘psychological
reality’ of ‘kernel sentences’, which ‘may not be authentic as units of
behaviour’, but which ‘are the stock in trade of language teaching’. As with ‘I-language’, Widdowson must be using the term
in a looser sense than Chomsky (1957: 106f), for
whom the ‘kernel of basic sentences’
must be ‘simple, declarative active with no complex verb or noun phrases’. By
this definition, the ‘stock in trade of language
teaching’ would be to feed learners on invented data like the man hit the ball and the
cat sat on the mat, but not the next man at bat was hit by a knuckle ball,
or the striped cat continued to sleep on
the mat, let alone innocuous authentic data like the lion and the unicorn were fighting for the crown, black sheep, have you any wool?, or Polly, put the kettle on! Surely
Widdowson meant simple sentences, and
only they have genuine ‘intuitive significance’ and ‘psychological reality’. He
may be concerned lest corpus data not be appropriately simple for the earlier
stages of EFL; but the regularities
most simply implemented in such sentences are of course represented in corpus
data as well.
Perhaps Widdowson would be content if a specially
selected corpus of appropriate data could be compiled to fit the levels of
simplicity he would recommend. At least, in a recent discussion (January 1997)
he approved of my proposal (elaborated in Beaugrande 1998a) to offer both
teachers and learners access to browse
through strategically selected and sorted ‘model corpora’, guided by
user-friendly walk-throughs. They could work together in exploring for themselves
not just contemporary English and other languages, but specific social,
regional, and registerial varieties of a language, including ones being spoken
as non-native languages in relevant pedagogical, academic, or professional
settings. Learners could also receive user-friendly rough-and-ready training
for working together in describing the
regularities they can find in the data. Here, I would advocate replacing
the traditional term and concept of rules,
which has accumulated far too much prescriptive and authoritarian baggage, with
the term and concept of reasons. The
replacement would be sound both on grounds of theory, because speakers
certainly do not follow ‘rules’ in the sense of either traditional or formalist
‘grammar’ for every choice they make but nearly always have ‘reasons’; and on
grounds of practice, because ‘rules’ carries disempowering connotations of
authorities, compulsions, violations, and punishment. Learners should be
reassured that they are basically ‘reasonable’ and deserve to know the
‘reasons’ why they should do or say things, and to have their own ‘reasons’
respected. Moreover, we would help to rebalance creativity with conformity,
since appropriate contexts supply good reasons to choose creatively on the
basis of a steadily more ‘delicate’ sensitivity toward the typical collocations
and colligations.
Browsing through a learner-oriented corpus on one’s
own pacing and initiative might finally eliminate much of the stress, anxiety,
and indifference fostered by conventional language teaching with its focus on ‘accuracy’ or ‘correctness’.
The learners who actively invest their creativity in discovering other people’s
‘reasons’ could thus gain substantial initiative and authority during the
overall process of learning, with a matching rise in interest and motivation as
compared to the passive, alienating, and mechanical application of ‘rules’ laid
down by teachers or textbooks.
A fascinating prospect would be to make the enterprise
cumulative. Advanced learners could guide the newcomers though the browsing
procedures and share their own results. Also, the total results could be
accumulated in a data base which could eventually serve to formulate the first
learner-generated grammar and lexicon in the history of language education. Such
a work would be an impressive implementation of the principle of learners
taking charge of their own learning processes, long advocated by democratic
educators like Paulo Freire (1985 [1970]).
Co-operative browsing might be an excellent activity
for dispelling the misunderstandings and anxieties language teachers may
harbour about large-corpus data. The misunderstandings I wish to dispel here
concerns the positions attributed to John Sinclair. He by no means asserts that
any corpus, however large, equals the total or ‘real English’; or that the
‘performed’ equals the ‘possible’. What
he does assert is that the difference between those data and regularities which
are found in a very large corpus
versus those which are not should be
significant for people who purport to make authoritative statements in
textbooks or reference works about ‘real English’, especially when addressing
learners of English who will try to put the statements into practice. Sinclair
also asserts that the same difference is significant for the competence of
adult native speakers, who are likely to say combinations that are frequent in
the corpus and are unlikely to say combinations that are infrequent or do not
occur, although they certainly can
say the latter in appropriate contexts. Such speakers have an intuitive sense
of which combinations are common, sensible, useful, and so on, without at all
implying that others are ‘just not possible’ or ‘not real English’. Their ‘immediate intuitive response this is part of competence and of a well
ordered view of language’ (Sinclair 1996)
Furthermore, Sinclair asserts that
the data and regularities which do appear frequently in a large corpus should
be relevant and interesting for teachers and learners of English as a native
language and even more as a foreign language. And finally, he asserts that
taking corpus data into account could improve the quality of English world-wide
because non-native learners would have much more detailed models and targets to
aim for (Sinclair 1996).
2. conclusion and outlook
I have tried to explain why some major ‘revisions’ are
on the cards for both theoretical linguistics and applied linguistics, and why,
rather than ‘fearing for our future work’,
we may justly sustain some refreshing optimism. I
have suggested that many important problems facing our work in both theory and
practice have been artificially fostered by ill-advised moves to replace real
language with ideal language. A natural
and unfortunate consequence has been the symptomatic ‘antipathy to data’, which Sinclair (1997: 8) invokes, and which may now
mislead language teachers about the vital opportunities offered by finally
having access to vast amounts of authentic language data. We might ponder Sinclair’s (1994) allegory of the church authorities who refused
to look through Galileo’s telescope lest they see that the earth is not the
eternal centre of the universe; so also might language authorities refuse to
work with corpora lest they see that their ideal ‘language’ (or I-language’) is
not the eternal centre of human ‘competence’ or the true sphere of ‘linguistic
universals’.
Most importantly, perhaps, large-corpus data can
provide an really effective counter-weight for the deeply ingrained insecurity
many speakers have about the real language they themselves produce, whether
native or foreign. Corpus data reveal how skilled ordinary speakers actually
are; and how the real language they produce is, as Sinclair (1991: 492) writes, ‘exhilarating creative, marvellously unpredictable,
wayward, unruly, quite incredibly productive’.
notes
1 I am deeply indebted to John Sinclair for discussing
a number of the issues raised here and for providing access to his Bank of
English terminal and to his unpublished materials. I also profited from
discussions with Henry Widdowson, Michael Halliday, Sid Greenbaum, Clive Holes,
Elena Tognini-Bonelli, Jeremy Clear, John Milton, and Nigel Turton.
2 Ironically, ‘langage’ was precisely Saussure’s term
for ‘speech’, as compared to ‘parole’ (translated as ‘speaking’)!
3 On these terms, compare already Firth (1968);
Greenbaum (1974). The term ‘preferences’ is elaborated in Louw
(1993); Sinclair (1994). Sinclair’s term ‘prosodies’ for ‘prosodies’ for ‘the
attitudinal meanings that emerge once you extend the phrase sufficiently far —
the point where the surface patterns of language give way to meaningful
choices’ (1996) could be
misunderstood as referring to intonation.
4 Yet Widdowson seemed a bit inconsistent later:
‘discourse analysts tend more and more towards the relativism’, and ‘to the
extent that they favour direct confrontation with actual data, they make common
cause with the text analysis of corpus linguistics’ (1991: 16). In my view, a
restrictive separation between ‘discourse analysis’ versus ‘text analysis’
hardly seems justified nowadays; but Widdowson might well think so (compare
Note 6).
5 Such could be one reading of Sinclair’s remark about
‘dealing with uncomfortable material’ ‘by tying it to a discredited
methodology’ (1991: 490).
6 Widdowson certainly has his own special views on what
‘discourse analysis’ should be — it was the topic of his unpublished thesis at
university — and has recently signed contract to write a book about it.
references
Beaugrande, R. de. 1991. Linguistic Theory: The Discourse of Fundamental
Works. London: Longman.
Beaugrande, R. de. 1997a. New Foundations for a Science of Text and
Discourse. Greenwich, CT: Ablex.
Beaugrande, R. de. 1997b.
‘Theory and practice in applied linguistics: Disconnection, conflict, or
dialectic?’ Applied Linguistics 18/3:
279-313.
Beaugrande, R. de. 1997c.
‘On history and historicity in modern linguistics: Formalism versus
functionalism revisited.’ Functions of
Language 4/2: 169-213.
Beaugrande, R. de. 1998a.
‘Society, education, linguistics, and language: Inclusion and exclusion in
theory and practice.’ Linguistics and
Education.
Beaugrande, R. de. 1998b. ‘Performative speech acts in linguistic theory: The rationality
of Noam Chomsky.’ Journal of Pragmatics 29: 1-39.
Beaugrande, R. de. 1998c. ‘On ‘usefulness’ and ‘validity’ in the theory and practice of linguistics: A riposte to H.G. Widdowson.’ Functions of Language 5/1: 87-98.
Beaugrande, R. de.
2000. ‘Text linguistics at the millennium: Corpus data and missing links’. Text 20.
Bialystok. E. and M. Sharwood-Smith. 1985.
‘Interlanguage is not a state of mind: An evaluation of the construct for
second language acquisition.’ Applied
Linguistics 6/2: 101-117.
Chomsky,
N. 1957. Syntactic Structures. The Hague: Mouton.
Chomsky, N. 1965. Aspects of the Theory of Syntax.
Cambridge: MIT Press.
Chomsky, N. 1986. Knowledge of Language. New York:
Praeger.
Chomsky, N. 1991.
‘Language, politics, and composition’ in G. Olsen and I. Gales (eds.) Interviews: Cross-disciplinary perspectives
on rhetoric and literacy. Carbondale: Southern
Illinois University Press, 61-95.
Francis, G. and J.McH. Sinclair. 1994. ‘I bet he drinks
Carling Black Label: A riposte to Owen on corpus grammar.’ Applied Linguistics 15: 190-200.
Freire, P. 1985
[orig. 1970]. Pedagogia do oprimido.
Rio de Janeiro: Editora Paz e Terra.
Granger,
S. 1996. ‘Learner English around the world’ in S. Greenbaum (ed.) Comparing English World-Wide: The
International Corpus of English. Oxford: Clarendon, 13-24.
Gougenheim, G. et al. 1956. Le
français fondamental. In L’Élaboration du français elementaire. Paris: Didier.
Greenbaum, S. 1988. Good English and the Grammarian. London:
Longman.
Halliday, M.A.K. 1973. Explorations in the Function of Language.
London: Arnold.
Hjelmslev, L. 1969 [orig. 1943]. Prolegomena to
a Theory of Language. Madison: University of Wisconsin Press.
Hymes, D. 1972. ‘On communicative
competence’ in J. Pride and J.S. Holmes (eds.) Sociolinguistics. Harmondsworth: Penguin, 264-293.
Kova…i…, I. 1998. ‘Relating grammar to discourse, or: Can grammar
classes be like poetry classes?’ in R. de Beaugrande, M. Grosman, and B.
Seidlhofer (eds.) Language Policy and
Language Education in Emerging Nations: Focus on Slovenia and Croatia.
Greenwood, CT: Ablex.
Krashen, S. 1985. The Input Hypothesis:
Issues and Implications. London: Longman.
Milton,
J. and R. Freeman.
1996. ‘Lexical variation in the writing of Chinese learners of English’ in C.E.
Percy, C.F. Meyer, and I. Lancashire (eds.) Synchronic
Corpus Linguistics: Papers from the Sixteenth International Conference on
English Language Research on Computerized Corpora. Amsterdam: Rodopi,
121-131.
Pawley, A. and F. H. Snider. 1983. ‘Two
puzzles for linguistic theory: Native-like selection and native-like fluency’
in J. C. Richards and R. Schmidt (eds.) Language
and Communication. London: Longman.
Saussure, F. de. 1966 [orig. 1916]. Course in
General Linguistics (transl. Wade Baskin). New York: McGraw-Hill.
Sinclair, J.McH. 1985.
‘Selected issues’ in R. Quirk and H.G. Widdowson (eds.) English in the World. Cambridge: Cambridge University Press, 11-24.
Sinclair, J.McH. 1988. New
Directions in English Dictionaries. Unpublished manuscript.
Sinclair, J.McH. 1991.
‘Shared knowledge’ in J. Alatis (ed.) Georgetown
University Round Table on Languages and Linguistics 1991. Washington, D.C.:
Georgetown University Press, 489-500.
Sinclair, J.McH. 1994.
‘Large Corpora Are Here to Stay.’ Lecture at the University of Vienna, June
1994 (on video).
Sinclair, J.McH. 1996.
‘What Do We Know about Language, How Do We Get to Know It, and What Has All
That Got to Do with Language Teaching?’ Paper at the International Conference
on Analysis and Description: Applications to Language Teaching, at Lignan
College and at the Hong Kong University of Science and Technology, June 1996
(on video).
Sinclair, J.McH. and A. Renouf. 1984. ‘A lexical syllabus ofr language learning’ in
J.McH. Sinclair et al. Lexis and
Lexicography. Singapore: National University Press, 75-95.
Widdowson, H.G. 1989.
‘Knowledge of language and ability for use.’ Applied Linguistics 10/2: 128-137.
Widdowson, H.G. 1990. Aspects of Language Teaching. Oxford:
Oxford University Press.
Widdowson, H.G. 1991.
‘The description and prescription of language’ in J. Alatis (ed.) Georgetown University Round Table on
Languages and Linguistics 1991. Washington, D.C.: Georgetown University
Press, 11-24.
Widdowson,
H.G. 1997. ‘The use of grammar, the grammar of use.’ Functions of Language 4,
2: 145-168.