Part I. Managing data for a “grammar’

I.1    Writing a book in my own “breezy” style -- a designation once bestowed on it by Michael Halliday and Ruqaiya Hasan, whose works unknowingly started me on my road way back in the 1970s --

CHINA, 1996

[We were photographed in 1995 near Badaling by the "Great Wall". which Michael first visited, via steam train, back in 1949.]

to “publish” in my own manner on the Internet, is a welcome boon for a lively and engaging presentation, free from the general custom of “grammar books” to plod or preach. I shall earn no modest royalties for me, nor immodest profits for publishers, yet this time around -- to borrow a phrase made unforgettable by Steve Biko, a martyr of “South African justice”, whose works lent me potent momentum in the 1990s -- “I write what I like”.[Note 1]

I.2    In fair return, you can accept or change what you like -- paraphrase, rewrite, and translate, or even delete whatever you disapprove. Neither real English data nor the images it invokes are necessarily genteel, nor adapted to the more sensitive cultural mores of diverse societies where English is not the home or only language. And if you regard “grammar “ as a solemn convocation best conducted with a patronising or magisterial scowl, you can delete my impish jokes, provided you recognise them.

I.3    My own social, economic, and political views are frankly evident here, now that I am free from pressures of editors and publishing houses controlled by octopussy “conglomerates” and mutated (‘merged”) from tinpot manufactures of biscuits, car radios, or cigarettes to producers of all commodities whatsoever, including paid-on-delivery “disinformation” in the discourse of mass media. In the 21st century, ostrich-headed silences or “Pilatous” hand-washings must count as complicity in the global disorder. And since grammar feed into a bundle of social, economic, and political factors, to treat it otherwise overcoats it with a patina of ritual and shamanism queasily camouflaged as “discipline” and “formality”.

I.4    Since I am freed from argy-bargies with editors about typefaces, I have followed my own inclinations:

 Italic for emphasis in my own texts;

Bold for special terms;

 Underlined for items I wanted to highlight in data samples;

SMALL CAPITALS for stressed words or syllables; and

LARGE CAPITALS where the data themselves have them.

I.5    No doubt the highest priority for a “friendly grammar” is to be resolutely simple and clear -- a tall order for a subject-matter which is often neither of those, and which has been breathily befogged by centuries of murky, opinionated “explication” and “instruction”. Above all, its terms should be conscientiously explained and consistently applied; I decided to make them visually distinct as well by displaying in CAPITALS and in “TIMES NEW ROMAN”, whilst the main text is in the usual “Arial”. This tactic may preclude confusions with ordinary uses, e.g., PRESENT and PAST as VERB TENSES versus “present” and “past” as expanses of time; or a SUBJECT in a CLAUSE versus a “subject” in a language curriculum. The terms also appear in BOLD TYPE in key passages where they are being expressly introduced, defined, or expounded. To locate those for review or reference, I have provided an Index according to the numbering of “Parts” and paragraphs.

I.6     Throughout it all, I have done my best to retain conventional terms as far as I judged them serviceable or saw how to render them so. Some, like “perfect”, “voice”, and “mood”, seemed uninformative and potentially misleading, and were replaced. Others, like NOMINAL and VERBAL, had to be retained but with careful specification of their uses. Still others, like DETERMINER and ENUMERATOR, had to be adapted from the recent reference grammars reviewed below in I.13 and II.26. And finally, some innovations had to be coined to enhance consistency and symmetry, such as STATEMENT TAG, COMMAND TAG, and EXCLAMATION TAG corresponding to the well-known QUESTION TAG (Part VI).

I.7    I fully appreciate the difficulty and resistance in changing terms and notions regarding “grammar”, whether among ordinary citizens or practicing English teachers or grammarians. Please believe that my own terms and notions have been deliberated and documented by data with all the thoroughness in my capacity. I present what I judged to be soundly reasoned, justified and certified by extensive evidence, of which I surveyed a vastly greater quantity than I have space to present here, often scanning hundreds of occurrences to single out a few among the most appropriate.

I.8    In this spirit, unless otherwise indicated, my data are authentic; I lifted no invented data from “grammar books”, “grammar websites”, “style guides”, and such like. “Authentic data” are attested in actual usage among people who were not acting the role of “grammarians”. I drew principally upon two corpora (or “corpuses”, which sounds too much like “corpuscles”). The British National Corpus (hereafter BNC, where appropriate with superscript BNC), installed in software called SARA (it’s an anagram within an anagram, so better not ask), offers 100 million words of mainly contemporary “British” usage in the broadest sense, ranging across the entire “United Kingdom” along with some outlying islands, and combining not just written English in such domains as novels, biographies, journals, newspapers, guidebooks, business reports, school essays, and disquisitions on topics like philosophy, religion, and health, but also an invaluable subset of 10 million words of transcribed spoken English in television news scripts, managerial meetings, council meetings, legal or medical consultations, university lectures, sermons, and ordinary conversations.[Note 2] I was delighted to find oral data from regional sources as well, like the Nottingham Oral History Project, the Suffolk Sound Archive, and the Orkney Sound Archive. Though I did not cover them in any detail, large data sets are becoming available on local Englishes in former “colonies” like India, Malaysia, Hong Kong, Nigeria, Kenya, and so on; and such nation languages as Jamaica, Barbados, and Belize. Some of these are gathered in corpora by the International Corpus of English (coordinated by Gerald Nelson at University College, London), which as of March 2002 (latest homepage update) lists teams in Great Britain, Hong Kong, East Africa, India, Singapore, the Philippines, and New Zealand.[Note 3] So far, I am not aware of any projects for a comparative synthesis of their corpus-driven “grammars”, a daunting prospect in any case.

I.9   As if aspiring not to be outdone, the Yanks (also known as “Americans”) [Note 4] have tardily announced an American National Corpus (ANC) of 100 million words, based on an update of the same software used by the BNC, hosted by the University of Pennsylvania. As of October 2006, two small “releases” have been posted on the website, while the project itself is being announced for 2007. A small obstacle impends: to use the corpus for “reference works, textbooks, and software”, you must join their “Consortium”, which requires an annual fee of  US$ 40,000! No comment is necessary (or printable) as to why I haven’t been using it.

I.10   My own English Prose Corpus, installed in software called WordPilot® and profoundly indebted to Project Gutenberg, offers roughly another 100 million words,[Note 5] including most of the so-called “classic” or “popular” published works in mainly “British” and “American” literature, philosophy, (auto)biography, history, politics, science, economy, geography, exploration, and folklore of “legends” and “fairy tales”, over the past three centuries up to the profit-minded cut-off by “International Copyright Law” -- plus all of Shakespeare's plays and the entire King James Bible. During my more recent periodic upgrades in size, I tried my hand at accessing the swelling resources of Australian, Canadian, Irish, South African, Native American, African American, Jewish, and Feminist data. I have also utilized my Drama Corpus, which I hope is reasonably representative; and only occasionally my Poetry Corpus, which is more symbolic than representative.

I.11    Even at a total of around 200 million words, my two corpora leave all too many blind spots, so I have often turned to the Internet, which proffers the advantages of diversity and openness, and where contributors need not be -- and, in many I observe, plainly are not -- self-conscious or apologetic about their usage; contact is their objective. Large corpora can already impose tedious travails, yet the Internet far more, at least with current software: leaky searches let target data slip through; noisy searches present the target data in a wash of irrelevant but superficially similar data; and bulky searches toss up hopelessly copious quantities (thousands or millions) which may or may not be target data. Any of these may trap us in the dizzying work I have heard called in computer circles brute force -- trudging though dodgy wodges of data and hoping for lucky finds. So, stating which data “are frequent in English” is a risky move; stating which “do not occur at all” is worse. I can only report what I did or did not find, without any aspirations toward completeness or finality.

I.12   Under such circumstances, I must proceed within some boundary assumptions. I notice myself saying “plausibly” more often than “good style” might prefer, simply because “grammar” is not a field of certainties, validations, and proofs. For my part, I can at least police my own usages as author: if I have felt unsure about a whether people (still) say what I would, I have queried the BNC or the Internet, and sought to tap the opinions of the language community. If I do treat myself to occasional echoes from Shakespeare, Milton or such like for brisk variety and entertainment value, then always in contexts where, if not recognised, the meanings should still be evident. As for usages I present as instances or samples of English, if one is nowhere attested in my corpora nor on the Internet, I feel justified in disregarding it unless it serves as a counter-example, signalled with ? or ??? for (very) “doubtful”, or with * for “nowhere found” (by me, you see); if I find it “rare” or “uncommon”, I say so. If an attested usage from some older or regional usage might seem obscure to less fluent or non-native users of English, I provide a rough-and-ready “translation” in [square brackets].

I.13    Regarding which issues might merit description, I have thoroughly familiarised myself with A Grammar of Contemporary English (CGE) and its thoughtful revision, A Comprehensive Grammar of the English Language (hereafter CGEL); plus the Collins COBUILD English Grammar (hereafter just COBUILD). I’ll say a bit more about them later (cf. II.26ff); here I would merely justify my confidence by their basis upon authentic data and by their laudable intent to be “comprehensive”. If an issue is not treated in any of these, I feel safe in assuming it has not been treated in the turgid Sargasso Sea of past grammar-books since Elizabethan times, or else has been firmly discredited in the interim.  In exchange, I shall be reporting some issues and usages which I have found to be authentic, yet which have apparently been missed by my sagacious predecessors.

I.14  For convenient reference, my examples are numbered in square brackets; that way, I can spare obtuse phrasings like “the second of the three examples given two paragraphs back”. Alternative versions of the same example are identified with lower-case letters after the numbers, e.g., [1] being shown in several variations as [1a], [1b], [1c] and so on. To highlight items inside the samples, I use underlining, which is the easiest to spot, especially for short bits like PRO-NOUNS.

I.15   Again for convenient reference, each paragraph bears its own number printed off in the left margin. These provide a precise means for cross-referring from one section to another, as is frequently needed in describing “grammar”, with its myriad interconnections.

I.16  Footnotes appear far more sparingly than in my recent New Introduction (2004), and mostly where I thought there might actually be occasion to follow up a reference. They appear as “[Note 1]” and so on, because tiny “superscript” numbers can get uncharitably mangled on the Internet. To keep the presentation uncluttered, I put the Notes at the end of each uploaded section.

I.17  To formulate my beginning, I stay clear of the convention of jumping right into the “grammar” itself with a staid opener like “The parts of speech in English grammar are these:”; or “The English sentence is composed of subject and predicate”. I feel allowed, indeed obligated, to first explore some weighty issues and problems pervading the term “grammar” in its various applications, as well as the “teaching” and “learning” of a subject-matter. Insofar as conventional approaches have been ineffectual in the past, we evidently still need to modify and deepen our understanding of what “English grammar” is about and how it might be made more “friendly” for all concerned.

I.18  By following and reporting the flows and currents in my authentic data, I am released from the practices of inventing data out of my own “intuition’; or of posing as a “great expert” who knows what’s what without even consulting data. Here, I must in some manner re-invent the role of the author (many editors and publishers spurn that, too), since my own voice is but one in a many-headed multitude. In the BNC, I “talk back” to omnivorous periodicals like Today, the Independent, and the Guardian; trendy ones like Sky, The Face, and New Musical Express; sober ones like New Scientist, Practical Fishkeeping, and the British Journal of Medicine; stilted, verbose rhetoric from the House of Commons via Hansard; well-meaning guidebooks and pamphlets on every aspect of life, as if the dear Brits can’t manage anything for themselves, not even how to be fuddy-duddies; and modern literature, some high enough to imbue the wide world with intellectual sense and soul, but some low enough to appease the pubescent sensationalism or voyeurism with formulaic fables fantastical, apish, shallow, inconstant, full of tears and smiles, such as Mills & Boon. In the EPC, I engaged with astute writers who made over the English language to suit themselves, like Shakespeare, Dr Johnson, Addison and Steele, Austin, Dickens, or Joyce; but also pioneers in the frontier life of Australia like Henry Lawson and Rolf Boldrewood, or of Canada like Ralph Connor and Tekahionwake; but also visionary advocates of social reforms as yet unrealised, like Mary Wollstonecraft, William Godwin, and W.E.B. Du Bois; heart-warming humorists like Jerome K. Jerome, Stephen Leacock, and P.G. Wodehouse; intrepid tellers of Native American tales and legends like Ohiyesa, Owindia, and Zitkala-Sa; and writers who were legends themselves, like Sojourner Truth, Harry Houdini, and Sarah Bernhardt. And in their wake parade the myriad characters portrayed or created, like a procession of endearing acquaintances. So grand a family of voices must have something to reveal about the “grammar of English”, beyond all reach of any lone grammarian’s intuition or invention.

I.19  Which is why all of my presented data is at best a highway of road signs. A lifetime of experiences around the globe has convinced me of the impracticality of “teaching English” and “learning English” as subject matters cordoned off in the schools. Our task cannot realistically be to dispense our own individual “knowledge of English” into the minds of learners like empty containers being filled to the brim; all too often, the results are disappointing. The more successful learners are those who go foraging for the language out of class. Our task should rather be seeking to kick-start this diligent enterprise, turn them loose on all the data they can get, and help them along with guidance and demonstrations of how to dig and explore, what to look for, and how to make workable use of it.

I.20  Which is also why I have not included “Exercises” using data samples of my own choosing, even though I hope this cyber-book may be useful for courses in schools and colleges. Learners can construe my presentations as “demos” of what they can do in gathering and interpreting data in order to make their own presentations, thus assuming the role and authority of “local experts” in the eyes of their peers -- a rare and stimulating experience in education on any level. The “grammar ” is best understood by those who can seek out further examples of the usages and patterns I describe.

I.21  To make the Grammar come more to life, I have introduced illustrative cameo visuals (carefully restored as appropriate in PaintShop®) of interesting folks. After all, GRAMMAR is for people, by people, and about people. My choices of people were serendipitously steered toward notables who I thought could also become engaging topics of student discovery projects for class presentation and for inclusion in a “communal workbook”.

I.22  Regarding potential copyright issues, I would cite current legislation, such as Title 17 in Section 107 of the US Copyright Law on “fair use”:

reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright.

My own use of Internet imagery is certainly not commercial, since my “grammar” is a non-profit venture if ever there was one.

I.23  More generally, a narrowly economic and proprietary outlook belies the essential spirit of the Internet and its founders. The Internet is the most public domain we have ever had, and attempts to squeeze and filter people who need information constitute rapacious abuses. For my part, I have repeatedly gone on record in my writings that knowledge, the foremost commodity you can share with gain rather than loss, carries the obligation to do so everywhere and for everyone, expressly working against social or economic restrictions upon so-called “intellectual property”. What’s mine must be yours as well, or it will be nothing.

I.24  Some years ago, textbooks were the targets of a stinging jeremiad:

Most textbooks [are] impersonally written [and] give the impression that the subject is boring; They have no “voice”, reveal no human personality. […] And still worse, there is usually no clue given as to who claimed these are the facts of the case, or how “it” discovered these facts. […] There is no sense of the frailty or ambiguity of human judgment, no hint of the possibilities of error. Knowledge is presented as a commodity to be acquired, never as a human struggle to understand, to overcome falsity, to stumble toward the truth. (Neil Postman, The End of Education: Redefining the Value of School)

Though I would have to largely concur – stressing however the compelling pressures of publishers upon authors – I believe I can state with some confidence that this “cyber-textbook is the dialectical antithesis of the type here invoked. Judge for yourselves.

Notes to Part I

1. Actually, this winged phrase, before being used for a whole book, began as the title of a paper published in a student journal under the pseudonym “Frank Talk” (a phrase I expropriated on my website for topics that our jolly old profession rarely speaks about “frankly”, if at all), and cited as evidence at his trial ostensibly for planning the (later banned) rally at Currie’s Fountain in Durham, to celebrate the Frelimo victory in Mozambique in 1974, but in reality for his role as the voice of the Black Consciousness Movement. As the world knows, he was brutishly assassinated in jail in September 1977, which sparked international outrage. Yet his writings are very much alive.

2.        The BNC material is largely under copyright, though not owned by Oxford University Press, which markets the corpus. The texts are in any case so handled that converting them to commercial uses is hardly feasible. The books are represented only in parts; the periodicals are out of date; and all text files are crammed with complex markers (e.g. for “Parts Of Speech”) before each and every word, so that reformatting them back to their original state is horrendously toilsome and tedious. My own uses of BNC material are assuredly not commercial, and the license was duly purchased. The corpus was at first only online, which made large searches grindingly slow for me in the remote outreaches of Africa and Arabia; the fast CD version, which can be ordered from the OUP website at www.hcu.ox.ac.uk, can be installed on your PC in Windows 98 and later, as well as UNIX; my Windows version occupies 5.05 gigabytes of hard disk space. (Important: Install “Texts” before “SARA’; and after clicking “install”, you must touch nothing until “commanded” to do so -- and click once, not twice -- even though the computer periodically appears to have stopped. Watch the green light on your CD drive. And keep an hour or two of reading matter at hand for the wait.)

3.   The website is www.ucl.ac.uk/english-usage/ice. Several of the corpora can be downloaded.

4.    No offence, but after living five years in South America, I balk at reserving “Americans” for natives of the “USA”. I know that “Yanks” is used over there in the south for northerners and northeasterners, mostly seen as tiresome tourists or suspicious “investors”. Yet over two centuries of ruthless exploitation, the “USA” have indeed been “yanking” away everything they could from their defenceless hemispheric neighbours, so there we have it. Yanks a lot.

5.   The computed total of my EPC as of July 2005 was 101,942,700 words, but this must reflect some bloating, due, for example, to the systematic (but indispensable) repetition of characters’ names before each of their speeches in the “drama” sub-corpus. Where I found it feasible and fitting, I removed extraneous material, such as the Project Gutenberg “small print” that totals over 2000 words in every downloaded item; let me express here my deepest gratitude, and my hope that I am using their materials as they would welcome. The EPC is installed in WordPilot, which can be downloaded for a modest $49.95 from www.compulang.com. You can make your own library out of any files saved as “text only”. But if, like me, you are interested in word combinations, you need to remove all extra blank spaces and all line-end carriage returns (abounding in Internet files) that do not mark the end of paragraphs; I literally extracted millions of them. If you want some tips on how do this without ravaging your sanity, contact me at the e-mail address on my website.

-- CLICK HERE TO GO TO PART TWO, Number ONE --

-- CLICK HERE TO RETURN TO HEADER  --

 

-- CLICK HERE TO RETURN TO RdB'S HOME PAGE --