I
I
Sequential
Connectivity
1.
Transformational Sentence Grammars
1.1
From the standpoint of operation, conventional sentence grammars are MODULAR
(cf. 1.2.7). A sentence is first generated as a syntactic pattern; subsequently,
a “semantic interpretation” is performed; and finally, in some versions at
least, a “pragmatic interpretation” follows (I.2.6). This ordering reflects
the sequence and scale of priorities in modern grammatical theories. If language
users processed a real sentence in this fashion, they would be re-enacting in
miniature the history of the linguistic discipline since 1950. However, they
might consider themselves fortunate if they managed to finish a complete
sentence in so modest a time as only thirty years.1 [1. The
fact that linguists analyse sentences in a few moments is due to theirprior
knowledge of What contexts the sentences could occur in (cL I.1.16).]
1.2
The issue at stake is COMBINATORIAL EXPLOSION: a drastic over-computation of
possible structures and readings that soon runs into astronomical operation
times (cf. Woods 1970; Winograd 1972: 31). To process a sentence, an autonomous
syntax cannot consult the decisive cues of meaning and purpose that real
utterances provide; it can only test one structural description after another by
trying various ways it could generate the sentence until the right one is found
(a form of “analysis by synthesis”). Even for a computer with an extremely
high speed, this procedure rapidly gets out of hand. One autonomous
phrase-structure grammar was calculated by Stanley Petrick (1965; reported in W.
Klein 1974: 179) to require, for the analysis of just one sentence at the rate
of one millionth of a second per cycle, merely 300,000,000,000,000, 000,
000,000,000,000,000, 000,000,000, 000,000,000 years — about six times the life
expectancy of the Sun.2 [2. Recent claims for transformational
parsers being as fast as the augmented transition networks we discuss later (cf.
Damerau 1977 and references there) are spurious. Petrick and co- workers are
using a much smaller vocabulary and knowledge-world, and a much faster computer
than Woods’ original test runs. Moreover, what has been programmed is not
Chomsky’s “standard” theory. All transformational parsers — among which
the one developed by Mitch Marcus (1977, 1978) stands out as especially
attractive — have been substantially amended to make them viable on the
computer.]
1.3
The picture is not much brighter for a semantics of independent word-meanings.
Every word with n possible meanings might multiply the number of
alternative ways to understand the whole sequence by n. Thus, for a
seven-word sequence in which each element has only three potential meanings, a
processor might have to contend with 2,187 readings. A sequence twice that long,
but still with only three meanings per word, would have a total of 4,782,969
readings. Consider what would happen with a word like ‘take’, for which
Steven Small (1978: 2) lists 57 meanings.3 [3. In III.3.5, I
introduce the term ‘senses’ for these alternative word meanings, following
the trend in computational semantics (e.g. P. Hayes 1977; Rieger 1977b; Small
1978).]
1.4
These examples for short sequences give some impression of the truly staggering
numbers that could be involved in the processing of entire texts. The modularity
of logical and quasi-logical grammars means that one cannot take advantage of
the contextual interaction of cues that renders the utilization of texts
feasible under everyday conditions. If text processing indeed depends on the
maintenance of connectivity (I.6.8), then the computation of structural
derivations is a roundabout way to proceed. Human language users would not
insist on such rigor anyway, but would jump to conclusions or work with fuzzy
configurations where an automatic theorem-prover might well run on forever (cf.
Goldman 1975: 328).
1.5
It might be helpful to differentiate again between actual and virtual systems
(cf. I.4.1). The abstract grammar of a language cannot be required to state an
absolute limit on the number or format of all possible sentences: someone could
always add a new sentence or make an old one longer. The notion that a language
allows an INFINITE set of sentences rests upon the potential for RECURSION:
cyclic repetition of a given operation (e.g. adding more and more relative
clauses to a sentence). To make an abstract grammar operational, we need to
impose controls on the length and complexity obtainable via recursion. In other
words, we need to impose actualization constraints upon the virtual syntactic
system of the language.
1.6
The two tenets of sentence grammarians to (1) uphold the autonomy of syntax and
(2) reduce all complex sentences to a fixed set of simple formats, has created a
grave obstacle for theories of language processing. The tenets lead to a model
of language in which operations consist of converting structures to other
structures within the same system.4
[4. János Petöfi (personal communication) agrees that for a theory of
texts, transformation among structures in diverse systems must be allowed. His
own model shows how this might be done (see note 18 to Chapter I).] To keep
syntax isolated, the standard model foresaw a purely syntactic “deep
structure” as the immediate goal of sentence processing. When meaning was
included, (Katz & Fodor 1963), it followed suit: nothing could be provided
except “yet another algorithmic operation defining structures” (Seuren 1972:
245) — namely the conversion of concepts into minimal semantic units (cf.
III.2). Syntax and meaning were thus unable to interact during their respective
operations. This difficulty has led to a decline in the acceptance of the
original notion of “deep structure” (cf. McCawley 1968a, 1968b; Lakoff
1968a, 1968b, 1971; Maclay 197 1; van Dijk 1972a; Liefrink 1973; Kintsch 1974;
Osgood & Bock 1977; Stockwell 1977). The appraoch of “generative
semantics,” which tried to obtain a more concerted interaction of syntax and
meaning was not a mere “notational variant” of the standard model (Chomsky
1970, 1971; Katz 1970, 1971), at least not in its intention. Both theories can
describe grammatical sentences; but the one with autonomous syntax has severe
operational disadvantages.
1.7
Chomsky (1961) himself took pains to warn against the “prevalent and utterly
mistaken view that generative grammar itself provides or is related in some
obvious way to a model for the speaker.” Still, we do encounter such contrary
assertions as this: “Transformational grammar has the ambition of subsuming all
essential aspects of a language system” (Hundsnurscher et al. 1970: 1)
(emphasis added). Psycholinguistics certainly lavished much time on attempting
to prove the psychological reality of the theory (surveys in Fodor, Bever, &
Garrett 1974; Clark & Clark 1977). Experimenters were plagued with the
bizarre task of eliciting autonomous syntactic behavior by methods like these:
(1) reciting sentences in a flat monotone to preclude the use of intonational
cues; (2) presenting isolated sentences in print displays and asking subjects
later if a certain sentence had been seen before; (3) keeping the content of the
sentences trivial, literal-minded, inane, and irrelevant to people’s interests
and situations. It was stubbornly assumed that behavior elicited under these
unrealistic conditions could somehow reveal the normal procedures of real
language use.
1.8
There might be occasions when people actually perform transformations on
sentence structures. The celebrated examples such as ‘Flying planes can be
dangerous’ or ‘The shooting of the hunters was tragic’ could elicit
syntactic reformulation that eliminates the ambiguities. More likely, however,
people would be careful to signal by intonation or other cues in context what
they intended to convey. The deliberate creating or noticing ambiguities is thus
a signal of non-cooperation or of an attempt to be humorous. Many overused jokes
rely on this principle:
(1)
On a hot day, a fat man in a crowd takes off his hat and pants.
The
utilization of such texts is unproblematic because knowledge of the world
mediates strongly against one reading. Jerry Hobbs points out the episode from a
Burns and Alien show where Gracie is told by a fire inspector:
(2)
There’s a pile of inflammable trash next to your car. You’ll have to get rid
of it.
and
undertakes to dispose of the car. In the motion picture The Wizard of Oz, an
arrogant neighbor lady comes to complain about the family’s dog, and the
following dialogue ensues:5 [5. I number the sample texts
continuously throughout the book. Decimal places are used to single out parts of
samples; letters are used for alternative versions of the same sample.]
(3.1)
UNCLE HENRY: You say Dorothy bit you?
(3.2)
NEIGHBOR LADY: No, her dog.
(3.3)
UNCLE HENRY: Oh! She bit her dog!
Uncle
Henry’s utilization of a wrong, though structurally allowable reading, signals
his intention to be uncooperative in the situation. It is clearly not the task
of grammar to provide rules that preclude such occurrences. Potential
ambiguities alone are a less crucial matter than the strategies people use to
resolve ambiguities or even to rule them out in advance.
1.9
Laboratory experiments can be designed to bring out transformational behavior,
if the researcher so desires. If a situation is constructed in which sentences
such as:
(4a)
John bought the book.
4b)
The book was bought by John.
are
substituted for each other, test subjects will perhaps do some mental
transforming analogous to the rules that would derive (4b) from (4a). It has not
been demonstrated that people do any such thing routinely if they want to say or
understand (4b) in communication. On the contrary, transformations merely
substitute one structural pattern for another of the same systemic type with no
great gain in processing (cf. I.6.2; II.1.6).
1.10
The difficulties with standard transformational grammar as a language theory can
be summed up as follows:
1.10.1
Transformations bring no processing advantages except in specially constructed
situations.
1.10.2
The theory does not explain why people would want to use complex sentences at
all, when they would apparently save effort by uttering the “deep structure”
to begin with..
1.10.3
Lacking interaction with other language levels, transformational syntax would
bring a processing overkill of alternative structural descriptions that could
scarcely be computed and disambiguated in reasonable times.
1.10.4
Transformations do not really explain a complex sentence; instead, they get
rid of it in favor of simpler structures whose explanation is a foregone
conclusion, since the latter are basic axioms to begin with.
1.11
We are left with a definition of the “standard” model as a partial theory of
paraphrase (cf. S. Klein 1965; Ungeheuer 1969). Chomsky (1965: 162f.) remarks
that his model can account for the relations between sentences of the type:
(5a)
John bought the book from Bill.
(5b)
The book was bought from Bill by John.
but
it cannot deal with the relationships between:
(6a)
John bought the book from Bill.
(6b)
Bill sold the book to John.
If
our hapless fire inspector were “competent” in Chomsky’s sense, he could
have averted Gracie’s misunderstanding by embedding one sentence into another
with the required deletions:
(7)
You’ll have to get rid of the pile of inflammable trash next to your car.
But
he could not have performed a conceptual paraphrase:
(8)
That pile of inflammable trash next to your car must be disposed of without
delay.
And
he certainly could not have chosen a format that leaves Gracie to infer the
appropriate action on her own:
(9)
That pile of inflammable trash next to your car is in violation of city fire
ordinances. Please comply with this warning immediately.
It
seems counter-intuitive that a theory explaining “the speaker-hearer’s
knowledge of his language” (Chomsky 1965: 4) should leave the “competent”
language user so helpless in everyday affairs.
1.12
The immense importance attributed to the SENTENCE BOUNDARY also seems
incongruous. As Horst Isenherg (1971: 155) remarks, it is odd that such nearly
synonymous samples as:6 [6. Whenever required, I provide my
own translations into English.]
(10a)
Peter burned the book. He didn’t like it.
(10b)
Peter burned the book because it didn’t like it.
must
be described by entirely different notions in a grammar, simply because one of
them happens to be two sentences. The easily discoverable conceptual
connectivity between the two configurations of content-the action of
‘burning’ has the “reason-of” a negative “volition” (on these terms,
cf. III.4.7) obtains whether an explicit junctive such as ‘because’ is used
or not (cf. V.7.6ff.). To study that aspect, we do not need a structural
description of the sentence or sentences, but a model of how people decide how
much conceptual content to load onto a given sentence format (cf. Quillian 1966;
Simmons & Slocum 1971; VII.2.17ff.).7 [7. The proposal to
treat texts as super-long sentences (cf. Katz & Fodor 1963) clouds the issue
altogether. The interesting question is why texts are usually not
super-long sentences.] The main decision criterion, I suspect, is the degree of
knownness or expectedness in context. The ‘not liking’ is predictable in
view of the ‘burning’, so that the combining into a single sentence is
plausible (cf. van Dijk 1977a: 86). The expression of known or expected content
favors longer, more complex sentences than that of new or unexpected (cf. Grimes
1975: 274; VII.2.19ff.; IX.4.6f.).
1.13
The observation that syntax is OVERRIDDEN must also be considered. Strohner and
Nelson (1974) found that children treated the following sentences as expressions
of the same content:
(11
a) The cat chased the mouse.
(11
b) The cat was chased by the mouse.
The
children were more inclined to rely on world knowledge than on grammar and
syntax (cf. also Turner & Rommetveit 1968).8
[8. Dressler (personal communication) observes here that children tend to
treat the initial noun phrase of utterances as the agent in all cases.] David
Olson (1974) found that children perform better on the identification of active
vs. passive sentences if the agents of the action are given a clear identity.
Carol Chomsky (1969) noticed that children acted out both the statements:
(12a)
Donald tells Bozo to hop across the table.
(12b)
Donald promises Bozo to hop across the table.
by
making the Bozo toy hop. Evidently, the close proximity of the expressions for
agent and action overruled the grammatical structure, which could not happen if
a syntactic “deep structure” were the primary goal of understanding.
1.14
I do not deny that surface structure is often misleading with regard to
underlying dependencies. I merely want to justify my preference for a syntactic
model unlike the standard transformational one. The syntactic component in a
theory of processing has two major functions: (1) the LINEARIZATION of elements
in production, or their DELINEARIZATION in comprehension; and (2) building the
GRAMMATICAL DEPENDENCIES among the surface elements as they are presented in
real time. The component is therefore addressed to connectivity rather than to
segmentation; and it is formulated such that syntax, meaning, and actions can be
given an analogous representation.
2.
SEQUENCING OPERATIONS
2.1
By “sequencing” I wish to designate all activities and procedures whose role
is to arrange language elements into a working order, such that speaking,
writing, hearing, or reading can be accomplished in a temporal progression. From
a very detailed standpoint, we see combined sequences of tiny units of sound or
form corresponding to those which have been systemized as phonemes or morphemes,
respectively. Obviously, the main activity of adult language use is not that of
gluing these tiny units together. The acquisition and use of words and phrases
automatically entails the production and identification of their constituent
parts. However, speech errors show that word parts do become displaced upon
occasion, as in the famous “Spoonerisms” (cf. Clark & Clark 1977; 274).
Many of these errors suggest a conceptual ambivalence when the displacement
creates a strikingly contrasting statement to the intended one. Imagine the grim
satisfaction some churchgoer with a troublesome life might derive when hearing
the Reverend Spooner produce this utterance:
(13)
The Lord is a shoving leopard to all his flock.
2.2
For a linguistics of actualization, the organization of phonemes and morphemes
in a useable format is no trivial issue. Terry Winograd (1972) demonstrates how
morpheme systems of English can be managed as a PROGRAM: a procedural statement
of actions to be performed when ENTRY CONDITIONS activate the operations on the
data (cf. also Berry 1977). To utilize inflected forms, the program matches the
input pattern to an ordered set of hypotheses (cf. Woods 1978b: 30ff.). If the
match is satisfactory, a “yes” is returned, and the program advances to the
identification of later elements. If a “no’ is returned, the program tries
out the next hypothesis on the same element (see for example the control diagram
for English endings in Winograd 1972: 74).
2.3
These considerations apply to the syntagmatic aspect of language, but they have
important implications for the paradigmatic aspect as well (cf. I.2.2).
Paradigms such as noun declensions or verb conjugations cannot be simple
listings of forms: there must be some provision for efficient utilization and
application. The better organized those provisions become, the less need there
is for rote storage of exhaustive listings. Grammatical rules should be able to
generate the highest feasible number of inflected forms for the largest range of
lexical items. Here, the rule can be called a program, or a sub-program in a
main program. The rule set for a given domain, such as verb inflections, should
itself be internally ordered in such a way that the most probable, simple, and
generally applicable rules are routinely tried first (cf. the notion of “core
grammar” in Haber 1975). I have designated a program of this kind with fifteen
systemic rules for the complex domain of German stem-changing verbs (Beaugrande
1979c). I undertook to show that some 80% of the extant verbs are rule-governed,
and that the rest are mostly explainable via rule conflation. I believe that a
computational approach to phonology and morphology deserves further attention.
2.4
When psycholinguistics began to emerge as a discipline, its central task was
first construed to be investigating the mental reality of linguistic theories
(surveys in Hörmann 1974, 1976). The natural consequence was that the analysis
linguists perform on sentences was taken as a model of what language users do in
understanding discourse. Emphasis accordingly fell on the extraction of
structural descriptions for the various levels of language. The popular
‘syntactic approach” to language understanding has been summed up by Clark
& Clark (1977: 58):
Listeners
have at their command a battery of mental strategies by which they segment
sentences into constituents, classify the constituents, and construct semantic
representations from them. [...] As listeners identify constituents, they must
not only locate them, but also implicitly classify them — as noun phrases,
verb phrases, determiners, and the like. They must do this before they
can build underlying propositions. [emphasis added]
For
English, this approach is embodied, according to Clark and Clark (1977: 59-68),
in STRATEGIES like these:
Strategy
1. Whenever you find a
function word, begin a new constituent larger than one word.
Strategy
2. After identifying the
beginning of a constituent, look for content words appropriate to that type of
constituent.
Strategy
3. Use inflections to help
decide whether a content word is a noun, verb, adjective, or adverb.
Strategy
4. After encountering a verb,
look for the number and kind of arguments appropriate to that verb.
Strategy
5. Try to attach each new
word to the constituent that came just before.9 [9. I cannot
quite grasp the point of this strategy as stated by the Clarks. Surely the
direction in which one looks for a constituent varies constantly.]
Strategy
6. Use the first word (or
major constituent) of a clause to identity the function of that clause in the
current sentence.
Strategy
7. Assume the first clause to be a main clause unless it is marked at or prior
to the main verb as something other than a main clause.
2.5
Despite the avowed importance of SEGMENTATION in the first quote, all of these
strategies except numbers 3 and 7 are instead oriented toward CONNECTION. As
such, they would be unobjectionable except for the stipulation that they must be
run before meaning (“underlying propositions’) can be recovered. That
requirement entails the following practical difficulties:
2.5.1
As the computer simulations described in II.1.2 suggest, there would be a
monstrous over-computation of structural alternatives. In actual practice, even
linguists who assert the autonomy of syntax are implicitly consulting meaning in
order to decide what structures are present.
2.5.2
The function words (i.e. determiners, prepositions, and conjunctions) and
inflections that people are asserted to utilize so decisively in Strategies 1
and 3 are often so slurred in actual speaking that they could scarcely be
identified out of context (cf. Pollack & Pickett 1964; Woods & Makhoul
1973). For example, Dressier, Leodolter, and Chromec (1976) collected samples of
the speech of Viennese students in which these elements are reduced to the
merest outlines.
2.5.3
As Clark and Clark comment (1977: 72), “actual speech is so full of incomplete
words, repeats, stutters, and outright errors” that the strategies “should
often be stymied from the very start.” Striking demonstrations that structural
inconsistencies do not impede communication are discussed by Schegloff,
Jefferson, and Sacks (1977) on the basis of video-taped conversations of
California residents — evidence at least as hard as laboratory experiments.
2.5.4
The strategies seem to presuppose complete, grammatical sentences as the
substance of every text. The high number of actually occurring incomplete
sentences in everyday communication should make understanding hover constantly
on the verge of a breakdown. The fact that sentence boundaries are also hard to
identify in heard speech (Broen 197 1) is another obstacle.
2.5.5
Like much work on syntax, the strategies are Anglocentric, for example, in
regard to the role of “function words.” In inflected languages with highly
variable word order (e.g. Czech), the notion of autonomous syntax would have
appeared far more counter-intuitive from the very beginning than English did.
2.5.6
Due to emphasis upon discovery and analysis, these strategies do not seem to be
applicable to speech production. Studies of speech production are in fact very
rare (cf. VII.2.1). Fodor, Bever and Garrett (1974: 434) remark that
“practically anything one can say about speech production must be considered
speculative, even by the standards current in psycholinguisties”; perhaps they
should have said “because of” rather than “even by.”
2.5.7
The heavy utilization of syntax does not accord with the findings on storage and
recall of language. Harry Kay (1955) used whole text passages in tests and found
that semantic recall ran about 70% and syntactic recall only about 30%. It
appears that syntactic formatting is not a prominent object of cognitive
resources. 10 [10.
The notion that surface syntax is stored in ‘short-term” memory and meaning
in ‘long- term” memory (cf. discussion in Loftus & Loftus 1976) is too
simple (Kintsch, personal communication). There is probably only a gradation in
the storage times and quantities along these lines.]
2.6
In recent discussions, support has accrued for the outlook of RELATIONAL GRAMMAR
(cf. Cole & Sadock [eds.] 1977; Johnson & Postal 1980). Perlmutter and
Postal (1978: 1) stress the theoretical opposition of DERIVATIONAL versus
RELATIONAL conceptions of grammar. The derivational approach deals with
structures in terms of constituency and linear precedence, but it places little
emphasis on the connectivity of grammatical occurrences in surface structure.
Yet because text perception must evolve in real time, people could not afford to
wait for sentence completion and build a derivational tree; instead, they want
to start connecting perceived elements together as soon as possible. This tactic
could be represented by a syntax that constructs links between pairs of related
elements (cf. the “arc pair grammar” of Johnson & Postal 1980). This
outlook frees us from reliance on complete sentences: nothing more than a
GRAMMATICAL DEPENDENCY between two elements is needed for operation. Indistinct
or missing elements would cause at most local discontinuities that could be
overcome by the general PROBLEM- SOLVING techniques outlined in I.6.7f.
(14)
A black and yellow rocket stood in a desert.
Anyone
hearing or reading the sentence notices at once that only some of the elements
which are directly adjacent in the surface structure are also grammatically
dependent on each other. In ‘yellow rocket’, the modifier is adjacent to its
head, but the other modifiers ‘great’ and ‘black’ are at some distance.
The determiner ‘a’ is also remote. These obvious facts have an important
consequence for processing: the linear sequence is a poor basis for the
production and comprehension of texts. The crucial structure is instead one in
which the dependencies are signaled with explicit links. Figure 1 shows how
direct linkage could be imposed on the sample.
2.8
The proportions of Figure 1 are somewhat misleading. The modifiers placed at a
greater distance are not inherently more remote from their head than the
adjacent ‘yellow.’ If we shorten the links to uniform length, the
grammatical dependencies yield a NETWORK, as shown in Figure 2 (cf. Perlmutter
and Postal 1978).
We
can designate this configuration as an ACTUALIZED SYSTEMIC NETWORK of GRAMMAR
STATES. The processor traverses the LINKS to access the NODES, making the data
at the nodes ACTIVE and CURRENT. The action of traversing the link corresponds
to PROBLEM-SOLVING: testing a hypothesis about the dependency between the nodes
(a simple kind of means-end analysis in the sense of 1.6.7.1). The word-class of
a current state should be treated as an INSTRUCTION about the PREFERENTIAL or
PROBABLE links that should be tested next (cf. I.3.4.6; Winston 1977: 343).
2.9 The structure in Figure 2 differs from the surface structure only in regard to its DELINEARIZATION. It would thus not qualify as a “deep structure’ in the standard sentence models, not being a basic format incapable of further reduction. We might term it a “shallow structure” operationally sufficient to represent the connectivity of grammatical occurrences during actualization. Figure 3 suggests the idealised sequence of operations when the systemic processor advances

from
state to state. As soon as the first MICRO-
2.10
To understand the procedural ordering of operations, we can view processing in
terms of STACKING. Each element is picked up and placed on the top of a HOLD
STACK (see Rumelhart 1977a: 131): the active list of working elements to be
integrated into a connected structure. If we have a PUSHDOWN STACK, each entry
goes on top and pushes the others down a notch. Thus, the
determiner
and the modifiers in our sample would be entered in the order they occur, but
removed in the reverse order. Figure 4 illustrates the stacking of the sample
noun phrase. When the head turns up at thetop of the stack, the stack is cleared
by building a NETWORK of the grammatical dependencies of the macro-state noun
phrase. Again, the numbers on the arrows suggest the sequence of building
operations as derived from the arrangement of the stack.
2.11
The foregoing demonstration should suggest how the procedural approach to syntax
might function. The processor needs an ordered list of preference hypotheses to
match against current input, so that the operational sequence is efficiently
controlled. Nothing more than a grammatical dependency, e.g. a noun phrase, is
required as input; incomplete sentences present no such difficulties as they
would for a tree-derivational approach in a phrase-structure grammar. The time
sequence I have shown in Figures 3 and 4 is probably too strict. I surmise that
there might be more than one control center active at a time-in the sample, both
noun-phrase head ‘rocket’ and verb-phrase head ‘stood.’ And there might
be some variations in the order in which the dependent states (e.g. modifiers)
are attached. Such matters as the ordering of operations in real time and of
hypotheses on a preference list will have to be explored by empirical study. The
procedural approach promises to capture the EXPECTATIONS language users would
have about what occurrences are PROBABLE at a given time (Rumelhart 1977a: 122).
The most important factor is that the rules of the grammar are simultaneously
procedures for utilizing the grammar in real time — a stipulation I cited as
crucial for text linguistics in I.3.5.9 (see also Rumelhart 1977a: 122). At the
moment of processing, the relations are ACTUAL, not VIRTUAL, and there is no gap
between competence and performance to be overcome. The very notion of “word
class” is removed from the domain of abstract taxonomies and made operational
for utilizing elements in real input (cf. Rumelhart & Norman 1975a: 64).11
[11. This factor is especially
decisive (cf. II.2.16), as it makes possible a flexible processing of word class
shifts (e.g. Shakespeare’s phrase ‘Her art sisters the natural
roses’, Pericles, V, chorus, 7).]
2.12
The formalism I have shown is the AUGMENTED TRANSITION NETWORK, a technique of
data formatting developed as an alternative to transformational grammar for
computer processing of English (Thorne, Braticy, & Dewar 1968; D. Bobrow
& Fraser 1969; Woods 1970; Simmons & Slocum 1971). The network is built
up in real time by making “transitions” from one node to the next; this
operation requires specifying or discovering the relation between the current
node and its successor. The transitions can be “augmented’ with any search
or recognition procedures considered relevant at the time (Winston 1977: 344).
Instead of using a highly detailed set of node types, we could rely on a very
general set (with members like “determiner,” “modifier,” etc.) and
attain any desired degree of specificity (e.g., ‘definite article,”
‘indefinite article,” “adjective,” “participle,” etc.) by augmenting
link labels (Winston 1977: 172). Such a design might be the most human-like as
well as the most economical: processing routinely picks up only essentials, but
can become more thorough if there is any need (cf. III.4.15). Note that the
augmented transition network does not have to be built in a single direction
(though our demonstration is kept purely linear for the sake of simplicity).
There could easily be a CONTROL CENTER such as a “head” with transitions
being tested to several dependent elements (“determiners,”, “modifiers’)
at once. In III.14ff., I suggest that grammatical networks should be set up in
parallel with conceptual ones.
2.13
The formal potential of augmented transition networks is undeniably attractive.
They are able to duplicate the behavior of virtually all kinds of grammars (cf.
Woods, 1970; Winograd 1972; Kintsch 1974: 70; Hendrix 1978): context-free
grammars, phrase-structure grammars, transformational grammars, and Turing
machines. Still greater advantages can be obtained by such generalizations as
those proposed by William Woods (1978c). He has built a parsing system from a
“cascade” of augmented transition networks such that computations common to
various language levels can be merged in operation (cf. III.4.14). He also lifts
the restriction that input must be a sequence of symbols, so that he can also
analyze apperceptual “fields,” e.g. scenes, acoustic substance of speech,
medical diagnosis, and data-base monitoring, from various perspectives. He
concludes (Woods 1978b: 24):
Generalized
transition networks thus lift the notion of ‘grammar” away from the limited
conception of a set of rules characterizing well-formed sequences of words in
sentences. Rather, they are capable of characterizing arbitrary classes of
structured entities.
2.14 I find the psychological plausibility of network grammars
appealing as well. I suggested in 1.4.3f. that if a virtual system is to be
actualized, the stability of the system depends crucially on a regulative
continuity of occurrences. That continuity emerges clearly in the network
format. Psychological testing offers further support. When subjects were
interrupted during the perception of sequences and asked to predict the next
syntactic occurrence, they were in 75% agreement with each other (Stevens &
Rumelhart 1975). Moreover, 80% of the reading errors recorded in the same tests
were in accordance with the most probable paths as determined from the
prediction experiment. Ronald Kaplan (1974) has shown that the notion of the
hold stack accounts for the comparative difficulties in processing relative
clauses just as well as does transformational grammar.
2.15
A repertory of grammatical states and dependencies can be defined according to
the requirements established by investigations such as these. A HEAD would be a
grammatical state of noun or verb capable of either appearing alone as a phrase,
or acting as the control center of a phrase. Because nouns and verbs might be
created on the spot from other word classes (e.g. in usage like’ “The yellow
rocketed skyward’), we might want to use terms like “noun-entity” and
“verb-entity” for whatever elements are used as nouns or verbs in current
input (as opposed to the virtual lexicon of the language). The MODIFIERS will be
the adjectives, adverbs, and prepositional phrases that depend on the heads. The
DETERMINERS would be articles, deicties, and numericals. Thus we have modifiers
as QUALITATIVE signals about the head, and determiners as QUANTITATIVE signals
(number, definiteness, etc.— see V.3). The VERBS differ from the nouns in
regard to their complements: SUBJECT, DIRECT OBJECT, INDIRECT OBJECT, AUXILIARY,
and DUMMY can all appear as well as MODIFIER. In compounded expressions, where
two or more elements of the same class appear as a unit, we would have
COMPONENTS. In sequences of phrases and clauses, we could have JUNCTION. The
following list of link types seems to be useable for labeling the state
transitions in actualized networks of grammatical dependencies. The
abbreviations in square brackets will be used in diagramming to save space. In
each link type, the control center is named first (“verb” or “head’). In
the diagrams, however, I reverse the abbreviated labels where needed so the node
label is next to the arrow pointing to the appropriate state.
2.15.1
VERB-TO-SUBJECT [v-s] is the minimal requirement for a clause or sentence,
though not for a discourse action (cf. sample (26) in 11.2.36).
2.15.2
VERB-TO-DIRECT OBJECT [v-o] obtains between a transitive verb (or verb-entity)
and a noun (or noun-entity) capable of being affected directly by the event or
action expressed via the verb.
2.15.3
VERB-TO-INDIRECT OBJECT [v-i] obtains between a verb and a noun capable of
receiving indirect effects of the event or action, e.g. the entity to or for
which some action is done or some object is given.
2.15.4
VERB-TO-MODIFIER [v-m] applies when a non-transitive verb (e.g. ‘be’) links
a subject to an expression of a state, attribute, time, location, etc.
2.15.5
VERB-TO-AUXILIARY [v-a] is the link between a member of the open set of verbs
(open because of potential word-class shifts in actualization) and a member of
the closed set of verbal auxiliaries used to signal tense (e.g. ‘have,’
‘had,’ ‘will’) or modality (e.g. ‘must’, ‘might’, ‘should).
2.15.6
VERB-TO-DUMMY [v-d] is the link between a verb and a place-holder that merely
fills a structural slot (e.g. ‘it’ in ‘it’s a good thing’, or
‘there’ in ‘there’s a unicorn in my garden’).
2.15.7 HEAD-TO-MODIFIER [h-m] covers the dependency between
one element and an expression which modifies it: adjective-to-noun entity, and
adverbial-to-verb entity. This dependency is distinct from
‘verb-to-modifier” in not having the intermediary linking verb present.
2.15.8
MODIFIER-TO-MODIFIER [m-m] obtains when modifiers depend on each other (e.g.
adverbial-to-adjective).
2.15.9
HEAD-TO-DETERMINER [h-d] is the link between an article, deictic, or numerical,
and its head.
2.15.10
COMPONENT-TO-COMPONENT [c-c] covers the dependencies between elements of the
same class, e.g. two nouns (‘computer science’) or two verbs (‘trick or
treat’).
2.15.11
JUNCTION subsumes the dependencies of (1)CONJUNCTION [cj] between at least two
elements whose relationship in regard to their environment is additively the
same or similar (tagged with ‘and’, ‘also’, ‘too’, ‘moreover’,
‘in addition’, etc.); (2) DISJUNCTION [dj] between at least two elements
whose relationship in regard to their environment is alternatively the same or
similar (tagged with ‘or’, ‘or else’, ‘either-or’); (3) CONTRA-
JUNCTION [oj] between elements
whose relationship in regard to their environment is antagonistically the same
or similar (tagged with ‘but’, ‘however’, ‘yet’, ‘nonetheless’,
etc.); and (4) SUBORDINATION [sb] between elements where one is hierarchically
dependent on the other and cannot constitute a sentence by itself (tagged with
‘if, ‘because’, ‘since’, ‘that’, ‘which’, etc.). Because these
dependencies entail coherence and informativity as well as sequencing, I reserve
their treatment until section V.7. I note here that conjunction, disjunction,
and contrajunction more often link configurations of comparable surface
structure than does subordination (cf. V.7.1.4).
2.16
Although my list is not intended to be definitive, it does offer the means for
identifying the transitions within grammatical networks. One might argue for a
more detailed list, depending on the thoroughness of syntactic processing one
wishes to postulate (e.g. subdividing “modifiers” into “adjectives,
adverbs, prepositional phrases,” and the like, cf. 11.2.17). Pending detailed
empirical tests, we cannot decide on any one degree of thoroughness. I surmise
that people may not be too thorough under ordinary conditions (cf. III.4.15). In
any case, the list should serve to label the current use of elements rather than
their status in the lexicon. The use of hypotheses would reduce the enormous
searching and combining that would be required if each element were looked up in
the lexicon as it came along; the importance of this factor was pointed out by
the failure of machine translating years ago.
2.17 The rest of our sample sentence (14) is processed in the same manner as the opening noun phrase, as suggested in Fig. 5.

Having
parsed the noun-phrase macro-state, the processor postulates the macro-state
VERB PHRASE. Note that for a language other than English, the procedures might
well be different. In French, for instance, the modifiers often come after the
head, so that the end of the noun-phrase would be harder to predict exactly.
Even in English we could easily have something else here besides a verb-phrase
(e.g. a modifying prepositional phrase). The processor of course needs a backlog
of alternative hypotheses to try out. In our sample, however, the occurrence of
‘stood’ allows immediate entry into the verb-phrase macro-state. The
identification of the successor state would require AUGMENTING the transition
(in the sense of II.2.12) by a specialized modifier search to distinguish
between the sub-classes “adverb” and “prepositional phrase.” Presumably,
the successor state would preferentially be an “adverb” modifier, with the
actually occurring ‘prepositional phrase’ tried after that. The
prepositional phrase would be a macro-state inside the verb-phrase macro-state,
and would have as its top priority the discovery of the head. I show the
transition network for this part of the sentence in Figure 5, again using dotted
lines to indicate failed hypotheses.
2.18 The entire sequence yields a fully labeled grammar network as shown in Figure 6.

Because
the “function words” ‘and’ and ‘in’ are purely relational signals, 1
show them as TAGS on links rather than as independent grammar states. Although
useful, these signals need not be distinctly apperceivable. In tests with
students at the University of Florida, the sentence was uttered with the
elements ‘and’ and ‘in’ both replaced by a nasalized version of the
schwa sound [ə]. The students had no difficulty filling in the
indistinguishable words. In terms of problem-solving,
they were able to connect the available points together with probable pathways
via means-end analysis (cf. I.6.7.1; 11.2.6).
2.19
I return to the ‘rocket’ sample in detail in Chapter Ill, where I am more
concerned with conceptual than sequential processing. The important aspect here
is to notice how the grammatical sequence interacts as bottom-up input with the
top-down predictions of a language processor. Efficiency results from
preferential ordering of the hypotheses to be tested first. The attention to
probabilities enables the orientation of hypotheses toward the most probable
occurrences at a given stage of operation. In effect, the procedures of the
language user are adapted to fit the exact structure of the real objects being
encountered: a technique I shall call PROCEDURAL ATTACHMENT (cf. D. Bobrow &
Winograd 1977). If the objects are highly non-expected and idiosyncratic after
all, language users will presumably not spend time running through a lot of
syntactic predictions; at the first sign of difficulties, attention will be
focused on other cues besides syntax. For example, our test subjects who could
not hear ‘in’ could easily infer the relation “location-of” between
‘rocket stood’ and ‘a desert’ by consulting world-knowledge.
2.20
A syntactic model for a theory of actual texts might reasonably be asked to deal
with issues like these:
2.20.1
recognizing major structures as familiar patterns;
2.20.2
distinguishing between main and subsidiary classes of elements, such as between
“function words” and “content words”;
2.20.3
conjunction, disjunction, and contrajunction;
2.20.4
subordination;
2.20.5
recursion and embedding;
2.20.6
dispensable elements;
2.20.7
discontinuous elements;
2.20.8
ambiguous structures;
2.20.9
incomplete, elliptical, or damaged structures;
2.20.10
mapping between surface expression and deeper levels in processing;
2.20.11decision-making
and selection procedures;
2.20.12
applicability to both the production and the reception of texts.
2.21
The first eight issues listed above have been extensively explored in
sentence-based linguistics. But headway on the last four has been much more
modest, due to a narrow interpretation of ‘competence.” Incomplete or
damaged structures would have been an eminent “performance” issue. The
mapping from surface to depth never progressed beyond algorithms in which
abstract structures were substituted for each other (whereas the decision-making
and selection procedures of language users extend far beyond considerations
internal to the sentence: contexts, motivations, goals, and situations). I
conclude this chapter with an outline of how a network system handles some of
these issues; others will be treated later on.
2.22
The recognition of major structures is a task for PATTERN-MATCHING (cf. 1.6.6;
Winograd 1972; Rumelhart 1977a). The BASIC CLAUSES and PHRASES (see Perlmutter
& Postal 1978: 1ff.) are treated as macro-state patterns for building or
recognizing actual structures in the utilization of a text. These patterns
become active when their INITIAL STATES are actualized, such as the determiner
beginning a noun phrase in our sample. When a FINAL STATE appears, a phrase or
clause boundary is predicted. If pattern-based predictions are overturned, the
use of other cues, especially conceptual relations, helps keep processing under
reasonable control.
2.23
The distinction between main and subsidiary categories of elements is required
for the organization of the grammar network. My convention is to place main
categories in network nodes — the “content words” being nouns, verbs,
adjectives, and adverbials — and the subsidiary ones, such as the “function
words” of prepositions and conjunctions, as tags on links (on “content”
vs. “function” words, cf. Bolinger 1975). The further function-word classes
of articles and pronouns appear as nodes only in the grammatical networks, while
their functions are taken over by positioning, linkage, and superpositioning
operations in text-world models. Numericals (except for articles used as
numerical signals) appear as nodes throughout. The psychological distinction
between these main and subsidiary categories should correspond to the
comparative indistinctness of the latter in speech as pointed up by my test with
slurred sounds. Clark and Clark (1977: 275) suggest that in speaking, the
content words are selected first and the function words are subsequently filled
in. The order might be the same during comprehension. This is in agreement with
the treatment of content words as control centers for problem-solving as
outlined in this chapter. 12 [12. Dressler (personal
communication) remarks that aphasiaes with a telegram-like speech often retain
content words and omit the function words.]
2.24
Junction, including conjunction, disjunction, contrajunction, and subordination
can occur between components of very different sizes. The DEFAULT junction would
be conjunction, since the relationship among elements in a text is usually
additive. In a sample such as Kipling’s famous phrase:
(15)
The great, gray-green, greasy Limpopo River12a [12a. Actually,
the part of it I saw in South Africa is neither great — sometimes just a
trickle — nor greasy. But it runs a long way from there and empties into the
Indian ocean Mozambique, during which much sewage is no doubt poured into it.]
the
modifiers are taken as added to each other even though no ‘and’ is present;
mere juxtaposition is sufficient. If the junction were disjunction i.e., the
river had only one of these attributes, an explicit signal like ‘ or’ would
have to be. Contrajunction, such as with ‘but’, ‘however’, etc., is also
likely to be signaled on the surface, though not obligatorily. I adopt the
convention of suppressing the signals of conjunction in diagramming text-worlds,
but preserving the signals of disjunction, contrajunction, and subordination as
link tags. I reserve the further treatment of these relations for V.7.
2.25
Junction of subjects or predicates can be represented by multiple sharing of
links among nodes. For example, another fragment from the ‘rocket’ text runs
like this:
(16)
Scientists and generals withdrew to some distance and crouched behind earth
mounds.
Figure 7 shows how this fragment appears as a grammatical dependency network. The operation of RECURSION upon encountering ‘and’ was already illustrated in Figures 3 and 4. The processor simply assumes that the next micro-state or macro-state is of the same class as the current one. In diagramming, I adopt the convention of placing earlier occurrences above later ones as far as spatial organization permits.

2.26
Subordination of clauses can be treated largely as subordination of the verbal
elements; for example, another ‘rocket’ fragment like:
(17)
Radar tracked it as it sped upward.
has
the signal ‘as’ to indicate a temporal proximity between the events
expressed by the verbs. In accordance with II.2.23, I show a link between the
verb nodes with the junctive element as a tag, giving us Figure 8. Subordination
is discussed further in section V.7.6ff.
2.27
Recursion is an essential property of context-free grammars (Kasher 1973:63),
and is the mainstay of the infinite generative power of sentence systems .
Actualization always imposes a THRESHOLD OF TERMINATION upon recursion, e.g.
upon the length of strings of modifiers or upon embeddings inside embeddings.
These constraints arise from processing resources like span of active memory and
scope of attention. The popularity of multiple embeddings as test objects in
psychological experiments (e.g. Miller & Isard 1963; Blumenthal 1966; Fodor
& Garrett 1967; Stolz 1967; Freedle & Craun 1970; Hakes & Foss 1970)
suggests a confusion of virtual and actual systems. Whatever people may do with
sentences like:
(18)
The pen the author the editor liked used was new.
cannot
tell us very much about normal processing strategies, because such sentences are
drastically improbable occurrences, and there is no need for routines to handle
them (a model for automatic processing of multiple embeddings is offered in J.
Anderson 1976: 470ff.). When Osgood (1971) designed an experimental situation in
hopes of eliciting self-embedded sentences, he reported: “despite my speakers all
being involved in psycholinguistics, and reasonably familiar with
transformational linguistics [a significant choice of text subjects!], only
a single subject produced center embeddings, and this happened to be my own
research assistant” (Osgood & Bock 1977: 517, emphasis added). The
obliging assistant eloquently demonstrated how strong PRAGMATIC motivations can
be in the selection of syntactic options.
2.28
The reliance upon contrasting grammatical sentences with ungrammatical ones (the
latter marked with *) in linguistic discussions points up a potential
difficulty. While a grammar must take special account of the central aspects of
a language (the “core” in Haber 1975), these discussions work with
peripheral occurrences. There is no good reason to suppose that the latter must
necessarily reveal the nature of the former. The discrepancy emerges strongly
when intricate elicitation techniques are designed to obtain empirical samples
of rare syntactic constructions required by abstract grammar. A more realistic
grammar would not need to defend its validity with such contortions.
2.29
Augmented transition networks of the kind I have described are easily able to
deal with recursion. The processor simply notices the corresponding signals and
repeats the structural operations it has just performed. To be humanly
plausible, the probabilities assigned to each recursion in a series should
steadily sink, so that language users would be increasingly surprised.13
[13. A psychological correlate of this factor might be “gambler’s
fallacy”; cf. note 2 to Chapter IV.] A theory of text utilization should
foresee operational difficulties for cases where humans clearly have trouble.
Transformational grammar was in this regard decidedly too powerful to be
realistic.
2.30
Dispensable elements are much less difficult for a systemic grammar of actual
occurrences than for an abstract derivational grammar. The latter is obliged to
rearrange whole tree structures just to get an element in or out of a sequence.
In actualization, the element’s appearance is a matter of stronger or weaker
expectations being fulfilled or not fulfilled, and whatever is apperceived as a
gap can be filled in via problem-solving (cf. 11.2.8). In pairs like:
(19a)
The pilot saw that the rocket descended.
(19b)
The pilot saw the rocket descended.
(20a)
A rocket stood in a desert in New Mexico.
(20b)
A rocket stood in a New Mexico desert.
the
dispensable elements ‘that’ and ‘in’ are relational link tags whose
conceptual labeling can be done without the tags. Increased processing effort
may be needed to handle the structures where the elements are absent (cf. Fodor
& Garrett 1967; Hakes & Foss 1970; Hakes 1972), but context could easily
influence the effort (Clark & Clark 1977: 64f.). Rudolf Flesch (1972) even
suggests that these elements should be deleted to make prose more readable by
his (admittedly disputable) standards (cf. IX.3.2ff.).
2.31
Discontinuous elements, according to our model, would be difficult to manage if
they were placed at some distance from each other. The span of active storage
(or the hold stack demonstrated in Figure 4) would become very crowded before
the final part of the element appeared. This gradation of difficulty seems
appropriate, as (21a) is probably easier for English language users than (21b):
(21a)
They took the rocket down.
(21
b) They took the rocket at the launching site that was constructed out in the
bleak New Mexico desert near White Sands down.
Probably,
an understander would not immediately know where to attach the ‘down’ of
(21b), but could do so by a backward search that favored the verb node over
other possible points: another illustration of problem-solving. Some languages,
especially German, have a strong potential for positioning the particles of
verbal elements at the very end of a clause. This usage does not make German
harder to use, however. The native speaker merely stores the corresponding
probabilities and expectations so that these final particles are immediately
connected to the appropriate prior occurrence. The concern for discontinuous
elements is intense only for models of “immediate constituent analysis,”
which proceed by cutting surface segments into halves, quarters, eighths, and so
on (hence, elements are hard to treat if they are scattered throughout a
sentence).
2.32
Ambiguous structures have been widely discussed in linguistics. As Peter
Hartmann (personal communication) remarks, the intense structural analysis done
by linguists tends to proliferate ambiguities that people in everyday
communication might well not notice. Transformational grammar used ambiguities
as a favorite means of justifying the notion of “syntactic deep structure”
(cf. II.1.6). For a procedural model, we should inquire whether the ambiguity is
or is not resolved later on in the sequence. For many years, Robert Simmons
(personal communication) has used this example:
(22)
The old man the boats.
Uttered
in a flat monotone, this sentence is extraordinarily hard to comprehend. Either
no meaning at all is recovered (as Simmons believes), or people must back up and
do a completely new processing in which ‘man’ is identified as a verb rather
than a noun (as Rumelhart 1977a: 123 argues for the same example). The dispute
cannot be settled to the extent that everyday contexts would hardly occur
spontaneously in which there could be a genuine and lasting ambiguity (the state
of affairs being of course different for a computer).
2.33
One class of resolvable ambiguities is called “garden path sentences,”
because they lead an understander down one track and then present a block (Clark
& Clark 1977: 80). The understander is believed to notice only one
alternative reading and to pursue that hypothesis until trouble arises. Yet
experiments show that if people are asked to make continuations for clauses
containing structural ambiguities, they show more hesitations and false starts
than for non-ambiguous clauses (MacKay 1966). This finding suggests that more
than one reading is being recovered. I suspect that the experimental set-up
encouraged a non-typical expenditure of processing resources in an attempt to
avoid what might be errors. The test subjects had more motivation to expect and
be wary of traps than would be the case in everyday discourse.
2.34 At present, it is computationally