A constituent of a sentence is a (mono-morphemic) word or phrase.


1   Etymology

From Latin constitutus "arranged, settled," past participle adjective from constituere "to cause to stand, set up, fix, place, establish, set in order; form something new; resolve," from com-, intensive prefix (see com-), + statuere "to set".

The use of the term in linguistics goes back to at least Chomsky 1957.

2   Recognition

A sufficient procedure for recognizing a string as a constituent of a sentence is called a constituency test. Many kinds of constituency tests exist.

No constituency test has perfect validity; they may generate a false negative (i.e. incorrectly reject a valid string).

Note, just because a string is a constituent in one sentence does not imply that it is always a constituent. [3]

We have the intuition that in some sentences words belong together even when they are not adjacent. For instance, see and who in (30a) belong together in much the same way as see and Bill do in (30b). [3]

Who will they see? They will see Bill.

Finally, we can observe that there are various sorts of ways that words can belong together. For instance, in a phrase like the big dog, big belongs with dog, and we have the intuition that big modifies dog. On the other hand, the relation between see and Bill in (30b) isn't one of modification. Rather, we have the intuition that Bill is a participant in a seeing event.

2.1   Substitution (Replacement)

Given a substring, sub, of some grammatical sentence, S, if there exists some constituent, c, such that the string that results from substituting c for sub in S is grammatical and semantically equivalent, then sub is a constituent of S.

c should be chosen to be as similar to sub as possible to minimize interference caused by factors not relevant for establishing constituency.

Mono-morphemic words are a good choice for c since they are trivially constituents. Other constituents must be carefully substituted. Consider:

Alice will see the boy tomorrow.

Alice will see the boy Monday because it is important.

A successful substitution indicates that the substituted string and its replacement share a distributed property; evidence that they belong to the same class. However, it is not proof. Consider the following:

She will see the boy.

* the boy will see she.

Alice and she do not have the same distribution, but they may still have enough to be in the same category. (She and her are in complementary distribution, the same category, and tokens of the same morpheme.)

Substitution is transitive.

The following (pronouns, adverbs) are called pro-forms.

Not every type of constituent has a corresponding pro-form. For instance, although some prepositional phrases can be replaced by the pro-forms here or there, other types of prepositional phrases - for instance, ones referring to purposes or reasons - can't. [4]

2.1.1   Ellipsis substitution (Deletion)

Substitution by the null string, also known as ellipsis. (Ellipsis means omission of understood material.) For example:

Under certain discourse conditions, substitution of some string by a null string is appropriate. Typically, contrasts:

Alice will go to the theater and the boy (will go) to the movies.

Ellipsis is constrained to discourse contexts in which an antecedent sentence is present and we want to keep the same intended meaning; with the cross out parts first necessarily understood, and second understood in the same way, as in the antecedent sentence.

Ellipsis seems only to be able to affect a constituent.

There is another kind of ellipsis called gapping.

We suppose that the ellipsis process only applies to verb phrases.

  1. Damien said they will [meet at the mall] ... and they will (meet in the mall)

  2. Alice liked the boy's [outfit], but not Charlies's outfit.

  3. They will find something, but I don't know what [what] <they will find>.

    sluicing: wh-

2.2   Movement

If it is possible to move a particular string from its ordinary position to another position - typically, the beginning of the sentence - that, too, is evidence that the string is a constituent.

In order to make the result of movement completely acceptable, it's sometimes necessary to use a special intonation or to invoke a special discourse context, especially in the case of noun phrases.

Adjective phrase:

He returned from his travels wise than before. -> Wiser than before, he returned from his travels.

Adverb phrase:

They arrived at the concert hall more quickly than they had expected. -> More quickly than they had expected, they arrived at the concert hall.

2.2.1   Topicalization (Fronting)

Topicalization involves moving the target (called the topic of exchange) to the front of the sentence.

Alice sees [the boy]. -> The boy, Alice sees.

Alice will [see the boy]. -> See the boy, Alice will. [VP-preposing]

Alice sees the boy [at night]. -> At night, Alice sees the boy.

Topicalization is insufficient in some cases:

This [girl] sees the boy -> Girl, this sees the boy.

Alice [will see the boy] -> Will see the boy, Alice.

2.2.2   Clefting

Given some grammatical string ABC, if "It BE B that AC" is grammatical, where BE stands for any form of the verb be (e.g. is or was), then B is a constituent of ABC.

Alice sees [the boy] now -> It is the boy that Alice sees now

Alice saw the boy [yesterday] -> It was yesterday that Alice saw the boy

If a string of word can be the focus of a cleft sentence, then the string of words is a constituent.

"it"-cleft: "it is the subject - be that/who

In this construction, the word it appears as the subject, be as the verb, and more material follows.

The target of clefting becomes the focus. The words that appear after that are called the presupposition (because the speaker of this sentences presupposes that the discourse participants know about it).

There are many constructions involving a notion of focus. This is one of them. It is called a cleft construction and the focus is also called the clefted string.

The fact that clefting can only affect continuous strings is an indicate the focus must be a constituent.

The focus of a cleft construction is a constituent.

Adjective phrase:

He returned from his travels wise than before. -> It was wiser than before that he returned from his travels.

Adverb phrase:

They arrived at the concert hall more quickly than they had expected. -> It was more quickly than they had expected that they arrived at the concert hall.

2.2.3   Pseudo-clefting (Preposing)

Cleft and pseudocleft constructions fulfill similar functions of 'focusing' a constituent of the correspondingly simpler sentence, though the two constructions differ considerably with regard to the class of cases in which they can be employed.

  1. It's Alice that Bob was talking to -> Who Bob was talking to was Alice
  2. It's Alice that John was talk to -> Who John was talking to was Alice
  3. Mary gave [a book to John] -> * What Mary was a book to John

Cleft construction and pseudocleft construction do not test for the same type of constiuents. The cleft construction only works well for DPs and PPs. The pseudocleft construction work well for a variety of of other constituents as well: A, P, VPs, and CPs.

Alice became [deadly afraid of flying]. -> What Alice became was deadly afraid of flying. (AP)

Alice told us [that she wants to quit school]. -> What Alice told us was that she wants to quit school. (CP)

Alice promised us to be gentle. -> What Alice promised is to be gentle. (CP)

Alice will arrive tomorrow. -> What Alice will do is arrive tomorrow. (VP)

The focus of a pseudocleft is a constituent.

A pseudocleft experiment takes the following form.

Given some acceptable string A B C, we form the new string What A C BE B. If the result is acceptable, this is evidence that B is a constituent of A B C, else we conclude nothing.

A variant of the pseudocleft experiment can be used to isolate VPs.

Given some acceptable string A B C, we form the new string: What A DO B BE C where DO and BE are inflections of do and be. If the result is acceptable ...

For example: "Alice will love Bob -> What Alice will do is love Bob."

There is another construction, called an inverted pseudocleft, which is identical to pseudoclefting except that the two strings the verb be are inverted.

Given ... A B C we form the new strong B BE what AC ...

For example, "It's Alice that Bob was talking to -> Alice was ho John was talking to", "It is to Cleveland that John drove the truck -> To Cleveland is where John drove the truck."

2.2.4   Wh-movement

Wh-questioned string are constituents. For example:

Alice wants to buy [these] books about cooking.

Alice wants to buy which books about cooking.

Which books about cooking does Alice want to buy?

2.3   Modification by only, even


Alice will only put books on these tables

Only places one element of the sentence, which we also call the focus, in contrast with implicit alternatives. In the first sentence below, focus may be on put, books, or tables depending on the stress. For example:

Alice will only *put books on these tables*
Alice will only *put* books on these tables
Alice will only put *books* on these tables
Alice will only put books *on* these tables
Alice will only put books on *these* tables
Alice will only put books on these tables
Alice will only put books on these *tables*

For example, the last case means: "It is only these tables that Alice will put books on, not anything else."

The process of associating an element with only is not free. Focus may not be put on will or Alice. (i.e. this last sentence does not mean that 'only Alice will ...') only seems to precede its focus. Precedence is not sufficient. Consider:

Alice will only put books *on these tables*

The reason for this difference is of course constituent structure.

Alice will [VP only [VP put books on these tables]]

This suggests the following rule:

The focus associated with only must contained in a constituent sibling to only.

2.4   Stand-alone

The answer ellipsis test refers to the ability of a sequence of words to stand alone as a reply to a question.

What did you do yesterday? -> Worked on my new project.

Linguists do not agree whether passing the answer ellipsis test is sufficient, though at a minimum they agree that it can help confirm the results of another constituency test.

2.5   Coordination

The coordination test assumes that only constituents can be coordinated, i.e., joined by means of a coordinator such as "and".

If we have two acceptable sentences of the form A B D and A C D and the string A B and C D is acceptable with the same meaning as A B D and A C D, this is evidence that B and C are both constituents, and constituents of the same kind.

For example, let A = "", B = "this girl in the red coat", C = "you", D = "will put a picture of Bill on your desk before tomorrow".

Coordination is more general than substitution, since we may be able to coordinate two string, neither of which is replaceable by a single word. For example:

The girl in the red coat [will eat her breakfast] and [will put a picture of Bill on your desk] before tomorrow.

Coordination tests rarely seem to fail (and conceivable that it never really fails). [2]

Coordination is appropriate when:

  • We can say each of the two sentences independently
  • These two sentences have identical parts and dissimilar parts
  • We can substitute one dissimilar part for the other, preserving acceptability

In many cases, coordination will fail because of an interference with agreement: coordination of two singular DPs yields a plural DP. For example, "[Alice] and [the boy] are sick."

Interpreting coordination test failures:

If we have two acceptable sentences of the form A B D and A C D where none of {A B C D} are bound morphemes, and the string A B and C D is not acceptable (even after we have fixed agreement), then it is false that: B and C are both constituents and constituents of the same kind; one or more of the following is true: B is not a constituent, C is not a constituent, or B and C are not of the same kind.

Generally, the coordination of two constituents is described by X Conj X -> X.

3   Classification

Types of constituents are called syntactic categories (= part of speech).

Syntactic categories may bear grammatical category. For instance, Tense bears grammatical tense and Voice bears grammatical voice.

The class of many constituents is ambiguous, and the class must always be determined by context.

Some word classes are complex or unspoken.

Syntacticians use distributional criteria to classify constituents. Distributional criteria are sufficient but not necessary for establishing category.

Constituents cannot be categorized by semantic nor functional criteria since many lexical words can be used ambiguously (e.g. nominalization). Further, one can know the part of speech of a word without even knowing what it means.

4   Properties

The terminal yield (= fringe = surface structure) of tree is the sequence of leaves encountered in an ordered walk of the tree. The resulting sequence of terminal is a string of the language generated by the grammar.

4.1   Precedence

Node A precedes node B if and only if A is to the left of B and neither A dominates B nor B dominates A and every node dominating A either precedes B or dominates B.

A immediately precedes B if there is no node G that follows A but precedes B.

4.2   C-command

Node A c-commands node B if every branch node dominating A also dominates B, and neither A nor B dominate the other. Informally, A node c-commands it sisters and all the daughters of its sisters.

Node A symmetrically c-commands node B if A c-commands B and B c-commands A.

Node A asymmetrically c-commands node B if A c-commands B but B does not c-command A.

The etymology of c-command is unclear. Some claim it means "constituent command. Others claim that "c-command" came about to distinguish between "command" and "kommand".

4.3   Government

Informally, government can be thought of as "immediate c-command".

Formally, node A governs node B if A c-commands B and there is no node G such that G is c-commanded by A and G asymmetrically c-commands B.

Governors are heads of the lexical categories.

4.4   Binding

Node A binds B iff A and B corefer and A c-commands B. A is called the antecedent and B is called a anaphoric pronoun.

5   Representation

Constituents are represented as trees, called concrete syntax trees (=parse tree = deep structure).

A tree drawn only one level deep is called a flat structure.

Given a context-free grammar, a parse tree according to the grammar is a tree with the following properties:

  1. The root is labeled by the start symbol.
  2. Each leaf is labeled by a nonterminal.
  3. Each interior node is labeled by a nonterminal.
  4. If A is the nonterminal labeling some interior node and 0, 1, ..., i, are the labels of the children of that node from left to right, then there must be a production A -> 0, 1, ..., i. Here, 0, 1, ..., i stand for a symbol that is either a terminal or a nonterminal.

6   Parsing (Constituent analysis)

Parsing is the task of deriving a terminal string from the start symbol of a grammar (or raising a syntax error).

Parse trees retain all of the information of the input. A parse tree is an abbreviated derivation, but contains less information since it does not tell us in what order the rules were applies in.

6.1   Ambiguity

A grammar can have more than one parse tree generating a given string of terminals. Such a grammar is said to be ambiguous. Since a string with more than one parse tree usually has more than one meaning, we need to design unambiguous grammar for compiling applications, or use additional rules to resolve the ambiguities.

The ambiguity of a sentence results from multiple possible arrangements into constituents.

They killed [the man] [with a gun]. They killed [the man with a gun].

8   Further reading

9   References

[1]Carnie 2001
[2]Sportiche Koopman Stabler 2013
[3](1, 2)

When constituents are are in complementary distribution, they are instances of the same thing. [1]