A generative introduction

Author: Andrew Carnie
Date: 2001


This book is intended as an introduction to syntactic theory. It takes the student through most of the major issues in Principles and Parameters, from tree drawing to constraints on movement.


1   Preliminaries

1.1   Generative Grammar

1.1.1   Preliminaries

Definitions are given for the following terms:

1.1.2   Syntax as Cognitive Science

Defines cognitive science and states that language is a human property.

1.1.3   Modeling Syntax

We will mostly study the Principles and Parts approach version of generative grammar although we will occasionally stray from this into the Minimalist Program.

1.1.4   Syntax as Science: The Scientific Method

Defines linguistics.

In syntax, we apply the scientific method to sentence structure. Syntacticians start by observing data about the language they are studying, then make generalizations about patterns in the data. They then generate a testable hypothesis and test their hypotheses against more syntactic data.

The hypotheses are called rules.

A grammar is a theory of language's syntax.

We focus on descriptive rules (instead of prescriptive rules) since they give us insight into how the mind uses languages.

An example is given with anaphors, demonstrating how they must agree in person, gender, and number with their antecedents, and showing how this rule evolves from simpler hypotheses.

Defines grammatical person.   Sources of Data

The two main sources of data for experiments with syntax are corpora and grammaticality judgment tasks.

Corpora (singular corpus) are collections of either spoken or written text (which sometimes need to be compiled by the research if for instance no literary tradition exists).

The value of corpora is limited since they only contain instances of grammatical (or more precisely well-formed) sentences. Due to the infinite and productive nature of language, a corpus could never contain all the grammatical forms of a language, nor could it even contains a representative sample.

To really get at what we know about our language, we have to know what sentences are not well-formed. This kind of information is not available in corpora.

Grammaticality judgment tasks are used.

In this text, we will be concerned primarily with syntactic well-formedness.

1.1.5   Where do Rules Come From?

This is sort of a side issue, but affects the shape of our theory.

If we know how children acquire their rules, then we are in a better position for a proper formalization of them. The theory of generative grammar makes something very specific claims about this.   Learning vs. Acquisition

Some rules of grammar seem to be innate knowledge rather than explicit knowledge or tacit knowledge. (See psychological nativism, instinct)   Innateness: Language as an Instinct

Noam Chomsky controversially claims that Language is also an instinct. Many part of Language are innate.

There are good reasons to believe that a human facility for Language (perhaps in the form of a "Language organ" in the brain) is innate. We call this facility Universal Grammar.   The Logical Problem of Language Acquisition

Here we show that from a logical perspective, an infinite productive system like the rules of language cannot have been learned or acquired.

Infinite systems are neither learnable nor acquirable. Since we have such an infinite system in our heads, and we can't have learned it, it must be the case that it is built in. (The argument presented her is based on an unpublished paper by Alec Marantz)

Language is an infinitely productive system; a speaker can produce and understand sentences he has never hear before. This is because language is recursive; it always possible to embed a sentence inside of a larger one.

It turns out that rule-generated infinite systems like language not learninable, as a simple fact of logic.

Infinite systems are unlearnable because one never has enough input to be sure one has all the relevant facts. This is called the logical problem of language acquisition.

Generative grammar gets around this logical puzzle by claiming that Universal Grammar helps children construct a knowledge of language by restricting the number of possible functions that map between situations and utterances.   Other Arguments for UG

The evidence for UG doesn't rely on the logical problem alone.

TODO: Return   Explaining Language Variation

1.2   Fundamentals: Rules, Trees, Parts of Speech


Defines constituents.

1.2.1   Parts of speech

Before we we can look at these phrases, we have to briefly look at the kinds of words that compose them.

If we have categories for words that can appear in certain positions and categories for those that don't we can make (scientific) generalizations about the behavior of different word types. This is why we need parts of speech.

Distributional criteria are used instead of semantic definitions for syntactic categories.

1.2.2   Structure

We have an intuition that certain words are more closely connected than others. The notion we use to capture these intuitions are constituency and hierarchical structure.

Defines constituent.

Constituency is the most important and basic notion in syntactic theory.

Constituents are embedded in one another to form larger and larger constituents. This is hierarchical structure. Hierarchical constituent structure can be represent with a pair of brackets, but they are much harder to read.

1.2.3   Rules and trees

In generative grammar, generalizations about structure are represented by rules, which are said to generate the tree in the mind. So if we draw a tree a particular way, we need a rule to generate that tree. We consider phrase structure rules in this chapter.

Definition:The Golden Rule of Tree Structures: Modifiers are always attached within the phrase they modify.

Definitions:   Adjective phrases and adverb phrases

An adjective phrases (AP) might be something like "a [very yellow] book"

AP -> (AP) A

very yellow

In much work on syntactic theory, there is no significant distinction between adjectives and adverbs. This is because it is isn't clear that they are really distinct categories.   Summary

In this section we looked at the phrase structure rules need to generate trees that account for English sentences.

  1. S' -> (C) S
  2. S -> {NP / S'} T VP
  3. NP -> (D) (AP+) N (PP+)
  4. VP -> (AP+) V ({NP / S'}) (PP+) (AP+)
  5. PP -> P (NP)
  6. AP -> (AP) A

1.2.4   How to draw a tree

There are two ways to about drawing at tree. Starting at the bottom and working one's way up to the S, or starting with the S and working down. The decision depends on one's style.   Bottom-up trees

This method often works best for beginners.

  1. Write out the sentence and identify the parts of speech
  2. Identify what modifies what.
  3. Link together items that modify one another.
  4. Keep applying rules until you have attached all the modifiers to the modified constituents until you get to the S.
  5. Go back and check the tree is valid.   The top-down method of drawing trees

  1. Write out the sentence and identify the parts of speech
  2. Draw the S with the NP and VP.   Bracketed diagrams

Sometimes it is preferable to use bracketed notation instead of the tree notation. This is especially true when there are large parts of the sentence that are irrelevant to the discussion at hand. Drawing bracketing diagrams essentially follow the same rules for tree drawing.

1.2.5   Modification and ambiguity

Consider the following two sentences:

  1. The man killed the king with a knife
  2. The man killed the king with the red hair

Each of these sentences is ambiguous, but for the moment consider the least difficult reading for each.

  1. "The man used a knife to kill the king."
  2. "The king with the red hair was killed by the man"

Note, the two original sentence have very similar surface forms.

A paraphrase is another way of saying the same thing.

Syntax trees allow us to capture the differences between ambiguous readings of the same surface sentence.

See: constituency tests

1.2.7   Appendix

Open vs closed classes of speech

Linguistic theory distinguish two kinds of lexical items. Parts of speech are divided into open and closed class items. Membership in open classes (Nouns, verbs, adjectives) categories is unlimited. New words may be coined at any time if they are open class. Membership in closed class by contrast is limited and coinages are rare.

1.3   Structural Relations


The focus of this chapter is to study and describe syntactic trees as geometric objects. This chapter is about the purely formal properties of trees.

There are two reasons to study trees as mathematical objects. First, we can assign names to the various part and describe how the parts relate to one another. Second, it turns out that there are many syntactic phenomena that make explicit reference to the geometry of trees.

1.3.1   The parts of a tree

See: parse tree

We now have all the terms we need to describe the various parts of a tree. Next we turn to a set of terms that will allow us to describe the relations that hold between these parts. These relations are often called structural relations.

1.4   Binding Theory




Looked at a complex set of data concerning the distribution of different kinds of NPs.

There are three context-sensitive semantic types of noun phrases, R-expressions, anaphors, and pronouns, whose distribution is governed by a set of Binding Principles (A, B, and C).

1.4.1   The notions co-index and antecedent

See noun phrase.

A noun phrase that gives it meaning to a pronoun or anaphor.

1.4.2   Binding & Locality conditions on the binding of anaphors

A binds B iff A and B corefer and A c-commands B. A is called the antecedent and B is called a anaphoric pronoun.

Corollary: Binding is not symmetric.

Locality constraint
A constraint on the grammar, such that two syntactic entities must be "local" or near to one another.
Binding domain
The syntactic space in which an anaphor must find its antecedent. For now, we assume this is the clause containing the anaphor. (This is over simplistic. Some cases are given.)
Binding principle A

An anaphor must be bound in its binding domain.

  • "* Herself helped Alice."
  • "* [Alice_i's mother]_j helped herself_i."
  • "* Alice said that herself helped Bob."

1.4.3   The distribution of pronouns

Principle B:

Pronouns must be free (not bound) in the binding domain.

  • Alicei helped herj.
  • * Alicei helped heri.
  • Alicei said that shei helped Bob.
  • Alicei said that shej helped Bob.

1.4.4   The distribution of R-expressions

Principle C:

R-expressions must be free.

  • Alicei helped Bobi.

2   The Base

2.1   X-bar Theory



  • There seems to be more structure our trees than that given by the basic phrase structure rules developed in chapter 2.

    In particular, we introduce intermediate levels of structure: [N', V', A', P']. The evidence for these comes from standard constituency tests like conjunction, and from processes like one-replacement and do-so-replacement.

  • Material on different levels of structure behave differently. Complements exhibits one set of behaviors and adjuncts a different set.

  • Our rules fail to capture several generalizations about the data. First was the endocentricity generalization.

  • All trees have three levels of structure: specifiers, adjuncts, and complements.

  • Propose that options with X-bar theoretic rules are parameterized. Speakers of languages select the appropriate option for their language.

2.1.1   Bar-level projections

Intermediate NP structure is given by:

NP -> (D) N'
N' -> {(AP) N', N' (PP)}
N' -> N (PP)
Replace an N' node with one.   V-bar

Identical process to one-replacement found in the syntax of VPs: do-so replacement.

I [eat beans with a fork]

Intermediate VP given by:

VP -> V'
V' -> V' (PP)
V' -> V' (NP)
Replace a V' with do so.   A-bars

The argument for intermediate structure in APs are a little more tricky.

The [very [[bright blue] and [dull green]]] gown


AP -> A'
A' -> (AP) A'
A' -> A (PP)   P-bars


PP -> P'
P' -> P' (PP)
P' -> P' (NP)

2.1.2   Generalizing the rules: the x-bar schema

For each of the major phrase types, {NP, VP, AP, PP}, we have come up with three rules, where the second and third rules serve to introduce structure.

The strength of evidence for these rules varies.

We seem to missing several generalizations.

  1. In all the rules of above, the category of the rule is the same is the same as the only element that is not optional. This is a very general notion in phrase structure, called headedness. All phrases appear to have heads. Heads are the most prominent element in phrasal category and give their part of speech category to the whole phrase. The property of every phrase having a head is called endocentricity.
  2. With the except of the determiner in the NP rule, all non-head material in the rules is both phrasal and optional.
  3. For each major category, there are three rules: on the introduces the main phrase, on that takes a bar-level and repeat its, and one that take a bar level and spells out the head. The same kinds of rules appear.

We can condense the rules we've proposed into a simple set using variables.

  • Let X be a variable that can stand for any category {N, V, A, P}.
  • Let X' be a variable that can stand for any category {N', V', A', P'}.
  • Let XP be a variable that can stand for any category {NP, VP, AP, PP}.


XP -> (YP) X'

{X' -> X' (ZP), X' -> (ZP) X'}

X' -> X (WP)

A phrase consists of some optional phrase element followed by a node of the same category that is a single bar.

2.1.3   Complements, adjuncts, and specifiers   Complements and adjuncts in NPs

There are two different kinds of PP with an NP with different kind of behavior.   The Notion Specifier

A specifier is an XP that is a sibling to an X' level and a daughter of an XP.

Since the specifier rule is not recursive, you can only have on specifier. (One exception to this is all as in all the books)

* the these red books

The specifier rule always applies at the top of the structure. In English, this mean it will always be the left-most element.

A third rule introduces a structural position: the specifier:

Specificer rule: XP -> YP (X')

For example in "[the] [book] [of poems] [with a red cover]", "the" is a specifier, "of poems" is a complement, and "with a red cover" is an adjunct.

2.3   Constraining X-bar Theory: Theta Roles and the Lexicon

Abstract:X-bar theory overgenerates in certain cases. One way to constrain X-bar theory is by invoking lexical restrictions on sentences, such that particular predicates have specific argument structures in the form of theta grids, which are part of the lexicon. The theta criterion rules out any sentences where the number and type of arguments don't up one-to-one with the number and type of theta roles in the theta grid

2.3.1   Some basic terminology

The predicate defines the relation between individuals beings being talked about and reality. The entities participating in the relation are called arguments.

In the following, "hit" is the predicate and "Gwen" and "the baseball" are arguments:

[Gwen] hit [the baseball]

Predicates have argument structure; the number of arguments that a particular predicate requires.

In considering how many arguments a predicate has, we only consider complements and specifiers; adjuncts are never counted in the list of arguments. Only obligatory elements are considered arguments.

Predicates can also impose restrictions on the categories of their complements.

For example, ask can take an NP or CP:

I asked the question

I asked if you know the answer

Restrictions on the categorizations that a verb can have as a complement are called subcategorization restrictions.

We also find semantic restrictions on what can appear in particular positions, called selectional restrictions.

# A bolt of lightning killed the rock.

2.3.3   The lexicon

Chomsky claims that the part of the mind devoted to language is essentially divided into two parts:

  1. The computational component, which contains all the rules and constraints. Builds sentences and filters out ill-formed ones.
  2. The lexicon
The Projection Principle
Lexical information (e.g. theta roles) is syntactically represented at all levels.

2.3.4   Expletives and the extended projection principle

See: Expletive

The Extended Projection Principle (EPP)
All clauses must have subject. Lexical information is syntactically represented.

The EPP works like the theta criterion; it is a constraint on the output of the X-bar rules.

The model we've drawn here is very preliminary. In the next chapter, we will introduce a new kind of rule that will cause us to significantly revise this diagram.

3   Transformation Rules

3.1   Head-to-Head Movement




A phrase structure grammar, such as X-bar theory, must undergenerate. Chomsky 1957 proposed transformation rules as a solution to these issues.

In this chapter, we observe some phenomena where X-bar theory undergenerates and introduce two movement rules, V -> T and T -> C, and an insertion rule, do-support, to account for them.

3.1.1   Verb movement   French

French has verb raising (V -> T). It moves its verbs out of the VP and into the slot associated with T.

V -> T raising: Move the head V to the head T

Verb raising parameter
Verbs raise to T OR T lowers to V. This provides a simple account of the difference between English and French adverbial placement.

One way to represent a transformation is to draw a tree with an arrow.   Irish

Irises has verb-subject-object (VSO) order. For example:

Phoh Marie an lucharachan

Kissed Marry the leprechaun

Marry kissed the leprechaun

There is no way X-bar theory can generate a sentence of this type.

Page:201: VP Internal Subject Hypothesis

3.1.2   T-Movement (T -> C)

We briefly return to T -> movement (= subject / aux inversion).

In yes/no questions in English, auxiliary verbs invert with their subject:

You have squeezed the orange

Have you squeezed the orange?

3.1.3   Do-support


todo: copy this

See: Do-support

3.2   NP Movement




Certain NPs appear in positions not predicted by theta theory.

The movement rules described here take acceptable word orders and transform them into other acceptable word orders.

  • We argue that these sentences involve movement of NPs to various specifier positions. The motivation for this come from Case. The Case filter requires all NPs to check a Case in a specific structural position.
  • Looked at two situations where NPs dont get Case in their D-structure position. In raising structures, an NP is in the specifier of an embedded clause with non-finite T. In this position, it can't receive Case so it raises to the specifier of the finite T in the higher clause.
  • Looked at passive structures. The passive consists of two operations: one morphological and the other syntactic. The morphological operation adds the suffix -en, deletes the external argument and absorbs the verb's ability to assign accusative Case. This results in a structure where there is no subject NP, and the object cannot receive Case in its base position. The NP must move to the specifier of T to get Case.

3.2.1   A Puzzle for the Theory of Theta Roles

"Leave" requires one obligatory argument, an agent, which is an external (subject argument). Other arguments are optional.

The Locality Condition on Theta Role Assignment
Theta roles must be assigned within the same clause as the predicate that assigns them.

However, some sentences seem to violate the locality condition. For example, in "John is likely to leave", "John" is the agent of "leaving" but "John" appears in the main clause, away from its predicate.

The solution to this is simple: there is a transformation that moves agents from the lower clause to the higher clause.

To see this, we first note that the theta grid for "is likely" includes only one argument: the embedded clause. This is seen clear when the sentence is written as "It is likely that John will leave".

[CP [C' [C 0]
        [TP [T' [T ]
                [VP [V' [V *proposition]
                        [CP [C' [C 0]
                                [TP [NP John]
                                    [T' [T to]
                                        [VP [V' [V leave]]]]]]]]]]]]]

We need a transformation that moves this NP to the specifier of the main clause TP. This transformation is called NP movement.

NP Movement
Move an NP to a specifier position

Notice that the specifier of is unoccupied. We can thus move the NP "John" into that position. This particular NP movement is frequently called raising.

We might speculate then that the absence of a subject in the D-structure of the main clause is the trigger for NP movement. An NP moves to the specifier of the main clause TP to satisfy the EPP. There a number of important problems with this proposal.

  • It does not explain why we don't use expletives in these environments, e.g. "* It is likely John to leave".
  • It does not explain why only the subject NP of an embedded clause can satisfy the EPP, but not a moved object NP, e.g. "* Bill likely John to hit".

3.2.3   Case

If the embedded clause is non-finite, then the subject must move to get Case.

3.3   Raising, Control, and Empty Categories

  • Certain sentences that look alike on the surface can actually have different syntactic trees.
  • We compare subject-to-subject raising constructions to subject control constructions, and subject-to-object raising construction to to object control constructions. These can be tested by working out their argument structure and using the idiom test.
  • We claim that PRO only shows up in Caseless positions. PRO does not meet any of the binding conditions. We suggest PRO is subject to control theory.
  • Compare two different kind of null subject categories: PRO and pro. PRO is Caseless and is subject to the theory of control. pro takes Case is is often 'licensed' by rich agreement morphology on the verb.
  • Ends with a short discussion on the various kind of elements we've looked at so far (null heads, PRO, traces, etc), and introduces a new one which is found in languages like Spanish and Italian.

3.3.1   Raising vs. Control   Two kinds of theta grids for main predicates

Two kinds of construction:

[That Jean left] is likely (clausal subject)

It is likely [that Jean left] (extraposition)

The D-structure of these is identical. In the clausal subject construction, the embedded CP moves to the specifier of TP presumably to satisfy the EPP requirement. In the extraposition construction, an expletive "it" is inserted into the specifier of TP.

The logical conclusion is that there is actually a third NP here. This NP argument is called PRO. PRO only appears in the subject positions of non-finite clauses. PRO appears in a position where no Case can be assigned.   Raising construction

For example:

[_ is likely [Jean to leave]]

That Jean will dance is likely

It is likely that Jean will leave

  • The main predicate assigns an external argument.
  • There is no raising; the external theta role of the embedded predicate is assigned to a null Caseless PRO.
  • Preserves idiomatic meaning., e.g. "The cat is likely to get his tongue"   Control construction

For example:

[Jean is reluctant [(PRO) to leave]]

* It is reluctant that Jean will leave

  • The main predicate does not assign an external theta role.
  • The subject of the embedded clause is Caseless and raises to the empty position for Case checking (and to satisfy the EPP).
  • Does not preserve idiomatic meaning, e.g. "The cat is eager to get his tongue"   Distinguishing Raising from Control

Whether you have a raising or control construction is entirely dependent upon the main clause predicate. Some main clauses require raising, others require control. The test for raising and control then mostly have to do with the thematic properties of the main clause's predicate.

  • Work out the theta grids associated with the matrix predicates. If the matrix predicate assigns an external theta role, then it is not a raising construction. This is the most reliable diagnostic.
  • Use idioms. If D-structure is the level at which we interpret idiomatic meaning, then we should get idiomatic meanings with raising constructions. Compare "The cat is likely to get his tongue" (idiomatic) with "The cat is eager to get his tongue" (non-idiomatic).
  • See if they allow the extraposition construction. Expletives are only allowed in non-thematic positions which are the hallmark of raising.   What is PRO?

If NPs always need Case, how can PRO appears in the specifier of non-finite TP, which is not a Case position? Chomsky (1981) claims that the reason PRO is null and silent is precisely because it appears in a Caseless position.

We need PRO because if we didn't have PRO, then we would have a violation of theta criterion. We are proposing a null element to account for an apparent hole in our theory.

3.3.2   Two kinds of raising, two kinds of control   Two kinds of raising   Two kinds of control   Summary of predicate types

4   Alternatives

5   Where to Go From Here


The no crossing branches constraint says that if one constituent A precedes another constituent Y, then X and all constituents dominated by X must precede Y and all constituents dominated by Y.