2.1. Premise and Paraphrase.
2.2. Uniform vs Non-Uniform Hierarchies.
2.3. New Evidence in Support of the Musical Surface.
[2.1.1] “[Construction] grammar does not involve any transformational or derivational component. Semantics is associated directly with the surface form (Goldberg 2002; Culicover and Jackendoff 2005)” (Goldberg 2013, 15).
[2.1.2] Mainstream generative grammar, especially in its “transformational” versions, viewed the literal or “surface” forms of clauses and sentences as the incidental appearances of deeper structures. For example, the passive voice and the active voice were viewed as surface transformations of a single, deeper semantic structure. Supporting this view was the hypothesis of Universal Grammar, an innate and highly abstract mental faculty (Chomsky 1965, 1966). Most linguists who study construction grammar reject these notions, arguing that we have both passive- and active-voice constructions because they serve different communicative functions. Schema theory in music places a similar emphasis on the perceivable musical surface.
[2.2.1] One of the hallmarks of Heinrich Schenker’s Der freie Satz (Schenker 1935) or of Lerdahl and Jackendoff’s A Generative Theory of Tonal Music (Lerdahl and Jackendoff 1983) is the pervasive gauging of long-term dependencies. Global pitch patterns strongly constrain the parsing of local patterns. This sense of “the movement as a whole” has been a strong attraction of these analytical systems. Schema theory in music, by contrast, has focused on relatively small patterns that can be accommodated by the normal capacities of working memory, a stance consonant with arguments set out by the philosopher Jerrold Levinson in his Music in the Moment (Levinson 1997). One might ask of schema theory, “How can listeners learn or appreciate a whole movement if they only process it in small chunks?” In the spirit of this article, and to augment Levinson’s points, we will offer what a linguist might term a “functionalist” response.
[2.2.2] Language can be conceptualized as a three-level, non-uniform hierarchy. That is, there is a lower level where words code semantic content, a middle level where clauses code propositional information, and an upper level where multi-propositional structures code discourse coherence (Givón 2001, 7–13). This hierarchy is non-uniform because different types of entities and relationships are involved at each level. A word may be formed from one or more phonemes, but a word is not a higher-level phoneme. Similarly, a clause is not a higher-level word, and a discourse is not a higher-level clause. Each level is distinct and can involve functionally different types of memory: (1) a sensory store, (2) short-term or working memory, and (3) long-term memory. Were someone to attempt a reduction of Jane Austen’s famous opening line from Pride and Prejudice—“It is a truth universally acknowledged, that a single man in possession of a good fortune must be in want of a wife.”—to an unadorned essence or deep structure, neither a single word nor a single clause would suffice. “Man desires wife” might serve as a want ad in the personals section of a newspaper but would fail utterly to capture the gist of Austen’s discourse (see also Allanbrook 2002).
[2.2.3] We believe that long-span, uniform hierarchical structures of pitches inherit some of the terminological slippage seen in the ordinary speech of musicians. When a noun like “C” becomes distributed across different levels (“that tone is C,” “that’s a C chord,” “the theme returns in C,” “the second movement, the one in C”), it is thereby covertly transformed into an array of different concepts. Similarly, when Milton Babbitt observed that Schenkerian analysis exhibits “nested transformations . . . strikingly similar to transformational grammars in linguistics” (Babbitt 1965, 59–60), he was comparing the linguistics of brief utterances like clauses or sentences with the analysis of sometimes vast musical canvases. It is worth noting that the famous linguistic theories of Noam Chomsky, which have inspired a great deal of music theory pro and con (Narmour 1977, Keiler 1978, Sloboda 1985, Chanan 1994, Sparshott 1994, Swain 1995, Blasius 1996, Larson 1998, Boykan 2004, Brown 2005, Jackendoff 2009, Lerdahl 2009, and Tan et. al. 2010), were not directed toward whole artworks, be they novels, plays, or epic poetry, and that is true of linguistics generally.
[2.2.4] For there to be a hierarchy, there must be real differentiations between levels and concepts, and levels of differing time spans must posit human faculties of memory appropriate to those time spans (Narmour 1984). Schema theory, applied to European art music, closely follows Givòn’s hierarchy. Tones combine into motives or brief melodies, which as emergent Gestalts cannot then be reduced to single tones. Individual voices join to make counterpoint and musical clauses like cadences, sequences, and thematic phrases. None of these can be reduced to single intervals or tones. Clause-like musical entities combine into a musical discourse, which again is different in kind from any of its components. At the level of discourse, one might say, for example, that “the opening theme returns.” Such an assertion depends on a recognition of similarity at the level of discourse, not on an imagined movement-wide web of counterpoint and/or harmony. One could, after all, easily identify the return of a coherent opening theme following an extended presentation of random tones. There are, of course, many areas where schema theory overlaps with Schenkerian analysis. Both approaches are sensitive to the artisanal practices of thoroughbass and counterpoint, and both posit a repertory of “middleground” prototypes (Pearsall 1996; Rabinovitch forthcoming). But in schema theory, patterns at one level of the non-uniform hierarchy are relatively unaffected by either subsidiary or superordinate patterns. In schema theory as in construction grammar, transformation is not a central process.
[2.2.5] Historically, instructional materials from the eighteenth and nineteenth centuries adopt a consistently local view of tonal structures. For example, E. A. Förster, a personal friend of Haydn, Mozart, and Beethoven, placed Arabic numbers under the basses of his realized examples of thoroughbass to represent scale degrees (Förster 1818). In every case he numbered the bass locally, adapting the numbers to each successive modulation. Similarly, Italian regole or “rules” attached to collections of partimenti (figured and unfigured basses intended as “lead sheets” for student improvisations) frequently mention scale degrees (e.g., prima del tono, seconda del tono, etc.), but always locally.
[2.2.6] Leonard B. Meyer, an early functionalist in music and a founder of schema theory, echoed this emphasis on the local and the perceivable in a keynote address to the 1988 meeting of the Society for Music Theory. Referring to the field’s “almost obsessive concern with the nature of unity in music,” he said,
All this yearning for the womb-like warmth of Oneness leaves me cold. I am an antediluvian empiricist who delights in discrimination, distinction, and diversity. For me, the order that entices and excites is that which is revealed when disparity and contrast, regularity and caprice are related to one another through functional differentiation. I am not a denizen of obscure, abstract depths—a diver after cosmic conceptions and unconfirmable hypotheses. I am content to snorkel along the surface, peering down just a bit to be bewitched by the pleasing patterns of luminous fish and the quiet swaying of colorful coral. (1991, 241)
[2.3.1] In mainstream Chomskian grammar, the “Argument from the Poverty of the Stimulus” holds that a child cannot develop a grammar from mere exposure to the utterances of adults and thereby must rely on an innate Universal Grammar to provide the needed deep structures. This is a contentious issue in linguistics, and many construction grammarians reject its validity (Tomasello 2003). In a number of innovative experiments, Jenny Saffran has demonstrated that babies can use statistical learning to develop a simple grammar (Saffran et. al. 1996, Saffran 2003, Baldwin et. al. 2008). Statistical learning is precisely the type of learning from mere exposure that was not supposed to be possible. An analogous experiment has more recently been conducted with challenging musical stimuli. Psyche Loui (2010), the late David Wessel, and Carla Hudson Kam created two sets of melodies based on the Bohlen-Pierce scale, which spans an equal-tempered twelfth (a stretched “octave”) and thus sounds quite strange and unlike any traditional scale. A “grammar” in this odd music was defined as the melody always presenting tones that belong to the same chord progression (a “chord” being a subset of the scale). There were only three chords. If we label them A, B, and C, then one set of melodies followed the progression A, B, C, A, playing two tones per chord, while the other set followed the progression A, C, B, A. Only the order of the middle two chords differed. Hundreds of such eight-tone melodies were composed and then played to participants in the experiment. The audio example provides a sample melody from both grammars. Each listener heard only one type of grammar but was later able to tell whether a new, previously unheard melody was like or unlike the ones he or she had previously heard. In other words, participants, following mere exposure to a grammatically coherent corpus of melodies, could generally tell whether a novel melody was or was not grammatical.
[2.3.2] The time-scales of the stimuli used in the experiments by Saffran and by Loui et al. fit comfortably within the normal capacity of human working memory. This is the span of melodic motives, of the principal components of sonata themes, of fugue subjects, and of the "hooks" in popular music. We know that items in working memory can be transferred to storage in long-term memory, and so these experiments provide empirical validation that "mere exposure" to certain statistical regularities in sound patterns allows listeners to abstract commonalities that we might reasonably call a grammar. In terms of empirical validation, the ability of listeners to perform similar abstractions at time-scales of, say, ten minutes remains an open question.