Diversity in Music Corpus Studies
Nicholas Shea, Lindsey Reymore, Christopher Wm. White, Ben Duinker, Leigh VanHandel, Matthew Zeller, and Nicole Biamonte
DOI: 10.30535/mto.30.1.8
Copyright © 2024 Society for Music Theory
0. Introduction
[0.1] Corpus studies are powerful tools for analysis due to their ability to examine trends across a relevant body of work. Traditional approaches to music corpus development typically involve compiling features from a repertoire defined by geographic region, historical period, and/or a socially determined category of musical style. Often these parameters intersect and are an explicit component of their constituency; e.g., harmonic annotations of 18th-century Western European art-music works (Devaney et al. 2015) or contemporary North American popular-music songs (Burgoyne 2012). There are also numerous strategies for building a corpus, depending on the aims of the researcher. Corpora of convenience are perhaps the most common, drawing upon an existing resource created by someone other than the researcher to capture a repertoire. This is especially true of popular-music corpus studies, which typically rely on measurements of popularity, levels of cultural salience, or approximations of a musical “mainstream” within some genre (London 2013). These operationalizations can include commercial success (Burgoyne 2012), historical fame (Devaney et al. 2015, Ohriner 2016), or the emergent consensus of a user-driven forum (White and Quinn 2016, Shea 2020).
[0.2] In this study we critically examine the role of social forces as an often-implicit but significant feature of corpus development. We argue that social forces should be carefully considered and, quite often, actively combatted when sampling for corpus studies, in order to avoid perpetuating real-world biases against already-marginalized populations. Previous research has considered the effects of discriminatory forces inherent in commercial music practices (e.g., Epps-Darling, Bouyer, and Cramer 2020; Neal 2016; Rose 1994; Smith et al. 2020; Watson 2019a) and has shown that creating corpora from only convenience samples may replicate the racism, sexism, xenophobia, and other forms of exclusion inherent in their source materials. For example, if the people who determine what is popular or famous in a given genre employ sexist preferences when creating a corpus of that genre, their corpus will reflect that sexism, whether or not such biases are conscious. Once a musical corpus is developed, it tends to be used consistently in other research due to its convenience; see, for example, the McGill Billboard and Rolling Stone 200 corpora.(1) We argue that researchers should consider social forces when constructing new corpora. By not considering such forces, researchers run the risk of perpetuating harm to marginalized populations through ongoing studies grounded in inequitable corpora.
Example 1. A schematic of the Anti-Discriminatory Alignment System
(click to enlarge)
[0.3] To our knowledge, discriminatory forces in corpus development have largely gone unaddressed.(2) In response, we propose an Anti-Discriminatory Alignment System. The ADAS acts on a parent corpus—an initial corpus, list, or dataset— and applies an intentional sampling method to counterbalance discriminatory forces that may have affected the parent corpus’s constituency. The resulting child corpus includes more diverse artist identities and also may better represent the musical diversity within some style or genre.(3) This process is outlined in Example 1. We will demonstrate the application of these sampling methods to a novel corpus of popular music, Timbre in Popular Song (TiPS). This new corpus, which we initially constructed in order to study timbre and texture, comprises four contrasting popular genres: country, pop, metal, and hip hop. We share our process constructing the TiPS corpus in order to show how the ADAS can inform sampling: for each genre, we apply the ADAS to a parent corpus to produce four child corpora. While we focus primarily on popular music in this essay, the technologies and criticisms we develop can generalize to any corpus-oriented research program. Most relevant to our discussion of social forces, the ADAS takes advantage of the natural ambiguity and flexibility of genre to steer corpus creation towards greater representation. We posit that if musical genres emerge as fluid categories based on cultural preferences, these preferences may be influenced by racist, misogynist, xenophobic, and other discriminatory stances. An anti-discriminatory corpus can work to minimize the effects of such stances, providing a better musical understanding of the internal characteristics of a genre by including the musical practices of a wider variety of humans who have contributed to it.
[0.4] We believe our approach is justified musically and theoretically, but we also acknowledge an ethical component to this work. Philip Ewell (2020) compellingly demonstrates connections between white supremacy and the repertoires and technologies promoted by North American music theory. Clifton Boyd (2020) explicitly challenges scholars to leverage their privilege to make the field of music theory more accessible to students and researchers with historically marginalized identities. Following the anti-racist theories of Kendi (2019) and Ewell (2020), we advocate for methods that actively remove discriminatory forces and promote the voices of those who have been discriminated against. In other words, our framework for corpus construction is designed not to be passively non-discriminatory but proactively anti-discriminatory.
[0.5] We also must acknowledge our own positionality in this project. All co-authors are white or white-presenting. All are cisgender, 4 identify as male and 3 as female; none exhibit visible disability. These identities place us in relatively privileged positions in contemporary society and in the field of music theory, and we acknowledge that we are writing this essay from that position of privilege.
[0.6] Furthermore, we recognize that many of the authors’ previous corpus studies could have done more to promote artist diversity in a meaningful way; which is to say, we are certainly not immune from the biases we investigate. For instance, the rock songs in Biamonte (2010) and the French and German art songs in VanHandel & Song (2010) do reflect the authors’ efforts to foreground contributions by female artists but simultaneously do not address racial or ethnic diversity. The same could be said of the Yale Classical Archives Corpus as featured in White (2013) and subsequent related publications. Duinker (2020) likewise acknowledges the gender imbalance that results from passively using external sources for corpus development but stops short of modifying the corpus to rectify this. A similar but slightly different issue arises in work by Shea (2020; 2023) which utilizes a preliminary model of our proposed demographic sampling procedure. However, these studies do so imprecisely by treating BIPOC and Black as analogous identities. By recognizing these methodological shortcomings and our relative position of privilege within our field, we hope to encourage others to avoid “white complicity” (Applebaum 2008) by adopting a proactive attitude toward anti-discriminatory practices in the nascent subfield of corpus studies.
[0.7] This article suggests a method for both counteracting discriminatory social forces and privileging artist diversity. Section 2 provides a broad overview of the role that social forces play—and issues that they introduce—in corpus construction. After a brief discussion of terminology in Section 3, Section 4 analyzes Rolling Stone Magazine’s list of “The 500 Greatest Songs of All Time,” using its underlying demographics to demonstrate the connection between discriminatory social forces and those songs acclaimed as paragons of a musical genre. Section 5 then focuses on the McGill Billboard corpus, a frequently used dataset derived from that publication’s “Hot 100” song charts. Here, we capitalize on the relatively greater diversity of artists in this list to examine the relationship among demographics, musical consistency, and genre. We find that demographics are deeply intertwined with musical consistency, to the extent that demographic exclusivity will ignore the musical variations present in a given historical tradition. Section 6 then operationalizes the ADAS, with Section 7 offering an example of results of this process in corpora from four different genres. Section 8 undertakes an analysis of these various corpora, showing how the actions of the ADAS allow for particular identity-connected stylistic rivulets to be identifiable, here showing the connection between the use of a prechorus and gender throughout various genres. Finally, Section 9 concludes the article by returning to the notion of genre, arguing that the boundaries and constituency of this concept are flexible enough to accommodate the ADAS approach. This flexibility empowers corpus analysts to sample widely from a generic repertoire; such a selection enhances artist diversity and can better capture the musical practices within that genre. We offer the ADAS as a potential framework for any music researcher who is interested in employing the anti-discriminatory sampling strategies outlined herein.
1. The Ubiquity of Historical Discriminatory Forces in Music
[1.1] Corpus studies that aim to adopt anti-discriminatory practices should be intentional and sensitive when choosing sources to represent some musical tradition. Because the Rolling Stone lists reflect sexist forces, for instance, then some other non-sexist compendium of popular music would be better suited for compiling a corpus. The problem, however, runs deeper than this solution’s capacity: the ways that musicians, producers, distributors, and consumers have grouped and divided music—especially 20th-century American popular music—are inextricably tied to personal identities and the history of their discrimination and exclusion. In what follows, we outline some issues surrounding categorizing songs and artists into musical genres. We also touch upon historical discriminatory perspectives that have shaped modern music scholarship and education, especially in music theory. We argue that such discriminatory forces run sufficiently deep within these areas that a more radical reorientation of corpus methods is required for an anti-discriminatory approach.
Commercial genre categories and discrimination
[1.2] In Categorizing Sound: Genre and Twentieth-Century Popular Music, David Brackett observes that “genres are not static groupings of empirically verifiable musical characteristics, but rather associations of texts whose criteria of similarity may vary according to the uses to which the genre labels are put. ‘Similar’ elements include more than musical features, and groupings often hinge on elements of nation, class, race, gender, sexuality, and so on” (2016, 3–4). As we began to describe earlier, the history of 20th-century popular genres in the United States is a study in discrimination, segregation, and exclusion. Beginning in the 1920s, the genre categories of “race music” and “hillbilly” were predicated on the racial identities of recording artists and their intended public. Despite musical similarities and common influences, rural musics by white artists and by artists of color were marketed separately, as “hillbilly” music for white audiences and “race” music for black audiences. Such a division was supported by segregationist laws passed by many states in the southern US in the late nineteenth century (Miller 2010, 2–3). In many contexts, race has underscored a barrier between specific genres and the so-called musical mainstream. The list of Afro-diasporic genres that have been subsumed into mainstream popular music is long, including blues, jazz, R&B, rock ’n’ roll, soul, funk, disco, and hip-hop. Johnson (2018) notes the exclusionary effect that the development of rock’n’roll had on Black musicians whose music gave rise to the genre in the first place. Rock’n’roll musicians were mainly white and marketed to a mainstream white audience, while Black musicians, whose output was more typically categorized as R&B, were largely denied the level of market access enjoyed by their white peers in the 1950s.
[1.3] The very names of Billboard charts underscore this point. Since its inception in 1942, the magazine has had thirteen (!) different names for charts devoted to music by Black artists. What began as “The Harlem Hit Parade” became “Race Records” in 1945, “Rhythm & Blues” in 1958, “Soul” in 1965, “Black” in 1982, and “R&B” in 1990 before settling in 1999 on its current name of “Hot R&B/Hot Hip-Hop Songs”—itself an ambiguous category because a separate chart for “Hot Rap Songs” has also existed since 1989.(4) This shifting nomenclature is emblematic of the record industry’s perennial sidelining of Black artists, first as a means of overt segregation under the Race and R&B demonyms, and more recently as a catchall bin grouping artists and songs that have very little in common stylistically under the “Hot Black Singles” banner.(5) Using a “mainstream” Billboard chart from an era with such segregated lists will therefore perpetuate the racial biases and exclusionary tactics associated with these genre constructions.
[1.4] Historically, genre categories have also marginalized female artists. For example, in general, women are more likely to be associated with the pop genre and less likely to be understood as rap artists; a corpus of rap using similar categorization methods would disproportionately exclude women’s voices, undermining or marginalizing their presence in public discourse on hip-hop music. With respect to intersectionality, Tricia Rose writes that “the marginalization, deletion, and mischaracterization of women’s role in black cultural production is routine practice” (1994, 152). Johnson (2018) presents a compelling example: Spotify categorizes Cardi B’s “Bodak Yellow” (2017) not as hip-hop, but as pop. This assessment is likely due to Cardi B’s female gender.(6)
Discrimination in music theory and education
[1.5] These factors in popular music parallel the troubling links between pitch supremacy and racial supremacy throughout the history of music theory and music education more broadly. A comprehensive summary is well beyond this study, but writings by historical music theorists offer a straightforward example of this problematic trend. Archival work conducted by Anna Gawboy (2016) reveals how François-Joseph Fétis, a primary contributor to modern conceptions of tonality, upheld white Western European art music as superior to that of other cultures. Fétis specifically refers to harmony as a distinguishing musical feature when he writes “
[1.6] In “Music Theory and the White Racial Frame,” Philip Ewell challenges the field of music theory to reframe its scholarship in a way that “deal[s] forthrightly with issues of race, whiteness, and
[1.7] We embrace the recent push in music theory to decentralize pitch and harmony as a primary lens for music analysis, due in part to the historical problems described above. Pitch expertise has been leveraged to promote discriminatory musical attitudes and practices across history. Likewise, pitch-based musical parameters such as harmony have historically been regarded as the most legitimate music-theoretical pursuits. Our approach engages these issues in two ways: directly, through intentional and systematic artist sampling, and indirectly, by foregrounding texture and timbre as musical features that are potentially more inclusive of broader musical experiences and genres. By raising these issues, we are not accusing any individual corpus analysts of discrimination, racism, or misogyny. Our project is simply to raise awareness around particular latent issues in music-data analysis, and to suggest some ways to address these issues.
2. A note on demographic terminology and encoding methods
[2.1] The terminology we use in this paper attempts to balance efficiency and clarity with a commitment to equity and inclusion. Considering race, ethnicity, and gender in tandem invokes the concept of intersectionality. Originating with Kimberlé Crenshaw (1989), intersectionality describes “the complex, cumulative manner in which the effects of different forms of discrimination combine, overlap, or intersect, especially in the experiences of marginalized individuals or groups” (Wingfield 2019). Glenn (2002, 14) specifies that intersectional characteristics are inherently relative, but traits such as gender and race can be treated as “anchor points” even though they are not static. There will always be deficiencies when using singular terms to describe the ultimately fluid and intersectional identities of social demographics, but the following section offers our imperfect solutions.
[2.2] We use the adjective marginalized as our most generic term to refer to artists whose identities are subject to discrimination along the lines of race, ethnicity, and/or gender. Race and ethnicity are distinct demographic parameters that reflect societal and cultural nuances (Cornell and Hartmann 2006); however, racial and ethnic traits frequently overlap. When race and ethnicity are considered together in this paper, we use the acronym BIHAP (Black, Indigenous, Hispanic, Asian, and other People of Color).(8) Following Ewell (2020, n 1), we treat POC and BIHAP as similar terms, and as essentially indicating non-white. We prefer BIHAP as this acronym avoids defining race and ethnicity in deference to the hegemonic whiteness reflected in the term non-white.
[2.3] Regarding gender, we endeavor to use categories that embrace a fluid and non-binary understanding of these identities. Nonmale is our broadest term and applies to any artist who identifies as a gender other than male, including female and non-binary identities.(9) We also use the category of cisgender as indicating a gender identity that corresponds to an artist’s sex as assigned at birth. As with non-white, the terms nonmale and non-cis rely on an oppositional relationship to a male or cisgender identity for their definitions that is admittedly problematic; however, such categories do reflect the historical marginalization of non-cis-male gender identities. When we use the nouns women and men and the adjectives female and male, we mean to indicate artists who identify as that gender. Additionally, when discussing the work of other scholars, we do not modify their terms but take care to cite research that is similarly inclusive of a wide range of gender identities.
[2.4] Ideally, for demographic encoding we would have popular-music artists self-report their identities, like the Composer Diversity Database compiled by the Institute for Composer Diversity.(10) Unfortunately, such an approach is not possible given the artist population in our corpora. Instead, we follow the procedure outlined in Shea (2022), where demographic information about popular-music artists is obtained from online resources. Encoders were not permitted to encode variables based on visual evidence. Instead, variables were only included if they were made explicit by found sources, including artist webpages and biographical information on Wikipedia and the Notable Names Database. The database contains source citations for all artists. Updates or corrections to our artist demographic information can be submitted through an online form.(11)
3. Corpus Analysis I: Rolling Stone and the Social Biases of Popular Music
[3.1] Corpora that rely on critical acclaim reproduce the power structures driving that acclaim, a dynamic that is apparent in Rolling Stone Magazine’s “500 Greatest Songs of All Time.” Originally compiled in 2004, the list was substantially revised in 2021 (hereafter these lists are referred to as RS2004 and RS2021).(12) To create RS2004, “172 rock stars and leading authorities,” mostly unnamed, nominated and ranked songs via an unexplained point system. In an effort to invoke “a more expansive, inclusive vision of pop,” as well as increase the transparency of this process, RS2021 published both their methodology and the names of the singers and industry insiders involved in the list’s rankings (2021).
[3.2] These results resonate with historical and cultural accounts of rock music’s development in the mid-20th century. Prototypical “rock” music, particularly that from the 1960s and ’70s, has long been purported as the most “authentic” of all popular-music genres (Negus and Astor 2022). Sanneh (2004) describes this overt bias as rockism, or the widely held conception in scholarship and music criticism that music in the rock genre is more “authentic” and aesthetically superior to music in other popular genres. This construction of authenticity is often tied up in biases in favor of male gender identities (Auslander 2004, Reddington 2012). For example, a masculine “authentic” notion of rock is often contrasted with a more feminine “pop” (Coates 1997, Davies 2001). Furthermore, as Walser (1993) and Waksman (1999) note, ideals of virtuosic performance—especially on the guitar—were tied to expressions of hypermasculinity in mid-century popular music.(13) It is not surprising, then, that a list framed as the “Greatest Songs of All Time” would show a bias toward male representation in the mid-20th century. Considering the cultural capital associated with mid-century rock’s constructs of authenticity, it is also unsurprising that these lists overrepresent songs from the 1960s, ‘70s, and ‘80s. These forces combine to magnify sexism (with definitions of rock tied to masculinity) and prioritize rock’s ostensible authenticity in these lists. Lending new urgency to these concerns is the sexist and racist rhetoric of Rolling Stone Magazine co-founder Jann Wenner who recently defended his belief that only white men are rock’s “masters” (Marchese 2023).(14)
[3.3] The historical legacy of rockism is also associated with racial discrimination. As various commentators have noted (Redd 1985, Russell 1970, Stoia 2013), the mainstream of rock was initially distinguished from genres like rhythm and blues on racial grounds, with the former made by and for white people and the latter for and by Black people, regardless of the largely shared musical characteristics of the two genres. Notions of an “authentic” and “classic” rock mainstream are thus tied to the racial segregation of the genre’s origins. Importantly, Smith et al. (2020) have noted that the sexism and racism apparent in mid-century American music seems to have decreased in recent decades, as Black and female musicians have garnered increasing levels of mainstream commercial success. From this perspective, the difference in ethnic and racial distributions between pre-2000 and post-2000 songs can be traced to the particular exclusionary tactics inherent in rock’s founding and more broadly in the 20th-century music industry.
Example 2. Sample demographic analyses for top-ranking artists in Rolling Stone Magazine’s “500 Greatest Songs of All Time” lists (2004 and 2021 versions) marginalized by race, ethnicity, and or gender
(click to enlarge)
[3.4] To quantify the demographic constituency within each dataset, we adopted a methodology drawn from a mixture of health science and humanities research that records whether members of an ensemble express a marginalized racial, ethnic, or gender identity. Each song and artist is associated with a series of binary variables corresponding to each category: if an ensemble includes at least one nonmale member, for instance, the nonmale variable is set to 1, while in an all-male band that variable would be 0.(15) A similar procedure occurs for BIHAP artists.(16) The model also implements a primary status variable that considers artists’ agency within their ensemble: if either the title/lead member of the group or half or more of the group’s members are of historically marginalized backgrounds by race/ethnicity and/or gender the song receives this status. Example 2 offers a sample case of artist encoding for top-ranking songs in 2004 and 2021. Importantly, our approach to marginalization is based on larger demographics and histories of the United States via census data,(17) rather than the specific dynamics within some musical community or genre: non-white singers are considered minoritized because they make up a minority of the US population and because of the historical racism associated with that identity. Furthermore, our approach is simply one way of capturing demographic data, and we do not argue that it is the only way one might undertake such representation, nor that this method is without flaws and shortcomings.(18)
Example 3. Distribution of groups with at least one BIHAP member, with at least one nonmale member, and who yielded a primary status member
(click to enlarge)
[3.5] Example 3(19) shows the distribution of songs whose groups contain at least one BIHAP member, a nonmale member, and a primary member designation, with bars representing the ratio of artist identity within each list’s 500 songs.(20) In all categories, RS2021 contains more songs by marginalized artists than RS2004 (an increase of 3.4%, 12.6%, and 10.9% in each respective category).(21) Reflecting the purported commitment of the 2021 list towards artist diversity, the more recent list represents marginalized artists more often than the former.(22)
[3.6] How should we interpret these distributions? Are these the ratios we would expect from these datasets? Do they fall short of—or even exceed—some expected threshold of representation for historically marginalized identities within popular music? And how might these statistics relate to historical racism and sexism?(23) Example 3’s lines and error bars show one way of framing these statistics by reflecting an expected level of representation based on population statistics drawn from US census data. (Highlighted lines show population averages for each demographic category between 1950 and 2010; error bars show a 95% confidence interval using a binomial distribution around these ratios, which serve to provide a general idea of the precision of these measures.) If roughly half (50%) the population is female, then the expected number of female artists should be about 50%. The fact that only 15.6% and 28.2% of the RS2004 and RS2021 lists are nonmale dramatically deviates from this expectation. However, compared to these baselines, both Rolling Stone lists have more BIHAP artists than one might expect, with 48% and 51% of the lists’ constituent groups including at least one BIHAP member, but only 24% of the population identifying as BIHAP on the census during this time period.
Example 4. Song counts per half decade a) for BIHAP groups in the 2004 and 2021 lists, and b) for gender groupings in the 2004 and 2021 lists
(click to enlarge)
[3.7] Examples 4a & 4b approach this question from a more granular perspective, illustrating the distribution of songs within five-year periods covered by these lists.(24) Using these divisions, the differences between the lists lose their statistical significance: a chi-square test shows that most five-year portions shared by these lists do not have significantly different numbers of marginalized vs. non-marginalized groups between the two lists (the exception is 1995–1999: there are significantly more women in the later list than the earlier, X2(1) = 6.36, p = .012). In other words, even though our earlier analysis shows the RS2021 contains more artists of non-white and nonmale identities overall, the latter list does not contain significantly more marginalized artists in the years shared by the two lists, 1945–2004. The increase in marginalized identities in RS2021 that we observed in Example 3 is therefore simply due to the incorporation of newer (post-2004) songs.
[3.8] The horizontal lines and error bars in Example 4 show the expected distribution of marginalized identities with ratios drawn from the US population by decade. As in Example 3, all half-decades of both lists represent marginalized artists at higher ratios than in the population as a whole, while nearly all underrepresent nonmale artists. The exceptions occur in the post-1995 portions of RS2021: here, songs by nonmale artists and groups make up roughly half of the entries in these half-decades, more closely approximating population statistics. Notably, however, these half-decades are among the most sparsely populated with songs in the corpus. Far from being uniformly distributed, the lists most heavily represent the period between 1965–1980. Additionally, in terms of artist demographics, the highest proportional representation of all-white groups is in the 1970s, and the highest representation of all-male groups is between 1965–1980. These higher representations couple with the spikes in song counts to magnify the proportions found within this time span. We will return to the effects of this magnification below.
Example 5. The ratio of groups with at least one BIHAP and nonmale member; n=500 for each full corpus, n=47 for post-2004 RS data
(click to enlarge)
[3.9] Example 5 shows an additional and complementary way to frame representation in these lists, again demonstrating some interactions between time periods and identity.(25) The graphs show the marginalized artist ratios associated with songs and groups between 2004–2021, the intervening years between the two lists. Taking these postmillennial ratios as baseline expectations in the same way we considered census data in Examples 3 and 4, mid-century pop in the United States is significantly less representative than expected (p < .0001, again using a binomial distribution around the postmillennial ratios). In other words, if we calibrate our expectations based on the postmillennial portion of RS2021, the premillennial songs on both lists are vastly less representative of historically marginalized identities.
[3.10] These statistics show that these lists underrepresent nonmale artists and groups in all but the most recent decades. Race and ethnicity (combined in the RE variable) are represented at levels greater than the overall United States population, but there is much less midcentury music by artists and groups with marginalized identities than premillennial and postmillennial music. Furthermore, more recent decades—i.e., decades with significantly more representation of marginalized artists—are represented by relatively fewer songs compared to earlier decades.
[3.11] We began this section by claiming that corpora relying on critical acclaim will reproduce the power structures underpinning that acclaim, and that this connection between corpus statistics and social forces is clearly observable in Rolling Stone’s 2004 and 2021 lists of the “Greatest Songs of All Time.” The distributions of identity representation reflect historical racism and sexism, and as these discriminatory forces influence critical acclaim and historical fame, a corpus based on such criteria will replicate these forces. For instance, the pervasive sexism against nonmales in early rock is evident in the distributions of Examples 2, 3, and 4: these statistics reflect the social forces surrounding the music in that corpus.
[3.12] Counteracting these forces by simply adding more music from more diverse communities may introduce another confounding factor. For instance, RS2021 increased its representation of marginalized artists primarily by including music from recent decades. But, as several researchers have noted (Peres 2016; Duinker 2019; Barna 2019, 2020; White, Pater, and Breen 2022, postmillennial popular music has significantly different properties than earlier popular music. From this perspective, RS2021 could be adding songs with markedly different internal characteristics: music of a different style than premillennial popular music. In what follows, we discuss the thorny issue of creating corpora using internal consistency versus historical, biographical, and cultural groupings, and how this consistency might connect to issues of identity and marginalization.
4. Corpus Analysis II: A case study illustrating the productive tension between artist identity and musical construction of a corpus’s constituency
[4.1] Broadly, we can describe corpus construction around a genre as negotiating two forces, one human-facing and the other music-facing. A corpus based on the Rolling Stone “Greatest Songs Of All Time” lists is defined by the critical acclaim of a select group of specialists, an explicitly human-facing design: a song is in that corpus because some group of humans put it on a list. Similarly, a corpus using the Billboard charts is based on commercial success, another human-facing determination. In each instance the musical grouping purportedly being represented by the corpus is defined by some human activity. Here, the genre “popular music” contains songs that a certain group of consumers think of as “popular music,” or that a group of experts call “popular music,” or that a group of music producers label as “popular music.” In this instance, popular music is a human-facing category.
[4.2] However, musical corpora can also be oriented towards music-facing features, striving to express qualities such as internal musical consistency (White 2017). For tendencies and trends to be identifiable, the examples in a corpus must feature some kind of internal stability and be similar to one another in how they deploy their musical materials. The differences, for instance, between 1970s bands Black Sabbath and The Carpenters is not merely the difference in authorship, audience, and sociological roles, but a difference in the instrumentation, timbre, harmony, melody, rhythm, and affect in their musical materials. Here, these features constitute music-facing categories.
[4.3] Some corpus analyses seek to maximize both musical and cultural dimensions in their creation and interpretation of their datasets. When Ohriner (2019) identifies divergences in East Coast versus West Coast rap, for instance, his analyses triangulate the music-facing and the human-facing: he shows coherences within the East-coast and West-coast corpora as well as a distinction between them and couches these differentiations using the human and geographical forces associated with those datasets.
[4.4] These concerns additionally relate to concerns of overfitting. In corpus studies, stylistic trends generated by canonical artists become the benchmark against which other non-canonical works are evaluated: some small group of homogeneous pieces stand in for what is in reality a heterogeneous and diverse practice. London’s (2022) study of overfitting within music corpora, for example, demonstrates the tendency of theorists to use a small handful of composers and pieces to represent the wide range of practices within the sonata tradition. The same criticism could be levied at popular genera: using a handful of the “most famous” or “most popular” songs to represent a wider tradition risks eliminating an exploration of that tradition’s underlying diversity of practice.
[4.5] However, attempting to compensate for overfitting in an analytical model can also run the risk of underfitting: loosening the definition of a genre or musical style risks clouding the trends and norms within a dataset. A corpus with too much musical heterogeneity forfeits its ability to express statistical consistencies. These concerns can be directly connected back to notions of musical genre and identity: if musical expression is connected with identity—if artists from different backgrounds and social situations make music differently and rely on different musical norms—then expanding the social contexts that define a corpus might compromise an analyst’s ability to identify consistencies and norms within that dataset. Following this devil’s advocacy to its logical conclusion: if “rock” as a genre was mostly the property of white men in mid-century Anglo-America, then a “rock corpus” that expands its song selection outside of these demographics may be introducing music that sounds and behaves differently than music by mid-century white male rockers. In other words, it is possible that culturally diverse corpora run the risk of stylistic incoherence—of overcorrecting for London’s overfitting to the point of “underfitting” some musical tradition. In the following section, we undertake a corpus analysis of the stylistic variation present within the McGill-Billboard corpus to observe the extent to which harmonic practices correlate with the identities of the performing artists.
Billboard analysis
[4.6] The McGill-Billboard Corpus (Burgoyne 2012) consists of songs from the Billboard Hot 100 list of the most-played and most-purchased singles from 1958–1991. The following analysis is based on 734 songs that we demographically encoded using the method described in our earlier Rolling Stone analyses. We divided the corpus into two subcorpus groupings based on our encoding’s primary demographic variable, an indicator that either the founding, or most forward-facing musician, or a majority of the band’s members, were of a marginalized identity (race/ethnicity and/or gender). 299 songs had this primary designation, and 435 did not. Each song was assigned to the major or minor mode based on whether a major triad rooted on occurred more frequently than a minor tonic triad, or vice versa. By this metric, 627 songs were in the major mode.
[4.7] We measured the internal consistency of each subcorpus to assess whether groups with different identities tend to conform to or diverge from corpus norms in terms of features that have been prevalent in analyses of popular music: pitch structures and form. We analyzed three parameters for each song: 1) the scale degree of each chord root, 2) the unordered scale-degree sets, and 3) the formal zones used in the song. Unigram probability distributions for each parameter were calculated for each song (i.e., we calculated how often each chord root, scale-degree set, and formal zone were used).
[4.8] To test musical consistency, we used a measurement called cross entropy (Temperley 2007), a value that shows how well the events within an individual series are predicted by some broader set of expectations: in this instance, how well the characteristics of an individual song are predicted by the larger corpus. If the events of that song do not align with the expectations of the corpus statistics, the cross entropy will be high. In other words, if a song uses the same chords, chord roots, and formal zones as the overall corpus, the song will have a low cross entropy along these domains, and can be thought of as conforming to the mainstream practice of the corpus. Cross entropy calculates the probability that some event occurs in a corpus (for instance, how often a IV chord occurs in a corpus compared to all other chords that occur in that corpus) and uses that probability to assess the events in some individual song (for instance, the corpus’s IV-chord probability would be applied to all the IV chords in the song being analyzed). The equation below shows how the values associated with the larger corpus and the individual piece are combined together. Following the norms of information theory (White 2022), the broader corpus’s probability q of some event x is converted to a logarithm log(q(x)), and is combined with how often that event x occurs in the individual song (p(x)). Also, following convention, we report these values using a base-2 logarithm. Because major-mode statistics would generally not predict the events of minor-mode songs and vice versa, we only used major-mode pieces in calculating chord and chord-root distributions.
Example 6. The cross-entropy distribution between individual songs and the larger corpus in the all-white-male/not-all-white-male divisions of the McGill Billboard corpus
(click to enlarge)
[4.9] Results are shown in Example 6. The distribution of cross entropies of chord roots and formal zones differs very little between subcorpora.(26) These comparisons suggest that neither subcorpus represents the norms of the larger corpus more strongly or weakly: the types of chord roots and formal zones used are similar in groups with and without primary-member status.(27)
Example 7. The results of a k-means clustering on the chord frequencies in each song within the McGill-Billboard corpus, such that k = 3
(click to enlarge)
[4.10] Chord choice, on the other hand, does feature a significant difference between the two subcorpora. To investigate this difference, Example 7 shows the results of grouping each song in the major-mode portion of the corpus by its chord usage. These groups were produced by a k-means clustering algorithm, which divides a dataset into k number of clusters, each containing songs that use the same chords with similar rates of occurrence.(28) Using measurements of silhouette width (White and Quinn 2018), we identified three groupings that offer an optimal clustering of songs by the chords they use, suggesting three types of harmonic vocabularies. Example 7 shows the aggregate chord usage of each of these clusters, representing which chords occur proportionately more and less often in these harmonically derived groupings. The example also shows the relative proportions of songs with each usage type by the size of the pie charts. Progressing from left to right around Example 7, these are vocabularies that favor: 1) primarily diatonic triads, with secondary instances of modal inflection, 2) diatonic triads with a strong secondary representation of extended chords like sevenths and ninths, and 3) harmonies with increased modal elaborations and chord extensions. Counts for each cluster were 322, 292, and 13 songs, respectively.(29)
[4.11] Example 7 also shows that cluster 1 has far fewer songs by primary-status groups, while clusters 2 and 3 have far more. Overall, 41% of songs in the major-mode portion of the corpus had a primary member demographic designation; therefore, if these groups were evenly distributed between the different clusters, we would expect their representation to hover in the vicinity of 41%. Example 7 shows this is not the case: only 21% of cluster 1’s songs are by primary-status groups, while 61% and 85% of songs in clusters 2 and 3 are by these marginalized artists. A binomial distribution test is a test that quantifies the difference between an observed distribution and an expected distribution. The results of such a test indicate that these distributions all differ significantly from the expected 41%. In other words, primary-status groups are significantly less likely to use the harmonic vocabulary represented by the cluster associated with the most songs—mainly diatonic and triadic vocabulary—and are instead more likely to use harmonies featured in the less-used—and more harmonically adventurous—clusters.
[4.12] Provocatively, these identity-based differences reflect differences in genre and accompanying cultural and historical associations with these materials. For instance, artists associated with Motown and R&B (e.g., the Shirelles, Gladys Knight and the Pips, Robert Cray) are strongly represented in clusters 2 and 3, and the seventh- and ninth-based harmonies that distinguish these clusters are hallmarks of African diasporic music in America, especially 20th-century jazz (Geyer 2014) and gospel music (Shelley 2021). In contrast, the diatonic/triadic cluster 1 features mostly white rock/pop artists like John Denver, Bob Seeger, and Laura Branigan. The relationship between these clusters could explain why groups with primary member identities were predicted relatively worse by the corpus at large than songs by groups with non-primary identities. Songs by primary-status artists use a harmonic vocabulary both distinct from and less frequently used than the “mainstream” triadic/diatonic vocabulary. Again, identity, musical consistency, and even genre seem to correlate with one another in these songs’ harmonic languages. We return to the thorny interactions between musical consistency, identity, and genre below.
[4.13] While these brief analyses are not meant to prove such sociological speculations one way or the other, they do illustrate the tension between the external and internal components of corpus creation, and—more importantly—they show the utility of embracing that tension. Our cross-entropy analyses show that neither of our identity-based subcorpora diverged from the larger corpus norms more than the other. But our cluster analysis showed some musical variations that corresponded to musicians’ identities: two harmonic clusters were used more often by groups with no marginalized members, while four clusters were used more by groups with marginalized members. Furthermore, these differences may show influences from African diasporic music, and potentially illustrate how Black performers were positioned in popular music during the era covered by these Billboard charts. A corpus featuring only the most mainstream artists might miss these interwoven narratives of experimentation, stylistic borrowing, and musical development.
5. The Anti-Discriminatory Alignment System
[5.1] The following section outlines the Anti-Discriminatory Alignment System (ADAS), a method for better representing diverse voices who have contributed to some socially defined musical tradition. We illustrate our method by applying the ADAS to the Timbre in Popular Song (TiPS) corpus, which includes sub-corpora of songs from four popular genres in the decade 1990–1999: country, metal, hip-hop, and pop.(30) While we introduce the ADAS as a general approach to corpus construction that could be applied in any number of ways to any number of musical traditions, our specific implementation described here focuses on these four popular genres. Below, we review the sociocultural issues surrounding each genre, analyze the demographic statistics, and outline how the ADAS’s process acts on each genre-specific corpus.
[5.2] As shown in Example 1, our process has four steps. First, 1) build a parent corpus from pre-existing sources, and 2) encode and assess the demographics of each constituent song’s performers. Next, 3) determine a target demographic distribution for that corpus to use as a guide in 4) deriving the smaller child corpus. During this last step, songs by marginalized artists may be intentionally retained or added as needed to improve alignment with the target distribution.
[5.3] 1) Build a Parent Corpus. In our model, a parent corpus is a dataset or list of songs derived from some pre-existing, ecologically valid and/or socially constructed definition of a given musical style, genre, or tradition. Parent corpora could, for instance, be based on selections made by radio stations or trade magazines that self-identify as specializing in a particular genre, on sales figures, web-based genre designations, style-specific compilations, or any other explicit grouping of songs that share some stable designation. Our child corpora are based on parent corpora of 150–225 songs drawn primarily from Billboard lists, which rank songs from particular genres within particular chronological windows by commercial success. The available lists were compiled in different ways, tailored to best suit each genre, as described below.
Example 8. Sample demographic annotations from each genre
(click to enlarge)
[5.4] 2) Encode Demographics. In this phase, demographic variables associated with the anti-discriminatory goals of the analyst, here including gender, race, and ethnicity, are encoded for the artists associated with each song in the parent list.(31) In our implementation, we researched the identities and backgrounds of the members of each band represented in the parent corpus using publicly available information. Echoing our earlier analyses, songs were encoded with five binary variables: nonmale, non-cis, race, ethnicity, and primary. Each variable was encoded as zero if no members of the band were from the corresponding population. Again, as we described above, we combined race and ethnicity into a sixth race/ethnicity variable.(32) The “primary” variable was once again used to indicate whether the founding, forward-facing member(s), or majority of the band was of a marginalized identity (our definition of marginalized in this category comes from Smith et al. 2018).
[5.5] 3) Establish Benchmarks. The demographic distribution of the parent corpus can then be measured against some benchmark for representation. Benchmarks should reflect the priorities and goals of a particular corpus study. Our current study aimed to make our corpora more representative of the overall population of the United States during the chronological window represented by the corpus. We also opted to make this a relative rather than absolute goal: the minimum acceptable proportion for representation in each genre’s child corpus was established based on the diversity within its parent corpus, with more initial diversity resulting in higher targets than corpora with less initial diversity. We therefore set target proportions for gender and for race/ethnicity by calculating the geometric mean—an averaging function designed for proportions—between the proportion in the parent corpus and the relevant US Census data for that time period.(33)
[5.6] 4) Derive Child Corpora. We then trim the parent corpora to create child corpora that conform to (or at least approach) the target proportions. Given a sufficiently large parent corpus, a child corpus may be able to simply retain songs by marginalized artists so that the resulting dataset meets or exceeds the demographic benchmarks. If this is not the case, additional songs can be drawn from datasets adjacent to the parent corpus in order to adjust the child corpus’s population demographics. The origin of such adjacent datasets necessarily varies depending on the types of pre-existing lists available that are relevant to any given child corpus, but strategies might include selecting songs from alternatively curated lists, exploring genre-based lists designed to highlight marginalized identities, or drawing from lists that are closely related to the parent corpus. Examples of such strategies are described in further detail below with respect to specific genres in the TiPS corpus, but these do not represent the only options for identifying additional songs. Ultimately, the choice must be unique to each parent corpus; the researcher should be intentional in this matter, remaining mindful of the integrity of the corpus, and ensuring that the resulting research is transparent about the source(s) of the additional songs. Once songs from marginalized artists reach the targeted proportions, songs from non-marginalized artists are chosen from the parent corpus to fill in the remainder of the child corpus. When a demographic benchmark is met or exceeded in the parent corpus (e.g., if more than half of the groups in a parent corpus include nonmale artists), no adjustments are needed.
[5.7] In our current study, the child corpora each comprise 100 songs. We used the nonmale and race/ethnicity variables for each song in conjunction with gender and race/ethnicity pooled data from the US census to create our benchmarks. We selected songs evenly across the decade (10 songs per year, 1990–1999) to ensure uniform chronological distribution, and we limited each artist to five songs within a corpus to ensure that no single artist or group was overrepresented in any list. In our current implementation, because one of our goals is greater representation of marginalized identities, we exceeded benchmarks in some contexts (more detail is provided in the following sections).
6. Parent and Child Corpora in Four Genres
Example 9. An outline of the ADAS process as applied to four popular music genres
(click to enlarge)
[6.1] The following sections describe our use of the ADAS procedure to create datasets of four genre-specific corpora of popular music from 1990–1999. We review issues of diversity surrounding each genre and explain how the child corpora were derived from the parent corpora. The demographic statistics of the parent and child corpora, along with the target proportions, are shown in Example 9. The demographic data were encoded primarily by a team of graduate research assistants, who participated in a 90-minute training session and were provided with a detailed guide to our encoding protocol. The authors cross-checked the data.
Metal
[6.2] Metal has the lowest gender diversity of the four genres—metal artists are overwhelmingly male—and metal has long been noted for its problematic relationship with issues of gender and sexuality. The electric guitar as a phallic symbol (Walser 1993, Waksman 1999) is indicative of the hypermasculinity and gender bias that exists within the genre of metal and the role sexuality plays in this music. Walser observes that the “generic cohesion of heavy metal until the mid-1980s depended upon the desire of young white male performers and fans to hear and believe in certain stories about the nature of masculinity” (1993, 109–10), something that representations in popular media have historically reinforced (Jones 2018).(34) Compounding metal’s gender bias with issues of race, the genre has long been associated with whiteness. In fact, metal is often set in opposition to rap music as two forms of racially marked and socially deviant music (Binder 1993, Lynxwiler and Gay 2000).
[6.3] Corpus studies of metal are relatively rare compared with the other genres considered here. Most recently, Hudson (2021) analyzes the form of 195 metal tracks from twenty well-known albums. He draws on Elflein (2010, 357–59), who includes a list of the 114 most-cited albums in his study of the musical language of heavy metal, spanning 1969–1999. In terms of mainstream lists and other candidates for a parent corpus, no Billboard (or similar) charts dedicated to metal are readily available for the genre during the 90s. Consequently, we derived the metal parent corpus from four lists based on critical acclaim: 1) Metal Hammer’s list of the “100 Best Metal songs of the 90s” (Pasbani 2017), 2) Metal Insider’s list of the “Best 100 songs of the 90s” (Podoshen 2017, released in response to the Metal Hammer list), 3) Loudwire’s “10 greatest metal songs of the 1990s” (Hartmann 2018), and 4) Loudwire’s “Top 90 Hard Rock and Metal Albums of the 1990s” (Loudwire Staff 2020). Although this last source lists albums rather than songs, the list includes 2–4 notable songs per album, which were used in generating our parent corpus.
[6.4] These lists often include songs with considerable genre crossover or ambiguity. To address this, we developed a genre filtering method using both iTunes genre classifications and insider knowledge from two researchers on our team with extensive listening experience in metal and related genres, each of whom independently reviewed and classified each song by genre. We eliminated the songs that had been classified in a genre other than metal by at least one of the three sources. Prior to filtering for genre, the list included 289 unique songs; after genre filtering, the resulting parent corpus included 225 songs.
[6.5] Out of the 225 songs in the parent metal corpus, 15 (6.7%) include at least one female artist, and 64 (28.4%) include artists/groups with at least one BIHAP member. We prioritized songs by female artists in order to reach target proportions for gender. More generally, priority was also given to songs that appeared on multiple lists.(35) Example 9 outlines the sources used to supplement the parent corpus; for metal, the alternative lists that provided the additional songs were identified through internet searches and chosen based on careful consideration and discussion among research team members with relevant expertise. The remaining songs in the corpus were chosen to achieve a distribution of 10 songs per year. As prescribed by our methodology, the same artist/group is not represented more than five times. The child metal corpus includes 18 songs by female artists/groups with one or more female members, and 37 songs by artists/groups featuring at least one BIHAP member.
Hip-hop
[6.6] Hip-hop music presents a unique challenge among the four genres considered in this research. Hip-hop is a historically Black genre that has expanded to include artists of Latinx, white, and Indigenous descent. With respect to gender, however, hip-hop music—at least that which garners the most critical acclaim—skews heavily toward cis-gendered, straight males. While female, non-binary, and LGBTQ+ artists have become increasingly visible in hip-hop, they are nonetheless largely left out of historical discussions. For example, in Rolling Stone’s list of “100 Greatest Hip-Hop Songs of All Time” (2017), only 8 songs feature women rapping.
[6.7] This gender imbalance is also evident in sales and streaming figures. While commercial statistics are somewhat more objective than critical “best-of” lists, a variety of underlying factors appear to privilege male artists. As Johnson (2018) has suggested, female MCs might be more likely to be coded as “pop” instead of “hip-hop” on streaming platforms such as Spotify. The trope of hypermasculinization, especially in 1990s hip-hop, created commercial incentive for MCs to play up their masculinity, which further marginalized the contributions of women MCs (Rose 1994). Finally, the rap-sung collaborations that have come to dominate mainstream pop music typically involve male rappers and female singers, further entrenching these roles in mainstream hip-hop music (2021, 180; Duguay 2022; Duinker 2021).
[6.8] The past few years have given rise to several corpus studies of hip-hop music. Ohriner (2016, 2019), Condit-Schultz (2016), Duinker and Martin (2017), and Duinker (Duinker 2020) all feature different corpora, while Komaniecki (2019) and Connor (2018) use large repertoires to underpin their research (but do not frame these explicitly as corpora). In each study, however, the gender balance is heavily skewed towards male performers.(36) While this can be partly attributed to the sources used for these corpora, many of the authors mentioned here acknowledge the demographic diversity problems inherent in these sources.
[6.9] Our hip-hop parent corpus was generated from Billboard’s “Hot Rap Songs” chart. (We used lists of rap rather than hip-hop because Billboard did not distinguish hip-hop as a named genre in its Year-End lists until 2013.) All songs that reached #1 between 1990–1999 (inclusive) were compiled, totaling 163 unique songs.(37) Songs were encoded chronologically according to the first date they reached #1. To create the child corpus, all songs featuring nonmale artists were retained, and the remaining songs required to reach 10 per year were randomly selected from the male-artist songs. Three years contained fewer than 10 songs total; for these years, the highest-ranking song(s) from that year by a nonmale artist were added to reach the quota of 10 songs per year.(38) That is, rather than supplementing songs from a separate list in order to reach target demographics for gender, we were able to find relevant songs in the initial “Hot Rap Songs” list. Because the parent corpus consisted of all the #1 songs from this list, the entire “Hot Rap Songs” list thus acts as a kind of “grandparent” corpus in this case. The resulting child corpus contains 39 songs with nonmale artists.
Country
[6.10] Among the four genres considered in this study, country music has the fewest marginalized artists by a significant margin. As popular music historian Olivia Carter Mather observes, “Since its commercial beginnings in the 1920s, the country music industry has presented the genre as primarily white by marketing to whites, promoting white artists, and linking traditional instruments to white rustic stereotypes” (2017, 327). The racial qualifier “white” frequently appears in descriptions of country music in industry publications from its formative years (Hammond 2011, 15–16). Mather also documents the exclusion of non-white contributions to country music—“Most [historical] surveys define country as a commercialized entertainment whose primary characteristics have been inherited from the British Isles” (2017, 331)—despite its incorporation of elements drawn from black musical traditions, such as the use of the banjo (Dubois 2016) and 12-bar blues forms, as well as aspects of other musics such as Hawaiian slide guitar, yodeling, and the use of Latin American influences, especially from Mexico, on western swing (Cusic 1999).
[6.11] More recently, the problem of racial identity in country music was foregrounded by the treatment of Lil Nas X’s debut single “Old Town Road” (2019), which was removed from Billboard’s Hot Country Songs list in April 2019 on stylistic grounds for not being “country” enough (Burns 2020, 113). Only when a remix version featuring vocals by Billy Ray Cyrus—a remix which retained the original track and many of Lil Nas X’s original vocals—did “Old Town Road” again earn a place on the Hot Country Songs list. The irony of these events can be seen in the incorporation of rap and other hip-hop influences in the music of successful country groups like white artists Florida Georgia Line, who are generally understood as representative of the country genre today. Furthering this irony is the fact that the original version of “Old Town Road”—the one featuring only Lil Nas X—includes no rapping at all.
[6.12] Despite their musical similarities and common influences, rural musics by white artists and by artists of color have historically been segregated for marketing purposes, as mentioned in the introduction — “hillbilly” music for white audiences and “race” music for black audiences, a division supported by the segregationist laws passed by many states in the southern US in the late nineteenth century (Miller 2010, 2–3). By the 1940s, these genres had developed into country & western and rhythm & blues respectively. Billboard established a Hillbilly Hits chart in 1939, changed the name to Country and Western in 1949, and then to Country in 1962.
[6.13] With regard to gender, country music was heavily male-dominated until the rise of country pop, a genre which featured a larger proportion of women artists, in the 1970s. Representation of women in country music reached unprecedented levels in the 1990s but began to decline again in the 21st century. As researcher Jada Watson (2019a, 2019b) has demonstrated, music by artists who identify as women comprises only 10% of country radio airplay and top chart spots. According to Jocelyn Neal (2016), this proportion has varied historically from a low of 0% to a high of about 25%.(39)
[6.14] Our country parent corpus was compiled by assembling the top 20 “Hot Country Songs” from the Billboard Year-End charts for each year from 1990 to 1999, totaling 200 songs. 20.5% (41 songs) of artists/groups feature at least one nonmale member, whereas only 1.5% (3 songs) of artists/groups feature at least one member marginalized by race/ethnicity. We retained these songs in the child corpus. Given the paucity of non-white artists in the parent corpus, we also added 11 more songs by non-white artists. To identify supplemental songs for the country child corpus, we began by demographically surveying the top 25 Billboard country songs for each relevant year, plus several “best of” country lists found online. However, these methods did not suggest any new songs by nonwhite artists. Consequently, we then searched “country artists” combined with keywords including “Black,” “Indigenous,” “of color,” and “diverse” to identify 11 songs to add to the corpus. Of the 14 songs by artists of color in the country child corpus, seven songs charted in the top 20 on weekly charts; the other seven were lower-ranked, but mostly within the top 50 and all within the top 100. While one of these songs was a duet between a man and a woman, all of the other artists were men, so we added one more song by a woman artist, totaling 42 songs featuring women artists. The remaining songs for the child corpus were systematically selected for even distribution, totaling 10 songs for each year.
Pop
[6.15] As a genre, pop is associated with the musical mainstream but is not synonymous with it. For example, pop is often found in Billboard’s Hot 100 lists, but the Hot 100 does not consist of pop exclusively. Using iTunes genre classifications, Smith et al. (2020) found that 41.2% of 800 songs on Hot 100 Year-End Billboard charts (2012–2019) were labeled as pop. With the same method, we determined that 36.4% of the 500 songs from Billboard’s “Top Songs of the 90s” were classified as pop.(40)
[6.16] Though still often underrepresented, women artists may be better represented in pop compared to other genres. Smith et al. (2020) examined gender representation in the Hot 100 corpus described above, finding that 32.6% of the subgroup of songs classified as pop were by women artists, a notable increase from the average representation of 21.7% across all genres. Epps-Darling, Bouyer, & Cramer (2020) similarly calculated gender representation from Spotify streaming data. They observed that songs by nonmale artists accounted for only 21–24% of streams. Among these data, the genre of pop came closest to achieving equitable representation of women, with around 40% of artists including at least one woman group member. While electronica and R&B showed similar proportions, the majority of other genres displayed much greater gender inequalities.
[6.17] Although some published literature emphasizes the interaction between race and genre, particularly in relation to Black-coded genres such as R&B and hip-hop, there is relatively little information about how artists marginalized by race and/or ethnicity fare in pop music specifically. Lafrance, Scheibling, Burns, and Durr (2018) tracked race in Billboard Top 100 Single Sales charts and Airplay charts from 1997–2007 and found that black artists contributed 45% of the songs; however, they note that representation is tightly linked with the genres of R&B and hip-hop. Smith et al. (2020) found that among the pop songs in their corpus, 35% of songs were by artists from underrepresented racial/ethnic groups.
[6.18] While more opportunities may be present for women artists in pop, the genre itself has been framed by the sexism inherent in the binary between pop and rock, problematizing the situation. Music scholars have noted the problematic “feminine” associations of pop as a genre, particularly in relation to the “masculine” associations of rock (Coates 1997, Davies 2001, Schulze et al. 2019). In the context of this binary opposition, rock is privileged through its associations with authenticity and “art,” while pop is disparagingly associated with commercialism and mass culture. For example, songs by Black women may be more likely to be categorized as pop or R&B, even when they are stylistically aligned with hip-hop; see, for example, the earlier discussion of Cardi B’s “Bodak Yellow” (Johnson 2018). Representation in pop is also problematized by biased practices that limit opportunities for artists based on gender and race (see Donze 2011 for evidence that women and minority artists have access to a more limited range of commercially viable artist persona types). Furthermore, many available lists for potential popular-music corpora do not accurately represent the success of marginalized artists. For example, pointing specifically to the enthusiasm for woman artists manifested in popular culture in the late 90s, scholars have argued that this apparent upsurge of successful women in music is not reflected in the Billboard Top 40 charts (Lafrance, Worcester, and Burns 2011) or in the Billboard Top 200 Album charts (Wells 2001).
[6.19] Because of the issues surrounding pop as a distinct genre, our pop parent corpus derivation process began with Billboard’s “Top Songs of the 90s,” which includes 500 songs from a variety of genres. Song titles were filtered by genre using iTunes, resulting in a parent corpus of 182 songs. Within the parent corpus, 60.4% of the songs were by women artists or groups that included at least one woman, while 44.0% of the songs in the parent pop corpus were by artists/groups that included members marginalized by race/ethnicity. Because these proportions exceeded our demographic targets, we sampled the child corpus based on song ranking, selecting the top 10 ranking songs from each year of the decade. The final child corpus included 62 songs with nonmale artists and 45 songs with BIHAP artists.(41)
7. Corpus Analysis III: The Gendered Prechorus
[7.1] To show the potential utility of the demographic encodings in these corpora, Examples 10 and 11 analyze some aspects of form within these genres. Student encoders divided each song into formal zones, relying primarily on definitions from Open Music Theory Version 1’s “Form in pop/rock music” section (Shaffer et al. 2018). Information about the instrumentation, texture, and timbre used in that zone were also annotated; information that will be used in the forthcoming TiPS corpus study.(42)
Example 10. Distribution of prechorus sections in each genre overall
(click to enlarge)
[7.2] Example 10 shows the relative proportions of prechoruses within each genre. The example also shows the expected proportions of prechoruses if they were distributed uniformly among genres (i.e., 5% of the formal annotations in the four genres as a whole used prechoruses, and a binomial distribution was calculated around that percentage). The pop genre uses significantly more prechoruses than expected, while metal and hip-hop use fewer. This observation aligns with other research on this section: Summach (2011), for instance, identifies the prechorus as arising from phrase types used in ‘70s and ‘80s rock and pop, and flourishing in the top hits of the 1990s. Conversely, it makes sense that genres eschewing mainstream musical norms, like metal, or with distinct stylistic lineages, like hip-hop, feature comparatively fewer prechoruses. On the other hand, we expect a genre like country, which explicitly incorporated elements of pop during the 1990s (Neal 2021, Ross 2015), to use prechoruses fairly often.
Example 11. Distribution of prechorus sections divided by gender
(click to enlarge)
[7.3] However, prechoruses are not used equally often by all-male versus not-all-male groups. Example 11 shows the average number of prechoruses per song in each genre, divided into groups consisting only of men and those with at least one nonmale member. (The ratio was calculated as the number of times a prechorus was used in a genre, including repetitions, divided by the number of songs in that genre, both delineated by gender categorization.) The black lines show the windows of expected prechorus distributions if gender did not matter. In pop, all-male groups use more prechoruses, but this variation is within the window of expected variation around an average. However, in both metal and country, groups with nonmale artists use significantly more prechoruses than groups with all male members.(43)
[7.4] Again, this quantitative analysis potentially resonates with qualitative musicological analysis. As we noted above, pop music has been heavily gendered as female, so country artists who are women could have felt more pressure to adopt practices from the pop genre, especially as the country charts saw an increase in mainstream consumption in the 1990s (Neal 2021). Because metal has been so heavily dominated by male groups, women artists may have been less drawn to follow the formal norms of that corpus.(44)
[7.5] But while these connections are provocative, our central point is less about their substance and more about their existence: these are the types of trends that may be invisible in a corpus that does not promote marginalized voices. In other words, a more demographically complete representation of a musical tradition provides more opportunities for a corpus analysis to detect trends, influences, and sub-styles, especially those that intersect with the identities of the artists operating within that genre. In what follows, we return to the notion of genre itself, and argue that our approach leverages important aspects of that concept to support its anti-discriminatory priorities.
8. Reconsidering Genre
[8.1] Earlier in this paper, we described two forces behind corpus creation, with one human-facing and the other music-facing. While the latter strategy groups pieces by their musical similarities, the former groups pieces that a group of humans treat as similar. The latter approach might define a corpus of “country music” as a group of pieces with similar instrumentation, melodic construction, and harmonic usage, while the former approach could define that genre as music people listen to when they think they are consuming “country music.” We argue that if the social forces motivating this human definition include racist, sexist, or other exclusionary pressures, then the resulting corpus will replicate that discriminatory practice. Counteracting these practices in corpus creation is more accurately representative: by expanding the identities of the artists included in a corpus through the ADAS, an analysis will be better situated to identify the variation and diversity within a given musical practice, as we illustrated in our analysis of the interactions among genre, gender, and the prevalence of prechoruses.
[8.2] We end this essay by returning to the thorny topic of musical genre and the role it can play in an anti-discriminatory approach to corpus construction. We assert that notions of musical genre are mutable enough that our ADAS produces child corpora well within the boundaries of each dataset’s intended genre. Indeed, we believe that we are leveraging the flexibility of musical genres to support anti-discriminatory priorities while reflecting the wider diversity of practices within these genres.
The Utility and Futility of Genre
[8.3] As a social construct, a musical genre is not a fixed category.(45) Fundamentally, this mutability is because individual consumers contribute to the definition and constituency of a genre, and these consumers may all have slightly different understandings, preferences, and approaches to a single generic category (Gjerdingen and Perrott 2008, Brackett 2016). While songs within a genre must share certain stable musical characteristics in order for the genre to be recognizable to consumers—listeners need some way of distinguishing songs that are included and excluded from some genre (Dahlhaus 1982, Kallberg 1988)—different listeners will understand and interact with these songs and their structural characteristics differently. In this sense, genre is not only “in the music” but also in the minds and bodies of groups of people who share certain social conventions (Holt 2007, 2). These conventions are themselves created in relation to songs and artists and the contexts in which they are marketed, performed, and experienced.(46)
[8.4] Yet consumers have only limited control over those contexts. It is, after all, the record labels, trade associations, and the groups themselves that package and market the music, and they do so in ways designed to attract the attention and capital of particular groups of consumers. Genre, of course, plays a role in this scheme. For instance, through Billboard’s chart infrastructure, genre labels were and are largely a function of commercial maneuvering. As Simon Frith writes, genre is a means to “link a music to its market” (1996, 75). However, even a force as hegemonic as global capitalism can have variations and disagreements around the constituency of genres (Ritchey 2019), and nowhere is this more starkly apparent than in how genre definitions change over time (Brackett 2016). Genres usually assume their nascent forms in relation to a catalyzing context from which they are differentiated, gain their identities as outgrowths of other larger genres, and may even at some point drift back into the mainstream. For example, the meaning of “country music” in 1953 was different in 1975, and different again in 2024.(47) Furthermore, how we retrospectively apply the term to those years differs from how the term was applied then.(48)
[8.5] These observations paint a picture of “genre” as an amorphous and moving target—more a vapor than a solid. But even if the boundaries of a genre are shifting and permeable, the concept is still extremely useful for corpus analysis. We can still identify groups of songs that are treated in a consistent way by consumers and the music industry as subsets of some genre. Any observations concerning consistent musical schemata in that subset can inform a discussion of generic “norms” while stopping short of enshrining these observations as generic rules or criteria.(49) Corpora can thus be investigated for their shared musical characteristics, which might be read as identifiable markers for that genre by a particular group of people at a particular place and time.
[8.6] In our estimation, however, the many challenges surrounding genre are not a bug but a feature, which our ADAS exploits. On one hand, such a nebulous understanding of genre shifts emphasis away from generic criteria and rules (things music in a genre must do) and instead focuses on tendencies and norms (things music in a genre may do). On the other hand, such an approach can expose the soft borders around genre categories as they are currently imposed by mainstream labels, magazines, and streaming platforms. Given that many genre labels are inextricable from implicit racist undertones, using musical data to undermine the utility of these labels can help to subvert their ubiquity in commercial apparatuses.
[8.7] But most importantly, acknowledging the mutability of genre affords the freedom to embrace our anti-discriminatory priorities. If a genre can be instantiated in different ways, then that leeway allows us to focus on those ways that better express more equitable representation. Our analyses show that any definition of genre, and the related selection of a corpus’s constituency, is a political act insomuch as generic delineations imply the social forces supporting that boundary. We therefore imagine the ADAS as leveraging the flexibility inherent in musical genres in order to produce more socially responsible corpora.
[8.8] Finally, selecting a generic subset using anti-discriminatory tactics feeds back into the internal musical characteristics of a genre, and mitigates the “overfitting” problem in corpus creation around musical genres (London 2022). By invoking the ADAS, we intentionally move a corpus away from some singular, nuclear understanding of what it represents, and instead sample a wider variety of musical practices present within the genre. However, by not abandoning the concept of genre, we also guard against “underfitting” our data as well, as we tether our corpus selections to some shared musical tradition or practice. The trends and norms present within a corpus reflect this larger practice, rather than circumscribed—and potentially overfitted—statistical tendencies.
[8.9] In sum, the ADAS does not redefine musical genres or reimagine musical categories, but rather takes advantage of the natural ambiguity and flexibility of genre to steer corpus creation towards greater representation. The logic is: musical genres triangulate musical and cultural similarities, but these cultural similarities can change depending on individual consumer preferences, particular marketing strategies, historical context, and other variables. Additionally, as we have shown, this cloud of similar-yet-variable cultural preferences may be influenced by racist, misogynist, xenophobic, and other discriminatory stances, and therefore an anti-discriminatory corpus can select music from the generic nebula that still represents a subset of the corpus while minimally replicating the discriminatory posture of the cultural forces associated with the wider genre. This selection provides a better understanding of the internal musical characteristics of the genre by including the musical practices of a wider variety of humans who contributed to that tradition.
What the ADAS does not do
[8.10] Finally, it is important to note the limits of our approach. The ADAS, especially in our implementation, is an imperfect solution to a complex issue. Although confronting historical sexism and racism can be an important way of contending with the past, we do not pretend that the very attention of our analysis is not complicated in and of itself. In a similar vein, our method of constructing parent corpora still relies on notions of popularity and renown, and while we stand by the mitigating solutions we outline in this paper, we are nonetheless relying on the discriminatory past of these traditions in aspects of our corpus derivation. We are certainly not arguing that the ADAS is the only way to engage in socially responsible corpus construction; we only mean to outline a possible way to do so.(50)
[8.11] Crucially, there are research questions for which the ADAS would not be appropriate. As White (2022) argues, a musical corpus analysis is often imagined as the nexus of a feedback loop between communities of listeners and composers/producers, with listeners learning what musical norms to expect by being exposed to a musical corpus and composers exploiting the expectations of their audience for expressive purposes. In our conception of the ADAS, however, we are constructing a corpus as an array of music that could have been heard by an audience being exposed to a particular genre in the 1990s, and music that was written by songwriters attached to that genre in that decade. However, our corpora cannot claim to capture the most likely listening experience of a casual mainstream listener in the 1990s. Such an experience would have been defined by radio play, marketing, media coverage, etc.—all forces that include discriminatory elements. Instead, this experience would be best represented by a corpus of the most-popular, most-purchased, and most-respected music in some genre. In other words, our method is not appropriate for analysts studying the lived experience of listeners ensconced in a culture’s discriminatory forces. However, such an analysis would have to grapple with the inherent racism, misogyny, and other discriminatory forces that curated that experience.
[8.12] We intend our approach to be a tool to advocate for marginalized identities that is generalizable across musical repertoires. While our current focus is on popular music genres and their representations of artists with marginalized gender, racial, and ethnic identities, the ADAS can be modified to address other identities in other musical traditions. Indeed, we hope this essay will challenge the field of music theory to be more intentional and accountable when selecting musical datasets for research purposes. We view the ADAS as an important contribution to empirical music research. To our minds, even if an act of music analysis does not wear its politics on its metaphorical sleeve, the choice of objects for analysis and the selection of methods are implicitly political (Ewell 2020; Palfy and Gilson 2018). Our approach is one way to harness these implicit forces for anti-discriminatory purposes and socially responsible research.
Acknowledgements
This research has been supported by funding from the Analysis, Creation, and Teaching of Orchestration (ACTOR) Project, the Social Sciences and Humanities Research Council of Canada (SSHRC), and the Fonds de recherche du Québec - Société et culture (FRQSC).
Nicholas Shea
Arizona State University
50 E Gammage Pkwy
Tempe, AZ 85257
njshea@asu.edu
Lindsey Reymore
Arizona State University
McGill University
50 E Gammage Pkwy
Tempe, AZ 85257
LREYMORE@ASU.EDU
Christopher Wm. White
UMass Amherst
151 Presidents Dr
Amherst, MA 01003
cwmwhite@umass.edu
Ben Duinker
Schulich School of Music
McGill University
555 Sherbrooke St. West
Montreal QC, Canada
H3A 1E3
benjamin.duinker@mail.mcgill.ca
Leigh VanHandel
University of British Columbia
6361 Memorial Rd
Vancouver, BC V6T 1Z2
leigh.vanhandel@ubc.ca
Matthew Zeller
Musical Instrument Museum
4725 E. Mayo Blvd.
Phoenix, Arizona 85050
Matthew.Zeller@mim.org
Nicole Biamonte
McGill University
Schulich School of Music
555 rue Sherbrooke O.
Montreal, QC H3A 1E3
nicole.biamonte@mcgill.ca
Works Cited
Agawu, Kofi. 2022. “Music Studies in Crisis? Notes and Queries on Reframing Music Theory.” Society for Music Analysis Colloquium, online, March 31, 2022. https://www.sma.ac.uk/2022/01/music-studies-in-crisis-notes-and-queries-on-reframing-music-theory/.
Applebaum, Barbara. 2008. “White Privilege/White Complicity: Connecting ‘Benefiting From’ To ‘Contributing To?’” Philosophy of Education 64: 292–300. https://doi.org/10.47925/2008.292.
Arthur, Claire. 2016. “A Corpus Approach to the Classification of Non-chord Tones Across Genres.” Proceedings of the 14th International Society for Music Perception and Cognition, 74–76.
Arthur, Claire, and Nathaniel Condit-Schultz. 2023. “The Coordinated Corpus of Popular Musics (CoCoPops): A Meta-Corpus of Melodic and Harmonic Transcriptions.” In Proceedings of the 24th International Society for Music Information Retrieval Conference, 239–246.
Auslander, Philip. 2004. “I Wanna Be Your Man: Suzi Quatro’s Musical Androgyny.” Popular Music 23 (1): 1–16. https://doi.org/10.1017/S0261143004000030.
Barna, Alyssa. 2019. “Examining Contrast in Rock and Popular Music.” PhD diss., Eastman School of Music.
—————. 2020. “The Dance Chorus in Recent Top-40 Music.” SMT-V 6.4. http://doi.org/10.30535/smtv.6.4.
Biamonte, Nicole. 2010. “Triadic Modal and Pentatonic Patterns in Rock Music.” Music Theory Spectrum 32 (2): 95–110 https://doi.org/10.1525/mts.2010.32.2.95
—————. 2014. “Formal functions of metric dissonance in rock music.” Music Theory Online 20, no. 2.
Binder, Amy. 1993. “Constructing Racial Rhetoric: Media Depictions of Harm in Heavy Metal and Rap Music.” American Sociological Review 58 (6): 753–67. https://doi.org/10.2307/2095949.
Boyd, Clifton. 2020. “Being a Black Ph.D. Student Following George Floyd’s Murder.” Inside Higher Ed, June 11, 2020. insidehighered.com/advice/2020/06/11/black-phd-student-describes-having-balance-his-career-prospects-responding-racial.
Brackett, David. 2016. Categorizing Sound. University of California Press. https://doi.org/10.1525/california/9780520248717.001.0001.
Bradley, Adam, and Andrew Dubois, eds. 2010. The Anthology of Rap. Yale University Press.
Burgoyne, John Ashley. 2012. “Stochastic Processes and Data-Driven Musicology.” PhD diss., McGill University.
Burgoyne, John Ashley, Jonathan Wild, and Ichiro Fujinaga. 2011. “An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis,” in Proceedings of the 12th International Society for Music Information Retrieval Conference, ed. Anssi Klapuri and Colby Leider, 633–38.
Burns, Chelsea. 2020. “The Racial Limitations of Country-Soul Crossover in Bobby Womack’s BW Goes C&W, 1976.” Journal of Popular Music Studies 32 (2): 112–27. https://doi.org/10.1525/jpms.2020.32.2.112.
Chander, Aditya, and Ian Quinn. 2023. “The decline of harmonic schemata in popular music chord loops.” Proceedings of the 17th International Society for Music Perception and Cognition. https://easychair.org/publications/preprint/vRwsT.
Clifford-Napoleone, Amber R. 2015. Queerness in Heavy Metal Music: Metal Bent. Routledge. https://doi.org/10.4324/9781315851723.
Coates, Norma. 1997. “(R)evolution Now? Rock and the Political Potential of Gender.” In Sexing the Groove: Popular Music and Gender, ed. Sheila Whiteley, 50–64. Routledge.
Condit-Schultz, Nathaniel. 2016. “MCFlow: A Digital Corpus of Rap Transcriptions.” Empirical Musicology Review 11 (2): 127–47. https://doi.org/10.18061/emr.v11i2.4961.
Connor, Martin. 2018. The Musical Artistry of Rap. McFarland.
Cornell, Stephen, and Douglas Hartmann. 2006. Ethnicity and Race: Making Identities in a Changing World. Pine Forge Press.
Crenshaw, Kimberlé. 1989. “Demarginalizing the Intersection of Race and Sex: A Black Feminist Critique of Antidiscrimination Doctrine, Feminist Theory and Antiracist Politics.” University of Chicago Legal Forum 1989 (1): 139.
Cusic, Don. 1999. “Latin America and Country Music.” Journal of Popular Culture 33 (3): 39–47. https://doi.org/10.1111/j.0022-3840.1999.3303_39.x.
Dahlhaus, Carl. 1982. Esthetics of Music. Translated by William Austin. Cambridge University Press.
Davies, Helen. 2001. “All Rock and Roll is Homosocial: The Representation of Women in the British Rock Music Press.” Popular Music 20 (3): 301–19. https://doi.org/10.1017/S0261143001001519.
De Clercq, Trevor. 2017. “Interactions between Harmony and Form in a Corpus of Rock Music.” Journal of Music Theory 61, no. 2: 143–170.
—————. 2020. “The Musicians Behind the Monsters.” Popular Music Interest Group Panel Discussion, Society for Music Theory annual meeting (online). http://www.midside.com/presentations/declercq_2020_pmig_smt_text.pdf.
De Clercq, Trevor, and David Temperley. 2011. “A corpus analysis of rock harmony.” Popular Music, 30 no. 1, 47–70.
Devaney, Johanna. 2019. “Eugenics and Musical Talent: Exploring Carl Seashore’s Work on Talent Testing and Performance.” American Music Review 48 (2): 6.
Devaney, Johanna, Claire Arthur, Nathaniel Condit-Schultz, and Kirsten Nisula. 2015. “Theme And Variation Encodings with Roman Numerals (TAVERN): A New Data Set for Symbolic Music Analysis.” In Proceedings of the International Society of Music Information Retrieval (ISMIR), Málaga, Spain: 728–34.
Donze, Patti Lynne. 2011. “Popular Music, Identity, and Sexualization: A Latent Class Analysis of Artist Types.” Poetics 39 (1): 44–63. https://doi.org/10.1016/j.poetic.2010.11.002.
Dubois, Laurent. 2016. The Banjo: America’s African Instrument. Harvard University Press. https://doi.org/10.4159/9780674968813.
Duguay, Michèle. 2021. “Gendering the Virtual Space: Sonic Femininities and Masculinities in Contemporary Top 40 Music.” PhD diss., City University of New York.
Duguay, Michèle. 2022. “Analyzing Vocal Placement in Recorded Virtual Space.” Music Theory Online 28.4. https://mtosmt.org/issues/mto.22.28.4/mto.22.28.4.duguay.html.
Duinker, Ben. 2019. “Plateau Loops and Hybrid Tonics In Recent Pop Music.” Music Theory Online 25 (4). https://doi.org/10.30535/mto.25.4.3.
—————. 2020. “Diversification and Post-Regionalism in North American Hip-Hop Flow.” PhD diss., McGill University.
—————. 2021. “Song Form and Mainstreaming in Hip-Hop Music.” Current Musicology 107: 93–135. https://doi.org/10.52214/cm.v107i.7177.
Duinker, Ben, and Denis Martin. 2017. “In Search of the Golden-Age Hip-Hop Sound (1986–1996).” Empirical Musicology Review 12 (1–2): 80–100. https://doi.org/10.18061/emr.v12i1-2.5410.
Elflein, Dietmar. 2010. Schwermetallanalysen: Die Musikalische Sprache des Heavy Metal. Transcript Verlag. https://doi.org/10.1515/transcript.9783839415764.
Epps-Darling, Avriel, Romain Takeo Bouyer, and Henriette Cramer. 2020. “Artist Gender Representation in Music Streaming.” In Proceedings of the 21st International Society for Music Information Retrieval (ISMIR) Conference, Montreal QC: 248–54.
Ewell, Philip. 2020. “Music Theory and the White Racial Frame.” Music Theory Online 26 (2). https://doi.org/10.30535/mto.26.2.4.
Fabbri, Franco. 2004 (1982). “A Theory of Musical Genres: Two Applications.” In Popular Music: Critical Concepts in Media & Cultural Studies, vol. 3, ed. Simon Frith, 7–35. Routledge.
Fétis, François-Joseph. 1867. “Sur un nouveau mode de classification des races humaines d’après leurs systèmes musicaux.” Bulletin de la Société d’Anthropologie de Paris, February 21, 1867, 134–46.
Frith, Simon. 1996. Performing Rites: On the Value of Popular Music. Harvard University Press. https://doi.org/10.1093/oso/9780198163329.001.0001.
Gawboy, Anna. 2016. “The Non-Neutrality of Part Writing.” Presentation at Michigan State University, East Lansing, MI.
George, Nelson. 1982. “Black Music Charts: What’s in a Name?” Billboard 94 (25): 10, 43.
Geyer, Benjamin. 2014. “All Things Being Equal: The Problem of Reduction in Second Practice Jazz.” Jazz Interest Group, Society for Music Theory annual meeting, Milwaukee, WI.
Gjerdingen, Robert O. and David Perrott. 2008. “Scanning the Dial: The Rapid Recognition of Music Genres.” Journal of New Music Research 37 (2): 93–100. https://doi.org/10.1080/09298210802479268.
Glenn, Evelyn Nakano. 2002. Unequal Freedom: How Race and Gender Shaped American Freedom and Labor. Harvard University Press.
Hammond, Angela Denise. 2011. “Color Me Country: Commercial Country Music and Whiteness.” PhD diss., University of Kentucky.
Hartmann, Graham. 2018. “10 Greatest Metal Songs of the 90s (Year by Year).” Loudwire. Accessed May 5, 2022. https://loudwire.com/10-greatest-metal-songs-1990s-year-by-year/?utm_source=tsmclip&utm_medium=referral.
Hauck, Fern R., Kawai O. Tanabe, and Rachel Y. Moon. 2011. “Racial and Ethnic Disparities in Infant Mortality.” Seminars in Perinatology 35 (4): 209–20. https://doi.org/10.1053/j.semperi.2011.02.018.
Heetderks, David. 2023. “The ‘Rebuff Chorus’ in 1960–2000 Pop Music.” Gamut 11, no. 1.
Holt, Fabian. 2007. Genre in Popular Music. The University of Chicago Press. https://doi.org/10.7208/chicago/9780226350400.001.0001.
Hudson, Stephen S. 2021. “Compound AABA Form and Style Distinction in Heavy Metal.” Music Theory Online 27 (1). https://doi.org/10.30535/mto.27.1.5.
Huron, David. 2013. “On the Virtuous and the Vexatious in an Age of Big Data.” Music Perception 31 (1): 4–9. https://doi.org/10.1525/mp.2013.31.1.4.
Johnson, Thomas. 2018. “Analyzing Genre in Post-Millennial Popular Music.” PhD diss., City University of New York.
Jones, Simon. 2018. “Kerrang! Magazine and the Representation of Heavy Metal Masculinities (1981–95).” Metal Music Studies 4 (3): 459–80. https://doi.org/10.1386/mms.4.3.459_1.
Kallberg, Jeffrey. 1988. “The Rhetoric of Genre: Chopin’s Nocturne in G Minor.” 19th-Century Music 11 (3): 238–61. https://doi.org/10.2307/746322.
Kendi, Ibram X. 2019. How to Be an Antiracist. One World.
Komaniecki, Robert. 2019. “Analyzing the Parameters of Flow in Rap Music.” PhD diss., Indiana University.
Korzeniowski, Filip, and Gerhard Widmer. 2018. “Genre-agnostic key classification with convolutional neural networks.” In Proceedings of the 19th International Society for Music Information Retrieval Conference, 264–270.
Lafrance, Marc, Casey Scheibling, Lori Burns, and Jean Durr. 2018. “Race, Gender, and the Billboard Top 40 Charts Between 1997 and 2007.” Popular Music and Society 41 (5): 522–38. https://doi.org/10.1080/03007766.2017.1377588.
Lafrance, Marc, Lara Worcester, and Lori Burns. 2011. “Gender and the Billboard Top 40 Charts Between 1997 and 2007.” Popular Music and Society 34 (5): 557–70. https://doi.org/10.1080/03007766.2010.522827.
Lena, Jennifer C., and Richard Peterson. 2008. “Classification as Culture: Types and Trajectories of Music Genres.” American Sociological Review 73 (5): 697–718. https://doi.org/10.1177/000312240807300501.
Léveillé Gauvin, Hubert. 2015. “‘The Times They Were A-Changin’: A Database-Driven Approach to the Evolution of Harmonic Syntax in Popular Music from the 1960s.” Empirical Musicology Review 10, no. 3.
LoadedRadio. 2021. “The Top 13 Female Fronted Heavy Metal Bands of All Time As Voted By You.” LoadedRadio.com: The Hard Rock and Metal Station, October 19, 2021. https://www.loadedradio.com/the-top-13-female-fronted-heavy-metal-bands-of-all-time-as-voted-by-you/.
London, Justin. 2013. “Building a Representative Corpus of Classical Music.” Music Perception 31 (1): 68–90. https://doi.org/10.1525/mp.2013.31.1.68.
—————. 2022. “A Bevy of Biases: How Music Theory’s Methodological Problems Hinder Diversity, Equity, and Inclusion.” Music Theory Online 28 (1). https://doi.org/10.30535/mto.28.1.4.
Loudwire Staff. 2020. “Top 90 Hard Rock + Metal Albums of the ’90s.” Loudwire. Accessed May 5, 2022. https://loudwire.com/top-hard-rock-metal-albums-1990s.
Lynxwiler, John, and David Gay. 2000. “Moral Boundaries and Deviant Music: Public Attitudes Toward Heavy Metal and Rap.” Deviant Behavior 21 (1): 63–85. https://doi.org/10.1080/016396200266388.
Marchese, David. 2023. “Jann Wenner Defends His Legacy, and His Generation’s.” The New York Times, September 15, 2023, sec. Arts. https://www.nytimes.com/2023/09/15/arts/jann-wenner-the-masters-interview.html
Marsden, Alan. 2022. “Reliability and Validity of Research with Corpora of Music.” In The Oxford Handbook of Music and Corpus Studies, ed. Daniel Shanahan, John Ashley Burgoyne, and Ian Quinn. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190945442.013.7.
Mather, Olivia Carter. 2017. “Race in Country Music Scholarship.” In The Oxford Handbook of Country Music, ed. Travis Stimeling, 327–54. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190248178.013.8.
Miles, Scott A., David S. Rosen, and Norberto M. Grzywacz. 2017. “A statistical analysis of the relationship between harmonic surprise and preference in popular music.” Frontiers in Human Neuroscience 11: 263.
Miles, Scott A., David S. Rosen, Shaun Barry, David Grunberg, and Norberto Grzywacz. 2021. “What to expect when the unexpected becomes expected: harmonic surprise and preference over time in popular music.” Frontiers in Human Neuroscience 15: 201.
Miller, Karl Hagstrom. 2010. Segregating Sound: Inventing Folk and Pop Music in the Age of Jim Crow. Duke University Press. https://doi.org/10.2307/j.ctv125jq7b.
Moore, Allan F. 2001. “Categorical Conventions in Music Discourse: Style and Genre.” Music & Letters 82 (3): 432–42. https://doi.org/10.1093/ml/82.3.432.
Neal, Jocelyn. 2016. ‘Why “Ladies Love Country Boys”: Gender, Class, and Economics in Contemporary Country Music.’ In Country Boys and Redneck Women: New Essays in Gender and Country Music, ed. Diane Pecknold and Kristine M. McCusker, 21–43. University Press of Mississippi.
—————. 2021. “Country Music.” Grove Music Online. Oxford University Press. https://doi.org/10.1093/gmo/9781561592630.article.A2224075.
Negus, Keith, and Pete Astor. 2022. “Authenticity, Empathy, and the Creative Imagination.” Rock Music Studies 9 (2): 157–173. https://doi.org/10.1080/19401159.2021.1989272.
Nobles, Melissa. 2000. Shades of Citizenship. Stanford University Press. https://doi.org/10.1515/9780804780131.
Ohriner, Mitchell. 2016. “Metric Ambiguity and Flow in Rap Music: A Corpus-Assisted Study of Outkast's ‘Mainstream’ (1996).” Empirical Musicology Review 11 (2): 153–79. https://doi.org/10.18061/emr.v11i2.4896.
—————. 2019. Flow: The Rhythmic Voice in Rap Music. Oxford University Press. https://doi.org/10.1093/oso/9780190670412.001.0001.
—————. 2022. “Corpus Approaches to Hip-Hop.” In The Oxford Handbook of Music and Corpus Studies, edited by Daniel Shanahan, John Ashley Burgoyne, and Ian Quinn, 1st ed. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190945442.013.23.
Palfy, Cora S., and Eric Gilson. 2018. “The Hidden Curriculum in the Music Theory Classroom.” Journal of Music Theory Pedagogy 32. https://jmtp.appstate.edu/hidden-curriculum-music-theory-classroom.
Pasbani, Robert. 2017. “These Are The 100 Best Metal Songs Of The ’90s According To Metal Hammer.” Metal Injection. Accessed May 5, 2022. https://metalinjection.net/lists/these-are-the-100-best-metal-songs-of-the-90s-according-to-metal-hammer.
Peres, Asaf. 2016. “The Sonic Dimension as Dramatic Driver in 21st-Century Pop Music.” PhD diss., University of Michigan.
Podoshen, Jeff. 2017. “Here’s an Alternate ‘Best Metal Songs of the ’90s’ List.” Metal Insider. Accessed May 5, 2022. https://metalinsider.net/lists/heres-an-alternate-best-metal-songs-of-the-90s-list.
Raykoff, Ivan. 2013. Dreams of Love: Playing the Romantic Pianist. Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199892679.001.0001.
Redd, Lawrence N. 1985. “Rock! It’s Still Rhythm and Blues.” The Black Perspective in Music 13 (1): 31–47. https://doi.org/10.2307/1214792.
Reddington, Helen. 2012. The Lost Women of Rock Music: Female Musicians of the Punk Era. Ashgate.
Ritchey, Marianna. 2019. Composing Capital: Classical Music in the Neoliberal Era. The University of Chicago Press. https://doi.org/10.7208/chicago/9780226640372.001.0001.
Rolling Stone Magazine. 2004. “The 500 Greatest Songs of All Time.” Rolling Stone 963, December 9, 2004.
Rolling Stone Magazine. 2017. “100 Greatest Hip-Hop Songs of All Time.” Rolling Stone, June 2, 2017. https://www.rollingstone.com/music/music-lists/100-greatest-hip-hop-songs-of-all-time-105784/.
Rolling Stone Magazine. 2021. “The 500 Greatest Songs of All Time.” Rolling Stone, September 15, 2021. https://www.rollingstone.com/music/music-lists/best-songs-of-all-time-1224767/.
Ross, Marissa R. 2015. “Inside Country Music's Polarizing ‘Urban Cowboy’ Movement.” Rolling Stone, June 12, 2015.
Rose, Tricia. 1994. Black Noise. Rap Music and Black Culture in Contemporary America. Wesleyan University Press.
Russell, Tony. 1970. Blacks, Whites, and Blues. Stein & Day Publishing. Reprinted 2001 as Yonder Come the Blues: The Evolution of a Genre. Cambridge University Press.
Sanneh, Kelefa. 2004. “The Rap Against Rockism.” New York Times, October 31, 2004.
Schulze, Laurie, Anne Barton Whie, and Jane D. Brown. 2019. “‘A Sacred Monster in Her Prime’: Audience Construction of Madonna as Low-Other.” In The Madonna Connection, ed. Cathy Schwichenberg, 15–37. Routledge. https://doi.org/10.4324/9780429312403-3.
Sears, David R. W., and David Forrest. 2021. “Triadic patterns across classical and popular music corpora: stylistic conventions, or characteristic idioms?” Journal of Mathematics and Music, 15 no. 2, 140–153.
Shaffer, Kris, Bryn Hughes, and Brian Moseley. 2018. Open Music Theory, Version 1. Edited by Kris Shaffer and Robin Wharton. Accessed January 2, 2024. https://openmusictheory.github.io/popRockForm.html.
Shaffer, Kris, Esther Vasiete, Brandon Jacquez, Aaron Davis, Diego Escalante, Calvin Hicks, Joshua McCann, Camille Noufi, and Paul Salminen. 2019. “A cluster analysis of harmony in the McGill Billboard dataset.” Empirical Musicology Review 14, no. 3–4: 146–162.
Shea, Nicholas. 2019. “Descending Bass Schemata and Negative Emotion in Western Song.” Empirical Musicology Review 14, no. 3–4: 167–181.
—————. 2020. “Ecological Modes of Musical Structure in Pop-rock, 1950–2019.” PhD diss., The Ohio State University.
—————. 2022. “A Demographic Sampling Model and Database for Addressing Racial, Ethnic, and Gender Bias in Popular-music Empirical Research.” Empirical Musicology Review 17 (2): 49–58. https://doi.org/10.18061/emr.v17i1.8531.
—————. 2023. “‘Guitar Thinking’ and ‘Genre Thinking’ among an Online Community of Guitarists.” Practice-Based Research, IASPM Journal 13 (1). https://iaspmjournal.net/index.php/IASPM_Journal/article/view/1291.
Shelley, Braxton. 2021. Healing for the Soul: Richard Smallwood, the Vamp, and the Gospel Imagination. Oxford University Press. https://doi.org/10.1093/oso/9780197566466.001.0001.
Smith, Stacy L., Marc Choueiti, and Katherine Pieper. 2018. “Inclusion in the Recording Studio? Gender and Race/Ethnicity of Artists, Songwriters, and Producers across 600 Popular Songs from 2012–2017.” Featuring Ariana Case, Sylvia Villanueva, Ozodi Onyeabor, and Dorga Kim. Annenberg Inclusion Initiative. https://assets.uscannenberg.org/docs/inclusion-in-the-recording-studio.pdf.
Smith, Stacy L., Katherine Pieper, Hannah Clark, Ariana Case, and Marc Choueiti. 2020. “Inclusion in the Recording Studio?” Annenberg Inclusion Initiative. https://assets.uscannenberg.org/docs/aii-inclusion-recording-studio-20200117.pdf.
Stoia, Nicholas. 2013. “The Common Stock of Schemes in Early Blues and Country Music.” Music Theory Spectrum 35 (2): 194–234. https://doi.org/10.1525/mts.2013.35.2.194.
Summach, Jay. 2011. “The Structure, Function, and Genesis of the Prechorus.” Music Theory Online 17 (3). https://doi.org/10.30535/mto.17.3.2.
Tan, Ivan, Ethan Lustig, and David Temperley. 2019. “Anticipatory syncopation in rock: A corpus study.” Music Perception 36, no. 4: 353–370.
Temperley, David. 2007. Music and Probability. The MIT Press. https://doi.org/10.7551/mitpress/4807.001.0001.
—————. 2011. “The cadential IV in rock.” Music Theory Online 17, no. 1.
—————. 2018. The musical language of rock. Oxford University Press.
Temperley, David, and Trevor de Clercq. 2013. “Statistical analysis of harmony and melody in rock music.” Journal of New Music Research 42, no. 3: 187–204.
VanHandel, Leigh, and Tian Song. 2010. “The Role of Meter in Compositional Style in 19th Century French and German Art Song.” Journal of New Music Research 39 (1): 1–11. https://doi.org/10.1080/09298211003642498.
Waksman, Steve. 1999. Instruments of Desire. Harvard University Press.
Walser, Robert. 1993. Running with the Devil. Wesleyan University Press.
Watson, Jada. 2019a. “Gender on the Billboard Hot Country Songs Chart, 1996–2016.” Popular Music & Society 42 (5): 538–60. https://doi.org/10.1080/03007766.2018.1512070.
—————. 2019b. “Gender Representation on Country Format Radio: A Study of Published Reports from 2000–2018.” SongData Reports. https://songdata.ca/wp-content/uploads/2019/04/SongData-Watson-Country-Airplay-Study-FullReport-April2019.pdf.
Weinstein, Deena. 2016. “Playing with Gender in the Key of Metal.” In Heavy Metal, Gender and Sexuality: Interdisciplinary Approaches, ed. Florian Heesch and Niall Scott, 11–25. Routledge.
Wells, Alan. 2001. “Nationality, Race, and Gender on the American Pop Charts: What Happened in the ‘90s?” Popular Music & Society 25 (1–2): 221–31. https://doi.org/10.1080/03007760108591794.
White, Christopher Wm. 2013. “Some Statistical Properties of Tonality, 1650–1900.” PhD diss., Yale University.
—————. 2017. “Locating Emergent Creativity with Similarity Metrics.” Journal of Creative Music Systems 2 (1). https://doi.org/10.5920/JCMS.2017.13.
—————. 2021. “Deployments of Change and Novelty in a Corpus of Popular Music.” Proceedings of the Future Directions of Music Cognition International Conference, 83–87.
—————. 2022. The Music in the Data: Corpus Analysis, Music Analysis, and Western Tonal Practice. Routledge. https://doi.org/10.4324/9781003285663.
White, Christopher Wm, Jeffrey Fulmer, Brian Cordova, Alexandria Black, Chloe Danitz, William Evans, Aidan Fischer, Rashaad Greene, Jinhan He, Emily Kenyon, Joan Miller, Madeline Moylan, Abigail Ring, Emily Schwitzgebel, and Yatong Wang. 2021. “A new corpus of texture, timbre, and change in 20th-century American popular music.” Proceedings of the Future Directions of Music Cognition International Conference, 88–90.
White, Christopher William, Joe Pater, and Mara Breen. 2022. “A Comparative Analysis of Melodic Rhythm in Two Corpora of American Popular Music.” Journal of Mathematics and Music 16 (2): 160–82. https://doi.org/10.1080/17459737.2022.2075946.
White, Christopher Wm., and Ian Quinn. 2016. “The Yale-Classical Archives Corpus.” Empirical Musicology Review 11 (1): 50–58. https://doi.org/10.18061/emr.v11i1.4958.
—————. 2018. “Chord Context and Harmonic Function in Tonal Music.” Music Theory Spectrum 40 (2): 314–37. https://doi.org/10.1093/mts/mty021.
Wingfield, Adia Harvey. 2019. “Definition of INTERSECTIONALITY.” In Merriam-Webster. Accessed June 9, 2022. https://www.merriam-webster.com/dictionary/intersectionality.
Footnotes
1. Studies using the McGill Billboard corpus include Arthur 2016, Arthur and Condit-Schultz 2023, Burgoyne 2012, Burgoyne et al. 2011, Chander and Quinn 2023, Heetderks 2023, Korzeniowski and Widmer 2018, Léveillé Gauvin 2015, Miles et al. 2017 and 2021, Sears and Forrest 2021, Shaffer et al. 2019, White 2021, and White 2021. Studies using the Rolling Stone corpora include Arthur and Condit-Schultz 2023, Biamonte 2014, De Clercq 2017, De Clercq and Temperley 2011, Sears and Forrest 2021, Tan et al. 2019, Temperley 2011 and 2018, Temperley and de Clercq 2013, White and Quinn 2018, and White, Pater, and Breen 2022.
Return to text
2. The problem of sampling bias in music corpus studies is discussed in several articles on corpus methodologies: Huron (2013), London (2013), and Marsden (2022). Among music corpus studies, Duinker (2020, 62–63) and Ohriner (2019, 34–39, and ) consider gender representation, but for the most part such issues are treated only tangentially.
Return to text
3. See London (2022) for an in-depth discussion on musical diversity and overfitting in corpus studies. Examples throughout this paper adopt a statistical approach to studying a corpus, but musical analysis of corpora is not necessarily constrained to rigorous empirical techniques.
Return to text
4. Confusion surrounding the terms “hip-hop” and “rap” as genre labels has persisted since music featuring rapping has been commercially available. It is not clear why certain songs appear on Billboard’s “Hot Rap Songs” chart while others appear on “Hot Hip-Hop/R&B,” but the labels themselves are problematic. For example, “Hotline Bling” (Drake, 2015), a song that includes no rapping whatsoever, spent 18 weeks at number 1 on the “Hot Rap Songs” chart and received the 2017 Grammy for Best Rap Song. The first (and to date only) song to outlast “Hotline Bling” at number one on this chart, “Old Town Road” (Lil’ Nas X ft. Billy Ray Cyrus, 2019), also has no rapping. This issue will be explored more fully in [6.11].
Return to text
5. Not long after the “Hot Black Singles” chart’s inception, music critic Nelson George defended the inclusiveness and versatility of the name, writing that “semantics, particularly in relation to music, is a complicated maze, where one word connotes racism, another a musical genre, another an ethnic group depending on who you are, your background and your politics. Considering the music covered in the chart, ‘black’ says it all” (1982, 43).
Return to text
6. See Johnson 2018, 154.
Return to text
7. Our thanks to Anna Gawboy for sharing her materials.
Return to text
8. We utilize BIHAP over BIPOC (Black, Indigenous, and People of Color) due to its greater inclusivity of Asian and Pacific Islander identities. The BIHAP acronym was derived from the Oregon Health & Science University website (https://www.ohsu.edu/inclusive-language-guide).
Return to text
9. Non-cis identities can include but are not limited to intersex, non-binary, third gender, transgender, two spirit, transgender man, and transgender woman.
Return to text
10. The Composer Diversity Database can be accessed at https://www.composerdiversity.com/composer-diversity-database.
Return to text
11. Requests to update artist demographic information can be made at: https://forms.gle/7JUREswFJ5UjxuR1A.
Return to text
12. RS2004 was slightly updated in 2010 to include several post-millennial hits.
Return to text
13. Raykoff 2013 observes that the connection between virtuosity, performance, and masculinity is not limited to guitar-based rock.
Return to text
14. According to Wenner, women artists and black artists are not “articulate” enough to be included in his pantheon. See “Jann Wenner Defends His Legacy, and his Generation’s” in The New York Times: (https://www.nytimes.com/2023/09/15/arts/jann-wenner-the-masters-interview.html).
Return to text
15. See Shea 2022 for a full outline of the process; our dataset is posted here: https://tinyurl.com/mtocd.
Return to text
16. Categories for racial and ethnic marginalization are derived primarily from those used by The Music Coalition of the Annenberg Inclusion Initiative at the University of Southern California (Smith et al. 2018) as well as researchers in the health sciences (Hauck et al. 2011). The model similarly considers any gender identity other than male as subject to marginalization, including artists who identify as women, transgender, and/or non-binary. Artists’ demographic data for this analysis were encoded by a team of undergraduate research assistants. The authors of this study then reviewed the research assistants’ work to ensure accuracy. As a final step, we cross-validated our demographic data with that available in Shea (2022).
Return to text
17. In using census categories, we follow Smith et. al (2018, 2020). Although these categories provide a convenient operationalization and serve our goal to adjust proportions toward population data, we also recognize that racial and ethnic traits often overlap; for example, Nobles (2000) outlines their arbitrary and often harmful distinction in the US Census. For this reason, we have chosen to combine race and ethnicity into a single race/ethnicity variable for analysis (also referred to using the term BIHAP; see also section 7.4).
Return to text
18. In a 2022 address to the Society for Music Analysis, Kofi Agawu considered the geocultural limitations of Ewell’s call to dismantle music theory’s white racial frame. Agawu argued that the lack of racial and ethnic diversity in the discipline of music theory is largely a North American problem, asserting that “‘Music theory is white’ will not survive as a global claim.” Our paper has similar geocultural limitations with its focus on Anglo-American popular-music artists with marginalized identities.
Return to text
19. Here and elsewhere in the paper US census data is used for comparison purposes only. Note that not all the artists in our corpora are from the United States.
Return to text
20. Double asterisks denote a significant difference between RS2004 and RS2021 at p < .01. Lines and error bars indicate the ratio of each demographic constituency according to the US census, 1950–2020; error bars are drawn from an expected binomial distribution using the count of each list (n = 500).
Return to text
21. Artists with more than 10 songs on each version of the list include The Beatles (2004, n = 23; 2021, n = 12), The Rolling Stones (2004, n = 14), Bob Dylan (2004, n = 12), and Elvis Presley (2004, n = 11). By song count, Chuck Berry is the most-represented BIHAP artist (2004, n = 6) and Aretha Franklin is the most-represented nonmale artist (2004 & 2021, n = 4).
Return to text
22. The differences between the two lists are statistically significant in terms of gender and primary representation: using the proportions of songs within one list to predict the distributions in the other list using a binomial test resulted in a significant difference (in each, p < .001). The difference between groups with at least one BIHAP member between the two lists was close to significance (p = .06).
Return to text
23. If you flip a coin 500 times, you would not be surprised if your results produced 251 heads, or even 260 heads. This outcome does not deviate from your expected baseline. But you would wonder about the coin’s reliability if you flipped 280 heads. Binomial (also known as Bernoulli) distributions act on this principle of establishing a baseline window of expectation around some ratio and observing how some dataset performs in relation to these expectations.
Return to text
24. Horizontal lines show the demographic average using decadal data from the US census; error bars show binomial distributions around that average using the counts of all songs in each half decade.
Return to text
25. Error bars use 95% of the probability mass of a binomial distribution using ratios within the post-2004 data.
Return to text
26. Boxes indicate positive and negative quartiles around the average, which is indicated by the horizontal line; the median is shown by the x, and the extended “whiskers” from each box show further quartiles; dots are outliers. Asterisk indicates a difference in the average of p < .01 according to a two-sided t-test. t(625) = 6.00, p < .01.
Return to text
27. This analysis shows one way of dividing the corpus by identity, which was chosen because it cordons the corpus into two roughly equal groups. However, divisions with greater identity differences, such as comparing all-white groups to all-Black groups, could present greater musical differences. The demographic model is currently not equipped to make such fine-grained distinctions based on subcategories of race or ethnicity, but is a potential avenue for future work.
Return to text
28. Clusters are grouped by their songs’ groups’ demographic constituencies, boxed in red, with a) showing songs by groups that are not designated as having primary members with marginalized identities and b) by groups whose primary members have marginalized identities. Double and single asterisks show significant differences in the expected distribution of primary-status groups from the observed distribution at p < .01 and p < .05, respectively, according to a binomial distribution test.
Return to text
29. The model currently takes a comprehensive approach to harmonic diversity across the corpus. However, we could further divide harmonic practice by decade in order to more fully understand changes in harmonic practice over time, as we do not believe music by non-white artists is homogenous over time.
Return to text
30. Although the current essay focuses on the constituency of these corpora, they were compiled for the purpose of studying timbre and texture in relation to form in popular music at the turn of the 20th century.
Return to text
31. A swath of other characteristics and identities are also associated with historical marginalization, including sexuality, disability, religious affiliation, level of education, and/or age. We do not consider these during the sampling procedure, however they were frequently documented during the demographic encoding procedure and thus are accessible through the OSF repository.
Return to text
32. Our BIHAP variable indicates whether one or more group members was of a marginalized racial or ethnic identity and the group/artist satisfied the primary-status condition. Our definition of marginalized in this context comes from Smith et al. (2018), including the categories of Black/African American, Hispanic/Latino, Asian, Native American/Alaska Native, Native Hawaiian/Pacific Islander, Middle Eastern, or Other/Mixed Race.
Return to text
33. For instance, women usually comprise roughly 50% of the United States’ population. If 10% of a particular parent corpus’ songs include nonmale artists, then our goal would be to create a child corpus in which at least 22% of the songs were by nonmale artists or groups including nonmale artists, because the geometric mean of 10% and 50% is 22%. It should be noticed that, while not reflective of international (i.e., non US) artists, we chose to use US census data as a benchmark because it is readily available.
Return to text
34. However, as Clifford-Napoleone (2015, 3) contends, “Metal is not, and never has been, all about the straight boys.” Indeed, Walser (1993, 124–36) situates androgyny in metal performance practice within the male hegemony, while Weinstein (Weinstein 2016, 14–15) considers whether “deconstructed masculinity” in metal involves straight-gay or masculine-feminine binaries.
Return to text
35. For example, we added two songs by Lita Ford, a ciswoman guitarist and vocalist (LoadedRadio 2021) and retained songs by the band Melvins, featuring female bassist Lori Black. We found no single chart that listed more than one nonmale metal musician from the 1990s.
Return to text
36. The corpora developed by Ohriner, Condit-Schultz, Duinker, and Martin derive from commercial and critical lists, each with their own biases. Ohriner (2019) and Duinker (2020) both acknowledge the gender imbalance that results from passively using external sources for corpus development, but neither author modified their corpus to rectify this. Komaniecki acknowledges the gender imbalance with the unsubstantiated claim that “men far outnumber women in hip-hop” (2019, 18). Connor (2018) does not mention gender.
Return to text
37. The song “Satisfy You” was excluded from the list due to its featuring R. Kelly, currently incarcerated on numerous sex-related charges. This methodological decision runs counter to de Clercq (2020), who argues that contributions by collaborators are lost when title artists are excised on moral grounds. Agawu (2022) similarly advocates for making known “more fully” the problematic aspects of researchers, composers, and performers rather than “cancel” them. However, we view this artist as a special case, given that much of his music has been removed from publicly available music services (e.g., Spotify and YouTube) and he has been prosecuted for sexual crimes involving minors.
Return to text
38. These songs are “U.N.I.T.Y.” (Queen Latifah, 1994), “Give it 2 You” (Da Brat, 1995), “I’ll Be” (Foxy Brown feat. Jay-Z, 1997), and “Not Tonight” (Lil’ Kim feat. Da Brat, Left Eye, Missy Elliott, and Angie Mar). “Give it 2 You” reached #3 on the “Hot Rap Songs” Chart; the rest of the songs reached #2.
Return to text
39. Neal notes that “the gender disparity is most striking not among the few voices of the superstars, but rather in the ranks of moderately successful singers with a few records and a handful of hits, where the men outpace the women in staggering numbers. In other words, the dominant voice of commercial country music as a whole is unarguably male” (Neal 2016, 6).
Return to text
40. Billboard also offers ranking lists specifically dedicated to pop, including Pop Airplay and Adult Pop Airplay; these were not used in the current study because they did not span our target decade, having been initiated in 1992 and 1996, respectively.
Return to text
41. Of the 62 songs with female artists, 26 (42%) of the songs were by women who were also BIHAP. Similarly, among the 38 songs by male artists, 19 (50%) were by male BIHAP artists. Thus, the song collection appears to be relatively well-balanced in terms of broad intersections between gender and race/ethnicity, although we recognize that not all BIHAP categories are necessarily well represented.
Return to text
42. All initial timbral and formal annotations have been completed. Approximately 50% of all songs have been cross-checked by another analyst at the time of this paper’s submission.
Return to text
43. No significant difference in prechorus usage was found for musicians of BIHAP identity.
Return to text
44. Alternatively, these differences could represent biases on the part of our annotators—women-led groups might elicit an annotator to assign a formal zone more associated with pop music. We took steps to mitigate such biases by training annotators via a standardized style guide and cross validating their annotations.
Return to text
45. Throughout this paper, we have acknowledged the ambiguity between the concepts of “style” and “genre.” Allan Moore (2001) is one of several scholars who propose a distinction between style and genre that aims to clarify divergent uses of the latter term. For Moore, style refers to the manner of articulation of musical gestures, while genre refers to the identity and context of those gestures. Moore’s distinction is predicated on orientation: style is hierarchical, but genre is not. To illustrate, we might compare two genres discussed in this paper: country and metal. At a very general level, some common stylistic attributes of these genres emerge. Both involve the regular use of guitar, bass, and drums, for example. Even the musical roles of some of these instruments remain constant: drummer as primary timekeeper, bass as foundation to guitar-based harmonic material. But on a more specific level, divergences appear. In metal, the guitar sound is often distorted, which is rare in country, and the harmonic languages are different. Reflecting the hierarchical nature of these aspects of style, while the drummer-as-timekeeper aspect holds constant, the meters in which the drummer commonly plays might not, nor the types of drums and cymbals normally used. Because genre involves more than just musical characteristics, treating them hierarchically makes little sense. Instead, the generic commonalities between metal and country, for example, could be shown in a Venn diagram, where overlapping generic conventions exist side-by-side with characteristics unique to each genre (but that might overlap with yet another genre). In an empirical study of popular music focusing on strictly musical parameters, as long as these are understood as only one dimension of genre, the terminology used to encompass them—style, structure, “formal rules” (after Fabbri 2004 [1982])—is less important.
Return to text
46. Gjerdingen and Perrott’s (2008, 95) axiom that “the customer is always right” with respect to genre rightly foregrounds its reification through individual musical tastes. And MIR (music information retrieval) algorithms that use listener preference data have become a powerful vehicle not only for tastemaking on streaming platforms, but also for systematically using individual data to inform the meaning of genre labels.
Return to text
47. The genre of country traced a complicated path through the 20th century. What began as hillbilly music was supplanted by the glossy veneer of the 1950s “Nashville Sound” country, then by the rough-hewn rebelliousness of outlaw country in the 1960s and 1970s, and eventually by a return to mainstream smoothness spearheaded by the “urban cowboy” movement. The various musics described as “country” over the past century have, perhaps more than any other genre, occupied remarkably different aesthetic positions. See Neal 2021 for a more in-depth discussion.
Return to text
48. Regarding the lack of consistency within genres over time, Lena and Peterson (2008) propose a system (AgSIT) situating them along a chronological trajectory with four discrete phases: Avant-garde (Ag), Scene-based (S), Industry-based (I), and Traditionalist (T). The Avant-garde phase describes the beginnings of a genre, before its codes of behavior are fully cemented or any true, positive exemplar of the genre emerges. Scene-based genres exist on a localized level, with increasingly solid codes of production and behavior as well as a concentrated locus of orientation. Industry-based genres simplify these codes as the genre expands to a greater area, consistent with the reach of the commercial apparatus supporting it. And finally, Traditionalist genres are more concerned with preservation and canonization.
Return to text
49. Determining genre-wide norms is Condit-Schultz’s stated aim in his corpus study of hip-hop music (2016).
Return to text
50. For example, lyrics by emcees featured in The Anthology of Rap (Bradley and Dubois 2010) were selected by an advisory board on the basis of historical impact and lyrical-artistic merit. The board made an effort during the selection process to foreground the contributions of female emcees on the basis of the imperative to provide a “conscious expansion of the coverage and commentary on the role of women in the rise of rap lyricism” (xlv).
Return to text
Copyright Statement
Copyright © 2024 by the Society for Music Theory. All rights reserved.
[1] Copyrights for individual items published in Music Theory Online (MTO) are held by their authors. Items appearing in MTO may be saved and stored in electronic or paper form, and may be shared among individuals for purposes of scholarly research or discussion, but may not be republished in any form, electronic or print, without prior, written permission from the author(s), and advance notification of the editors of MTO.
[2] Any redistributed form of items published in MTO must include the following information in a form appropriate to the medium in which the items are to appear:
This item appeared in Music Theory Online in [VOLUME #, ISSUE #] on [DAY/MONTH/YEAR]. It was authored by [FULL NAME, EMAIL ADDRESS], with whose written permission it is reprinted here.
[3] Libraries may archive issues of MTO in electronic or paper form for public access so long as each issue is stored in its entirety, and no access fee is charged. Exceptions to these requirements must be approved in writing by the editors of MTO, who will act in accordance with the decisions of the Society for Music Theory.
This document and all portions thereof are protected by U.S. and international copyright laws. Material contained herein may be copied and/or distributed for research purposes only.
Prepared by Amy King, Editorial Assistant
Number of visits:
3481