MTO 10.3: Aarden and Hippel, Overfitting

Volume 10, Number 3, September 2004
Copyright © 2004 Society for Music Theory

Paul T. von Hippel and Bret Aarden
Rules for Chord Doubling (and Spacing): A Reply To Wibberley

Weights and Accuracy

[1] Our study used features of doubling and spacing to discriminate between "composed" and "random" chords. The results were summarized in two ways:

Weights. Which features discriminated best? That is, which features should receive the most weight in discrimination?
Accuracy. When features have been assigned appropriate weights, how accurately do they discriminate between composed and random chords?

[2] In our paper, we used the same set of chords to set weights and to measure discriminatory accuracy. Wibberley objects to this. His concern is legitimate, since using the same data for both purposes can yield inflated estimates of accuracy--a problem known as overfitting. However, overfitting is a substantial problem only when the number of features is comparable to the number of chords. In our study we had thousands of chords and only a few dozen features, so we did not consider the issue worth mentioning.

[3] We did address overfitting in unpublished analyses. Our approach, known as cross-validation, was to split the data into two subsets--a training set that was used to set weights, and a testing set where the weights were used to test discriminatory accuracy. We tried several ways of splitting the data. First, we split it at random. Then we split it by repertoire, so that the chorales were used for training and the quartets were used for testing, or vice versa.

[4] No matter how we split the data, discriminatory accuracy was about 70%--just as we reported in the published paper.

◄ Back to article

Prepared by
Brent Yorgason, Managing Editor
Updated 17 September 2004