Untitled Document

Project Ideas -- CS 583 -- Fall 2021

General

Projects may be done alone, or in a group of up to 3 people. Your grade will be determined by the overall contribution and demonstrated amount of work, divided by the number of people. Therefore, a group project should be more comprehensive than a single-person project.

Due dates:

You will have to specify your project and your team when you hand in HW 04 (the Tuesday after Thanksgiving break), and you will get feedback on it asap after that; then you must schedule one Zoom meetings (approximately 15 minutes) with me each week to report on progress in the two weeks before it is due:

Week of December 6 - 10
Week of December 13 - 17
December 18th: End of final exams, final project report due.
Of course, I can meet with you as much as you want!

Submission:

You should submit your project into the Project directory on Gradescope. You must provide a 2-5page writeup outlining what you did and what results you got. Provide references to any papers or code you used. Ok to use code from other places, as long as it is referenced, and as long as you understand that your grade will depend on what YOU add to it!

Some general remarks:

When reading papers, please don't get discouraged if there are equations you can't understand, terminology we didn't discuss, etc.; this field is highly dependant on digital signal processing and previous work in vision and voice recognition, and it is sometimes hard to get your bearings. Please skim those parts that don't seem to have as much to do with your goals, and get what you can from the paper. Read the introduction and conclusion first, and keep your eyes open for other papers in the bibliography that might actuallly be a better choice for you.

I do not expect you to build all the tools you need for these project, e.g., if you need an onset detection algorithm. My first recommendation is that you look at the Aubio python library for sound processing: aubio.org. I am just learning this library myself, but it has a good reputation and should suit your needs. Feel free to explore other options, such as the Scipy libraries:

Scipy Signal Processing: http://docs.scipy.org/doc/scipy/reference/signal.html

Scipy Fourier Transforms: http://docs.scipy.org/doc/scipy/reference/tutorial/fftpack.html

Scipy Statistics: http://docs.scipy.org/doc/scipy/reference/tutorial/stats.html

Scipy Interpolation: http://docs.scipy.org/doc/scipy/reference/interpolate.html

For references to more general kinds of software, see these lists of software useful for MIR tasks: [1], [2]

Project Descriptions

[1] Lecture Animations/Educational Software. We definitely could have used, especially in the early parts of this class, some Python programs that illustrate, interactively, the various (complicated) notions that we learned about signals, sinusoids, phasors, spectra, etc. We also did not use any real-time audio (such as pyaudio). For this project, you could propose a suite of animations and/or interactive applications that would illustrate various ideas from the lectures in the first half of the class. Or you could explore pyaudio or other audio library and show how to do various demonstrations with real-time auiod. These could be on the web, or simply Python programs that future students could download and experiment with. We could even design some labs or lecture software for the class to use as part of a lecture.

Two projects I am particularly interested in are:

[2] Library of Instrument Sounds and Spectra. I didn't represent very many spectra of real intruments (e.g., the clarinet, whose parameters I got here). It would be interesting (and useful!) to have a library of sound clips of various instruments and a quantification of their spectra, following the model of the clarinet spectrum. Of course, the spectrum is only the very first step in simulating various instruments, so you might have to think about how to add characteristic vibrato, amplitude shaping (e.g., percussion instruments such as the piano die out following an exponential curve, but wind instruments do not). Then there is the whole subject of "attack" which is very different for different instruments. What I would want in this project is a collection of clips, an analysis of their basic spectra, perhaps some amplitude shaping and basic description and/or simulation of the attack phase. Then a collection of simulated instrument sounds based on your analysis.

[3] Digital Implementation of Guitar Pedals. Guitar pedals provide a bewildering variety of effects (as filters) for modifying the basic timbre of the electric guitar. For this project I would like someone to pick an interesting set of effects (e.g., frequency modulation, flanging, distortion, reverb, etc.) and investigate how to implement these with a Python program.

[4] Comparative Analysis of Pitch Tracking Algorithms. I gave several references to pitch-tracking algorithms. You would implement a handfull of the most important of these, and benchmark them using a suite of test files, and come up with a comparative analysis of the strengths and weaknesses of each. This has been done at various times, but it is always useful to think these things through for yourself.

Readings:

Wiki Article is a nice place to start: HTML

"Pitch Extraction and Fundamental Frequency: History and Current Techniques" by Gerhard: http://www.cs.uregina.ca/Research/Techreports/2003-06.pdf

Oldie but goodie: http://www.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/107_comparative%20pitch%20detectors.pdf

Google "Comparison pitch detection" to see other references.

How to proceed:

I would expect that you would either implement or access already-written algorithms in Python for this project; choose the zero-crossing method, the auto-correlation method we did in HW 4, the Yin method, and several others, including at least one spectral-based method.

Run these algorithms on a variety of files, from the simplest sine waves, to monophonic (speech, single instruments of various kinds), to polyphonic (piano, pop, classical, including some orchestral works), and for each, find a location in the signal where you have a sensation of definite pitch which you can confirm by playing the note on the piano. You will have to design your "test suite" from the easiest to the hardest based on your own perception of pitch at that point. Then test the algorithms on this test suite and evaluation how each did, writing a chart showing how accurate each method was. Did they get confused by octaves? How accurate was the pitch determined?

Then draw any conclusions you can from your comparison. Did one algorithm do better in all contexts or did various algorithms have various strengths?

[5] Comparative Analysis of Onset Detection Algorithms. We will do something along these lines for HW 03, but you could extend it, with more "ground truth" examples, and provide a more comprehensive analysis of the effect of various hyper-parameters.

[6] Vocoder. The Vocoder was an early program to analyze and synthesize sounds, either voice or music. The basic idea is as follows: take a musical signal and analyze the spectrum using various "frequency bins" (basically, the results returned from the FFT, perhaps being grouped together in some way, see "Search for Frequencies" below); then you analyze the bins as a function of time to get the 2D array of amplitudes of the sample-number (the columns) and frequency (the rows). This could be displayed as a spectrogram; however, you will use it to re-synthesize the signal as follows. Take each frequency and calculate the amplitude envelope for that frequency over the whole signal, and then create a new signal which uses these amplitude envelopes to recreate the original signal. Why do this? Once having done the analysis, you could manipulate the signal in many ways, for example, changing the pitches or the time (see the next project). This is the basis for the Grammy-Award-Winning Melodyne editor, which allows you to change all kind of things in the music signal, and then resynthesize it back with the changes in place.

Reading:

The Wiki article on the Phase Vocoder is good:HTML and there are good references to further reading; you can also check out the Dolson tutorial referenced there: http://www.eumus.edu.uy/eme/ensenanza/electivas/dsp/presentaciones/PhaseVocoderTutorial.pdf.

For this project, I would expect you to take the toy vocoder from class and extend it with more complicated techniques, as discussed in the papers.

[7] Pitch or Tempo Modification. For this project, I would like you to explore more advanced algorithms, implementing them and testing them on various kinds of signals, both spoken and musical. The Wiki article (here) has a nice summary, and you could implement the two methods (other than the vocoder) described there and evaluate them.

A very complete survey of techniques for Pitch and Time Scale Modification: PDF

One of the most important improvements to Phase Vocoder for Pitch/Time Scale Mod is here: PDF

[8] Pitch Tracking. If you graph the pitch over time, you get a "pitch track," which is useful in many situations where the melody or pitch contours (say of a spoken utterance or an animal sound) need to be analyzed. You would have to develop a reasonably algorithm for pitch tracking (harder than it sounds), and then it might be interesting to take a corpus of sounds, say, bird songs, and analyze them in terms or contour or similarity. This will be discussed in lecture.

[8b] Analysis of Vibrato. Vibrato is the application of pitch (and amplitude) modulation for musical effect. For classical singers, it is stated in the literature that an "ideal" vibrato has a pitch variation of about a half a semitone and a frequency of 360 Hz, i.e., 6 oscillations per second. How accurate is this received wisdom in classical vocal performance? Can you make such generalizations about pop singers or about rock guitarists? I have noticed that among blues harmonica players, the vibrato frequency matches the tempo of the music, but I have not observed this in other kinds of music. Is this correct?

This would involve pitch tracking and then applying the tempo analysis techniques from lecture to find the oscillation of the pitch, and then looking at a lot of different music files.

[9] Music Synthesis. The first part of the course was about synthesis. For this project, I would like you to take the "mini-midi" code from HW 04 and build a more complex music synthesis framework. As part of this, I would like you to explore Frequency Modulation in more detail, finding out about how to simulate various instruments (see Chowning's Article, or the chapter from Musimathics linked in the topics schedule on 2/2); I would also like you to explore spectral analysis of various real instruments; finally, put these all together and demonstrate them with some real music that you simulate.

[10] Tempo Analysis. Using the techniques presented in the lecture on onset detection and rhythm analysis, write a program that will track the tempo of a musical recording. Note that this does not depending on solving the "hierarchical rhythm problem" (i.e., which is the basic beat?) but only on tracking whichever beat is strongest. Apply this program to a collection of (live) music recordings and see how much the tempo changes during the pieces. Can you draw any conclusions about classical vs pop vs jazz vs whatever kinds of musicians? A common idea is that musicians (especially amateurs) speed up when the music gets louder; can you find any relationship between amplitude and tempo? Do different interpretations of the same piece show the same tempo changes (i.e., do all orchestras speed up at the end of Beethoven's Fifth Symphony?).

[11] Chord Recognition. Based on the lecture on this topic, write a program to take a musical file and a timestamp, and to analyze a window of samples starting at this time stamp, calculate the chroma features, and to match it against the kernels/templates for the various chords. The program find the best fit among the kernels for the chroma calculated from the window. Best to do this on the piano.

Readings:

Lecture (PPT on the web site)

"CHORD RECOGNITION USING MEASURES OF FIT, CHORD TEMPLATES AND FILTERING METHODS", by Laurent Oudre, Yves Grenier, and Cédric Févotte: http://www.ee.columbia.edu/~dpwe/papers/OudGF09-chords.pdf.

"TEMPLATE-BASED CHORD RECOGNITION : INFLUENCE OF THE CHORD TYPES," same authors as previous: http://laurentoudre.fr/publis/OGF-ISMIR-09.pdf

Resources:

The audio samples directory contains two files for this project: PianoChords.wav and PianoChordsDescription.txt.

[12]* Audio Fingerprinting for Search and Retrieval. Systems such as Soundhound use algorithms to summarize a music signal as a "fingerprint" which can be search efficiently. In this project, I would expect you to learn the basic algorithms (surveyed in lecture) and produce a simplified system which can store and search fingerprints for a small corpus of music files.

Readings:

My 591 lecture: PDF

http://www.blisshq.com/music-library-management-blog/2012/08/21/what-is-audio-fingerprinting/

http://willdrevo.com/fingerprinting-and-audio-recognition-with-python/

http://www.palmnet.me.uk/uni/FYP/Audio%20Fingerprinting.pdf

Review of Fingerprinting Algorithms: PDF

ISMIR tutorial on fingerprinting: PDF

How to proceed:

Look at at least my lecture notes and a couple of the references listed above; but keep in mind that a great deal of the research has concerned efficiency, and I am not so interested in the most efficient solution for a project.

Write a program that produces a spectrogram for an entire music file, using the realFFT(...) function from HW 05, and applying it to windows of size 4096, with a slide interval of, say, 2048 (i.e., the first window of size 4096 starts at 0, then next at 2048, the next at 4096, and so on). Following one of the paradigms in the readings, summarize the spectrogram using a threshold to identify peaks in the amplitudes in the spectrogram. Devise a representation for such reduced spectrograms so that they can be stored as files (e.g., text files) in a directory. This is your database of music to be searched.

Write code that will accept a 2-3 second excerpt froma music signal, process it in the same way as your database of music files, and for each file in your database, following the algorithms for fingerprinting that you read about in the readings, slide the reduced spectrogram over the various signals in the database, and find the best match for the given exerpt.

Evaluate your algorithm by developing a "test suite" of examples from the database, which should be able to be identified precisely. Then, generate additional, more difficult, test signals, in which noise has been introduced: perhaps you could play the music file and record it on another device (so that it is not exactly the same file), or introduce noise by rerecording the music file with background noise, either a person talking, or another piece of music. Try to be realistic about what might happen when someone hears a piece of music he/she might like to submit to your system. Test your algorithm on all these examples.

Once you have the basic method working, try to refine it by playing with the parameters: window size, threshold for summarizing spectrograms, etc. Can you increase your success rate by these basic parameter tuning techniques? Go back to the literature and see if there are any other ideas that you could experiment with in the time you have. Again, do not worry about efficiency, only think about how to improve the recognition rate of your algorithm.

[13]* Alignment. Systems which compare two music signals or allow score-following work by creating a 2D "similarity matrix" comparing two signals (or a signal and a score) at every pair of window locations; the similarity is computing using a distance function on the spectra calculated in each window. Such a matrix can be used to line up two signals which differ in their tempo or general timing. In this project I would want you to learn about the alignment techniques (I have studied this, and there are some good survey papers as well) and then pick an application (e.g., score to music alignment) and demonstrate your system on several examples. (I will discuss time-warping and alignment in the lecture on 4/8.)

[14]* Structure Analysis. Using the technique of similarity matrices described in the previous project, one can analyse a piece of music to find similarities among the various sections of the piece. The basic idea is to analyze the self-similarity by computing the similarity matrix of a music signal with itself. The resulting matrix (which can be displayed graphically) will have features corresponding to similar and dissimilar sections of the piece. I will talk about this at the last class, but there is at least one good, short research paper which explains the ideas fairly simply, and which you could use to implement your system.

[15]* Automatic Transcription. A very important problem in audio programming is attempting to transcribe a musical signal into either MIDI form, tablature, or standard musical notation. This has been studied for some time, and partial success has been gained, but the problem is far from solved. For a semester project, I would expect that you would delimit the problem drastically, and attempt to transcribe homophonic melodies (one melodic line at a time) to some kind of text file representing the notes. A simple place to start would be with a guitar melody, and the transcription would be in the form of tablature, giving the notes in first position (within the first four frets) with their timing. This is a combination of pitch detection (as in HW05) with beat detection.

[16] Machine Learning of Musical Signals. This would be particularly good for anyone who has taken Machine Learning. Machine learning techniques are natural for musical signals. The basic idea here is to define a problem, for example, how could we automatically identify human speakers on the basis of a short recording? Then you would identify a number of "features" (= statistical properties of the signals, such as mean frequency), train a classifier, and then try to solve the problem. This was done last year in a very successful project, using bird songs. But you could really use just about any collection of sounds, for example, sounds collected around the city of Boston (classroom, subway, restaurant, street, church, etc., etc., etc.). Imagine a recording from the cell phone of a murder victim who was carried away from the scene of the crime: could you identify the location from the statistical properties of the background noise?

Instrument Recognition: Classify instruments used in a musical sample, either monophonic or (much harder) polyphonic; this is a current project which we will talk about in detail, but you could contribute!
Various non-musical classification problems: bird songs, environmental sounds, speaker/singer identification, etc.

[17]* Analysis of Swing. You would need to use the techniques presented in that lecture to analyse the tempo of a piece of jazz music, adjust the timing of the note onsets relative to this tempo, and then draw conclusions about the use of swing rhythm in the piece. Is it true that swing is "just" the conversion of straight eights to a triplet rhythm? You should evaluate the statement, otherwise unsubstantiated (to my knowledge) in the literature, that swing becomes more like straight time as the tempo increases. Is there a different concept of swing among different eras or genres of jazz and blues?

[18]* Analysis of "Good Tone". Why do we think that certain instrumentalists (usually acoustic) have "great tone"? Wind players and vocalists have at least some substantive reason to think this is the case, because there are lots of reasons (having to do with the configuration of the vocal tract for example) why experienced players sound better than beginners. It is (I think) a combination of resonance and appropriate vibrato and phrasing, and produced after long study by skill and relaxed, efficient tone production. For singing, this has been fairly well established as the "singers formant," which is a region of the spectrum that can be measured, and has to do with the dropped larynx of classical singers. For harmonica, it has a lot to do with resonance and vibrato. For piano, there is a lot of nonsense that has been written about various pianists having "great tone" with physicists weighing in on the other side, explaining that the piano key, at the moment it hits the string, has NO connection with the key, and hence has only one parameter, the velocity. I am not expecting you to solve this issue, but perhaps some statistical analysis of amateur vs expert players on some instrument could shed light on the issue.

[19] Analysis of Phonemes Anyone with an interest in linguistics might want to pursue this---again, you know who you are! Write a program that avoids the whole problem of word separation, and simply try to identify phonemes in a voice signal. This would probably involve a sliding window that moves through the signal and tries to match the features of the window with the features of various consonants and vowels; for example the "ss" sound has a lot of high frequency components, the plosives have a characteristic amplitude envelope, and each of the vowels has a particular "spectral signature" based on the configuration of the vocal tract. Try to characterize each of the phonemes and convert a voice signal into a list of phonemes. Further explorations of this topic could involve "time-warping" to remove the problem of tempo and duration from the comparison of a phoneme signal and the pattern.

Readings:

http://homepages.wmich.edu/~hillenbr/Papers/HillenbrandGayvertVowelClassificationBasedOnF0AndFormants.pdf

http://homepages.wmich.edu/~hillenbr/Papers/HillenbrandGettyClarkWheeler.pdf

http://www.ai.rug.nl/~tjeerd/publications/valkenier09.pdf

How to proceed:

First, learn the standard English vowel sounds circled in this diagram of the IPA chart:

This is from the following site, which includes short recordings of each of the vowels: http://en.wikipedia.org/wiki/IPA_vowel_chart_with_audio

Make recordings of each of these vowels, or record them yourself (sometimes it helps to say a word that has the vowel, you can find these in a dictionary that gives IPA translations).

Note that we are ignoring consonants and diphthongs for now.

Next, using a fairly large window size (e.g., .25 or .5 sec) generate a spectrum for each of these, and see if you can come up with a pattern that characterizes these vowels. If necessary, you might need to seek additional recordings of vowels for comparison.

Write a classifier which applies a distance function to the spctrum of an input vowel and a pattern, and reports the closest match. You should be able to identify each of these vowels, because the patterns were derived from the signals being recognized!

Next, generalize your classifier by recording your own voice saying these 9 vowels, plus several of your friends or family. Or, find one of the databases of voice recordings on the web, and develop a collection of samples for each of the nine vowels. Now you'll have to try to modify the classifier to recognize these; you probably can't do it perfectly, but try for the highest success rate you can.

Finally, if there is time, you might try to do the same for the consonants; however, in this case, it is not so much the spectrum as the combination of the spectrum and the envelope: the difference between "p" and "b" seems to be in the spectrum and in the sharpness of the onset. Some vowels, such as "s" could perhaps be recognized by the spectrum alone. Try to differentiate the various classes of consonants based on the spectrum, and then incorporate other features.

If you have time, you might try to isolate phonemes in spoken words by applying a sliding-window algorithm to search for the regions of the signal where various phonemes occur.

[21] Your Suggestion! If you have a special interest, and want to pursue something I have not mentioned here, please talk to me! I'm flexible and interested in everything having to do with computing and sound! But be sure to think realistically about what can be done with the resources you have. Most students are WAY too ambitious; try to find a smallish project and say something definite, rather than a big project which you can not really finish and can't really draw any definite conclusions from.