The Blogg

April 6, 2010

MusicMap & Recommendations

Filed under: Computing,Music,Personal — chadhogg @ 1:28 pm

MusicMap is a style of research that I have been interested in for some time, and hope to branch into at some point in my career. The idea is to provide a 2-dimensional model in which similar things are close to each other and dissimilar things are far away from each other. I could not find it stated explicitly anywhere, but my educated guess is that these relationships are based on data from last.fm, with the similarity between two musicians based on how frequently they appear together in the list of artists a user likes compared to how frequently only one of them appears.

There are legitimate questions as to whether or not the same people liking two artists really makes them “similar”, and the process of trying to embed very high-dimensional data into the plane inevitably results in artifacts that appear to affirm relationships that do not actually exist. Looking at this map, is the music of Duke Ellington really that similar to the music of Eminem and that distinct from Lee Morgan? Are Morgan and Ellington really more similar to Snoop Dogg than to each other and very far from Louis Armstrong? What does Garth Brooks have to do with rhythm & blues?

Not surprisingly, my own interests tend toward the extremes of the map, away from the vast desert of mainstream pop in the center of the continent and the northwestern electronica steppes. Start with the jazz musicians at the southwestern coast; move eastward through soul, funk, and blues to classic rock; drift northward through hard rock and into heavy metal (but avoiding the peninsula of extremism); then tiptoe to the northwest, sampling a taste of modern rock but never quite comfortable until you reach the punk coast; continue through to the peninsula of ska; and from there take a boat east to the isle of reggae. Where do you draw your own citizenship?

I created a last.fm account for myself a few months ago. If you too have an account, please be my “friend”. As a way to keep track of what I have listened to and look for trends, I find the service very useful. I am not so sure about its utility as a recommendation system, however. Last.fm uses what appears to be a binary model of interest: either you have listened to a musician or you have not. Perhaps they use weights based on how often you listen to a band, but the fact that some artists are much more prolific than others would complicate that. There is no easy way to differentiate between that which you love, like, or merely tolerate. (It is possible to “love” individual tracks, but I do not think that this is used for recommendations.) More importantly, there is no way to distinguish between musicians that you have not listened to because you are unaware of them and musicians that you have not listened to because you hate their music. Any system that attempts to learn without any negative examples is going to have serious difficulties.

Long ago I set out to create my own music recommender system for several class projects and my own interests, but found the task far too large for a single person. My system was to be album-based, so that it can work in spite of artists who have evolved significantly over their careers. Instead of the ambiguities in the last.fm data, users would be able to rate albums on a numeric scale, and would be encouraged to rank some music that they are familiar enough with to know that they do not like it. It would attempt to collect other data about albums (the year they were released, producers who worked on them, whether they are studio / live / compilations releases, etc) and about users (age, gender, geographical location, etc) to explicate how those features might explain some users’ ratings. Users would be able to generate custom recommendations by choosing an algorithm (k-means clustering, singular value decomposition, …) and a data source (user’s ranking, album data, user demographics) instead of the default. Unfortunately, even if I had been able to find the time to implement all of this, collecting all of this data from a large sample of the population would be impossible. That is the genius of last.fm; while the informative content of the data may be weak, it is collected automatically from people who opt-in.

I became a member of Pandora back when you actually had to pay for an account. I love the idea of their Music Genome Project attempting to find similarity based on actual musical characteristics, but they often seem to find the most superficial relationships while ignoring the factors that are important to me. Their system has a tremendous knowledge engineering requirement to determine the “genetic code” of each song, and it is amazing that they have been able to accomplish this feat. But does it actually make good recommendations? Only partially, in my experience. Based on a playlist of thrash and mainstream metal, it has selected the song “Hitman” by Metal Church for me. This is good; I like the song. But Pandora has played that song for me dozens of times and never any other track by Metal Church. It is possible that this is the one and only song they ever wrote in the style of music that I enjoy (I’ve not yet actively sought to hear the rest of their catalog), but this seems unlikely. If one of the objectives is to help me discover new music that I would like, then a little variety would be nice.

2 Comments »

  1. I agree that Pandora could use a checkbox for “occassionally throw in songs by artists that I like, even though they don’t necessarily have the same Genome tags that you think I like”.

    I’ve found that you can help it along by adding in some of these additonal artists, instead of particular songs.

    Comment by Keith — April 6, 2010 @ 1:43 pm

  2. I didn’t know you were hip to embeddings. A lot of what my group does is on that stuff. Your one comment, about inevitable artifacts, is not always true. My friend Ben is working on 3 dimensional embeddings of 4 dimensional data, so to avoid these (which you get in a simple projection), he uses an algorithm that takes a bunch of local projections and puts them together. Of course, this only works if the data can be faithfully represented in a lower dimension … if it can’t then this doesn’t work.

    Comment by Michaluk — April 7, 2010 @ 11:05 am

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress