An Analysis of Rate Your Music Ratings Is Today's Music …aglevine/DataScience/RYMStats.pdf · An...
Transcript of An Analysis of Rate Your Music Ratings Is Today's Music …aglevine/DataScience/RYMStats.pdf · An...
1
An Analysis of Rate Your Music Ratings Is Today's Music Really Worse?
Aaron LevineOctober 2016
2
Introduction
● Rate Your Music (RYM) is one of the largest online databases of music– It's also the largest database that contains user submitted reviews and
ratings [1]
● Top albums from the most recent decade are rated lower than top albums from prior decades– Is music just getting worse?
● Hypothesis: we know which albums from 60's and 70's are classics and rate them appropriately. Albums that were added after RYM went online are fiercely debated and the average rating declines accordingly– This will be supported if average rating declines after RYM launched
● Let's use data from RYM to quantify this effect!
[1] https://en.wikipedia.org/wiki/List_of_online_music_databases
3
Data Gathering
● RYM has no API– Data gathered using HTML scraping and parsing
– Time consuming, limited amount of data gathered for this small project
● Focus on top albums– “Worst” albums from prior decades are forgotten but more recent “worst”
albums are ridiculed on the site and receive many ratings and reviews● Entirely different effect, for a future project!
– Initial goal was to analyze all albums between 1990 and 2014 with over 3.5/5 rating and over 500 reviews
● RYM's ratings are strict: 3.5/5 actually is quite good for the site
● Scraped data from top of year charts until all albums with over 3.5/5 and over 500 reviews were collected
4
● RYM ranks albums by a proprietary algorithm that takes into account number of ratings and votes– An album with 3.4/5 but 1000 ratings could be ranked
above an album with 3.5/5 and 600 ratings
● After scraping enough HTML from the RYM charts to get all albums with over 3.5/5 and over 500 ratings for the year, a Gaussian distribution was created when the rating requirement was disregarded– Reason: tail on the lower end of the rating scale
consisting of albums with more votes
● Ultimately, it doesn't make sense to apply a universal ratings cutoff when the whole point of this project was to determine how much the ratings varied from year to year!
● Final selection: Scraped HTML for “top” 680 albums per year, from 1990 to 2014, disregarded all albums with less than 500 votes
Data Selection
5
RYM Ratings by Year
Scroll to see changes!
6
1990
7
1991
8
1992
9
1993
10
1994
11
1995
12
1996
13
1997
14
1998
15
1999
16
2000
17
2001
RYM Launched
First year of RYM: Mean lower than all years but 1992,1999
18
2002
Second year of RYM: Lowest mean so far
19
2003
20
2004
Mean keeps declining....
21
2005
22
2006
23
2007
24
2008
User Reviews added [1]Immediately a 0.05 decline in mean from 2007!Largest previous change was 0.03
[1]wikipedia.org/wiki/Rate_Your_Music
25
2009
26
2010
27
2011
28
2012
29
2013
30
2014
31
1990 vs 2014
32
Three Distinct RYM Ratings Eras
● Pre RYM ( Up to 2000):– Top albums discussed in pop culture,
users have solidified opinions before rating
● Transitionary period (2001-2007):– RYM online, but still in early period.
Internet less universally prevalent, lack of reviews
● Modern RYM (2008-Present)– Reviews added, average rating of top
albums drops an unprecedented 0.05 in first year of modern era, continues declining throughout era
– Increased awareness of good but not classic albums shrinks width of top 680 distribution
33
Modern RYM Era: A Closer Look
● Mean declines linearly throughout era– Apply a linear regression to
quantify the effect
● Result for year > 2007: – f(x) = 33.15 – 0.15x
● Strong linear relationship– R2 = 0.9788
– Residual standard error: 0.0051
34
Corrected RYM Ratings
● Correct Modern RYM ratings to account for drop in mean● For each rating, use:
– Corrected rating =
(initial rating – mean(year)) X σ(PreRYM)/σ(year)) + mean(PreRYM)● mean(year) is from linear regression from 2008-2014● mean(PreRYM) = mean(1990-2000) = 3.66● σ(PreRYM) = σ(1990-2000) = 0.16● σ(year) = σ for each year of the modern RYM era
● This simple method produces a distribution from 2008-2014 with the same mean and σ and the 1990-2000 distribution– Simplest way of correcting for this effect. Many more sophisticated
ways to correct can be implemented in future
35
Corrected ModernRYM Vs. PreRYM
Corrected
36
Top 10: PreRYM and ModernRYM
Rating Year Artist Album
4.22 1997 Radiohead Ok Computer
4.21 1991 My Bloody Valentine Loveless
4.18 1993 Wu Tang Clan Enter the Wu Tang
4.17 1994 Nas Illmatic
4.16 2000 Radiohead Kid A
4.15 1995 GZA Liquid Swords
4.14 2000 Godspeed You Black Emperor
Lift Your Skinny Fists Like Antennas to Heaven
4.12 1994 Portishead Dummy
4.11 1996 DJ Shadow Endtroducing
4.09 1997 Godspeed You Black Emperor
F A ∞♯ ♯
Nothing from ModernRYM is in top 10
Before Corrections
37
Top 10: PreRYM and ModernRYM
Rating Year Artist Album
4.41 2012 Kendrick Lamar Good Kid Maad City
4.22 1997 Radiohead Ok Computer
4.21 1993 My Bloody Valentine Loveless
4.19 2010 Kanye West My Beautiful Dark Twisted Fantasy
4.19 2014 D'Angelo Black Messiah
4.18 1993 Wu Tang Clan Enter the Wu Tang
4.18 2012 Swans The Seer
4.17 1994 Nas Illmatic
4.16 2000 Radiohead Kid A
4.15 2009 Vektor Black Future
After Corrections
Mixture of ModernRYM and PreRYM in top 10
38
Results and Conclusions
● Data supports hypothesis: ratings decline after creation of RYM● Simple corrections to ModernRYM to account for ratings declines
produces a top album list with an equal mixture of albums from both eras– More sophisticated corrections taking into account number of votes and
genre could be added as part of a larger future project
● Future work could also examine the volatile transition period from 2001-2007– More difficult to examine because of a relatively rapidly changing and
increasing user base
● Apparently I really need to listen to Kendrick Lamar's 2012 album!