Sharla Gelfand

February 2019: Visualizing my discogs collection

In parts one and two of this series I did a whole lot of API pulling and data cleaning to get my discogs collection into a tidy state šŸ™ Now Iā€™m finally ready to do something with it!

I want to be able to explore my collection on a map (šŸ˜±) and also see what styles of music I like, from where, and how that has changed over time.

collection_data
## # A tibble: 169 x 11
##    release_id title format artist_id artist_name  year style city  country
##         <int> <chr> <chr>      <int> <chr>       <dbl> <chr> <chr> <chr>  
##  1    7496378 Demo  Tape     4619796 Mollot       2015 Hardā€¦ Toroā€¦ Canada 
##  2    4490852 Obseā€¦ "12\""   3192745 Una BĆØstiaā€¦  2013 Hardā€¦ Barcā€¦ Spain  
##  3    5556486 Fuckā€¦ "12\""   2876549 Good Throb   2014 Postā€¦ Londā€¦ UK     
##  4    9827276 I     "7\""    2769828 S.H.I.T.     2017 Hardā€¦ Toroā€¦ Canada 
##  5    9769203 OĆ­doā€¦ "12\""   4282571 Rata Negra   2017 Punk  Madrā€¦ Spain  
##  6    7237138 A Caā€¦ "7\""    3596548 Ivy          2015 Hardā€¦ New ā€¦ USA    
##  7   13117042 Tashā€¦ "7\""    5211980 Tashme       2019 Hardā€¦ Toroā€¦ Canada 
##  8    7113575 Demo  Tape     4450861 Desgraciadā€¦  2014 Hardā€¦ Calgā€¦ Canada 
##  9   10540713 Let ā€¦ Tape     4273896 Phantom Heā€¦  2015 Postā€¦ Kansā€¦ USA    
## 10   11260950 Sub ā€¦ Tape     5694086 Sub Space    2017 Hardā€¦ New ā€¦ USA    
## # ā€¦ with 159 more rows, and 2 more variables: lat <dbl>, long <dbl>

So, yes, I want to map my discogs collection all over the world šŸŒ

Pretty much everything I know about spatial data is from Jesse Sadlerā€™s amazing blog post, Introduction to GIS with R, so Iā€™m pulling this code heavily from there.

First, so that we donā€™t have legend fatigue, Iā€™m going to lump the least common music styles together. My collection is fairly dominated by a few things:

collection_data %>%
  count(style, sort = TRUE)
## # A tibble: 17 x 2
##    style                n
##    <chr>            <int>
##  1 Hardcore            78
##  2 Punk                37
##  3 Post-Punk           14
##  4 Indie Rock          13
##  5 Black Metal          4
##  6 New Wave             4
##  7 Shoegaze             4
##  8 Experimental         3
##  9 Hip Hop              2
## 10 Indie Pop            2
## 11 Pop Rock             2
## 12 Alternative Rock     1
## 13 Avantgarde           1
## 14 Grunge               1
## 15 Ska                  1
## 16 Stoner Rock          1
## 17 Synth-pop            1

and while Iā€™d love to specifically look at New Wave releases across the world, it just doesnā€™t make sense for that grand total of 4.

library(forcats)

collection_data <- collection_data %>%
  mutate(style = fct_lump(as_factor(style), 4))

Next, Iā€™m converting my data frame into an sf object using the long and lat fields.

library(sf)
library(dplyr)

points_sf <- collection_data %>%
  filter(!is.na(lat)) %>%
  st_as_sf(coords = c("long", "lat"), crs = 4326)

points_sf
## Simple feature collection with 167 features and 9 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -123.13 ymin: -33.46 xmax: 139.77 ymax: 63.83
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 167 x 10
##    release_id title format artist_id artist_name  year style city  country
##         <int> <chr> <chr>      <int> <chr>       <dbl> <fct> <chr> <chr>  
##  1    7496378 Demo  Tape     4619796 Mollot       2015 Hardā€¦ Toroā€¦ Canada 
##  2    4490852 Obseā€¦ "12\""   3192745 Una BĆØstiaā€¦  2013 Hardā€¦ Barcā€¦ Spain  
##  3    5556486 Fuckā€¦ "12\""   2876549 Good Throb   2014 Postā€¦ Londā€¦ UK     
##  4    9827276 I     "7\""    2769828 S.H.I.T.     2017 Hardā€¦ Toroā€¦ Canada 
##  5    9769203 OĆ­doā€¦ "12\""   4282571 Rata Negra   2017 Punk  Madrā€¦ Spain  
##  6    7237138 A Caā€¦ "7\""    3596548 Ivy          2015 Hardā€¦ New ā€¦ USA    
##  7   13117042 Tashā€¦ "7\""    5211980 Tashme       2019 Hardā€¦ Toroā€¦ Canada 
##  8    7113575 Demo  Tape     4450861 Desgraciadā€¦  2014 Hardā€¦ Calgā€¦ Canada 
##  9   10540713 Let ā€¦ Tape     4273896 Phantom Heā€¦  2015 Postā€¦ Kansā€¦ USA    
## 10   11260950 Sub ā€¦ Tape     5694086 Sub Space    2017 Hardā€¦ New ā€¦ USA    
## # ā€¦ with 157 more rows, and 1 more variable: geometry <POINT [Ā°]>

In order to visualize those, I need a map of the world so I have something to plot on top of (I mean, I guess I donā€™t need to use the actual earth as a reference point, but I think weā€™d all appreciate it if I did)

library(rnaturalearth)

countries_sf <- ne_countries(scale = "medium", returnclass = "sf")

And then I can plot my collection! Iā€™m using different colours for different music styles, and shapes for different formats.

To no surprise, the vast majority of my collection is from North America, with a real focus on the Pacific North West (I used to live in Vancouver ā˜‚ļø) and Toronto/East Coast USA (thereā€™s just a lot of punk there, in general šŸŽø).

library(ggplot2)
library(paletteer)
library(plotly)

collection_plot <- ggplot() +
  geom_sf(data = countries_sf, fill = "white", size = 0.25, alpha = 0.5) +
  geom_sf(
    data = st_jitter(points_sf,
                     amount = 0.75),
    aes(color = style, shape = format,
        text = glue::glue('"{title}" by {artist_name}<br>{city}, {country}<br>{style} {format}')),
    alpha = 0.75,
    show.legend = FALSE,
    size = 2
  ) + 
  theme_bw() + 
  theme(legend.position = "none", 
        legend.title = element_blank(),
        axis.text = element_blank(), 
        axis.ticks = element_blank()) + 
  scale_color_paletteer_d("rcartocolor", "Pastel")

ggplotly(collection_plot, 
         tooltip = "text")

Iā€™m also interested in the different eras of my music taste ā€“ do I like different kinds of music from different times? You know how to add the time dimension to a plot?

Animation šŸ˜Ž šŸŒ 

Similar to spatial data, everything I know about animation is from one source: Thomas Lin Pedersenā€™s talk about the gganimate package from RStudio conf.

Iā€™m going to focus on North America, since thatā€™s where most of my information is from. In a maybe blasphemous move, Iā€™m overlaying the American states and Canadian provinces and territories over the map of the worldā€™s countries šŸ˜¬

states_sf <- ne_states(country = c("Canada", "United States of America"), returnclass = "sf")

north_america_collection_plot <- ggplot() +
  geom_sf(data = countries_sf, fill = "white", size = 0.25) +
  geom_sf(data = states_sf, fill = NA, size = 0.25) + 
  geom_sf(
    data = st_jitter(points_sf %>% filter(year > 0),
                     amount = 0.75),
    aes(color = style, shape = format),
    alpha = 0.75,
    show.legend = "point",
    size = 3
  ) +
  theme_bw() + 
  theme(legend.title = element_blank(),
        legend.position = "bottom") + 
  guides(colour = guide_legend(override.aes = list(size=5, alpha = 1)),
         shape = guide_legend(override.aes = list(size=5, alpha = 1))) + 
  scale_color_paletteer_d("rcartocolor", "Pastel") + 
  coord_sf(xlim = c(-130, -65), ylim = c(23, 55), datum = NA)

Without animation, itā€™s not bad.

north_america_collection_plot

With animation itā€™s way cooler.

library(gganimate)

north_america_collection_plot + 
  transition_states(as.factor(year),
                    state_length = 3) + 
  ggtitle("{closest_state}") + 
  shadow_mark() 

I lived in the PNW from 2013 to 2017, and you can see a huuuge increase in music from there during that time. Pretty cool!

I think thatā€™s all I have šŸ’ Bye!