Histograms
Visualizing summarized data
When visualizing raw data doesn't work
(music_top200
>> ggplot(aes("position", "streams", color = "country"))
+ geom_point()
) 
Calculating min and max streams
by_position = (
music_top200
>> group_by(_.position)
>> summarize(max_streams = _.streams.max(),
min_streams = _.streams.min())
)
by_position | position | max_streams | min_streams | |
|---|---|---|---|
| 0 | 1 | 12987027 | 13604 |
| 1 | 2 | 9163134 | 10801 |
| 2 | 3 | 8043475 | 9510 |
| ... | ... | ... | ... |
| 197 | 198 | 1606234 | 1472 |
| 198 | 199 | 1606153 | 1470 |
| 199 | 200 | 1597824 | 1470 |
200 rows × 3 columns
Plotting
(by_position
>> ggplot(aes("position", "max_streams"))
+ geom_point()
+ labs(title = "Top 200 hits - max streams overall")
) Plotting (result)
(by_position
>> ggplot(aes("position", "max_streams"))
+ geom_point()
+ labs(title = "Top 200 hits - max streams overall")
) 
Starting y-axis at 0
(by_position
>> ggplot(aes("position", "max_streams"))
+ geom_point()
+ expand_limits(y = 0)
+ labs(title = "Top 200 hits - max streams overall")) 
Calculating min and max streams
by_continent_position = (
music_top200
>> group_by(_.continent, _.position)
>> summarize(max_streams = _.streams.max(),
min_streams = _.streams.min())
)
by_continent_position | continent | position | max_streams | min_streams | |
|---|---|---|---|---|
| 0 | Africa | 1 | 94422 | 94422 |
| 1 | Africa | 2 | 74689 | 74689 |
| 2 | Africa | 3 | 67552 | 67552 |
| ... | ... | ... | ... | ... |
| 997 | Oceania | 198 | 225951 | 44570 |
| 998 | Oceania | 199 | 225492 | 44364 |
| 999 | Oceania | 200 | 225179 | 44291 |
1000 rows × 4 columns
Visualize
(by_continent_position
>> ggplot(aes("position", "max_streams", color = "continent"))
+ geom_point()
+ expand_limits(y = 0)
+ labs(title = "Top 200 hits - max streams overall")) 