I have downloaded the 2018 Political Advertisement data off of Snapchat’s website by downloading a .zip file and converting it to a .csv file, which can be read here. Using the following two graphs, I will attempt to look at how demographics relating to age, gender, and country influence general use and advertisement targetting.
snap_sum1 <- snap %>%
group_by(CountryCode,AgeBracket) %>%
summarize(freq=n()) %>%
mutate(perc=freq/sum(freq)*100) %>%
ungroup()
snap_sum1
#R> # A tibble: 72 x 4
#R> CountryCode AgeBracket freq perc
#R> <fct> <fct> <int> <dbl>
#R> 1 australia "" 2 25
#R> 2 australia "24-" 1 12.5
#R> 3 australia "30+" 4 50
#R> 4 australia "35++" 1 12.5
#R> 5 belgium "17-24" 1 100
#R> 6 canada "" 15 50
#R> 7 canada "16-22" 1 3.33
#R> 8 canada "16-25" 6 20
#R> 9 canada "18-26" 1 3.33
#R> 10 canada "24-" 1 3.33
#R> # … with 62 more rows
Here I am summarizing my data to use for my second plot. The code is compiling country and age information into a frquency, and then percentage so that I may see how many people fall in each specified category.
p <- ggplot(data=snap,mapping=aes(x=Spend,y=Impressions,colour=CountryCode,fill=CountryCode)) +
geom_point(pch=21,alpha=0.5,size=2) +
scale_x_continuous(name="Thousands($) Spent On Campaign",breaks=seq(0,20000,10000),labels=scales::unit_format(unit="",scale=1/1000),expand=expansion(mult=c(0,0.02))) +
scale_y_continuous(name="Millions of Views by Snapchatters",labels=scales::unit_format(unit="",scale=1/1000000),expand=expansion(mult=c(0,0.02))) +
geom_smooth(method="lm",se=FALSE) +
theme_classic() +
theme(panel.grid.minor=element_blank()) +
facet_wrap(vars(Gender)) +
labs(title="$ Spent by Advertiser vs Impressions",subtitle="by Gender and Country") +
theme(plot.background=element_rect(fill="linen",color="blue"),legend.background=element_rect(fill="linen"))
p
#R> `geom_smooth()` using formula 'y ~ x'
This graph attempts to show how gender effects money spent on advertising and how well the advertisements are recieved. However, since a large portion of Snapchat’s users do not specify their gender, the majority of the data takes place in the unlabeled “other” gender box. As you can see, a scatterplot of all data points has been compiled, with a regression line or line of best fit over the top. The line, added with a smoother using geom_smooth, particularly helps outline the increased number of views in France compared to other countries.
p <- ggplot(data=snap_sum1,mapping=aes(x=AgeBracket,y=CountryCode,fill=perc)) +
geom_tile() +
scale_x_discrete(name="Age Bracket",breaks=seq(0,100,10)) +
scale_y_discrete(name="Country") +
labs(title="Age Distribution of Snapchatters",subtitle="by Country") +
labs(title="Density of Snapchat users",subtitle="by Age and Country") +
theme_grey() +
theme(plot.background=element_rect(fill="linen",color="blue"),legend.background=element_rect(fill="linen"))
p
Using the summarized data I compiled earlier, I have used geom_tile to display the distribution of Snapchat users determined by the variables age and current country lived in. You can see that the United States has by far the greatest density and that by far the largest number of people access Snapchat in the U.S. Most countries average user falls in the middle of the graphic, which represents people in the age bracket of their 20’s.