Critique and Create Project 1: Amounts
Introduction
These projects ask you to critique a series of visualizations and then to apply best practices as you create your own. Follow along below, but ultimately do the work using the corresponding .qmd file on Posit Cloud.
This week’s data
The spills2019 data set is sourced from the U.S. Coast Guard’s National Response Center. Take a look at it to understand the kind of data included. Only 100 rows are shown here, but the full set we’re using has over 30,000.
Critique
As an example, I’ll show two visualizations of amounts that can be drawn from this data set. Pay attention to the techniques (the code) used to create the visualizations, but also be willing to judge the merits of both. In other words, consider how each visualization was made, and why each visualization is well made or not well made.
For this first project, the polished visualization does show you the code used to create it. In addition to this code, consult the visualizing amounts methods page.
Rough figure
Code
rough_draft <-
spills2019 |>
drop_na(DESC_REMEDIAL_ACTION) |>
count(DESC_REMEDIAL_ACTION, sort = TRUE) |>
slice_head(n = 20)
rough_draft |>
ggplot(aes(
y = reorder(DESC_REMEDIAL_ACTION, n),
x = n
)) +
geom_col()
For this visualization, I used a few techniques to make a working visualization.
- Dropped any rows that had
NAin theDESC_REMEDIAL_ACTIONcolumn. - Used
count()to tabulate the number of times each value was used in that column, and sorted the result. - Used
slice_head()to limit the resulting table to 20 rows. - Used
ggplot()andgeom_col()to create a bar chart. - Set the X-axis to the column called
n, made in step 2. - Sorted the Y-axis by the values of
nusingreorder().
Compare this chart to the improved versions in the next two section.
Half-improved figure
Code
rough_draft |>
ggplot(aes(y = reorder(DESC_REMEDIAL_ACTION, n),
x = n)) +
geom_col(aes(fill = DESC_REMEDIAL_ACTION=="NONE"),
show.legend = FALSE) +
theme_minimal() +
labs(x = "Incidents",
y = "Remedial action",
title = "Many environmental spills recorded in 2019 had no remedial action.") +
scale_x_continuous(expand = c(0,0),
labels = scales::label_comma()) +
theme(panel.grid.major.y = element_blank(),
plot.title.position = "plot")
This chart starts from the same data, which is a little messy. What does it improve? What more is it doing, and what less does it show? Where does it still fall short? (No need to answer these!)
Polished figure
Code
polished_draft <-
spills2019 |>
drop_na(DESC_REMEDIAL_ACTION) |>
# count the total number of incidents
mutate(total_num = n()) |>
# clean up some messy records
mutate(DESC_REMEDIAL_ACTION = DESC_REMEDIAL_ACTION |>
str_remove_all("[.]$") |>
str_replace_all("NOTIFICATIONS", "NOTIFICATION")) |>
# split up remedial actions when multiple actions are recorded
mutate(DESC_REMEDIAL_ACTION = strsplit(DESC_REMEDIAL_ACTION, ", ")) |>
unnest_longer(DESC_REMEDIAL_ACTION) |>
# change remedial actions from uppercase to mixed case
mutate(DESC_REMEDIAL_ACTION = DESC_REMEDIAL_ACTION |>
str_to_sentence()) |>
# instead of count(), use group_by() and summarize() to keep total_num
group_by(total_num, DESC_REMEDIAL_ACTION) |>
summarize(number = n()) |>
ungroup() |>
# put rows in order by number
arrange(desc(number)) |>
# take the top 20
slice_head(n = 20)
polished_draft |>
ggplot(aes(y = reorder(DESC_REMEDIAL_ACTION, number),
x = number/total_num)) +
geom_col(aes(fill = DESC_REMEDIAL_ACTION=="None"),
show.legend = FALSE) +
theme_minimal() +
labs(x = "Incidents",
y = "Remedial action",
title = "Many environmental spills recorded in 2019 had no remedial action.") +
scale_x_continuous(expand = c(0,0),
labels = scales::label_percent()) +
scale_fill_manual(values = c("black", "red")) +
theme(panel.grid.major.y = element_blank(),
plot.title.position = "plot")
Critique the visualization
This last version of the figure takes a different technique. Hopefully it’s a stronger visualization in the end.
Wilke chapters 22, 23, and 29 lay out some important practices for visualizations. Consider them as you answer these three questions:
- In what noticeable ways has the polished version improved upon the first rough draft?
- Which of these changes do you think made the biggest difference?
- Is there anything you see that could still be improved?
Rough table
In addition to an attractive figure, it is sometimes helpful to show the numbers in a table. Both the rough table and the polished table will start from the polished data set prepared above, since that data is already a little cleaner.
Code
polished_draft |>
gt()| total_num | DESC_REMEDIAL_ACTION | number |
|---|---|---|
| 25265 | None | 2081 |
| 25265 | Investigation underway | 1457 |
| 25265 | Absorbents applied | 1231 |
| 25265 | Clean up underway | 1211 |
| 25265 | Booms applied | 1033 |
| 25265 | Contractor has been hired | 942 |
| 25265 | Made notification | 638 |
| 25265 | Clean up crew on-site | 546 |
| 25265 | Notification | 505 |
| 25265 | Making notification | 489 |
| 25265 | Cleanup completed | 482 |
| 25265 | Dissipate naturally | 387 |
| 25265 | Investigation is underway | 304 |
| 25265 | Material contained | 289 |
| 25265 | Shutdown system | 251 |
| 25265 | Clean up crew enroute | 244 |
| 25265 | Vac truck used | 227 |
| 25265 | Isolated area | 196 |
| 25265 | Secured operations | 132 |
| 25265 | Repairs made | 128 |
This rough table doesn’t provide much more detail than the visualization, and it unhelpfully includes one column of repeating values.
Polished table
Code
polished_draft |>
slice_head(n = 10) |>
mutate(percent = number / total_num) |>
select(-total_num) |>
gt() |>
fmt_percent(columns = percent) |>
fmt_number(columns = number, decimals = 0) |>
tab_spanner(label = "Incidents", columns = c("number", "percent")) |>
cols_label(matches("percent") ~ "%",
matches("number") ~ "n",
contains("DESC_R") ~ "Remedial action") |>
opt_stylize(style = 6, color = "cyan") |>
tab_header(title = "Responding to environmental hazards",
subtitle = "USCG National Response Center, 2019")| Responding to environmental hazards | ||
|---|---|---|
| USCG National Response Center, 2019 | ||
| Remedial action | Incidents | |
| n | % | |
| None | 2,081 | 8.24% |
| Investigation underway | 1,457 | 5.77% |
| Absorbents applied | 1,231 | 4.87% |
| Clean up underway | 1,211 | 4.79% |
| Booms applied | 1,033 | 4.09% |
| Contractor has been hired | 942 | 3.73% |
| Made notification | 638 | 2.53% |
| Clean up crew on-site | 546 | 2.16% |
| Notification | 505 | 2.00% |
| Making notification | 489 | 1.94% |
Critique the table
The second table took more effort to prepare. At least some of that effort was worth it.
Wilke chapter 22 discusses some principles of table design. Consider it as you answer the first of these two questions:
- Compare the end results. Which parts of the second table show an improvement from the first?
- Consider the code used to generate the second table. Which steps are unclear or need explanation? (You’re welcome to tinker around with the code to see how making slight changes can make a difference, but please by considering the example provided for you.)
Create
Recreating a visualization
Taking inspiration from the code for the Rough figure section above, modify spills2019 until it has 20 rows and the first few rows look like this:

Then use that table to recreate the following visualization:

Write code to recreate this table and figure.
Improving a visualization
Do something to the above visualization to improve it, or create a new visualizaton of some amounts from this data set. Your final product doesn’t have to be as fully polished as the Polished figure shown above, but do consider the relative strengths and weaknesses of the visualization you’ve just recreated. Write your code here:
Write code to improve upon this figure.
Creating a table
Finally, take inspiration from the Polished table section above to create a table of numbers here. The final result can be quite simple.
Write code to improve upon this figure.