Critique and Create Project 1: Amounts

Introduction

These projects ask you to critique a series of visualizations and then to apply best practices as you create your own. Follow along below, but ultimately do the work using the corresponding .qmd file on Posit Cloud.

This week’s data

The spills2019 data set is sourced from the U.S. Coast Guard’s National Response Center. Take a look at it to understand the kind of data included. Only 100 rows are shown here, but the full set we’re using has over 30,000.

Critique

As an example, I’ll show two visualizations of amounts that can be drawn from this data set. Pay attention to the techniques (the code) used to create the visualizations, but also be willing to judge the merits of both. In other words, consider how each visualization was made, and why each visualization is well made or not well made.

For this first project, the polished visualization does show you the code used to create it. In addition to this code, consult the visualizing amounts methods page.

Rough figure

Code
rough_draft <- 
  spills2019 |> 
  drop_na(DESC_REMEDIAL_ACTION) |> 
  count(DESC_REMEDIAL_ACTION, sort = TRUE) |> 
  slice_head(n = 20)

rough_draft |>
  ggplot(aes(
    y = reorder(DESC_REMEDIAL_ACTION, n),
    x = n
  )) +
  geom_col()

For this visualization, I used a few techniques to make a working visualization.

  1. Dropped any rows that had NA in the DESC_REMEDIAL_ACTION column.
  2. Used count() to tabulate the number of times each value was used in that column, and sorted the result.
  3. Used slice_head() to limit the resulting table to 20 rows.
  4. Used ggplot() and geom_col() to create a bar chart.
  5. Set the X-axis to the column called n, made in step 2.
  6. Sorted the Y-axis by the values of n using reorder().

Compare this chart to the improved versions in the next two section.

Half-improved figure

Code
rough_draft |> 
  ggplot(aes(y = reorder(DESC_REMEDIAL_ACTION, n),
             x = n)) +
  geom_col(aes(fill = DESC_REMEDIAL_ACTION=="NONE"),
           show.legend = FALSE) +
  theme_minimal() +
  labs(x = "Incidents",
       y = "Remedial action",
       title = "Many environmental spills recorded in 2019 had no remedial action.") +
  scale_x_continuous(expand = c(0,0), 
                     labels = scales::label_comma()) +
  theme(panel.grid.major.y = element_blank(),
        plot.title.position = "plot")

This chart starts from the same data, which is a little messy. What does it improve? What more is it doing, and what less does it show? Where does it still fall short? (No need to answer these!)

Polished figure

Code
polished_draft <- 
  spills2019 |> 
  drop_na(DESC_REMEDIAL_ACTION) |> 
  
  # count the total number of incidents
  mutate(total_num = n()) |> 
  
  # clean up some messy records
  mutate(DESC_REMEDIAL_ACTION = DESC_REMEDIAL_ACTION |> 
           str_remove_all("[.]$") |> 
           str_replace_all("NOTIFICATIONS", "NOTIFICATION")) |> 
  
  # split up remedial actions when multiple actions are recorded
  mutate(DESC_REMEDIAL_ACTION = strsplit(DESC_REMEDIAL_ACTION, ", ")) |> 
  unnest_longer(DESC_REMEDIAL_ACTION) |> 
  
  # change remedial actions from uppercase to mixed case
  mutate(DESC_REMEDIAL_ACTION = DESC_REMEDIAL_ACTION |> 
           str_to_sentence()) |> 
  
  # instead of count(), use group_by() and summarize() to keep total_num
  group_by(total_num, DESC_REMEDIAL_ACTION) |> 
  summarize(number = n()) |> 
  ungroup() |> 
  
  # put rows in order by number
  arrange(desc(number)) |> 
  
  # take the top 20
  slice_head(n = 20)

polished_draft |> 
  ggplot(aes(y = reorder(DESC_REMEDIAL_ACTION, number),
             x = number/total_num)) +
  geom_col(aes(fill = DESC_REMEDIAL_ACTION=="None"),
           show.legend = FALSE) +
  theme_minimal() +
  labs(x = "Incidents",
       y = "Remedial action",
       title = "Many environmental spills recorded in 2019 had no remedial action.") +
  scale_x_continuous(expand = c(0,0), 
                     labels = scales::label_percent()) +
  scale_fill_manual(values = c("black", "red")) +
  theme(panel.grid.major.y = element_blank(),
        plot.title.position = "plot")

Critique the visualization

This last version of the figure takes a different technique. Hopefully it’s a stronger visualization in the end.

You respond

Wilke chapters 22, 23, and 29 lay out some important practices for visualizations. Consider them as you answer these three questions:

  1. In what noticeable ways has the polished version improved upon the first rough draft?
  2. Which of these changes do you think made the biggest difference?
  3. Is there anything you see that could still be improved?

Rough table

In addition to an attractive figure, it is sometimes helpful to show the numbers in a table. Both the rough table and the polished table will start from the polished data set prepared above, since that data is already a little cleaner.

Code
polished_draft |> 
  gt()
total_num DESC_REMEDIAL_ACTION number
25265 None 2081
25265 Investigation underway 1457
25265 Absorbents applied 1231
25265 Clean up underway 1211
25265 Booms applied 1033
25265 Contractor has been hired 942
25265 Made notification 638
25265 Clean up crew on-site 546
25265 Notification 505
25265 Making notification 489
25265 Cleanup completed 482
25265 Dissipate naturally 387
25265 Investigation is underway 304
25265 Material contained 289
25265 Shutdown system 251
25265 Clean up crew enroute 244
25265 Vac truck used 227
25265 Isolated area 196
25265 Secured operations 132
25265 Repairs made 128

This rough table doesn’t provide much more detail than the visualization, and it unhelpfully includes one column of repeating values.

Polished table

Code
polished_draft |> 
  slice_head(n = 10) |> 
  mutate(percent = number / total_num) |> 
  select(-total_num) |> 
  gt() |> 
  fmt_percent(columns = percent) |> 
  fmt_number(columns = number, decimals = 0) |> 
  tab_spanner(label = "Incidents", columns = c("number", "percent")) |> 
  cols_label(matches("percent") ~ "%",
             matches("number") ~ "n",
             contains("DESC_R") ~ "Remedial action") |> 
  opt_stylize(style = 6, color = "cyan") |> 
  tab_header(title = "Responding to environmental hazards",
             subtitle = "USCG National Response Center, 2019")
Responding to environmental hazards
USCG National Response Center, 2019
Remedial action Incidents
n %
None 2,081 8.24%
Investigation underway 1,457 5.77%
Absorbents applied 1,231 4.87%
Clean up underway 1,211 4.79%
Booms applied 1,033 4.09%
Contractor has been hired 942 3.73%
Made notification 638 2.53%
Clean up crew on-site 546 2.16%
Notification 505 2.00%
Making notification 489 1.94%

Critique the table

The second table took more effort to prepare. At least some of that effort was worth it.

You respond

Wilke chapter 22 discusses some principles of table design. Consider it as you answer the first of these two questions:

  1. Compare the end results. Which parts of the second table show an improvement from the first?
  2. Consider the code used to generate the second table. Which steps are unclear or need explanation? (You’re welcome to tinker around with the code to see how making slight changes can make a difference, but please by considering the example provided for you.)

Create

Recreating a visualization

Taking inspiration from the code for the Rough figure section above, modify spills2019 until it has 20 rows and the first few rows look like this:

Then use that table to recreate the following visualization:

You code

Write code to recreate this table and figure.

Improving a visualization

Do something to the above visualization to improve it, or create a new visualizaton of some amounts from this data set. Your final product doesn’t have to be as fully polished as the Polished figure shown above, but do consider the relative strengths and weaknesses of the visualization you’ve just recreated. Write your code here:

You code

Write code to improve upon this figure.

Creating a table

Finally, take inspiration from the Polished table section above to create a table of numbers here. The final result can be quite simple.

You code

Write code to improve upon this figure.