“Wrangling Data”

Riley Boice
2 min readFeb 3, 2021

--

After perusing Journalism Handbook and Source: Cleaner, Smarter Spreadsheets, I was left with more questions than insights — which surprised me because they seem to be written for data beginners.

Both articles presume a certain amount of familiarity with lingo that do not yet have. They also presume a familiarity with the process of gathering, cleaning and using data that if I had, would make these articles quite helpful (I think.)

Some take-aways:

  • A narrow, specific scope will help you to obtain the right data and not leave you with far too much to sort through.
  • However, refraining from narrowing that scope too much will allow you more maneuverability if you realize you need more information which you initially thought would be irrelevant to your question.
  • Asking the right questions of the data supplier will help you to head off confusion as you sort through and clean your data.
  • Structure. is. important. (i.e. readability to other humans is important) → in other words, keep a running diary and updated dictionary so you that you and others can read your work.
  • Decide on a consistent approach to labeling (names, dates and the like)
  • Use dashes instead of spaces when naming things to avoid confusion later on

Questions:

  • What is CSV?
  • What does it mean to filter data?
  • What are the mistakes in either of the examples shown? They are not well explained.
  • What sorts of queries or notes might one put down in their data diary if they’ve never done this before?

I chose County Index crime rates and counts by county in New York state.

I opened the data in Excel, and decided to narrow down what I would chart — I chose the six highest rates and six lowest rates and then organized those 12 from highest population to lowest. I could not figure out how to graph these two things on the same chart in a way that conveyed information visually.

I think the problem was how the spread was for population vs crime count, making the crime numbers appear to be zero even though there was a slight change in the line.

So I graphed them separately and overlaid them in my hand drawing. I hope to figure out how to do this on the computer as well.

*I realize the red vs pink doesn’t offer ideal contrast on a screen as it does in person — I will fix this next time

--

--