Tuesday, July 26, 2016

2000 - 2009: Prison Sentences & Time Served

BACKGROUND

This summer has been chock full of crazy, interesting, dismaying, and uplifting national and world events. Brexit, political races and conventions, police-civilian shootings, civilian-police shootings, endless soccer (a good thing, in my opinion), and Olympic buildup, just to name a few. So much to viz; so little time.

As it turns out, my latest viz has nothing to with any of those. In the past year, like many others, I became heavily engaged and invested in the Adnan Syed case, a story first popularized by the Serial podcast. I won't go much into the details of his efforts towards post-conviction relief, but you can learn more here or here, or a zillion other places on the internet. In short, the media about his case exposed me to a great deal more of the legal and criminal justice system than I known had before. I was (and am) fascinated by its procedures, history, successes, and flaws, and wanted to contribute data to a discourse I had anecdotally encountered before. Namely, are there significant differences in prison sentences across race? Is the system biased?


DATA SOURCE AND ALTERYX PREPARATION

After a brief search of the interwebz, I came across the Bureau of Justice Statistics, which contains lots of data snippets and goodies. I came across the data I was interested in, and got really excited... until I opened up one of the zipped CSV files.


Could there possibly be a less friendly format for data analysis? (Yes, I suppose things could be infinitely worse, actually, but whatever. This meant significantly more work. Did I mention that the file for each year was a slightly different format?)

Enter Alteryx, the self-service data blending and preparation tool that any data analyst should work to become fluent in. In a previous life, I would have fired up Excel, written a bunch of custom macros in VBA and cleaned/aggregated this data into a usable format. This mini-project gave me an opportunity to finally step beyond the very basics in Alteryx into the world of Control Parameters and Macro workflows.

Here is the core of one of the workflows to cleanse and aggregate an input file. The Control Parameter was used to change the name of the input file prior to numerous steps of skipping unused rows, deselecting unused columns, and adding new columns for labeling and classification.


After the cleansing, the data from each file is dumped into a TDE of ever-increasing size. Here, again, a Control Parameter is used to insert the proper year into the dataset (each year's data was contained in separate files).
Finally, at the end of the day, the data was significantly cleaner and much improved. Thank you, Alteryx!



DATA VIZ

For the viz itself, there were a few things I wanted to try out.

  1. Explore the power of barbell charts. Or perhaps, more specifically, the power of connecting two data points using a line to allow the user to create a strong mental association that can help sift through visual clutter. If my butchered explanation doesn't do it for you, go to the source. Hardly a text that flies under the radar, but one I finally consumed.
  2. Explore the power of small-multiples & area charts. I've typically had difficulty finding scenarios in which area charts effectively displayed information in a manner that I found functionally and visually appealing; I thought that, by overlapping two sets of data, with the intention to show a "gap/difference" using the darker shade, the reader's eye would be able to rapidly pick up these "gaps" against datapoints where no gap existed. Hope it worked-- let me know what you think!
Enjoy!