Friday, August 12, 2016

Quick Viz - Amazon Purchases!

Recently stumbled upon instructions on how to download my entire Amazon purchase history and decided to build a really quick viz that compared purchase totals and habits of a few friends...

Screenshot below (dash is too wide for the blog), link to dashboard on Tableau Public here.


Tuesday, July 26, 2016

2000 - 2009: Prison Sentences & Time Served

BACKGROUND

This summer has been chock full of crazy, interesting, dismaying, and uplifting national and world events. Brexit, political races and conventions, police-civilian shootings, civilian-police shootings, endless soccer (a good thing, in my opinion), and Olympic buildup, just to name a few. So much to viz; so little time.

As it turns out, my latest viz has nothing to with any of those. In the past year, like many others, I became heavily engaged and invested in the Adnan Syed case, a story first popularized by the Serial podcast. I won't go much into the details of his efforts towards post-conviction relief, but you can learn more here or here, or a zillion other places on the internet. In short, the media about his case exposed me to a great deal more of the legal and criminal justice system than I known had before. I was (and am) fascinated by its procedures, history, successes, and flaws, and wanted to contribute data to a discourse I had anecdotally encountered before. Namely, are there significant differences in prison sentences across race? Is the system biased?


DATA SOURCE AND ALTERYX PREPARATION

After a brief search of the interwebz, I came across the Bureau of Justice Statistics, which contains lots of data snippets and goodies. I came across the data I was interested in, and got really excited... until I opened up one of the zipped CSV files.


Could there possibly be a less friendly format for data analysis? (Yes, I suppose things could be infinitely worse, actually, but whatever. This meant significantly more work. Did I mention that the file for each year was a slightly different format?)

Enter Alteryx, the self-service data blending and preparation tool that any data analyst should work to become fluent in. In a previous life, I would have fired up Excel, written a bunch of custom macros in VBA and cleaned/aggregated this data into a usable format. This mini-project gave me an opportunity to finally step beyond the very basics in Alteryx into the world of Control Parameters and Macro workflows.

Here is the core of one of the workflows to cleanse and aggregate an input file. The Control Parameter was used to change the name of the input file prior to numerous steps of skipping unused rows, deselecting unused columns, and adding new columns for labeling and classification.


After the cleansing, the data from each file is dumped into a TDE of ever-increasing size. Here, again, a Control Parameter is used to insert the proper year into the dataset (each year's data was contained in separate files).
Finally, at the end of the day, the data was significantly cleaner and much improved. Thank you, Alteryx!



DATA VIZ

For the viz itself, there were a few things I wanted to try out.

  1. Explore the power of barbell charts. Or perhaps, more specifically, the power of connecting two data points using a line to allow the user to create a strong mental association that can help sift through visual clutter. If my butchered explanation doesn't do it for you, go to the source. Hardly a text that flies under the radar, but one I finally consumed.
  2. Explore the power of small-multiples & area charts. I've typically had difficulty finding scenarios in which area charts effectively displayed information in a manner that I found functionally and visually appealing; I thought that, by overlapping two sets of data, with the intention to show a "gap/difference" using the darker shade, the reader's eye would be able to rapidly pick up these "gaps" against datapoints where no gap existed. Hope it worked-- let me know what you think!
Enjoy!

Sunday, June 26, 2016

Updated: 2016 Presidential Primaries - Super PACs & Independent Expenditures

It has been a few months since I've been able to viz for fun, but I wanted to make sure I submitted an entry for the latest Tableau Iron Viz feeder! Lucky for me, this round is politics themed, which gave me an opportunity to refresh and enhance my Independent Expenditures viz.

Now that the primary season is over, I was able to pull data for the entire 2016 cycle, and also include a comparison to 2012. Not surprisingly, 2016 has seen a significant jump in independent expenditures from Super PACs. More surprising:

  1. How flat-footed the Democratic party was caught in 2012, compared to their Republican counterparts, in taking advantage of the new campaign financing infrastructure provided by Citizens United and SpeechNow.
  2. How much spending is loosed multiple months before the Iowa caucuses.
Original post and viz here!

Thursday, March 31, 2016

2016 Presidential Election - Super PACs & Independent Expenditures

You may be aware that the United States is currently in a presidential election year. I know there are many television and news outlets helping you navigate the resulting sea of information (and misinformation). I'm not sure how helpful you will find this campaign finance viz in the grand scheme of things, but I hope there are a few "Wait, wut?" or "Srsly?" moments. I know there were for me...

In the previous presidential election cycle, I had heard of Super PACs and had a high-level understanding of how they functioned. This time around, I was super interested in finding out what candidates they support and how much of a financial impact they make. I always found it curious that an individual could only donate a maximum of $2700 directly to a candidate's campaign, while a Super PAC that is "separate" from and "unaffiliated" with said candidate's campaign could receive unrestricted millions from wealthy donors and corporations.

The viz below, armed with Federal Election Commission data as of 3/31/2016, takes a look at the Independent Expenditure landscape, which is overwhelmingly dominated by Super PACs. Do you spot any interesting trends? I've listed a few of my favorites below...

"Wait, wut?" Jeb Bush had a third party spend more than $87M on his behalf, and he managed to win zero states and four delegates? Impressive!

Republican backers are outspending Democratic counterparts (in terms of Independent Expenditures) by more than 13-to-1? "Srsly?"

This obviously isn't comprehensive of the entire campaign financing landscape, but I'm hoping to tackle this again in the future. We do have until November after all...

Sunday, February 28, 2016

NFL Injuries - 2015 Regular Season

A few weeks ago, as I was trying to find an interesting dataset to analyze, I promised myself I wouldn't be drawn into creating another sports-related viz. As you might have guessed from the title, or from a peek at the viz below, I wasn't very successful. This will be the last post on sports-related content for a while (if I can help it).

I happened to be scrolling through the NFL's website, and came across a well structured set of official injury reports from last season. Immediately, I knew this was an opportunity to answer a few questions that had been on my mind for a while now:
  1. What NFL position is the most dangerous to play? QB and WR concussions gain a lot of media attention and protection from the referees, but I suspected that defensive players were likely subject to many unsung perils (a hunch that proved to be true, as you can see that linebacker was the most injured position last year).
  2. In Tableau, how do you plot data points onto a custom image/background/map? I had read the nice and straightforward knowledge base article on it, but was itching to give it a go in practice.
  3. How do you use basic (free) web scraping tools? I had seen references to both import.io and datascraping.co and decided to give the latter one a shot. All in all, it turned out to be a fairly easy tool to use, although it still left me with a fair amount of manual cleanup to do. I plan on giving import.io a shot the next time I have data to pull from a website.
That should be enough background for now-- enjoy the viz below! I was startled/disturbed by numerous takeaways, including:
  1. The general degradation of players' bodies over the season. Even with the dip in injuries due to heavy bye weeks, the trend is unmistakeable. As the season continues, more players are showing up on the injury report.
  2. QBs do not get injured all that often (relative to other positions, anyway, as they are ranked 11th in terms of injury volume).
  3. Lower body injuries are the most common ailments for almost every single position.
  4. Defensive players are injured far more often than offensive players (and yet, the most prominent anti-head-hunting and anti-targeting rules are in place to protect offensive players).
  5. I almost forgot-- don't let your kid play linebacker!

Tuesday, February 9, 2016

High School Football Recruiting (2010 to 2016)

Hello world! I have recently had the pleasure of joining an amazing group of data consultants at Slalom Silicon Valley's Information Management & Analytics practice and will be using this space to learn more about and share my data visualization and Tableau work. Check out my first viz below!

Following college football's National Signing Day last week and the Super Bowl this past weekend, I wanted to take a look at where the best and most highly touted (high school) recruits chose to play their college football. Not surprisingly, I found that the best (four and five star) recruits tend to sign at the same schools year after year, with SEC schools snapping up the lion's share. Nonetheless, it was interesting to see how the rest of the field compared, and which schools were able to draw kids from around the country.


Enjoy!