Thursday, September 18, 2014

Recap: Iron Viz - Reviewing the Reviewers

I recently had the opportunity to compete in the 2014 Tableau Conference Iron Viz. For those who are unfamiliar, the Iron Viz is a competition which pits three data visualization enthusiasts against each other to create the best possible visualization in 20 minutes - live onstage in front of over a thousand people. It was the fastest 20 minutes of my life. After the dust had settled, my dashboard, which analyzed the Yelp reviewers (as opposed to businesses), won the championship! Without further ado, here was my final product:


Below is the story of my road to the Iron Viz championship…

I earned my spot in the finals by winning the feeder competition Elite 8 Sports Viz Contest with my dashboard It’s 4th Down. My two other competitors, Jeffrey Shaffer and Jonathan Trajkovic, earned their spots by winning the quantified self and storytelling competitions respectively. I got to know them over the course of the week in Seattle and they are both really great guys that create fantastic content for the Tableau community.

The inspiration for my viz was a seafood restaurant I found on Yelp years ago when we were vacationing in a new city and looking for dinner. I remember it only had a few reviews and at 3 overall stars wasn’t looking that promising, but I clicked it anyways. There were two 4 stars reviews and a single 1 star review. I don’t remember the exactly wording, but the 1 star review basically said 'I hate seafood and I hate this restaurant'. I remember thinking ‘Why should I trust someone’s review of a seafood restaurant when they don’t like seafood?’ When I was looking at the Yelp data for Iron Viz, this all came back to me and my idea was born: 'Who should you trust on Yelp?'

Tactically my Iron Viz strategy basically involved taking advantage of the strengths of Tableau:
  •  Easy exploration to see what data was available and how it was structured
  • ‘Speed-of-thought’ analysis to see what insights I could pull out of the data.
  •  Rapid creation and iteration to arrive at a final design
  •  Rapid creation and iteration to start from scratch after I decided I didn’t like my ‘final’ design at the last minute



Ease of Exploration

Although we knew we were going to be competing months in advance, to make it more challenging, we were only given the data several days ahead of time - which left only a few days to explore and practice. Pulling the data apart quickly revealed it was just over 1 million Yelp reviews going back to 2004 for three cities (Las Vegas, Phoenix and Edinburgh, UK). The structure of the data had some implications given that the data is a large denormalized table. For example, because businesses are repeated for each review, the SUM of the businesses’ review are meaningless. Understanding the data structure was critical to using the right aggregations for our metrics.



Speed of Thought

I knew I wanted to evaluate the reviewers themselves, but how best to do that? I quickly created a few different calculated fields before settling on my key metric, Reviewer Error. Reviewer Error measures how far that particular reviewer’s rating varied from the overall Yelp consensus. For individual reviews this isn’t meaningful (people can have a bad experience at a good business) but when looking in aggregate you can get an idea how close or far someone is from the overall consensus. This is technically the Root Mean Square Deviation. It was easy to create this metric in Tableau:

Reviewer Error = SQRT(SUM(([Review Score]-[Business Score])^2)/SUM([Number of Records]))

My first exploration was segmenting reviewers by a few key dimensions including that user’s overall review average, how many votes that user had accrued, how many fans they had, and how many years they were Yelp elite. There were some very clear trends in the data:


Key Takeaways:
  • Reviewers whose average rating was less than 3, or worse, less than 2 had a very large error. This is likely because some people likely go on Yelp, write a bad review, and never write any other reviews.
  •  Both the number of votes a person had received and the number of fans they had were correlated to a reduction in error.
  • Reviewers with at least one year Yelp elite had a lower error, but additional years of Yelp elite didn’t lead to any significant further reduction of error.
In layman’s terms, a trusted reviewer typically has an average review greater than three stars, has many votes and fans, and has at least one year as Yelp elite. I practiced building these charts and a couple of detail charts in the two days leading up to the competition. After multiple rounds of practice I got it down to under 20 minutes. Of course, as you can see in my final dashboard, I did not include these charts.



Rapid Design and Iteration

About 18 hours before the competition, after looking at my dashboard for the 100th time, I decided the analysis was too static and didn’t tell the story of any individual reviewers. I decided to go back to the drawing board. I ended up rebuilding almost the entire dashboard the night before.

I kept my Reviewer Error metric and just started poking around and slicing different ways until I settled on looking at individual reviewers in a scatterplot. The Reviewer Error / Number of Reviews is a great scatterplot chart that shows a regression to the mean as would be expected, but also has a skew to it that indicates the frequent reviewers on Yelp really are more accurate than the overall average. More interesting, the scatterplot quickly illuminates the outliers, or reviewers who are either very good or very bad at reviewing. I’ll start with the good: Norm.



Norm has over 1,000 reviews, an error of .72 stars, and minimal bias in either direction. We can see he has been very consistent over time both in how many reviews he has left and what his average review is. I feel that Norm has a good sense of what to expect from a business. If I pick one of his recent reviews (link from the tool tip in bottom right chart), a 4 star review of ‘Delhi Indian Cuisine’, I can see he wrote a detailed review with pictures. Clicking on his profile reveals he has thousands of votes and many years at Yelp elite. Given my prior segmentation, this was not a surprise.



Now let's go to the other extreme, Diana. At nearly three times Norm's error, her error is 2.01 stars with negative .76 star bias over 120 reviews. When we select her, we can she is an 'all or nothing' type woman. All of her reviews are either 5 stars or 1 star. From her average rating and bias we can tell she is, on aggregate, a harsh critic and really hands out the 1 star reviews like it is going out of style. Selecting her recent review of Zoyo Neighborhood Yogurt we can see she gives it a 1 because of ‘flies and bugs’. Clicking her profile we can see her four most recent reviews are all 1 star reviews and every single one of them complains about the flies or insects. It makes you wonder if you can really trust her reviews at all.


Closing Thoughts

In closing, I wanted to discuss a couple of comments I received from the judges:
  •  I used Red/Blue gradient instead of an Amber/Blue gradient
  •  I did not include a color legend

Although we were hurried for time, I did consciously think about both of these aspects prior to creating the viz and wanted to share my thoughts here. Please disagree with me in the comments!
  • As for the red/blue gradient, I wanted my dashboard to have a ‘Yelp feel’ and they prominently use red throughout their site. I used red for this reason. I used blue so there was clear contrast against high and low reviews even though on the low end yelp uses grey/yellow hues.
  • I didn’t use a legend and instead opted for semantically resonant colors. High/low reviews are red/blue respectively where red is 1) hot (popular) 2) aligned to Yelp and 3) reinforced subtlety through all three charts on the right of the dashboard. There could be some argument that red is ‘bad’ such as in a status report, but when it comes to reviews and stars specifically, I've often found them red.

I have a list of about a dozen more things I would do to improve this dashboard, but let me tell you, 20 minutes goes by fast! In the spirit of the Iron Viz, I made no further updates to this workbook since I put the mouse down in the competition. The 20 minutes onstage belies the effort all three of us put in preparing for the showdown. My competitors put together great vizes and I thought Jeffrey's roulette wheel was a creative way to blend the data with story of Vegas (though losing to a pie chart would have been rough). Most importantly, the Iron Viz was a tremendous amount of fun, I learned even more about Tableau, and I got to meet many fantastic people along the way.

John

Thursday, March 20, 2014

Fourth Down - what play do you call?

My viz has been selected as a finalist in Tableau's Elite 8 Iron Viz competition! Please vote for me via hashtag:    

The fourth down is the most critical possession in football - it is the team’s final play and they are left with three basic options: Go for a first down, kick a field goal, or punt the ball away. The most important variables when making this decision are how many yards are needed to get the first down and where they are positioned on the field. Of course, this all goes out the window when it comes down to the wire in the fourth quarter.

This first half of my Iron Viz submission illustrates what NFL coaches typically do when faced with a fourth down decision. Users can see clear trends in how the most common play changes based on these two variables. Note how certain scenarios are impossible (e.g. fourth and seven on the five yard line). Increase the pressure and switch to view only fourth quarter stats and see how much riskier the play calling becomes when the entire game is on the line.

The second half of my viz provides additional detail around expected outcomes of each of the three options to help guide the decision for your team. Maybe your field goal kicker has some extra distance in their leg compared to the NFL average or perhaps your punter has control issues resulting in many short punts. These factors all influence the calculus of which play is the right one to call.

Blue 42, Blue 42, Hut Hut Hike!


Tuesday, March 4, 2014

Divvy Bikes - Data Challenge

Divvy, a bike sharing service in Chicago, recently announced a data challenge where they published approximately 750k rows worth of bike trip data in Chicago. They challenged people to build data visualizations that would reveal and showcase patterns in usage of the bike sharing service.

My submission is what I have titled the ‘Divvy Station Cockpit’. It is designed to provide an in-depth analysis of a selected station and can be used to help Divvy evaluate operations at existing stations and better plan for new ones. 

The dashboard features the following capabilities:

Major KPIs:
Average Daily Trips – How many trips depart from this station?
Relative Station Popularity – How does this station rank compared to others based on average daily departures?
Duration – When people take trips from this station, how long does it take them to reach their destination?
Duration Histogram – How are these trips distributed? Is there a cluster around a specific time in case of a common destination (e.g. Wrigleyville to the loop)
Day of First Trip – How long has this station been operating?
Weekly Trips Trend – Is traffic increasing or decreasing? Keep in mind, the data set starts in the summer and runs through December – brrrr!

Aggregated Daily Supply/Demand Curves
Throughout the day, bikes are coming and going from each Divvy station. In order to plan capacity and estimate traffic, this tool looks at the aggregated demand over the course each day of the week and can quickly identify how this station is commonly used. Is this a station where many commuters depart in the morning (increasing demand for bikes) but then return to in the evening (increasing supply)? Or is it a reverse commute when supply builds up in the morning and then is depleted in the evening? Alternatively, is the station more tourist-focused where demand is more consistent over the course day, but much higher on the weekends than weekdays? This tool can help Divvy operations plan the optimal allocation of bikes.

Frequent Destination Heat Map
Although nearby stations are unsurprisingly common destinations, occasionally there are clear trends where a riders flow to a common location further away. This heat map indicates which destinations are the most common for riders departing the selected station.

Ridership by Day and Member
For a quick look at the type of riders that are frequenting the station, we look at average daily ridership by member type and day of the week to get a better sense if it is used by subscription members or guests and when. T
his view can be used in conjunction with the Daily Demand Curve views to find stations that have a commuter usage pattern, but a larger share of Guests users, to target for marketing resources suggesting riders purchase a subscription membership.


I had a lot of fun with this viz and thanks to Divvy for making this data set available!


Tuesday, February 11, 2014

Fighting Ignorance - A Tribute to Hans Rosling

The Gapminder Foundation has a stated mission ‘to fight devastating ignorance with a fact-based worldview’. By making data available, and more importantly accessible through interactive visualizations, Hans Rosling brings to life statistics that might otherwise languish in the appendix of an almanac. We pay tribute to the work Hans has done by visualizing an educational dataset (sourced from Gapminder) showing the tremendous educational gains made around would in the past 40 years. View regional performance or choose a region to drill into individual countries. Understand educational inequality trends by comparing the average male education vs female education over time plotted against a 45 degree line. In quintessential Hans style, check out the histogram view that changes over time and can be adjusted via the year selector.


Saturday, January 18, 2014

Creating a 45 Degree Reference Line in a Tableau Scatter Plot (without SQL!)

A scatter plot chart is useful to compare two measures and quickly identify clusters, outliers and trends. One can quickly identify relationships between the two measures, for example Sales vs. Profit or Age vs. Income. While we would expect to see correlations and patterns in these examples, it does not make sense to see how close they are to parity (e.g. 42 years old vs. $51,000 in income). However when there are certain, similar measures, it may provide additional insight to see how closely the measures equal each other or if there is a divergence. Several examples of this would be goals scored at home vs. away or average male education vs. female education.

In situations where it is logical to make this comparison, a simple 45 degree line on a scatter plot can show how close or far the points are to parity. While working on my Hans Rosling - Fighting Ignorance viz, I wanted to include a reference line to help users understand educational inequality between genders. Below I've outlined the steps used to create this reference line in Tableau without using any SQL.

We begin with a scatter plot showing male education on the X axis and female education on the Y axis. At a glance, it is hard to tell how close the average male education compares to the average female education:
Our starting point - while this chart provides some insight, the gender education inequality is not obvious.
The first step is to create a calculated field that will be plotted as the reference line. I will title it 'Reference Line'. We are going to make it equal to Male Education so that at any point X the value for this field will be X: 
By creating  a field equal to male education we can plot it on the male education axis resulting in a simple 45 degree line.
Next we add the newly created calculated measure to Rows and right click to set it as 'Dual Axis' so both the educational data and reference line are on the same chart. Note that due to Tableau defaults, it colors and formats the reference line data just like the educational data. We'll be correcting that shortly!
Tableau defaults the reference line formatting to match that of education data so it is segmented by size (year) and color (geography).

Tableau defaulted the Reference Line field to Sum so we need to adjust it back to Average:
This ensures the line is at the same scale as the educational data.

Because we want to compare apples-to-apples between our data points and reference line it is critical we synchronize the axes between the data and reference line or otherwise we'll mislead the user:
It is essential the axes are synchronized, otherwise users will be misled.

Now we need to format the reference line by taking several actions. First remove the color and size variation for geography and year. Second, to increase the range of the line, we need to increase the granularity of the data so I have added the country dimension ensuring that no matter how the chart is filtered by the user, the line reaches from the lowest point to the highest. Lastly, we change the color from the default light grey to black so that it is more visible on the chart:
        

We must remember to clean up the default tooltip generated by Tableau. I have removed all of the variables and given it the description "Reference Line - Avg. Male Education = Avg. Female Education". I also removed the command buttons as they do not serve a purpose on the reference line:
Remove the default text and provide a description to help your user understand the significance of the line.


Finally we hide the second axis and our new and improved scatter plot chart with a 45 degree reference line is complete! It is now easy to see which geographies and countries have near parity between male and female education and which ones have a large gender gap.
With the reference line in place, it is now apparent that most geographies are nearing gender educational equality while Africa has diverged further from educational equality.


Thursday, January 9, 2014

3 Minute Win - San Francisco Food Trucks

One of the features Tableau is known for is the speed at which users can build insightful visualizations. The "3 Minute Win" competition hosted by Tableau challenges users to create a viz in under three minutes to show the power of the standard functionality.

For my entry I decided to combine my love of food with data publicly available from the city of San Francisco. I found an interesting data set of 'mobile food facility permits' (a.k.a. Food Trucks). Applicants must declare what items they intend to sell as part of the application process, as well as which locations they plan to operate. By leveraging the parameter functionality within Tableau and some basic filtering capabilities, I show how easily one can pull up a map of the locations serving your favorite food.

Take a look below:

Wednesday, November 20, 2013

Dreamforce Social Listening Post

My Dreamforce social listening dashboard gives users an at-a-glance view of the major trends and key conversation influencers at Dreamforce. To address the popularity of the official hashtags, I looked at a logarithmic view of the top hashtags and utilized a bubble chart showing the tweet count (size) and retweet percentage (color). The influencers section identifies the top tweeters and indicates how popular their tweets were (identified via retweet). Check out the TwitPic tab for a fun view of the many photos captured at the event!


Tuesday, September 3, 2013

The Increasing Cost of Healthcare

Visualizing Medicare data revealed several interesting financial aspects of our healthcare system. By viewing the state and classification heat map, it is apparent that California, Nevada, and New Jersey have the highest average costs. It is especially odd for New Jersey given its proximity to Maryland, which has the lowest state costs. Also interesting is the scatter plot of Diagnosis Related Group (DRG) codes, which reveals several outliers in terms of cost and patient volume. Public health administrators looking to bring down costs could use this visualization to identify procedures with the highest costs and frequency that would most benefit from efficiency improvements.