Now that we’re hot off of 2014’s North Texas Beer Week… Have you ever wondered what Dallas’s favorite local craft beer is? You’re probably thinking “Yeah, it’s clearly Lone Star because it’s the ‘National Beer of Texas'”, or “duh – it’s the one in my hand right now, bro!”.
While valid guesses, they are clearly not correct (and you should feel bad about those guesses). The correct answer is: Lakewood Brewing Company’s “Temptress” – a milk stout. Now Dallas – you’re probably now thinking “Well, Lone Star and the beer in my hand are clearly the second and third favorite local craft beers.”
Well… this is the point where I ask you to stop thinking such terrible thoughts – those answers are also not correct (and you should continue to feel bad). The correct answers are: Peticolas Brewing Company’s “Velvet Hammer” — an imperial red — and Community Beer Company’s “Mosaic IPA” — an American-style IPA.
How do I know that Temptress, Velvet Hammer, and Mosaic IPA—in that order—are Dallas’s three favorite beers? These beers are on tap, or (for Temptress and Mosaic IPA) on shelves all across town. But just being available doesn’t make a beer Dallas’s favorite – or else those truly wretched thoughts you were having about Lone Star would have been true.
Well as a beer nerd and a stats nerd, I decided I just had to know: of all the local craft beers that are now produced and available throughout DFW – which are Dallas’s favorites? Let’s get nerdy.
I created a relatively simple survey on Google Docs. This survey listed 35 beers produced in (the broader) DFW area. For a beer to get on the list it had to meet the following criteria:
- The brewery itself must have been in operation for at least 1 year
- The beer itself must have been available for at least the past six months
- It has to be a year-round beer (no seasonals, specials, or one-offs)
That qualified 35 beers from the following breweries1: Franconia, Peticolas, Revolver, Martin House, Four Corners, Lakewood, Rahr, Deep Ellum, Community, and Cedar Creek.
When I had my list, I randomized the order in which these beers were listed and sent the survey out. Here’s a quick breakdown of some demographics:
- 202 respondents. One was excluded2.
- Gender: 36 Females, 160 Males, 1 Meat Popsicle, 1 Unicorn, 1 Manatar, and 3 non-responses.
- 33 People professionally work with beer (brewer, bartender, waitstaff, etc…).
- 58 People consider themselves homebrewers.
The survey asked people to respond to each beer with one of the 6 following options 3:
- It is one of my favorite beers.
- I like this beer.
- This beer is OK.
- I don’t like this beer.
- I’ve never had this beer.
- I have no opinion.
At this point, we can just count how many people, out of 201, had the answers above for each of the beers in the survey. So let’s get down to it:
There are some clear favorites: Temptress, Velvet Hammer, Revolver’s Blood & Honey, and Mosiac IPA all have a lot of “Favorite” responses. You might be thinking, “Yo, Derek, you didn’t say a thing about Blood & Honey before—that’s my go to crushable—so maybe you’re lying about Lone Star too?”. If I were inclined to respond to such accusations, I’d say that 1) I’m building suspense (or boring you to tears) and 2) I’ve grown really tired of you talking about Lone Star – but I’m above that so I won’t say it.
As a stats nerd, though, this picture feels a bit… rudimentary. There are better ways to figure out and visualize Dallas’s favorite beer. So let’s turn to one of my favorite statistical methods: Correspondence Analysis (CA). CA is a technique that takes a large table made up of a bunch of variables (here: the responses) and turns them into new variables that better represent what’s happening4.
The data from above looks something like this:
|Beer||FAVORITE||LIKE||OK||DO NOT LIKE||Never Had||No Opinion|
|Four Corner’s Block Party||15||83||29||6||66||2|
So what will CA do for us with a table of data like this? It tells us which beers are most similar to one another – based on all the different categories. It can also tell us if any of the categories are similar to one another, too. Most importantly, it tells us which beers are more related to responses than other beers. Let’s take a look at what a CA would produce:
CA produces for us these new variables—these variables are called “components”—denoted by the axes (horizontal and vertical lines) in these pictures. There are 3 other axes besides these – but those aren’t very important. Just these first two explain 87% of the entire data.
With what we know about CA we can say some of the following:
- Temptress, Velvet Hammer, Blood & Honey, and Mosaic are more associated with “A FAVORITE” than other beers (both figures)
- The responses of “OK” and “DO NOT LIKE” are essentially the same – which probably means people are being nice when they say “OK” or they’re being mean when they say “DO NOT LIKE”.
- The lower left of the left figure shows Cedar Creek Scruffy’s, Cedar Creek Elliot’s Phoned Home, and Martin House XPA – which means they are nearly identical based on their responses; the responses being that most people haven’t had these beers. Sad times.
Let’s go a bit further. We know a bit about this data to, perhaps, make it easier to understand. Let’s combine “OK” with “DO NOT LIKE” – because they are basically one in the same here. We’ll also combine “NO OPINION” with “NEVER HAD” – so that we can group together the responses that are basically non-responses. Let’s do another CA and this time color each beer by the responses they are most similar to.
With the combined responses – we can see the general configuration is essentially the same. Except this time we can explain 92.5% of the data instead of just 87% (take that Lone Star!). It’s also a little clearer that from right to left is a gradient of liking (or ever having) a beer. Now let’s take a look at the beers, colored by which response they are most similar to:
Now we have a much clearer idea of which beers people have never really had (in gray), which ones are not particularly cared for (in red), which ones are liked (in yellow), and which are Dallas’s favorites (in green).
The favorites are still Temptress, Velvet Hammer, Blood & Honey, and Mosaic. So why did I exclude poor ol’ Blood & Honey from the top 3? Let’s take a look at the responses in these 4 categories like we did initially. Beers are sorted by those with the most “A FAVORITE” responses:
Let’s also look at beers sorted by fewest responses of “OK/DO NOT LIKE”:
Now we have a bit different of a perspective – one that we can also get directly out of the CA results. Some beers are very related to “A FAVORITE” while at the same time rarely ever get a “DO NOT LIKE”. Unfortunately for Blood & Honey – the responses for “A FAVORITE”, “LIKE”, and “DO NOT LIKE” are equally likely.
But for Temptress, Velvet Hammer, and Mosaic IPA – very few people would say they “DO NOT LIKE” these beers. Thus, these three beers—in that order—are Dallas’s favorite beers. And that’s just science.
So what’s next? In about a year I’ll try to re-do this survey. That’s because by then approximately 30,786 breweries are, apparently, going to be open in Dallas (thanks, urban sprawl!), and many of the breweries that are currently open—but didn’t qualify this time—will qualify in a year.
All analyses performed in R. Correspondence Analysis was performed with the ExPosition package – a package created by particularly attractive and smart people. Data available here5. Code to recreate these analyses here6.
I’m tired just writing this and I’m sure you’re tired just reading it. So let’s go get some Lone Stars.
1 I had only realized after I sent out the survey I had made 2 glaring errors. I mistakenly excluded Firewheel and Armadillo Ale Works. Woops – sorry!
2They responded with “I’ve never had this beer” to all beers.
3For the stats nerds: these are survey options not usually seen. Often times when you get a survey, you’re asked to respond with a 1, 2, 3, 4, or 5 (or some similar numeric scale). Well, what if people have no opinion? What if they don’t want to answer the question? They need a way to opt out. Also, categories aren’t numbers, you dummy! For your (statistical) health!
4 For the stats nerds, technically both the beers and the responses are variables. The observations (people) are kind of hiding. Each person simply helps increase the number of responses within a particular cell of this table. CA is analogous a principal components analysis but for data more suited for χ2 analyses.
5Some responses are decimals. This is because some people left their responses blank (instead of choosing the very comprehensive categories I outlined – jerks). When a response was blank, I just replaced it with the average response.
6It’s in a text file, but, change the extension to .R to use it more easily with R.