DFW’s Favorite Breweries

Oh hello again.

Now that science has determined Dallas’s favorite year round/flagship DFW-made beer, science must answer the following question: what is DFW’s favorite brewery?

You might be thinking: “Derek. That’s a stupid question that doesn’t require science. Given the abundance of overpriced Miller Lite at Jerry’s Dome of Eminent Domain, the answer is MillerCoors (which is produced by smashing frosty-cold bullet trains into mountains)”. While it might be technically correct (based on sales) that’s just gross and you should feel gross for having such gross thoughts.

Unlike before — I’m not going to tell you upfront which breweries are the best. You’re going to have to get nerdy with me (or just scroll to the end). So let’s continue the tradition1!

So how can we determine which of DFW’s breweries are the best? Well, you might be thinking, “don’t we have the (much reviled) Yelp average ratings?” or “I gave it 5 stars on Facebook so it’s clearly the best.”. Yeah, sure. If you go to Facebook, you see how many people rate Lakewood Brewing Company with 5, 4, 3, 2, or 1 star. You can do the same with Yelp, but you need to make sure to go find the hidden ratings, too. So, for this venture into stats and beer nerdery, I aggregated all the ratings from Yelp and from Facebook for all the DFW area craft breweries2. This gives me a count of, for example, how many 5 star ratings a brewery has (per platform: Facebook or Yelp).

Before we go on, let’s get something quite obvious out of the way. The 5 star all-purpose rating system is… flawed. In fact, these types of systems are usually despised. It’s pretty well documented, especially here in DFW, that ratings systems need to be more elaborate — rating different aspects of something, instead of an all-purpose feel-goodery star system (as if it were kindergarten and you didn’t knock the blocks down today — 5 stars for not being a clumsy 4 year old).

So the average rating might be quite unfair for these breweries. Are people giving stars because they are architecture nerds and love the actual building? Was it the tour? General opinion on all the beers? Who knows. What we do know is that the 5 star all-purpose feel-goodery system is flawed. And some businesses are very anti-Yelp because of this all-purpose feel-goodery star system.

Sometimes, when averaged together, the stars tell you just enough. But when it comes to these breweries, as we’ll see, the average tells you very little. However, when we take a closer look — the distribution of stars speaks volumes. Let’s begin with just looking at the frequency of ratings for all the DFW breweries. We’ll also sort them (top to bottom) the total number of ratings per brewery, with “average stars” on the right:
1

Here, we can see that Rahr & Sons and Deep Ellum Brewing have the most overall ratings in DFW. So, let’s sort this by average rating (average of Facebook & Yelp):

2

From the looks of both of these pictures, it really seems as though 3, 2, and 1 star ratings are rarely, if ever, used. This suggests that, for the most part, when people rate these breweries 5 means “Great”, 4 means “Good” and anything else means “Relatively Unsatisfactory”. So from here on out, I’m going to combine 3, 2, and 1 star ratings into a single category of “Not Good”.

2_5_new

But that still feels weird, so let’s look at things proportionally: that is, the percentage of ratings for each brewery:

3_replace2

From the looks of this, you’d probably think Peticolas is DFW’s favorite brewery. And then I would kindly interject and say “Your thought lacks science and is thus far incorrect!”.

When we look at these ratings, we’ve probably noticed right away that all ratings exist between 4.45 and 4.8. In fact, 7 different breweries have ratings between 4.63 and 4.67. So if we go just by average ratings on a (fictitiously) 5 point all-purpose feel-goodery kindergarten star scale — we’d conclude “they’re all pretty good so let’s go party.”.

So, how can we figure out which brewery really is the best? And how can we do that when the number of overall ratings are so different between breweries? By now you’re thinking the answer to that is “Science, duh”. So let’s science.

The data here look something like this:

 

Brewery 5 Stars 4 Stars 3, 2, or 1 star
903 289 40 28
Rahr & Sons 2690 726 277

 

where each row is a brewery, and under the ratings columns, are the total number of stars from both Facebook and Yelp3. One of the best ways to analyze this type of data is with Correspondence Analysis (CA). If you’re not into stats, avert your eyes for a moment…

For the stats nerds: CA is a technique that takes a large table made up of a counts, and finds the best overall representations of these counts. Like PCA, CA produces components. These components explain the maximum possible variance in descending order. But these components are derived under χ2 assumptions. However, CA—unlike other techniques—takes into account the total number of ratings (which is different for each brewery). That means we can more fairly analyze the ratings, even when the overall number of ratings is very different for each brewery. In this application of CA, we’re going to use the asymmetric version — where the columns are privileged. The privilege here is that we want the columns to define a maximum possible boundary of where the breweries can go. This is called a simplex.

Back to beer business. So, with some statistical magic, let’s start to find out which breweries can lay claim to being the best. First, let’s look at the ratings:

4

The configuration of ratings here defines a boundary, that can be broken into regions:

5 6

Those regions reflect 3 different traits of how a brewery receives it’s “average” rating. The purple region is due to breweries that pretty much get 5s and 4s. The orange region is due to breweries that get 5s and {3, 2, 1} ratings. And finally, the red region is due to breweries that are more associated with 4s, and {3, 2, 1} ratings than the other breweries. So let’s put the breweries in:

7

All those purple dots are the breweries. Note how close they are to “5 stars”. Let’s pause a moment. We can already assume that the average ratings-type system is flawed — people love to love their favorite things. Because the 5s are being used a little too much, we can’t figure out which breweries are really the best just by average. We need to use the other ratings to find this out. Let’s pretty that last picture up a bit.

89

A little better. Now we can see the breweries’ logos and where they fall in these boundaries. If you’re here for beer… avert your eyes again.

For the R nerds: I searched high and low for a way to plot raster graphics onto a plot device. I found no obvious and simple way to do this (but plenty of advice on how to put a plot device on a raster image — painfully unhelpful). My current solution (pictured above and below) exists somewhere between “Neat trick” and “Disgusting hack”. See the attached code in the footnotes. 

Back to beer business. Let’s zoom in on this area, which has all the breweries:

10

And bring back our magical boundaries:

13_replace

Oh man we are about to get scienced. Remember: all these breweries have a ridiculous amount of 5 star ratings. What’s important for figuring out which breweries are the best are the not-5 stars and how the stars are distributed. Instead of asking “which breweries get loved on the most?”, we’re really asking: “which breweries get hated on the least?”. Also remember that the red area means that these breweries get their average ratings from a higher number of 4, and {3, 2, 1} ratings than any other breweries. While beloved, Deep Ellum, Firewheel, Cobra, and Community get hated on the most. But 903, Cedar Creek, and Grapevine live in the “love-hate” zone — they have their lovers giving them 5s and their haters giving them {3, 2, 1} ratings. Here in the orange “love-hate” zone there is no middle ground: these breweries are less likely to get a 4 star rating than the other breweries. That purple zone, though… that’s what we care about.

So now we know that the purple zone is, generally, the “zone of favored breweries” in DFW. But exactly which breweries are the best?… We’re so close to the big reveal. So close. Before the big reveal, let’s look at the breweries, but marked with their average ratings:

14

Now that’s fancy. Science just told us that not all 4.6whatevers are created equal! 903 and Grapevine’s “4.64” is because they have lots of 5s, but those 5s get dragged down by the {3, 2, 1}s, where as Martin House’s 4.64 has its 5s dragged down by 4s! Making Martin House the best damn 4.64 in DFW! Likewise, Cedar Creek and Rahr & Son’s 4.63s are different: Rahr’s 4.63 is the best damn 4.63 in DFW!

Now that we can see a lot more of what’s going on — let’s take a look at just those top ratings: Peticolas (4.80), Revolver (4.79), Rabbit Hole (4.76), and Franconia (4.72). With Correspondence Analysis (CA) — we can think of the dots for the star ratings (5, 4, {3, 2, 1}) as pulling the breweries towards their “star position” (in CA the terminology is “inertia” because we can think of this as a gravitational pull)4. So which star ratings are pulling which breweries towards them?

While Peticolas and Rabbit Hole are being pulled by 5 star ratings — they’re also getting pulled back towards the {3, 2, 1}s. While there’s no doubt that these are some of DFW’s favorite breweries — they are not, according to (my analysis of) Facebook and Yelp (ratings), #1 nor #2. Rabbit Hole is #4 and Peticolas is #3.

And then there were two. To find out the #2 and #1 breweries in DFW, we need to get extra nerdy: Facebook ratings vs. Yelp ratings.

1516

First off — most of the ratings from this analysis come from Facebook. There is a disproportionately high amount of them there as opposed to Yelp. However there is something quite insightful on how these ratings relate to the overall analysis:

Facebook ratings are generally very positive and include even more 5 star ratings. Note how in the figure on the left, that the blue Facebook dots are being pulled towards the 5 star ratings. Then look at the figure on the right. And then notice how far away all the Yelp ratings are. This would suggest an anecdote most of us are probably well aware of: Yelpers are mean-spirited jerks (or, rather, just tend to more negatively rate things).

This is actually really important to note: Facebook ratings are overly positive while Yelp ratings are overly negative. Now, there’s a bit of additional unfairness here… Franconia has no (business) Facebook page. That means, it has no ratings from Facebook to help it out. Let’s look at one more picture: how Franconia and Revolver stack up on Yelp (with respect to their aggregated results):

16_2

From Yelp’s perspective, Franconia is closer to the 5 stars than Revolver. Revolver is getting pulled closer to the 4 star ratings. And given that we now know that Yelp ratings are generally more negative than Facebook we have but one conclusion:

Revolver is #2, and Franconia is DFW’s #1 brewery (based on two of the ubiquitous 5 star rating systems available).

But it’s quite important to remember: we have no idea why people are rating these breweries as they do5, simply that—when it comes down to ratings—Franconia gets lots of 5s and 4s, and very, very, very few {3, 2, 1} star ratings.

All analyses performed in R. Correspondence Analysis was performed with the ExPosition package – a package created by particularly attractive and smart people. Code and data for the nerds who are so inclined.

Footnotes

1 I don’t think 2 blog posts counts as “tradition” yet.
2 Some breweries don’t have any ratings, and some have just a few, so they’ve been unfortunately excluded.
3 Some breweries only have ratings on Facebook and some only on Yelp.
4 I just rewatched Guardians of the Galaxy and Star Wars (in Machete Order) and am really emphasizing “star systems” and “star positions”. Space operas are the best.
5 For the stats nerds: there is actually another problem hiding here. Not all ratings are necessarily independent. In fact, it’s not unlikely that the same person provides a rating on both Facebook and Yelp. So, yes, there are some statistical assumptions that have been violated. But this is what happens sometimes — just do the best you can.

MVPA & Society for Neuroscience

I didn’t present at this year’s Society for Neuroscience (but was an author on a talk). But I did go to SfN for two reasons: (1) Networking and (2) an informal survey of “MVPA”.

In the context of neuroimaging, what is “MVPA”?

Well, MVPA stands for Multi-voxel pattern analysis. Or Multivariate pattern analysis. So what do those mean? Let’s break the terms down a bit. Each have “pattern analysis” in them. Pattern analysis typically involves some sort of statistical analysis of patterns — where patterns are defined as a set of traits, features, or variables to describe a whole bunch of observations.

Sometimes, these patterns are used as the basis for separating different (often known a priori) groups of observations. Other times it is for finding ways to group observations together based on common patterns.

Pattern analysis (PA) is implicitly multivariate. Thereby making one of the MVPAs—Multivariate Pattern Analysis—redundant in title.

Multivariate means that multiple dependent variables are modeled or analyzed in one go, as opposed to conducting many, many univariate tests. I’m stealing a quote from Haxby (2011, link) that succinctly gets to the advantages of multivariate:

MVP analysis can detect the features that underlie these representational distinctions at both the coarse and fine spatial scales, where as conventional univariate analyses are only sensitive to the coarse spatial scale topographies.

or…

With multivariate analysis, you’ll get a similar (or the same) perspective as univariate approaches — but now with the added bonus of a unique perspective only multivariate approaches can give you.

Multi-voxel pattern analysis, though, is where things can get confusing. Multi-voxel does not imply multivariate — rather it is explicit with it’s title: a whole bunch of voxels used in some way. Some of these methods are, for example, ridge regression. But… ridge regression is a univariate method. Thereby making the other MVPA—multi-voxel pattern analysis—sometimes contradictory in title.

There are some fantastic reviews on “MVPA” and multivariate analyses and pattern analyses for fMRI1, 2, 3, 4, so I won’t go into detail yet on what MVPA should be, but in general it is understood as (1) classification methods, (2) multivariate methods, or (3) a combination of the two.

So I “took to the streets”, if you will5, and conducted a small scale survey of what “MVPA” means to neuroimagers. I went to as many posters and talks that explicitly used the term “MVPA” or happened to just stumble across them through the vast oceans of the poster section. I would then take note of exactly which technique was used. However, in most cases (for posters) exactly which technique used was never explicitly written; rather only the 4 letters: MVPA. I would often have to ask “Which MVPA are you using?” (which in most cases was my sole question — and for that I probably seemed like a crazy person). Here’s what I found, broken down into 3 categories. Quotes are used to paraphrase responses.

Category 1: The Definitely Multivariate:

  • Support Vector Machines
  • “Multivariate pattern similarity analysis” (MPSA)
  • Representational Similarity Analysis (RSA)

I’d like to note that MPSA is RSA are the same technique, but now falls under two (unnecessarily different) names, because RSA is just multidimensional scaling (MDS; which means it’s three unnecessarily different names). This SfN is the first time I’d ever seen “MPSA” used in the imaging context.

Before we move on, let’s break this down. Google would suggest (on December 9, 2014) there are only 8 unique results using “MPSA” (wherein most are related to one another). However, the exact phrase of “Multivariate Pattern Similarity Analysis” can be traced to Ritchey et al., in 2012, then again in 2013 by Onat, and again by Kalm in 2013, until it was finally acronymized6 by Copara this year. And now (at least) twice at SfN. Hooray for confusingly renaming methods (nearly) as old as modern statistics themselves.

Category 2: The Definitely Multi-voxel (and ambiguously not multivariate)

  • “Haxby style correlations”
  • Searchlight
  • Ridge regression
  • Logistic regression
  • Gaussian Naive Bayesian Classifier

I do find the phrase “Haxby style correlations” quite delightful. Why am I separating these techniques from the above? Well, these techniques usually rely on aggregating results from a series of univariate analyses. The aggregation usually happens across voxels.

Before we move on to the third and most hilarious (or upsetting) category, I have a small aside: I couldn’t find any case of regularization performed correctly. Regularization is a nifty technique. Usually, regularization is a helpful method when your sample is too small to properly estimate all of your variables. So, the nifty-ness comes in by artificially inflating particular values to, essentially, pretend you have a bigger sample size. To quote Takane: the inflation of these values “works almost magically to provide estimates that are more stable than the ordinary [Least Squares] estimates.”

But, there is a danger to inflating: overfitting. Which is why in regularization methods — you have to search for the regularization parameter that is a compromise between a more stable solution and not overfitting. Often, this is done through a train-test paradigm like k-folds.

At SfN, I found only the following case: a single arbitrarily chosen regularization parameter. Tikhonov would be furious.

Category 3: The Definitely Concerning (and ambiguously ambiguous)

  • “Regularized regression”
  • “MVPA Regression”
  • “The MVPA toolbox”

I would follow up with something along the lines of “Do you happen to know which type of analysis?”, to which the response was usually just “The MVPA Toolbox”. I didn’t bother asking which MVPA toolbox.

At this point, you’re probably thinking: “Derek,

what you’ve just said is one of the most insanely idiotic things I’ve ever heard. At no point in your rambling, incoherent response was there anything that could even be considered a rational thought. Everyone [on this internet] is now dumber for having [read] it.

And you’re right. This post is merely a spewing of complaints with no apparent direction nor solution. However, it will be the first in a series of posts over the coming months. There will be two types of posts: (1) examples of multivariate and similarly exotic neuroimaging analyses, in the hopes that (2) some sort of taxonomic structure can be derived — essentially a family tree of “MVPA” with the hopes that, some day, we can stop using the those 4 letters in that particular sequence. So let’s hope I turn this complaint into something more useful!

1: Haxby, Connolly, & Guntupalli, 2014, 2Pereira, Mitchel, & Botvinick, 2009, 3McIntosh & Mišić, 2013, and 4Shinkareva, Wang, & Wedell, 2013.
5You probably won’t, and shouldn’t, because sometimes I don’t make sense.
6Not a real word.
7ate nine.
10I love footnotes.