Unfair gems when you double or triple on heros

Brobb · June 16, 2018, 11:29am

Some things are so blindingly obvious that we don’t need to waste our time on tests - like that @Kahree’s data very clearly showed no difference between colour drop rates for strong and weak colours. But hey - anyone utterly unfamiliar with how numbers work might doubt that. …Edited

Here’s the thread in which @Kahree provided that nice little chunk of data:

(The mechanics of coding starting boards (mostly) without matches were discussed reasonably thoroughly in that thread. There are multiple obvious ways to do it, but I think the consensus was that @Revelate’s suggestion from January 31 of this year was the best - you’d complete the usual random assignment of tiles, but where matches occurred you’d process those matches (and any follow-ons) until they were all cleared, before presenting the starting board…Edited…)

(Edit: @NPNKY points out above that the devs have confirmed we got the basics of this right, though we might have got the details wrong - thanks for filling in that gap.)

So here are the data:

"Average number of tiles for 31 boards where I attacked with 5-colors:
7.6 Red
6.7 Green
7.2 Blue
7.0 Purple
6.5 Yellow

Average number of tiles for 19 boards where I attacked with 3 Purple and 2 Blue:
7.5 Red
6.8 Green
6.7 Blue
6.8 Purple
7.2 Yellow"

(It is kinda amazing to me that anyone can’t tell by looking that there’s no indication of different drop rates there. But what the heck: let’s do this.)

There are two obvious ways we could run this - test for a difference in drop probability between the samples, or test for a drop probability in the overweight sample that differs from expectations. The first way won’t show any difference, of course, so let’s waste our time on the second way, which is much more likely to show a difference. (Why did I just have a vision of some genius not understanding why the first way is an even bigger waste of time than the second, and insisting we go through the fiasco of running the numbers? Perhaps I’m psychic.)

So our null hypothesis is that the probability of a particular colour gem dropping is 0.2. Of our two heavy colours, Blue departs furthest from that expectation, so let’s use blue. (We can only conduct one test, because we do not have independent samples. I feel silly even having to say that.)

What’s our confidence interval for the probability of a blue gem dropping? 16.6%-22.6%, at the 95% confidence level (Wilson Method), so we have no evidence that true probability does not equal 0.2.

How about if we relax our confidence level? At 90% confidence, our interval is 17.0%-22.0%, so again, no evidence the true probability does not equal 0.2.

I could keep dropping the confidence level to make a point, but now I’m losing the will to live. When your smallish sample produces a point estimate of 19.4%, it is very obvious that you’re not going to be able to show a statistically significant difference from an expectation of 20%.

(FYI - I treated each gem drop as a separate trial here, for a sample size of 665. It would be reasonable to argue that there are not 665 independent trials, but then we’d have to reduce our sample to 19, we’d get an even wider CI, and we’d have even less chance of seeing any difference from the expected drop rate. Sorry to be Captain Obvious, for those who are rolling their eyes at me pointing this out.)