Color Stacking Fairness Project

Garanwyn · January 27, 2019, 5:18am

Just wanted to follow up with you on this hypothesis. I’m about 150 raids (about 5300 tiles) in now on my counts for enemies who color stack on defense. The results suggest that it’s pretty unlikely that there’s any bias to protect the enemy’s stacked color(s) either.

95% confidence bounds
Upper	21.67%
Mid	20.59%
Lower	19.51%

So I’m actually seeing about 7.2 tiles of the color that is strong against the enemy’s stacked color, on average. I don’t believe that it’s representative of a genuine bias in favor of hurting the enemy, though. Probably just a bit of statistical noise.

At this point, I’ve stopped rerolling with intent to find color stacking enemies, but I do still record data when they pop up.

Duaneski · January 27, 2019, 7:12pm

Thank you for taking that extra step. I appreciate it

Feel like this little guy can altogether be put to sleep now lol

anon39993326 · January 28, 2019, 3:17am

Wow. You have so much patience. Well done.

Garanwyn · January 28, 2019, 3:22am

Thanks! It adds about 30 seconds a raid, so it’s not too bad. I’m just about to post my 500 raid update. Still looking perfectly fair/unbiased

Garanwyn · January 29, 2019, 2:07pm

Up to 17,500 tiles seen. That’s 500 raids.

Updated probability of seeing a tile of the stacked into color: 20.02%

95% confidence interval: 19.4% to 20.6%

Conclusion: looking perfectly fair

Data for raids #301-500 below

Team	Red	Green	Blue	Yellow	Purple	Tile Count	Tiles 1st color
3 r 2 g	8	9	3	6	9	35	8
3 r 2 g	6	5	10	6	8	35	6
3 r 2 g	5	6	8	7	9	35	5
3 r 2 b	8	7	11	4	5	35	8
3 r 2 g	7	6	5	9	8	35	7
3 b 2 r	10	4	6	9	6	35	6
3 b 2 g	6	10	10	6	3	35	10
3 y 2 b	9	7	6	5	8	35	5
3 y 2 b	6	11	1	4	13	35	4
3 y 2 b	3	14	7	5	6	35	5
3 y 2 b	4	7	9	8	7	35	8
3 y 2 b	8	5	11	4	7	35	4
3 b 2 y	6	6	9	5	9	35	9
3 y 2 r	8	7	5	8	7	35	8
3 b 2 y	8	10	6	7	4	35	6
3 r 2 b	5	8	10	7	5	35	5
3 r 2 b	8	10	9	4	4	35	8
3 y 2 b	8	8	8	7	4	35	7
3 y 2 r	6	6	9	6	8	35	6
3 b 2 p	9	8	5	8	5	35	5
3 b 2 p	7	7	7	9	5	35	7
3 b 2 p	8	3	6	9	9	35	6
3 y 2 b	9	11	2	8	5	35	8
3 p 2 b	4	7	8	10	6	35	6
3 y 2 b	10	7	8	3	7	35	3
3 y 2 b	9	6	7	6	7	35	6
3 y 2 r	8	3	9	11	4	35	11
3 b 2 r	7	3	9	6	10	35	9
3 y 2 b	8	4	2	10	11	35	10
3 b 2 r	7	4	9	5	10	35	9
3 b 2 r	9	6	3	8	9	35	3
3 b 2 r	5	8	6	9	7	35	6
3 g 2 b	3	8	7	7	10	35	8
3 r 2 b	7	5	8	6	9	35	7
3 r 2 p	9	6	7	7	6	35	9
3 g 2 r	6	4	8	6	11	35	4
3 b 2 g	7	9	7	7	5	35	7
3 b 2 p	10	7	9	5	4	35	9
3 r 2 y	8	9	4	7	7	35	8
3 r 2 y	12	5	5	5	8	35	12
3 b 2 p	8	6	5	8	8	35	5
3 b 2 r	4	2	7	12	10	35	7
3 p 2 b	7	7	11	6	4	35	4
3 p 2 b	3	8	7	8	9	35	9
3 g 2 b	11	5	8	8	3	35	5
3 b 2 r	8	7	10	5	5	35	10
3 b 2 g	3	12	5	6	9	35	5
3 r 2 y	6	9	8	7	5	35	6
3 r 2 y	6	8	9	8	4	35	6
3 r 2 y	8	10	5	9	3	35	8
3 b 2 y	12	5	11	5	2	35	11
3 y 2 b	7	12	3	6	7	35	3
3 p 2 b	10	9	5	5	6	35	6
3 y 2 r	10	11	5	6	3	35	6
3 y 2 r	4	10	6	7	8	35	7
3 y 2 r	7	6	5	10	7	35	10
3 g 2 r	8	7	11	6	3	35	7
3 g 2 r	5	7	6	9	8	35	7
3 b 2 y	5	4	10	8	8	35	10
3 b 2 r	8	10	4	5	8	35	4
3 r 2 g	3	9	8	8	7	35	3
3 b 2 r	7	3	6	9	10	35	6
3 r 2 y	8	7	4	6	10	35	8
3 g 2 r	8	8	9	7	3	35	8
3 g 2 r	9	6	6	8	6	35	9
3 b 2 r	9	9	5	6	6	35	5
3 g 2 b	4	8	7	9	7	35	8
3 g 2 b	6	7	6	9	7	35	7
3 b 2 r	8	7	8	6	6	35	8
3 g 2 r	6	10	7	5	7	35	10
3 r 2 g	9	10	5	7	4	35	10
3 b 2 r	7	9	6	8	5	35	6
3 b 2 r	3	10	5	11	6	35	5
3 b 2 r	10	5	7	7	6	35	7
3 b 2 r	10	7	7	6	5	35	7
3 r 2 g	11	7	3	7	7	35	11
3 b 2 p	5	7	10	4	9	35	10
3 b 2 p	4	8	4	11	8	35	4
3 b 2 p	7	11	7	7	3	35	7
3 r 2 y	8	8	8	6	5	35	8
3 g 2 b	3	9	8	7	8	35	9
3 b 2 r	8	7	6	8	6	35	6
3 b 2 r	9	6	2	6	12	35	2
3 b 2 r	5	8	7	9	6	35	7
3 p 2 r	9	7	10	6	3	35	3
3 b 2 y	8	5	7	6	9	35	7
3 r 2 p	9	5	8	5	8	35	9
3 r 2 b	5	10	9	4	7	35	5
3 b 2 p	6	7	10	7	5	35	10
3 r 2 g	9	10	4	4	8	35	9
3 r 2 g	3	9	10	8	5	35	3
3 r 2 g	8	6	5	10	6	35	8
3 g 2 b	4	6	8	6	11	35	6
3 g 2 b	10	4	7	8	6	35	4
3 b 2 r	6	6	8	8	7	35	8
3 b 2 r	8	8	9	3	7	35	9
3 b 2 r	8	6	7	4	10	35	7
3 y 2 r	8	9	3	8	7	35	8
3 b 2 r	7	10	7	7	4	35	7
3 r 2 b	11	8	4	4	8	35	11
3 r 2 b	6	8	4	8	9	35	6
3 b 2 r	8	7	8	6	6	35	8
3 y 2 r	7	10	5	6	7	35	6
3 y 2 r	6	9	6	6	8	35	6
3 y 2 r	7	6	12	7	3	35	7
3 r 2 g	10	4	4	10	7	35	10
3 p 2 r	3	5	10	9	8	35	9
3 y 2 b	6	10	6	4	9	35	4
3 b 2 r	9	5	9	8	4	35	9
3 r 2 g	8	5	4	7	11	35	8
3 r 2 g	11	7	3	7	7	35	11
3 r 2 b	9	9	6	7	4	35	9
3 g 2 p	6	7	7	7	8	35	7
3 b 2 r	7	6	7	5	10	35	7
3 r 2 g	10	9	4	8	4	35	10
3 b 2 g	6	11	2	8	8	35	2
3 y 2 g	6	7	7	7	8	35	7
3 y 2 b	8	5	7	11	4	35	11
3 g 2 p	8	6	8	6	7	35	6
3 b 2 y	11	6	6	6	6	35	6
3 b 2 y	4	8	10	6	7	35	10
3 g 2 b	7	6	8	8	6	35	6
3 g 2 b	5	8	8	6	8	35	8
3 y 2 r	7	8	6	10	4	35	10
3 r 2 g	2	6	8	9	10	35	2
3 g 2 r	7	9	6	8	5	35	9
3 r 2 y	7	4	6	11	7	35	7
3 r 2 y	6	7	8	6	8	35	6
3 p 2 r	6	8	11	6	4	35	4
3 b 2 r	7	4	12	6	6	35	12
3 r 2 g	7	4	11	8	5	35	7
3 r 2 b	9	8	10	2	6	35	9
3 b 2 g	8	7	8	5	7	35	8
3 g 2 b	5	7	8	9	6	35	7
3 b 2 y	8	6	7	5	9	35	7
3 r 2 g	4	10	9	6	6	35	4
3 r 2 g	5	7	7	9	7	35	5
3 r 2 g	8	7	10	4	6	35	8
3 b 2 r	6	8	4	8	9	35	4
3 g 2 b	6	6	9	5	9	35	6
3 g 2 b	10	4	8	4	9	35	4
3 b 2 y	8	9	7	8	3	35	7
3 y 2 g	8	7	3	10	7	35	10
3 b 2 p	5	5	6	8	11	35	6
3 r 2 y	5	7	10	7	6	35	5
3 r 2 y	8	6	8	5	8	35	8
3 r 2 y	6	13	9	1	6	35	6
3 g 2 y	6	8	7	9	5	35	8
3 r 2 p	8	7	6	7	7	35	8
3 b 2 y	7	6	6	4	12	35	6
3 b 2 y	8	6	8	7	6	35	8
3 r 2 g	11	3	7	9	5	35	11
3 r 2 g	9	8	5	9	4	35	9
3 b 2 r	4	7	9	10	5	35	9
3 b 2 y	3	12	4	6	10	35	4
3 b 2 y	10	6	7	5	7	35	7
3 g 2 y	5	6	10	7	7	35	6
3 g 2 b	4	6	10	9	6	35	6
3 y 2 b	11	8	4	6	6	35	6
3 y 2 b	3	5	13	8	6	35	8
3 b 2 g	11	7	5	5	7	35	5
3 b 2 g	16	5	4	6	4	35	4
3 b 2 g	4	9	8	5	9	35	8
3 b 2 r	10	7	6	7	5	35	6
3 r 2 g	2	6	8	11	8	35	2
3 r 2 g	3	11	5	10	6	35	3
3 r 2 g	7	10	4	5	9	35	7
3 b 2 p	3	7	11	5	9	35	11
3 b 2 r	6	5	8	4	12	35	8
3 b 2 g	6	10	5	11	3	35	5
3 b 2 g	7	7	11	4	6	35	11
3 b 2 p	6	8	9	7	5	35	9
3 b 2 p	4	8	8	9	6	35	8
3 b 2 r	9	4	4	9	9	35	4
3 y 2 r	4	10	8	4	9	35	4
3 y 2 r	6	7	10	9	3	35	9
3 b 2 g	8	4	4	10	9	35	4
3 b 2 g	7	6	7	9	6	35	7
3 y 2 r	9	8	7	7	4	35	9
3 p 2 b	7	7	5	8	8	35	8
3 p 2 b	5	10	4	6	10	35	10
3 r 2 y	4	6	7	11	7	35	11
3 y 2 r	7	8	6	6	8	35	6
3 y 2 r	9	7	3	7	9	35	7
3 y 2 r	7	5	5	8	10	35	8
3 r 2 b	8	6	6	11	4	35	8
3 r 2 b	6	12	6	5	6	35	6
3 r 2 b	7	6	10	4	8	35	7
3 r 2 y	6	7	7	9	6	35	6
3 p 2 g	11	7	6	6	5	35	5
3 b 2 r	7	7	6	7	8	35	6
3 b 2 r	6	3	10	7	9	35	10
3 b 2 r	7	6	6	7	9	35	6
3 b 2 p	9	8	4	7	7	35	4
3 b 2 r	7	4	9	7	8	35	9
3 r 2 p	4	11	9	9	2	35	4
3 r 2 p	8	5	6	7	9	35	8
3 r 2 p	2	13	8	7	5	35	2
3 b 2 r	7	6	7	7	8	35	7
3 y 2 b	6	3	12	7	7	35	7
	3521	3458	3550	3529	3442	17500	3503
Est %	20.12%	19.76%	20.29%	20.17%	19.67%		20.02%
Est Tiles	7.04	6.92	7.10	7.06	6.88		7.01

Zarxahl · February 4, 2019, 1:32am

Hey @Garanwyn

First of all thanks for putting in the effort and the hours for doing the analysis. As an IT professional I love looking at numbers and stats.

One quick question… would it not be better to use one stack colour say 3 dark 2 yellow and do a lot of raids using that and see how well the tiles generate. Right now we are analyzing stacking but the colour stack is varying. If tile generation algorithm is truely random then you could just be getting lucky with the boards being given.

I am not statistician by any means but I would love to hear your thoughts on this.

Garanwyn · February 4, 2019, 2:48am

You’re welcome! I’m glad you find it interesting.

With respect to your question about color stacking:

The hypothesis we’re trying to test here is whether the tile engine has a bias against color stacking, not whether it produces equal numbers of tiles of each color (I’ve never heard any theories that there might be a literal color preference in the tile engine, and I certainly don’t see one in my data).

To test this hypothesis, we can think of every tile draw on a board as falling into one of two categories:

Tiles of the strong stacked color (whatever it might be)
Tiles of a non-stacked color

Then each board gives us a binomially-distributed sample.

Unless you’re hypothesizing some sort of roving color bias or roving color-stacking bias, there’a no change in our chance of getting lucky across the samples by changing colors versus by sticking with a single color.

If you’re interested in the per-color estimates, here they are. The confidence interval bounds are pretty loose on some of them, though, since I don’t stack them often.

Color		95% Low	Mid	95% High
Red
Total Strong Tiles	1224	19.33%	20.35%	21.37%
Games	172

Blue
Total	1549	19.15%	20.04%	20.93%
Games	221

Green
Total Strong Tiles	239	16.90%	19.06%	21.23%
Games	36

Yellow
Total Strong Tiles	608	18.37%	19.78%	21.18%
Games	88

Purple
Total Strong Tiles	248	17.10%	19.24%	21.39%
Games	37

Zarxahl · February 4, 2019, 5:49pm

Thanks man. I kinda reached the same conclusion after thinking about it last night. Your a legend. Keep up the good work and thanks for an awesome explanation.

Maelic · February 4, 2019, 6:13pm

In the theme of peer review:

Have you/others looked at distribution of color across the board (ie color stacking may produce the same quantity of X color, but are evenly distributed across the grid reducing possible early matches)?

Have you/others looked at the generated tiles beyond the initial board? No game is won on the opening board, does color stacking impact the tiles beyond the initial tiles spawned?

You should also run a rainbow control to confirm that you are receiving the expected 20% / color generation.

Edit: Very interesting and well executed. This is definitely a benefit to the community and something that I have yelled at my tablet numerous times for. I may owe it an apology.

Garanwyn · February 4, 2019, 6:36pm

Regional color density is an interesting question. The fact that SG throws out boards with initial 3-matches is going to alter (increase) the expected mixing somewhat. I haven’t looked at or tracked the available initial color matches on any given board, though.

The challenge is that I don’t have a good model for what the distribution of color densities across trials ought to look like, so I wouldn’t know what the data was really telling me even if I had it. I’m very open to suggestions on figuring this out.

Not yet. This is an open question. We’re trying to get someone to pick this project up, but so far no takers.

I can infer that they aren’t too bad based on my win rate being about 70%, even though I’m an average of 350TP lower than my opponents. But that’s a long, long way from a rigorous analysis

I’m a bit less concerned about this, given how good the distribution between colors is even when stacking. My current heroes also aren’t really compatible with going rainbow and staying in Platinum, so this would be painful for me to do.

DoctorStrange is a dedicated rainbow player, though, as is Brobb. I believe General_Confusion and KingArchur also raid with rainbow teams. We might be able to talk one of them into screenshotting boards and uploading them. I’m not really sure the game is worth the candle, though.

Thanks! Yeah, I have the data, and I still yell at my phone. Gotta love when you go up against someone 700TP higher and get 2 tiles of your strong color

Eev · February 11, 2019, 12:09pm

Much respect for the data, i will also start to do screenshots. Atacking only mono black and mono blue. It will take some time to gather statistics but i will post it.
I see one problem is how too collect the data after initial board… a lot of work so too lazy.

What i really want to do is King on titans. When i see how oft there is only miss miss miss i am asking if there is really only 30% reduction of accuracy… btw, in raids or cws i think King is quite fair.

Eev · February 19, 2019, 12:32pm

So some info for 100 raids. Only mono color teams are used. Only enemies with weak center color are atacked. Only starting board is evaluated.
Number of stones of strong color - how many times.
Zero - 0
One - 0
Two - 1
Three - 6
Four - 10
Five - 10
Six - 19
Seven - 26
Eight - 8
Nine - 8
Ten - 6
Eleven - 3
Twelve - 1
Thirteen - 2

Garanwyn · February 19, 2019, 2:42pm

The prediction would be for 43 worse-than-average starting boards, and you saw 46. So you had a little bad luck in there.

You also ended up with some pretty nice positive luck. Two 13-strong-tile boards in 100 samples is quite high. I’ve only seen 2 in my 600 boards recorded

The stats on your data overall are:

Probablilty of strong tile: 19.6%

Agresti-Coull 95% confidence bound:

Low: 18.3%
Mid: 19.6%
High: 20.9%

So those numbers are very much in family with what I’ve collected, and consistent with a random draw process. Thank you for doing a collection!

Eev · February 19, 2019, 7:20pm

Tnx,
the population of 100 is still too small, for five species and many different states. So i will continue to gather the data. Atm i do not like the difference between 5-6 and 8-9. The quite lucky 13th smooth the %. But i will continue to make screenshots and count.

Garanwyn · February 20, 2019, 5:48am

The distribution is always a bit asymmetric because it caps out at zero at the bottom. I’m not worried about the differences between 5-6 and 8-9 just yet, but it is something to keep an eye on.

I look forward to seeing more data!

TheLastDragon · February 20, 2019, 7:50am

From what i have noticed from tile play is with a rainbow team the tile % is 20% chance for each tile not on the board. Which gives you a “even’ish” start of each color. Now from ever color you choose NOT to have in your lineup it drops the one color you dont choose by 4% and puts THAT 4% into the OTHER colors. So a 2-1-1-1 team with 2 red, 1 green, yellow, blue will be 16% red chance and 21% chance other colors. 2red-2blue-1yellow would be 12%red, 12%blue, 20%yellow, 28%green, 28%purple. 4red-1blue is 8%red, 20%blue, 24% green, purple, yellow. 5 color mono is the lowest chance % of course. 4%purple, 24% chance for a red, green, yellow, blue. Weirdly after i lose and REMATCH it seems the mono color chance goes up sometimes and i MURDER the opponent sometimes lol. So its still CHANCE is the most important. No cheating far as i see but the % does get less the more you choose same color. Hope this helps. Good luck

Garanwyn · February 20, 2019, 2:11pm

When you say “noticed”, do you mean that you have actually collected a reasonably large amount of data?

I ask because the effect you’re describing certainly isn’t present in the data I’ve collected and published above. So I’d be very interested to see data representing what you’re describing.

TheLastDragon · February 22, 2019, 5:14am

Oh not a large amount by myself. Me and some group members did a 20 battle test after i first started playing because i was confused why when i didnt have a hero the tiles for that color was lower than the rest. So yes we did a person data for
alliance conversion but not like your full on data. So maybe the numbers would have changed over a LARGER scale. So i recommend your data. So i will put in big words that its my knowledge so i dont confuse people. Thanks for bringing that to my attention But i will do a large scale on this and see if it still adds up. 100 matches you think is good? I will start there.

Eev · February 28, 2019, 10:17am

So results for 201 atacks.
Zero - 0
One - 0
Two - 2
Three - 8
Four - 19
Five - 25
Six - 32
Seven - 44
Eight - 26
Nine - 19
Ten - 12
Eleven - 8
Twelve - 4
Thirteen - 2

It looks now more or less good. So i will stop collecting the data.

HOWEVER i will start collecting the data for mono colors in wars! Recently i started to use mono color. Last 3 wars i had one 7 one 6 and the rest 2-4. Looks very suspiciuos. Even for a small population.

EVA01 · March 9, 2019, 12:10pm

First things first. Funniest and most detailed information post combo ever. In all of thread history I do not believe that many statistics have ever been combined with a cat and dog proposal. So that out of the way I think most complaints come from the full board stackers. Of course if u play one color only it is going to appear unfairly set. But u have to remember how the tiles r going to b less available to u in appearance because the game is providing the proper amount based on a rainbow set up. So then lets add in the second factor which is the power of exponential mana generation off less tiles because of ur solid board combined with hit power too and u really have to chalk it up to risk/reward. This is fiscal of course but hey numbers r numbers. And Einstein himself quoted compound interest as the 8th wonder of the world. That’s the basically the reward u get when solid stacking is ur thing but if the defense is able
to get there own thing going quick u then fall victim to the risk part. Anyway had to chime in. Was just plannnig on reading until the photo and
math combo hit me so I figured that this was definitely getting a remark and a . Swear that volley between @Brobb and @Garanwyn should b atchivsd and saved. Animal marriage proposals, statistics and a polygamy reference all together… definitely a first.