Disclaimer: This post is extremely long, and if you don't like data analysis and geeky maths talk, I'd suggest you run for the hills and don't look back, because this post has quite a bit of it! Hi guys. There are a lot of theme parks in Europe, as well as a lot of roller coasters. So naturally, people (myself included) tend to ask questions like "which park has Europe's best roller coaster lineup?" or "which parks are quality-over-quantity and which parks are quantity-over-quality?", amongst others. As such, while it's not really a discussion thread as such, I thought it might be fun to try to take a quantitative look into some of these questions and try to answer them using some data science techniques. So join me as I attempt to perform a quantitative, multi-part analysis of Europe's major coaster selections! I'll split my investigations into a couple of posts, one for each question, to make it a little more digestible. Before we start, let me set out a few prerequisites and explain some of the facts regarding the investigation… Prerequisites of the Investigation I am using the coaster ratings on Captain Coaster (https://captaincoaster.com/en/) as of March 2022 to perform this investigation. If you look at each ride’s page on CC, it has a % score out of 100; this is what I have used and converted into ratings out of 10. For the rating out of 10 of a ride, I converted the percentage into a rating by dividing by 10 (so for instance, a ride rated 87% would have an average rating of 8.7/10). Building upon the ratings stuff; all ratings are rounded to the nearest 0.1 (so to 1dp). As a rule of thumb of what’s considered major, I went with; to be considered, a park must have 5 scoreable roller coasters. If you’re wondering why I get so specific in saying “scoreable roller coasters”, it’s because Captain Coaster does not score what it considers to be “kiddie coasters”, so not every ride in a park's lineup is scored. As such, this means that parks with 5 kiddie coasters wouldn't be eligible for this investigation; my rule ensures that a park in the study has 5 family/family thrill coasters, at the very least. It also doesn't score rides where the ridership is too low, but that doesn't really affect this investigation; even the newest major coasters in Europe like Ride to Happiness and Kondaa were ridden enough to be scoreable. However, one inconsistency is that Captain Coaster has a somewhat inconsistent definition of what it considers a kiddie coaster. For instance, things like the Steeplechases at Blackpool are considered kiddie coasters, but Blue Flyer in the same park, which I personally would consider a kiddie coaster, isn't. The site also has rides listed on it that some probably wouldn't count as roller coasters, but some do, such as SuperSplash at Plopsaland and Fuga de Atlantide at Gardaland. I just decided to go with the site's scores and the rides that the site scored, as even though I could calculate the mean rating of some unscored rides, I don't think CC's scoring system only uses mean rating, as I seem to remember it being mentioned that members' rankings are also factored in, so me attempting to meddle with CC's system risks introducing bias and skewing the data the wrong way, which you definitely don't want in a data investigation. However, I did think this was something I should raise before we begin. The most important prerequisite of all is that the results of this investigation are not necessarily the final answers to the questions I raised in my introductory paragraph by any stretch. All of this still comes entirely down to personal opinion, of course. Right then; I think that's everything, so let's dive into the dataset... The Dataset When applying my criteria and thinking of parks in Europe that might qualify for this, as well as searching through RCDB just to check that I hadn't missed any obvious ones (as it turned out, I had missed a few on the first check...), I came out with approximately 36 theme parks to analyse in total, with 253 scoreable roller coasters between them. The theme parks being studied are as follows, with the number of scoreable roller coasters each park has being listed in brackets: Alton Towers, UK (9) Bellewaerde, Belgium (6) Blackpool Pleasure Beach, UK (10) Bobbejaanland, Belgium (8) Djurs Sommerland, Denmark (6) Efteling, Netherlands (8) Energylandia, Poland (11) Europa Park, Germany (12) Farup Sommerland, Denmark (6) Flamingo Land, UK (5) Freizeit-Land Geiselwind, Germany (5) Gardaland, Italy (8) Grona Lund, Sweden (6) Hansa Park, Germany (6) Heide Park, Germany (8) Linnanmaki, Finland (8) Liseberg, Sweden (5) Mirabilandia, Italy (8) Movie Park Germany, Germany (8) Nigloland, France (6) Parc Asterix, France (5) Parque de Atracciones de Madrid, Spain (5) Parque Warner Madrid, Spain (6) Phantasialand, Germany (8) Plopsaland de Panne, Belgium (7) PortAventura Park, Spain (8) PowerPark, Finland (6) Skyline Park, Germany (5) Thorpe Park, UK (7) Toverland, Netherlands (6) Tripsdrill, Germany (6) TusenFryd, Norway (5) Walibi Belgium, Belgium (9) Walibi Holland, Netherlands (6) Walibi Rhone-Alpes, France (5) Wiener Prater, Austria (10) I think that just about covers everything, but if you feel I’ve missed an obvious one, then don’t be afraid to tell me. Let's move on to some fun stuff now... I'll start analysing some different common questions and see what answers I come out with. I'll use this first post to do... Which European theme park has the strongest coaster lineup? Let's start with the big one; which European theme park has the strongest coaster lineup? There are many different ways you could measure this, but I'll start with the simplest one; the mean coaster rating of each park... Mean Coaster Ranking of each Park If I look at the Explore function of this spreadsheet, the top 10 highest mean ratings come out as follows: Ranking Park Mean Rating out of 10 (to 1dp) Number of Scoreable Coasters 1 Liseberg 7.6 5 2 Phantasialand 7.5 8 3 Alton Towers 7.3 9 4 Grona Lund 6.9 6 5 Efteling 6.5 8 6 Toverland 6.3 6 7 Walibi Holland 6.2 6 8 Tripsdrill 6.1 6 9 Europa Park 6.1 12 10 Djurs Sommerland 6.1 6 Those certainly aren't the answers I'd have expected, I'll admit, but that's what the data says for that particular method. However, it should be said that the mean is far more easily swayed by outliers in any particular direction than some other methods (for instance, it's very easily swayed by one coaster rating much more highly or lowly than the others on average). Let's explore a different method... Median Coaster Rating of each Park Instead of using the calculated average (mean), I'm going to be using the median, the middle-ranking value for each park, this time. Using Google Sheets to explore the median values instead of the mean, the top 10 median values are as follows: Ranking Park Median Rating out of 10 Amount of Scoreable Coasters 1 Liseberg 8.9 5 2 Alton Towers 7.7 9 3 Phantasialand 7.7 8 4 Walibi Holland 7.2 6 5 Thorpe Park 6.9 7 6 Grona Lund 6.9 6 7 Parque Warner Madrid 6.3 6 8 Heide Park 6.3 8 9 Tripsdrill 6.3 6 10 Toverland 6.2 6 Interesting to see that we have quite a few differing results when we change to the median; in spite of the top 3 staying consistent, 4-10 have actually changed a fair amount! I guess the median is possibly a better gauge of a consistently well-rated coaster selection than mean, because it isn't as easily swayed by one particularly highly rated or lowly rated attraction. But at the same time, it also doesn't really take into account those more highly rated or lowly rated coasters either; if a park's highest rated coaster is rated more highly than a median of 7/10, for instance, it makes no difference whether it's an 8/10 or 10/10. With that in mind, I have concocted my own formula (of sorts) that I think offers the best of both worlds... My formula for coaster selection quality The formula that I propose seems to me like a good way to take into account both a park's highly rated coasters and the consistent quality of their selection. It is as follows: Matt N's Formula for Coaster Selection Quality: Score = (Highest rating + upper quartile)*(Lowest rating + lower quartile) Now I don't know if I've got my assumptions 100% correct here, but my assumption was that the use of the highest rating and lowest rating would ensure that any standouts at either end are adequately accounted for, but the use of the quartiles would ensure that the consistency of a park's coaster selection is also accounted for, and that the two metrics cancel each other out and make the playing field level. The higher the score, the higher the rank. Using the Matt N Formula, the top 10 was as follows: Ranking Park Matt N Formula Score Upper quartile Lower quartile Highest rating Lowest rating Amount of Scoreable Coasters 1 Alton Towers 196.9 8.4 7.1 9.5 3.9 9 2 Phantasialand 191.7 9.2 7.1 9.8 3 8 3 Liseberg 188.2 9.4 8 9.8 1.8 5 4 Grona Lund 183.2 7.7 6 9 5 6 5 Efteling 164.5 8.2 5.3 8.5 4.6 8 6 Europa Park 141.4 7.3 4.9 9 3.8 12 7 Toverland 128.8 8.5 4.9 9.2 2.4 6 8 Tripsdrill 120.2 8.1 5.3 8.8 1.8 6 9 Djurs Sommerland 112.8 7.9 5.3 9.3 1.3 6 10 Parque de Atracciones de Madrid 110.3 7.1 4.9 7.6 2.6 5 I'll admit those aren't the results I expected, and I know they probably look a bit weird to some of you, but that is what the data came out with. So, in conclusion... Well, that produced some interesting data! I'll admit that the results weren't quite what I was expecting, but I do think they make sense when you look at the data. In terms of the answer to the initial question of "what is Europe's highest rated coaster selection?"; even though the parks in the top 10 for each method varied, the top 3 stayed consistent every time, and that top 3 was Liseberg, Phantasialand and Alton Towers. In terms of an order for those top 3; I'd probably go with something like this based on the data: Liseberg (won 2/3) Alton Towers (beat Phantasialand in 2/3, while Phantasialand only beat Towers in 1/3) Phantasialand However, I should stress that just because my data analysis put these parks on top, that is not "the correct answer" to the question by any stretch. As with most things, it all boils down to your own personal opinion and personal preference. You might think these results are hogwash, and that's fine; your personal answer to this question is entirely down to your opinion. Before we end, here's the Google Sheet with my calculations, for your viewing pleasure: https://docs.google.com/spreadsheets/d/1_GaPx5r61Qlv7Irka92VrjFbQy-nhiJYRjeAOkNyDE8/edit?usp=sharing And here is the dataset shown in visual form using a boxplot, coded in Python using MatPlotLib, Seaborn and Pandas (Python libraries). This shows the median, upper quartile, lower quartile, highest value, lowest value and any outliers (values more than 1.5 times the interquartile range from the upper or lower quartile) for each park: I know that the x-axis is a bit of a jumbled mess, so let me clear up the order in which the parks appear so that you can more clearly see which park's boxplot is which. The boxplots appear in the following order, from left to right: Alton Towers Thorpe Park Blackpool Pleasure Beach Phantasialand Liseberg Walibi Holland Energylandia Plopsaland de Panne Walibi Belgium Europa Park PortAventura Parque Warner Madrid Parque de Atracciones de Madrid Efteling Bobbejaanland Toverland Movie Park Germany Heide Park Hansa Park Flamingo Land Tripsdrill Parc Asterix Gardaland Mirabilandia Djurs Sommerland Farup Sommerland TusenFryd Linnanmaki Bellewaerde Nigloland Skyline Park PowerPark Grona Lund Wiener Prater Walibi Rhone-Alpes Freizeit-Land Geiselwind So, I hope you found my first dive into European coaster selection data interesting! I'll certainly be answering more questions about this dataset at some point in the near future; I've got some ideas of my own, but I'm also happy to accept suggestions from any of you of questions you'd like answering. I apologise for the ridiculously long post, I hope you find this interesting, and if you have any questions or feedback, or if anything isn't clear, then don't be afraid to ask me!

Sorry to double post, but I had a thought while in the shower this morning about these results and why they might have been so weird when I applied my own formula. As much as I tried to make high ratings and low ratings carry equal weight in terms of how a coaster selection is rated, I failed to take into account some real-life bias that exists when evaluating coaster selections by doing that. That real-life bias is that enthusiasts naturally gravitate more towards highly rated rides when evaluating a park's coaster selection, whereas my formula assumed that highly rated and lowly rated would be equally weighted in the minds of enthusiasts, which isn't really how it works. For instance, this formula assumes that removing Viking Roller Coaster from Energylandia and removing Zadra from Energylandia would have exactly the same level of impact on the rating of its coaster selection. However, I'd wager that most enthusiasts would see Energylandia's coaster selection quality as being far more impacted by the removal of Zadra than by the removal of Viking Roller Coaster. As such, I'll play around with an altered version of the Matt N Formula when I get some time later today, one that weights the score more towards the higher rated rides, and see what I come out with.

Martin Doyle Posted March 18, 2022 Report Share Posted March 18, 2022 If we are talking quality I feel Energylandia has the stronger coaster selection than say Liseberg. Both parks have a main trio of thrill coasters and then in Lisebergs case a supporting family coaster in Banen. Energylandia has Formula and RMF Dragon as support to its trio. Then lets look at each trio. "Third wheel of the trio" - Not much in this one but thrill wise I would say Abyssus is better quality than Valkyria "Secondary coaster" - This is an easy one. Balder is a good fun woody but Hyperion is a different level. Brilliant hyper "Star coaster" - Do I REALLY need to answer this one?? Zadra is ten times better than what Helix is and I'm hard pressed to find anyone who would disagree from an unbalanced standpoint. That takes nothing away from Helix. Its just that Zadra is one of it not the best coasters on the entire planet. So yeah all in all I'd say for coaster selection, Energylandia is Europes best park followed by Phantasia if we are talking strength of coasters. Matt N 1 Quote Link to comment Share on other sites More sharing options...

I played around with altering the Matt N formula. I tried doing three alterations. Altered Matt N Formula 1 The first altered Matt N formula I tried was as follows: Altered Matt N Formula 1: Score = (Highest rating + upper quartile)^2 + (Lowest rating + lower quartile) I squared the bracket containing highest rating + upper quartile in an attempt to give the higher ranked coasters slightly more weight. And the results were... Ranking Park Altered Matt N Formula Score Original Matt N Formula Score Rank with Original Formula Change 1 Phantasialand 3646.7 191.7 2 +1 2 Liseberg 3612.7 188.2 3 +1 3 Alton Towers 3524.5 196.9 1 -2 4 Grona Lund 3049.4 183.2 4 0 5 Efteling 2747.1 164.5 5 0 6 Europa Park 2297.3 141.4 6 0 7 Toverland 2279.2 128.8 7 0 8 Tripsdrill 2029 120.2 8 0 9 Djurs Sommerland 1933.9 112.8 9 0 10 Parque de Atracciones 1620.7 110.3 10 0 Altered Matt N Formula 2 And the second formula I tried was: Altered Matt N Formula 2: Score = (Highest rating^2 + upper quartile) + (Lowest rating + lower quartile) I squared the highest rating to try and make that have more of an impact, and the result was as follows: Ranking Park Altered Matt N Formula Score Original Matt N Formula Score Rank with Original Formula Change 1 Alton Towers 1085.2 196.9 1 0 2 Phantasialand 1060.5 191.7 2 0 3 Liseberg 1033.3 188.2 3 0 4 Grona Lund 975.2 183.2 4 0 5 Efteling 792.4 164.5 5 0 6 Europa Park 767.8 141.4 6 0 7 Toverland 677.6 128.8 7 0 8 Djurs Sommerland 620.3 112.8 9 +1 9 Tripsdrill 609.3 120.2 8 -1 10 PortAventura 577.4 94.5 11 +1 As you can see, doing those first two formulas changed... very little. I then decided to consult a final alteration... Altered Matt N Formula 3: Score = (Highest rating + Upper quartile)/2 For the final formula, I eliminated the lower ends of the coaster selection entirely, focusing only on the highest rating and the upper quartile. I calculated the mean of these two values so as to gauge an average quality of a park's "top" coasters. The results were as follows: Ranking Park Altered Matt N Formula Score Original Matt N Formula Score Rank with Original Formula Change 1 Liseberg 9.6 188.2 3 +2 2 Walibi Holland 9.5 82.3 15 +13 3 Phantasialand 9.5 191.7 2 -1 4 Energylandia 9.3 61.1 19 +15 5 Plopsaland de Panne 9 57.3 20 +15 6 Alton Towers 9 196.9 1 -5 7 Toverland 8.9 128.8 7 0 8 Hansa Park 8.9 87.6 13 +5 9 Parque Warner 8.8 49.6 26 +17 10 Djurs Sommerland 8.6 112.8 9 -1 Interesting to see how things change quite a bit when the lower coasters are removed from the equation... Phantasialand and Liseberg remain in the top 3, but for the first time, Alton Towers has been ousted from the top 3, landing at #6 when only their top coasters are concerned.

Right; apologies for the double post, but I decided to have another go at Part 1. But this time, I did what some people suggested and calculated the mean and median using only the park's 3 top-rated coasters. When I did this, the results were as follows (to 1dp): Mean Park Mean Rating of Top 3 (1dp) Energylandia 9.6 Phantasialand 9.5 Liseberg 9.4 Walibi Holland 9.3 Alton Towers 8.8 Europa Park 8.8 Plopsaland 8.6 Parque Warner 8.6 Toverland 8.5 Heide Park 8.4 Median Park Median Rating of Top 3 (1dp) Energylandia 9.8 Phantasialand 9.6 Mirabilandia 9.6 Liseberg 9.4 Walibi Holland 9.3 Toverland 8.9 Hansa Park 8.7 Europa Park 8.7 Tripsdrill 8.6 Parque Warner 8.6 I hope you find that interesting! I promise that is the last time I will faff around with part 1... part 2 will be coming soon! Do you guys have any questions you'd like me to try and answer using this dataset? I've got a couple in mind of my own, but I'm happy to take suggestions!

Right; sorry to triple post, but I think it's about time I did Part 2 of this! And for Part 2, I'll be exploring... What coaster selections in Europe are the most and least consistent? Now I should clarify that this is not wishing to determine consistent strength, but merely consistency on its own, which can work both ways. So, let's dive straight in! To work this out, I used two different types of range. The first measure I used was the range between the highest and lowest ratings, which is a very simple measure where you merely subtract the lowest value from the highest value (Range = Highest Rating - Lowest Rating). The top 5 most and least consistent using that method were as follows: Top 5 Most Consistent (Using Range) Ranking Park Range Mean Rating (out of 10) (to 1dp) Number of Scoreable Coasters 1 Freizeit-Land Geiselwind 2.5 1.4 5 2 Efteling 3.9 6.5 8 3 Grona Lund 4 6.9 6 4 Flamingo Land 4.4 3.1 5 5 Skyline Park 4.7 4 5 Top 5 Least Consistent (Using Range) Ranking Park Range Mean Rating (out of 10) Number of Scoreable Coasters 1 Energylandia 10 5.7 11 2 Walibi Holland 9.8 6.2 6 3 Walibi Belgium 9.4 4.9 9 4 Mirabilandia 9 4.2 8 5 Plopsaland 8.9 5 7 The other measure I used was the interquartile range between the quartiles (IQR = Upper Quartile - Lower Quartile), which should provide a better gauge of the selection's general consistency and not be too swayed by one particularly highly or lowly rated ride. The top 5 most and least consistent using IQR were as follows: Top 5 Most Consistent (Using IQR) Ranking Park Interquartile Range Mean Rating (out of 10) Number of Scoreable Coasters 1 Blackpool 1 4.8 10 2 Freizeit-Land Geiselwind 1.1 1.4 5 3 Alton Towers 1.3 7.3 9 4 Liseberg 1.4 7.6 5 5 Grona Lund 1.7 6.9 6 Top 5 Least Consistent (Using IQR) Ranking Park Interquartile Range Mean Rating (out of 10) Number of Scoreable Coasters 1 Walibi Rhone-Alpes 6.3 4.5 5 2 Parque Warner 5.9 5.5 6 3 Plopsaland 5.8 5 7 4 Movie Park Germany 5.8 3.8 8 5 Parc Asterix 5.7 5 5 Finally, let me once again reference the boxplot from Part 1, for a visual aid to show this off: Let me once again remind you of the order the parks are in, from left to right: Alton Towers Thorpe Park Blackpool Pleasure Beach Phantasialand Liseberg Walibi Holland Energylandia Plopsaland de Panne Walibi Belgium Europa Park PortAventura Parque Warner Madrid Parque de Atracciones de Madrid Efteling Bobbejaanland Toverland Movie Park Germany Heide Park Hansa Park Flamingo Land Tripsdrill Parc Asterix Gardaland Mirabilandia Djurs Sommerland Farup Sommerland TusenFryd Linnanmaki Bellewaerde Nigloland Skyline Park PowerPark Grona Lund Wiener Prater Walibi Rhone-Alpes Freizeit-Land Geiselwind In terms of how you can visualise the ranges; you can see the range as the difference between the extreme ends of the plot, and the IQR can be visualised as the difference between the ends of the coloured rectangle in the middle. So, what have we learned from this part of the investigation? Firstly, I think I can declare Freizeit-Land Geiselwind the winner for consistency in Europe; it scored very highly on consistency using both measures! Even if the selection isn't the most highly rated, it's certainly consistent if nothing else! Secondly, I found it odd how besides Geiselwind, the results varied drastically dependant on the measure applied. Some parks did appear again besides Geiselwind (for instance, Grona Lund was quite consistently strong by both measures), but many others only appeared in the top 5 for one or the other. But overall, I think my data has concluded that Freizeit-Land Geiselwind is the winner for most consistent in Europe. And for least consistent, I think I can conclude that Plopsaland de Panne actually wins that one, as it is the only park to appear in the top 5 least consistent for both measures. I hope you enjoyed discovering which coaster selection is Europe's most consistent (according to the data) in part 2! Part 3 (which I'm thinking may be the final part) will be coming soon...

Sorry to double post, but I think it's time I did the 3rd and final part of this... today, I'll be investigating: Which coaster selections emphasise quantity over quality and which coaster selections emphasise quality over quantity? Now I'll digress that this one is possibly harder to measure statistically, but it was one I was interested to find out, so I still decided to give it a go! I used 3 different measures to try and work this out. The first measure I used was to work out the median:mean ratio, as it always appeared to me as though a higher median denoted a more consistently strong selection (thus more of a quality focus), while a higher mean denoted a less consistently strong selection (thus more of a quantity focus). To work this out, I simply did median/mean, and the results were as follows (to 2sf)... Top 5 "Quantity over Quality" (Median/Mean) Ranking Park Number of Scoreable Coasters Median/Mean (2sf) 1 Movie Park Germany 8 0.73 2 Freizeit-Land Geiselwind 5 0.74 3 Walibi Rhone-Alpes 5 0.74 4 Mirabilandia 8 0.76 5 Plopsaland 7 0.76 Top 5 "Quality over Quantity" (Median/Mean) Ranking Park Number of Scoreable Coasters Median/Mean (2sf) 1 Flamingo Land 5 1.4 2 Thorpe Park 7 1.2 3 Parque Warner 6 1.2 4 Liseberg 5 1.2 5 Heide Park 8 1.2 The second measure I used was to work out the mean:count ratio, because a park having a high or low mean relative to their coaster count would surely denote whether their coaster selection is quantity over quality or quality over quantity, no? One slight flaw with this method is that any theme park with more than 10 scoreable roller coasters automatically gravitates towards "quantity over quality" by default because you cannot have a mean above 10, although one could argue that having a coaster count of more than 10 makes you quantity-focused to a certain extent anyway... To work this out, I did mean/count, and the results were as follows (to 2sf)... Top 5 "Quantity over Quality" (Mean/Count)

