Ranking Feature
Revisiting the US Open Forecast
by Dallas Oliver, 16 September 2015
Share: | |
| | |
|
|
The last slam of the 2015 year is now over, and the eyes of the tennis world are finally moving away from New York. Prior to the tournament, Tennis Recruiting put out a set of heat maps to forecast the boys and girls singles draws - with heat maps for the men and women thrown in for good measure. Once again, those heat maps showed probabilities for all players to reach various rounds of the tournament - and also provided estimates of the numer of upsets.
So... how did we do? What did we get right? Or get wrong? This article evaluates how we did with our predictions. The numbers junkies can take a look at detailed analysis of all matches in the US Open - and other tournaments - in our
post-tournament analysis which you can access by
clicking here. But for now, let's take a broad look at the tournament...
True to Form
As usual, our heat maps provided predictions for the number of expected upsets, where an upset is defined as a lower-rated player defeating a higher-rated player. For example, there were 63 matches in each of the US Open junior events, and our system predicted 18 upset wins for the boys and 19 for the girls. As you can see from the heat maps, there ended up being 15 and 20 upsets, respectively. We also offer predictions for the number of seeds that would be upset by unseeded or lower-seeded opponents.
Looking across all events at the US Open - juniors and open - we see that our predictions held up very well:
Of the 358 overall matches, our system overestimated the number of upsets by 9 (2.5%). We did even better for the junior matches, estimating 37 upsets when there were actually 35 - within 1.6%. Our estimates for seeded upsets were not quite as accurate, but they were still within 7.2% for the junior matches and 5.6% for all US Open matches.
Inside the Numbers
The broad numbers look good, but how convincing are they? Were the upsets in the matches we predicted?
To answer that question, we group the upset bids into five sets: 0-10% win probability, 10-20%, 20-30%, 30-40%, and 40-50%. If our forecast is accurate, we would expect underdogs in the 0-10% group to win about 5% of the time, underdogs in the 10-20% group to win about 15% of the time, and so on.
The numbers work out fairly well. The percentage of actual upsets generally increases with our predicted percentage groups. All of our estimates were within 10% of the actual number of upsets. Overall, our system did a respectable job of knowing which match-ups were more evenly-matched and which ones had clear favorites.
It is interesting to note one discrepancy that we would not have expected: the 30-40% underdogs actually won a higher percentage of matches (40.7%) than the 40-50% underdogs (38.1%).
Prediction Redux
In addition to upsets, we used our model to make four predictions for the US Open. Let's review those predictions...
i) Players with the best chance of advancing to the semifinals are Taylor Fritz, Tommy Paul, Michael Mmoh, Reilly Opelka, and Claire Liu.
The American boys had a phenomenal run at the US Open. Fritz and Paul both met expectations - not only reaching the semifinals, but squaring off in an All-American Final this past Sunday. Mmoh and Opelka reached the quarterfinals and the round of 16, respectively.
The American girls had an unexpectedly strong showing as well (more on that later), but Liu was upset by Russia's Elena Rybakina in the first round.
ii) Top-four seeds with the toughest roads to the semifinals are Anna Blinkova, Seongchan Hong, and Tereza Mihalikova.
This prediction held true to form. No. 3 Blinkova and No. 4 Mihalikova went out in the first and second rounds, respectively, while No. 4 Hong reached the quarterfinals before falling to American Tommy Paul.
iii) American boys collectively had an 82% chance of getting one boy into the final, a 33% chance of an all-American final, and a 65% chance of winning the title. The American girls together had a 62% chance of reaching the final and a 39% chance of winning. Again, the American boys delivered in New York. Half of the quarterfinalists were from the U.S., and Fritz and Paul - our pre-tournament favorites - played a great three-set match in the final.
The US girls had an impressive showing as well, with many of them exceeding our expectations. Our system gave unseeded Kylie McKenzie a 15% chance of advancing to the quarterfinals before the tournament began, but she defeated two higher-rated and higher-seeded players to do so. Unseeded Francesca Di Lorenzo advanced all the way to the US Open semifinals - knocking off four seeds in the process - even as our system believed she only had a 3% chance of reaching that round. And, of course, No. 9 seed Sofia Kenin reached the championship final.
iv) Our forecast called for a very interesting first round of the tournament - with 17 upsets overall and nine upsets of seeded players in the 64 first-round matches.
The first round played out pretty close to how we expected. There were 20 upsets in the first round - and seven upsets of seeds.
Predicting the Pros
We ran a fun experiment this time with the Men's and Women's Open divisions just to see how things played out. Overall things looked similar to what we have seen in the junior divisions for all the tournaments that we analyzed in the past. We may continue predictions for open events of future Grand Slams because our users seemed to like it - even though professional tennis is not the focus of our website.
Speaking of the open divisions, we would be remiss if we did not give kudos to the ATP points ranking system. In the 15 tournament that we have analyzed, The US Open Men's Championships is the only time a points system has outperformed both our own head-to-head system and the one used by Universal Tennis. Of the 113 matches in the Men's division (discounting defaults, walkovers, and retirements), the point system was correct 92 times compared to the 88 times that the Tennis Recruiting predictions were correct and the 84 times that the higher-rated UTR player won. We plan to do more work over the coming year to see how the professional points systems perform in the 2016 Grand Slam events.
Updating the Numbers
We have added the 2015 US Open as the fifteenth tournament in our analysis of international, national, and sectional events. You can click here to see our analysis for fifteen different tournaments. For each of these tournaments, we used player ratings immediately prior to the tournament start as the basis of our predictions - and you can access our US Open pre-tournament data directly by clicking here.
Frequently Asked Questions We got even more questions and comments on our rating ans ranking system than usual by email and on Twitter during this heat map activity. Here are a couple of questions that might have general interest...
(1) Does a 60% expected win percentage mean the match will be close? Does a 90% expected win percentage mean that it will be a blowout?
Not at all. Although a favorite with a 60% win percentage may be more likely to have a competitive match that a 90% favorite, our system actually makes no assessment about anticipated scores for the matches. Our system merely estimates how often it expects each player to win if the two play several times - coming up with an expected win percentage (EWP).
Using the numbers from above, a 60% EWP for a favorite in a particular matchup means that we think the favorite would win six times if the two played ten matches - or three times if they play five matches. A 90% EWP means that we expect the favorite to win nine matches if the two players meet ten times.
In practice, players do not often meet ten times head-to-head, and our system does not know when the one time out of ten that a player with a 90% EWP might lose. But our system fully expects the player with a 90% EWP to lose on average about one out of every ten of those matches.
(2) Before the US Open women's quarterfinals, we posted on Twitter that our system gave Serena a 40% chance to win the calendar slam - a post that was much-maligned by the community. What happened with that comment?
Perhaps it is just because we are numbers geeks, but we were impressed with how high that 40% number actually was. Our system pegged Serena with a 40% chance to win it all, while the other seven women combined only had a 60% chance collectively to win the title.
One thing many people do not understand is that the 40% number was a cumulative probability of winning all three matches from the quarterfinals on. The system gave Serena a 73% chance of beating Venus (which perhaps you could argue was too high, but certainly Venus would win some matches if they played a series). We then gave Serena an 87% chance to win her semifinal match and a 64% chance to defeat her opponent in the final. We had her as a significant favorite to win all of her matches, but the math says that winning all three in a row is tough to do - yielding a 40% chance overall. I would also note that the next-highest chance of winning the title our system gave to anyone not named Serena prior to the quarterfinals was 16%.
Leave a Comment
More Ranking Articles
19-Sep-2017
Comparing Rating Algorithms
During the U.S. Open we posted information on social media about matchups involving American players - previewing matches by displaying rating and ranking differences and highlighting wins by underdogs. You might have noticed that the various rating systems often agreed on their favorites. But what about the close matches where they disagreed? Let's take a look ...
13-Jul-2016
Behind the Tennis Recruiting Rankings
This week marks the 569th consecutive week that Tennis Recruiting has
put out graduation-based rankings of American junior boys and girls.
Rankings are front and center on Tennis Recruiting, and people often ask
us how our rankings work. Today we describe our ranking system ...
19-May-2016
What Is An Upset?
Tennis Recruiting is a website that rates and ranks junior tennis
players. One of the questions we get most often from our users is,
"What exactly is an upset?" There are many possible
definitions of an upset - this article explores the question and
puts forward an answer.