the video game rating system
TrueSkill is a rating system among game players. It was developed by Microsoft Research and has been used on Xbox LIVE for ranking and matchmaking service. This system quantifies players’ TRUE skill points by the Bayesian inference algorithm. It also works well with any type of match rule including N:N team game or freeforall.
This project is a Python package which implements the TrueSkill rating system:
from trueskill import Rating, quality_1vs1, rate_1vs1
alice, bob = Rating(25), Rating(30) # assign Alice and Bob's ratings
if quality_1vs1(alice, bob) < 0.50:
print('This match seems to be not so fair')
alice, bob = rate_1vs1(alice, bob) # update the ratings after the match
In TrueSkill, rating is a Gaussian distribution which starts from \(\mathcal{ N }( 25, \frac{ 25 }{ 3 }^2 )\). \(\mu\) is an average skill of player, and \(\sigma\) is a confidence of the guessed rating. A real skill of player is between \(\mu \pm 2\sigma\) with 95% confidence.
>>> from trueskill import Rating
>>> Rating() # use the default mu and sigma
trueskill.Rating(mu=25.000, sigma=8.333)
If some player’s rating is higher \(\beta\) than another player’s, the player may have about a 76% (specifically \(\Phi(\frac {1}{\sqrt{2}})\)) chance to beat the other player. The default value of \(\beta\) is \(\frac{ 25 }{ 6 }\).
Ratings will approach real skills through few times of the TrueSkill’s Bayesian inference algorithm. How many matches TrueSkill needs to estimate real skills? It depends on the game rule. See the below table:
Rule  Matches 

16P freeforall  3 
8P freeforall  3 
4P freeforall  5 
2P freeforall  12 
2:2:2:2  10 
4:4:4:4  20 
4:4  46 
8:8  91 
Most competition games follows 1:1 match rule. If your game does, just use _1vs1 shortcuts containing rate_1vs1() and quality_1vs1(). These are very easy to use.
First of all, we need 2 Rating objects:
>>> r1 = Rating() # 1P's skill
>>> r2 = Rating() # 2P's skill
Then we can guess match quality which is equivalent with draw probability of this match using quality_1vs1():
>>> print('{:.1%} chance to draw'.format(quality_1vs1(r1, r2)))
44.7% chance to draw
After the game, TrueSkill recalculates their ratings by the game result. For example, if 1P beat 2P:
>>> new_r1, new_r2 = rate_1vs1(r1, r2)
>>> print(new_r1)
trueskill.Rating(mu=29.396, sigma=7.171)
>>> print(new_r2)
trueskill.Rating(mu=20.604, sigma=7.171)
Mu value follows player’s win/draw/lose records. Higher value means higher game skill. And sigma value follows the number of games. Lower value means many game plays and higher rating confidence.
So 1P, a winner’s skill grew up from 25 to 29.396 but 2P, a loser’s skill shrank to 20.604. And both sigma values became narrow about same magnitude.
Of course, you can also handle a tie game with drawn=True:
>>> new_r1, new_r2 = rate_1vs1(r1, r2, drawn=True)
>>> print(new_r1)
trueskill.Rating(mu=25.000, sigma=6.458)
>>> print(new_r2)
trueskill.Rating(mu=25.000, sigma=6.458)
There are many other match rules such as N:N team match, N:N:N multiple team match, N:M unbalanced match, freeforall (Player vs. All), and so on. Mostly other rating systems cannot work with them but TrueSkill does. TrueSkill accepts any types of matches.
We should arrange ratings into a group by their team:
>>> r1 = Rating() # 1P's skill
>>> r2 = Rating() # 2P's skill
>>> r3 = Rating() # 3P's skill
>>> t1 = [r1] # Team A contains just 1P
>>> t2 = [r2, r3] # Team B contains 2P and 3P
Then we can calculate the match quality and rate them:
>>> print('{:.1%} chance to draw'.format(quality([t1, t2])))
13.5% chance to draw
>>> (new_r1,), (new_r2, new_r3) = rate([t1, t2], ranks=[0, 1])
>>> print(new_r1)
trueskill.Rating(mu=33.731, sigma=7.317)
>>> print(new_r2)
trueskill.Rating(mu=16.269, sigma=7.317)
>>> print(new_r3)
trueskill.Rating(mu=16.269, sigma=7.317)
If you want to describe other game results, set the ranks argument like the below examples:
Additionally, here are varied patterns of rating groups. All variables which start with r are Rating objects:
Let’s assume that there are 2 teams which each has 2 players. The game was for a hour but the one of players on the first team entered the game at 30 minutes later.
If some player wasn’t present for the entire duration of the game, use the concept of “partial play” by weights parameter. The above situation can be described by the following weights:


As a code with a 2dimensional list:
# set each weights to 1, 0.5, 1, 1.
rate([(r1, r2), (r3, r4)], weights=[(1, 0.5), (1, 1)])
quality([(r1, r2), (r3, r4)], weights=[(1, 0.5), (1, 1)])
Or with a dictionary. Each keys are a tuple of (team_index, index_or_key_of_rating):
# set a weight of 2nd player in 1st team to 0.5, otherwise leave as 1.
rate([(r1, r2), (r3, r4)], weights={(0, 1): 0.5})
# set a weight of Carol in 2nd team to 0.5, otherwise leave as 1.
rate([{'alice': r1, 'bob': r2}, {'carol': r3}], weights={(1, 'carol'): 0.5})
The TrueSkill algorithm uses \(\Phi\), the cumulative distribution function; \(\phi\), the probability density function; and \(\Phi^{1}\), the inverse cumulative distribution function. But standard mathematics library doesn’t provide the functions. Therefore this package implements them.
Meanwhile, there are thirdparty libraries which implement the functions. You may want to use another implementation because that’s more expert. Then set backend option of TrueSkill to the backend you chose:
>>> TrueSkill().cdf # internal implementation
<function cdf at ...>
>>> TrueSkill(backend='mpmath').cdf # mpmath.ncdf
<bound method MPContext.f_wrapped of <mpmath.ctx_mp.MPContext object at ...>>
Here’s the list of the available backends:
Note
When winners have too lower rating than losers, TrueSkill.rate() will raise FloatingPointError. In this case, you need higher floatingpoint precision. The mpmath library offers flexible floatingpoint precision. You can solve the problem with mpmath as a backend and higher precision setting.
Represents a player’s skill as Gaussian distrubution.
The default mu and sigma value follows the global environment’s settings. If you don’t want to use the global, use TrueSkill.create_rating() to create the rating object.
Parameters: 


A property which returns the mean.
A property which returns the the square root of the variance.
Implements a TrueSkill environment. An environment could have customized constants. Every games have not same design and may need to customize TrueSkill constants.
For example, 60% of matches in your game have finished as draw then you should set draw_probability to 0.60:
env = TrueSkill(draw_probability=0.60)
For more details of the constants, see The Math Behind TrueSkill by Jeff Moser.
Parameters: 


Initializes new Rating object, but it fixes default mu and sigma to the environment’s.
>>> env = TrueSkill(mu=0, sigma=1)
>>> env.create_rating()
trueskill.Rating(mu=0.000, sigma=1.000)
Returns the value of the rating exposure. It starts from 0 and converges to the mean. Use this as a sort key in a leaderboard:
leaderboard = sorted(ratings, key=env.expose, reverse=True)
New in version 0.4.
Registers the environment as the global environment.
>>> env = TrueSkill(mu=50)
>>> Rating()
trueskill.Rating(mu=25.000, sigma=8.333)
>>> env.make_as_global()
trueskill.TrueSkill(mu=50.000, ...)
>>> Rating()
trueskill.Rating(mu=50.000, sigma=8.333)
But if you need just one environment, setup() is better to use.
Calculates the match quality of the given rating groups. A result is the draw probability in the association:
env = TrueSkill()
if env.quality([team1, team2, team3]) < 0.50:
print('This match seems to be not so fair')
Parameters: 


New in version 0.2.
Recalculates ratings by the ranking table:
env = TrueSkill() # uses default settings
# create ratings
r1 = env.create_rating(42.222)
r2 = env.create_rating(89.999)
# calculate new ratings
rating_groups = [(r1,), (r2,)]
rated_rating_groups = env.rate(rating_groups, ranks=[0, 1])
# save new ratings
(r1,), (r2,) = rated_rating_groups
rating_groups is a list of rating tuples or dictionaries that represents each team of the match. You will get a result as same structure as this argument. Rating dictionaries for this may be useful to choose specific player’s new rating:
# load players from the database
p1 = load_player_from_database('Arpad Emrick Elo')
p2 = load_player_from_database('Mark Glickman')
p3 = load_player_from_database('Heungsub Lee')
# calculate new ratings
rating_groups = [{p1: p1.rating, p2: p2.rating}, {p3: p3.rating}]
rated_rating_groups = env.rate(rating_groups, ranks=[0, 1])
# save new ratings
for player in [p1, p2, p3]:
player.rating = rated_rating_groups[player.team][player]
Parameters: 


Returns:  recalculated ratings same structure as rating_groups. 
Raises:  FloatingPointError occurs when winners have too lower rating than losers. higher floatingpoint precision couls solve this error. set the backend to “mpmath”. 
New in version 0.2.
Default initial mean of ratings.
Default initial standard deviation of ratings.
Default distance that guarantees about 76% chance of winning.
Default dynamic factor.
Default draw probability of the game.
A shortcut to rate just 2 players in a headtohead match:
alice, bob = Rating(25), Rating(30)
alice, bob = rate_1vs1(alice, bob)
alice, bob = rate_1vs1(alice, bob, drawn=True)
Parameters:  

Returns:  a tuple containing recalculated 2 ratings. 
New in version 0.2.
A shortcut to calculate the match quality between just 2 players in a headtohead match:
if quality_1vs1(alice, bob) < 0.50:
print('This match seems to be not so fair')
Parameters: 


New in version 0.2.
Setups the global environment.
Parameters:  env – the specific TrueSkill object to be the global environment. It is optional. 

>>> Rating()
trueskill.Rating(mu=25.000, sigma=8.333)
>>> setup(mu=50)
trueskill.TrueSkill(mu=50.000, ...)
>>> Rating()
trueskill.Rating(mu=50.000, sigma=8.333)
A proxy function for TrueSkill.rate() of the global environment.
New in version 0.2.
A proxy function for TrueSkill.quality() of the global environment.
New in version 0.2.
A proxy function for TrueSkill.expose() of the global environment.
New in version 0.4.
Returns a tuple containing cdf, pdf, ppf from the chosen backend.
>>> cdf, pdf, ppf = choose_backend(None)
>>> cdf(10)
7.619853263532764e24
>>> cdf, pdf, ppf = choose_backend('mpmath')
>>> cdf(10)
mpf('7.6198530241605255e24')
New in version 0.3.
Detects list of available backends. All of defined backends are None – internal implementation, “mpmath”, “scipy”.
You can check if the backend is available in the current environment with this function:
if 'mpmath' in available_backends():
# mpmath can be used in the current environment
setup(backend='mpmath')
New in version 0.3.
Released on Dec 31 2015.
Fixed documentation error. See issue #11. Thanks to Russel Simmons.
Released on Sep 4 2014.
Fixed ordering bug on weights argument as a dict. This was reported at issue #9.
Released on Mar 25 2013.
Released on Mar 6 2013.
Changed to raise FloatingPointError instead of ValueError (math domain error) for a problem similar to issue #5 but with more extreme input.
Released on Mar 5 2013.
TrueSkill got a new option backend to choose cdf, pdf, ppf implementation.
When winners have too lower rating than losers, TrueSkill.rate() will raise FloatingPointError if the backend is None or “scipy”. But from this version, you can avoid the problem with “mpmath” backend. This was reported at issue #5.
Released on Nov 30 2012.
Released on Oct 5 2012.
Fixed ZeroDivisionError issue. For more detail, see issue#3. Thanks to Yunwon Jeong and Nikos Kokolakis.
Released on Jan 12 2012.
Fixed an error in “A” matrix of the match quality algorithm.
First public preview release.
There’s the list for users. To subscribe the list, just send a mail to trueskill@librelist.com.
If you want to more details of the TrueSkill algorithm, see also:
This TrueSkill package is opened under the BSD license but the TrueSkill™ brand is not. Microsoft permits only Xbox Live games or noncommercial projects to use TrueSkill™. If your project is commercial, you should find another rating system. See LICENSE for the details.
I’m Heungsub Lee, a game developer. Any regarding questions or patches are welcomed.