A Lightning Introduction to Proper Scoring Rules

Perhaps in the last election you watched Polymarket.com as a source of information about who was likely to win the US Presidency.

This election, prediction markets proved to be an accurate source of up-to-date information about who was likely to win. Much more accurate than the polls.

2024 US Presidential Election Market on Polymarket.com

Since these markets are so important, perhaps you wondered how they work. How do they decide how many shares to give for every dollar someone bets? How do they decide on probabilities? After all, if they got this only slightly wrong is could entirely screw up the trustworthiness of the market.

On these bigger markets it's quite easy for Polymarket to figure out the right price to sell each share: they agree to pay out $1 to all the holders of Yes Donald if Donald gets elected. And they then allow people to place bids for however many shares they wish to buy at any price they want to buy it at. The holders of No Donald can also indicate the prices at which they would be willing to sell. All told this makes up what's called on Order Book — a long list of all the buy and sell offers that people have made — and the balance of the capital is what gives us the probability. If ever there's an overlap because a buyer that wants what a seller is offering, they can make the deal.

a peek inside the Polymarket Order Book for Donald's 2024 election. You can see that lots of people want to buy at 48.3 and below, not many people want to sell at 49.8 (the Asks are empty)

However, there's a problem with an Order Book: Order Books require a lot of buy and sell offers. Imagine there are only two people in the market, one person makes an offer to buy for $0.50 and another person makes an offer to sell for $0.90. Neither of them get to fill their orders — the seller doesn't get any cash and the buyer doesn't get any shares. Also, what probability should the market show? Is it almost 90% likely since that's where the seller is? Or is it only as much as 50% likely since that's where the buyer is? If in the middle, where?

These are the problems of what are called "thinly traded markets", markets that don't have a large enough volume of orders, and there are a lot of these markets out there.

For some of these thinly traded markets, where you can't build out a whole Order Book, there's an alternative structure you can use — these are called Automated Market Makers, AMMs for short. Polymarket occasionally uses (or at least in 2022 they used to) AMMs for some of their thinner markets. AMMs offer an alternative to the traditional order book: rather than prices coming directly from trades, prices are set automatically according to an algorithm. It's sort of like instead of trading against one another, you trade against the algorithm — and that's exactly our goal since there aren't enough people to trade against.

Let's use an example: let's say there's a market that asks, "What will the weather be tomorrow?"

There are two options you can bet on: Cloudy and Sunny.

What we'd like to do is to allow you to provide your report as to what you expect the weather will be, and then we'll pay you based on your accuracy.

The very first way you might answer these questions might be to pick the simplest setup possible: let's just pay people $1 if they guess correctly, and $0 otherwise.

The problem that you'll find here is that we miss out on a lot of information. If you think it will be Sunny tomorrow, you'll just report "Sunny", but what if you just barely think it will be Sunny (51%) or what if you're utterly confident it will be Sunny (99%)? By only paying you for being accurate we miss out that information (and that might be important when deciding whether we should bother carrying our umbrella or our parasol). So, how can we get that information about the probability of Sunny out of you?

Well, there are two things we care about. First, we know that you believe there's some probability of the event happening, let's call this p, a number between 0 and 1. And second, there's the probability that you are going to include in your report, let's call this q, also a number between 0 and 1. We have no guarantee that these two numbers will be the same, though we would like them to be.

To begin with, what we as the market designers just decided to pay you based on the probability you submitted in your report. So, if you submit 90% and it's Sunny, you get $0.90. If you submit 10% and it's Sunny you get $0.10.

Let's say that you believe it's 60% likely that it will rain tomorrow. What should you report?

Well, you could report 60%, and in that case you expect to get a $0.60 payment about 60% of the time. A typical way to think about this quantity is with the expected value, which is the value you expect to get times the likelihood you think you'll get it. It's sort of a summary of the value of taking a particular action. In this case, 60% × $0.60 = 0.36 or 36 cents. But of course, you could be wrong. In the case that it's actually Cloudy you'll get the payout of 1 - 60% = 0.4 or 40 cents. Since you expect it to be Sunny 60% of the time you should get $0.40 about 40% of the time. If you calculate the expected value of that you'll get 40% × $0.40 = $0.16. So, there's two ways you could win if you report the 60% that you believe, one has the value of $0.36 and the other has the value of $0.16, so it total on average you should expect to win about $0.52.

But, is this the best you can do?

Well, let's try something else. First, let's rewrite that calculation as a single equation:

S(p, q) = p × q + (1 - p) × (1 - q)

You can see the two ways you can win: p × q if you're right, plus (1 - p) × (1 - q) if you're wrong.

If you plug in our probability (p) and our reported probability (q) which were both 60%, you get out your expected value 0.52. But you don't have to report accurately. What if instead you put in 70%?

S(p = 60%, q = 70%) = 60% × 70% + (1 - 60%) × (1 - 70%) = 0.54 Better!

In fact, you can keep getting better, if you were to publish 80% you'd get 0.56 or 56 cents, with 90% you'd get 0.58 and with 100% 0.60!

Well that's not what we want. By lying about your beliefs you can get better scores. We definitely don't want to give you a reason to lie about your beliefs, that would be improper. A proper scoring rule would pay you the most only when you're being honest. It seems we do not yet have a proper scoring rule. Here's how bad it is:

http://desmos.com/scoring-rule-attempt-2

On this graph, we're plotting the different expected scores you could get on the vertical axis, as we imagine changing the reports that you might give on the horizontal axis. The green line shows the payoffs for each report, the red line shows the height of the maximum score, and the blue line shows the actual probability p you believe the Sunny day will occur, which is the thing you're changing that causes the graph to move around.

In our perfect world, the red line, the blue line and the green line will all intersect in one place — that would be the sign that we're giving the highest score / payout (red line) to the report (green line) that lines up with your actual believed probability (blue line). But we don't have that yet. Is there a way to get it?

I'm so glad you asked!

To be honest, just by looking at this you probably have an idea for how we could fix our problem. After all, all that we really want is for the highest payout S(p, q) to happen where p = q. The problem seems to be that we keep on giving more payouts as we get to bigger numbers.

Perhaps this gives you an idea for how to fix this. How could we ensure that red, blue, and green all intersect at the same place?

The problem seems to be that when the blue line is less than 50%, the green line goes too high on the left. How can we push down the green line on the left so that the blue line is the place where the highest expected score is paid out?

Perhaps this is enough for you to guess, "What if we just bent the green line a little bit so that we stopped paying more out?"

That's sort of exactly what most proper scoring rules do. One easy way to bend a curve is with a quadratic:

1 - q^2

We can compute the expected value of this score with this equation:

S(p, q) = 1 - (q - p)^2

How's it look?

http://desmos.com/scoring-rule-quadratic

This is a quadratic scoring rule, so named because of the power of 2. You can see the blue line, red line, and green line all intersect in the same place just like we hoped. Yay! With this simple rule we can reward you the most when you're honest about what you expect to occur.

The quadratic scoring rule is only one possible scoring rule. The most popular scoring rule is called the log scoring rule because it uses the logarithm of the report q:

1 + log(q)

(For nerds: throughout, I'll use log to mean the natural log ln i.e. log base e. Trying to min concept count.)

The expected score for for 1 + log(q) is this:

S(p, q) = 1 + p × log(q) + (1 - p) × log(1 - q)

you can see how the structure of the expectation for our initial scoring rule is still visible, it's just wrapping each q in a log(). Original S(p, q) for easy comparison:

S(p, q) = p × q + (1 - p) × (1 - q)

So, now we've found several rules that allow us to provide the biggest rewards to players when they tell the truth about their beliefs. How do we turn this into a prediction market?

From Proper Scoring Rules to Market Scoring Rules

So is that it? The way you give people the most money only when they accurately report their private beliefs is simply by bending a little curve?

Unfortunately, we don't yet have the ability to make a market. Yes, it's true that we now have a score that goes highest when you're honest, but now we need to do use our proper scoring rule to construct what's called a market scoring rule.

A market scoring rule does two things:

it incorporates all the buy and sell trades into a single number that represents the probability of an event occurring.

it calculates how much it will cost for a player to buy more shares (or, how much they earn for selling them).

Think of it this way: in a proper scoring rule, we score a single prediction against reality. But in a market, each new trader is effectively saying "I disagree with the current prediction. Here's an update." For the market to make sense, several things should be true:

When a player disagrees with the market's current conclusion, they should be able to nudge the price, either by buying more shares or by selling their existing shares.
The amount they should expect to be paid out should obey the scoring rule we found before, because that's what ensures they're honest about their beliefs
The total amount of money that the market needs to pay out in the end should be accumulated into the pot by the traders as they make trades, otherwise it could get very expensive for us to run this market.
The price that the market shows should be between 0 and 1 (since no event has more than a 100% probability of occurring, nor less than a 0% probability), and the price of NO plus the price of YES should equal 100% (since together they are everything that can happen — YES or NO).

If you think about it, this is a pretty long list of constraints — it would be surprising if this were even possible at all. Can we really find a market scoring rule that can do all these things?

I'll give you some bad news up front: with the quadratic scoring rule I showed you before (also called the Brier score) does not come with a limit on the losses it could experience in the worst case. If the market price swings drastically, the potential payments can exceed what any finite set of traders have contributed. So the fact that I'm about to show you that the log rule can do this is something special.

Deriving the Log Market Scoring Rule

Typically, markets make the deal with traders that if the market resolves to YES they will pay out $1 per YES share, and $0 otherwise.

To make this simple, let's just think about what happens when the market resolves to YES. Remember that the trick to make this a proper scoring rule is that we gave a payout of log(q) for whatever the report q was.

Now, let's say you're going to buy a single share of YES, what price should the market charge you for it?

Well, clearly the market already believes something, there have been other traders there, or the person who set up the market started it with their best guess. Let's assume that the market started with q = 0.5. And let's say that you believe the real likelihood is more like 60%, or 0.6. Now, you shouldn't get the full reward for updating the market from 0 to 0.6. Instead, you should get the difference in reward between what the rest of the market has already contributed log(0.5) and the reward due to you thanks to how you've updated the market: log(0.6). In other words, once we've charged you for the new shares, the total earnings you stand to gain should equal log(0.6) - log(0.5).

"Why?" You might ask. Fair question.

If we gave you log(0.6) directly, then it would reward you for taking the market from 0% all the way up to 60%. But you're not doing that, you're taking the market from 50% up to 60%, and that means we need to reward the people that already took us to 50% with what's due to them. How much reward do they get? With a log scoring rule they get log(50%), leaving you with whatever's left, namely log(60%) - log(50%). Make sense?

This means that once we hold the $1 share, it should have costed us enough that it amounts to being exactly equal to the amount the proper scoring rule owes us for moving the market. I.e.:

$1 if YES Share - Purchase Fee = log(0.6) - log(0.5)

Another way to say the same thing is, "Once I've paid the purchase fee, assuming it resolves YES, the total reward I get should be the difference between the score everyone else got for their report, and the score I would get for my report."

You can see that leaves us with only one thing to figure out: the purchase fee.

As the market offering these deals, the key thing to keep track of is the total amount that we might lose if the the worst were to happen (e.g. if we sell 100 YES shares and it resolves YES then we owe $100). This is the cost that could be charged to us, and therefore we call it the Cost (p good name, huh?). The Cost cares only about two things: how many YES and NO shares have been sold so far so that we can figure out how many claims are outstanding. If we know those numbers we know how much we could possibly need to pay out.

But do you notice something about this? If we've defined the Cost correctly, not only would it tell us the total cost of the shares to us, but it would also imply the purchase fee we should charge you for making the change. If we want to make sure we always have enough money to pay you back after you buy these shares, then we know we need to charge you exactly the difference between the total cost we were risking before you made the trade and the cost after you made the trade. In other words:

Purchase Fee = Cost(YES + shares, NO) - Cost(YES, NO)

and, therefore:

$1 if YES Share - [Cost(YES + shares, NO) - Cost(YES, NO)] = log(0.6) - log(0.5)

This tells us something special: if we want to know how much a YES share at this exact moment is worth we could find it with these Cost functions. If we did Cost(YES + 1, NO) - Cost(YES, NO) we'd overshoot it, because the price right now isn't YES + 1, it's just YES. If we did Cost(YES + 0.5, NO) - Cost(YES, NO) / 0.5 we'd be a bit closer, it would tell us how much a 50 cent increment would cost. And then if we did Cost(YES + 0.1, NO) - Cost(YES, NO) / 0.1 we'd be even closer. If we keep making it smaller then we can get our Price(YES, NO) with [Cost(YES + 0.00...001, NO) - Cost(YES, NO)] / 0.00...001 where 0.00...001 is some very small number.

This is the plain way of saying that the derivative of Cost with respect to YES is the Price of YES. That's a very important fact because it's going to unlock the Cost function for us. See, the Price function has an additional constraint that the Cost function doesn't have: we know that Price should always remain between 0% and 100%.

Another way to think about this (thank you to my mother for this explanation) is that the amount that you would have to pay to buy one more share of YES eventually skyrockets, becoming prohibitively expensive as you purchase more shares of YES. This is what keeps the price bounded. (Note: in this graph, we've scaled it to be bounded between 0 and 100 instead of 0 and 1 so it's easier to read):

desmos.com/price-deriving-log-market-scoring-rule
SPOILERS! Don't look at the equations on the left!

Now that we know this, we can use our two constraints on price from earlier to define the price function:

Price is always within the range 0% to 100%
Price if YES + Price if NO = 100%

Ideally, it would also have a chance of conforming to our log scoring rule. What function can we use for Price that will fit these constraints?

If you're familiar with mathematics there's a function that might immediately come to mind: the logistic function.

A common way to write the logistic function is as:

logistic(x, w) = e^x / e^x + e^w

Where x is the dependent variable, and w shifts the curve left and right. Here's a desmos you can play with.

This logistic function has all the properties we hope for:

It's always in the range 0% to 100%
logistic(a, b) + logistic(b, a) always equals 1 (imagine flipping the curve backward and then adding the two by their height above the horizontal axis — it always sums to 1!).

So, now let's try defining Price using the equation for our logistic function:

Price(YES, NO) = e^YES / (e^YES + e^NO)

As we showed in the graph above, we know the relationship that Cost should have with Price, namely that the derivative of Cost is the Price. We can use that information to go the other way — that if we sum up all the Prices from 0 shares to all the YES shares, we should recover the cost function. Mathematics gives us nearly magical machinery for doing this, called the integral, which tells us what our cost function should be. It comes out to this:

Cost(YES, NO) = log(e^YES + e^NO)

One crucial property of this function is that the difference in this Cost between the old and new states always equals the difference in the logs of the probabilities. In other words, if you push the market from a probability q_old to q_new, you end up paying exactly "log(q_new) - log(q_old)." This is by design: once we decide the derivative of Cost has to be a valid probability (that logistic shape), it turns out the difference in Cost automatically becomes "log(q_new) - log(q_old)." That's precisely how the log scoring idea gets preserved at each update in a live market! Amazingly, our logistic curve just works.

Let's see this work with a concrete example. Imagine you want to move the market from 50% to 60%.

When the price is 50%, the YES and NO shares must be equal (call both x), since:

price = e^x / (e^x + e^x) = 0.5

To move the price to 60%, you need to buy some amount d of YES shares. In other words, we're going from:

Before: (YES = x, NO = x)
After: (YES = x + d, NO = x)

For this new position to give us a 60% price:

0.6 = e^(x + d) / (e^(x + d) + e^x)

Solving this equation tells us d = log(1.5). This means to push the price from 50% to 60%, you need to buy log(1.5) ≈ 0.405 YES shares.

What's the cost? Using our cost function:

Purchase Fee = Cost(x + log(1.5), x) - Cost(x, x) = log(e^(x + ln(1.5)) + e^x) - log(e^x + e^x) = log(1.25e^x + e^x) - log(2e^x) = log(1.25) ≈ 0.223

If YES happens, you'll get $1 per share, or $0.405 total. Your net profit will be:

0.405 - 0.223 = 0.182

And look at that — 0.182 is exactly log(0.6) - log(0.5)! Wooooo! 🎊

(Remember, this is log base e, so if you're double-checking the math type ln in place of log)

Almost magically the market scoring rule has preserved the incentive structure of our original log scoring rule, while letting traders continuously update the market's probability estimate.

In fact, this approach extends beautifully beyond just YES/NO markets. The same ideas — using exponentials for prices and log-sum-exp for costs — work just as well for markets with multiple outcomes like, "Who will be elected president?" or "What's the next day will it rain?". This is another part of why these market scoring rules have become so widely used in practice.

We've now implemented our entire market, we've finished converting our proper scoring rule into a market scoring rule, and it all boiled down to just two simple equations:

price = e^YES / (e^YES + e^NO)

cost = log(e^YES + e^NO)

You can play with this here: http://desmos.com/log-market-scoring-rule

Now, in practice you might see these with a b parameter added which changes how fast the market prices change. This is called a liquidity parameter. You can play with it in the desmos graph above, but the key ideas remain the same.

price = e^(YES / b) / (e^(YES / b) + e(NO Shares / b)

cost = b * ln(e^(YES Shares / b) + e^(NO Shares / b))

Conclusion

And with that, we've solved the puzzle of how to make continuous prediction markets work. Starting with a simple desire to reward honest probability reports, we discovered proper scoring rules. Then by carefully considering what a market needs — prices that behave like probabilities and costs that preserve scoring incentives — we found our way to the elegant log market scoring rule. What started as a theoretical question about scoring predictions has given us practical tools for running real-world prediction markets, from presidential elections to tomorrow's weather.

Want to implement it yourself? Here's a prediction market in a spreadsheet (link). I knight you a bookie.

Pedagogy liberally borrowed from Aaron Roth's excellent lecture which you can find here.

Thank you to Sylvain Chevalier, Joel Miller, Volky, and Leona & Gil McCormick for review, corrections, and suggestions.

To visit the holy site, see Robin Hanson's 2002 paper here.

Connor McCormick ☀️

Commented 5 months ago

I still think this is one of my best essays https://paragraph.xyz/@ngi/scoring-rules

James McComish

Commented 5 months ago

thanks for writing it 👌 it helped me get my head around scoring/ market scoring rules when I was looking into them last week

Trigs

Commented 5 months ago

Wow. Lol I'm not a math head; that was intense! It was a pretty compelling read even glossing over the parts that went over my head. Excellent writing!

Connor McCormick ☀️

Commented 6 months ago

Have you ever wondered how continuous prediction markets really work? I've written you an accessible deep dive on them — even my mother understood it! Bookmark for when you've recovered from your hangover. https://paragraph.xyz/@ngi/scoring-rules

A Lightning Introduction to Proper Scoring Rules

Network Goods Institute

A Lightning Introduction to Proper Scoring Rules

A Lightning Introduction to Proper Scoring Rules

With Market Scoring Rules as thunder.

From Proper Scoring Rules to Market Scoring Rules

Deriving the Log Market Scoring Rule

Conclusion