Saturday, March 05, 2016

Forecasting Elections

This wild and crazy election cycle is generating an enormous amount of data that social scientists will be pondering for years to come. We are learning about the beliefs, preferences, and loyalties of the American electorate, and possibly witnessing a political realignment of historic proportions. Several prominent republicans have vowed not to support their nominee if it it happens to be Trump, while a recent candidate for the Democratic nomination has declared a preference for Trump over his own party's likely nominee. Crossover voting will be rampant come November, but the flows will be in both directions and the outcome remains quite uncertain. 

Among the issues that the emerging data will be called upon to address is the accuracy of prediction markets relative to more conventional poll and model based forecasts. Historically such markets have performed well, but they have also been subject to attempted manipulation, and this particular election cycle hasn't really followed historical norms in any case.

On Super Tuesday, the markets predicted that Trump would prevail in ten of the eleven states in play, with the only exception being a Cruz victory in his home state of Texas. This turned out to be quite poorly calibrated, in the sense that all errors were in a single direction: the misses were Oklahoma and Alaska (which went to Cruz) and Minnesota (where Rubio secured his first victory). But the forecasters at FiveThirtyEight also missed Oklahoma and were silent on the other two so no easy comparison is possible. 

Today we have primaries in a few more states, and another opportunity for a comparison. I'll focus on the Republican side, where voting will occur in Kansas, Kentucky, Louisiana and Maine. Markets are currently predicting a Cruz victory in Kansas (though the odds are not overwhelming):


In contrast, FiveThirtyEight gives the edge to Trump, though again it's a close call:


The only other state for which we have predictions from both sources is Louisiana, but here there is negligible disagreement, with Trump heavily favored to win. Trump is also favored by markets to take Kentucky and Maine, for which we have limited polling data and no predictions from FiveThirtyEight. 

So one thing to keep an eye out for is whether Trump wins fewer than three of the four states. If so, the pattern of inflated odds on Super Tuesday will have repeated itself, and one might be witnessing a systematically biased market that has not yet been corrected by new entrants attracted by the profit opportunity. 

But if the market turns out to be well-calibrated, then it's hard to see how Rubio could possibly secure the nomination. Here's the Florida forecast as of now:


The odds of a Trump victory in Michigan are even higher, while Kasich is slightly favored in Ohio. Plenty of things can change over the next couple of weeks, but based on the current snapshot I suspect that there is a non-negligible probability that Rubio may exit the race before Florida to avoid humiliation there, while Cruz and Kasich survive to the convention. This is obviously not the conventional wisdom in the media, where Rubio continues to be perceived as the establishment favorite. But unless things change in a hurry, I just don't see how this narrative can be sustained.

---

Update (March 5). The results are in, with Cruz taking Kansas and Maine and Trump holding on to Kentucky and Louisiana. The only missed call by the prediction markets was therefore Maine. Still, the significant margins of victory for Cruz in Kansas and Maine suggest to me that traders in the aggregate continue to have somewhat inflated expectations regarding Trump's prospects. And I'm even more confident than I was early this morning that Rubio faces a humbling and humiliating loss in his home state of Florida, though he may have no option now but to soldier on.

Tuesday, March 01, 2016

Super Tuesday

It's Super Tuesday, and if the polls and prediction markets aren't completely off base, Donald J. Trump is heading for a significant and perhaps insurmountable delegate lead in the contest for the Republican nomination. According to PredictIt, he is heavily favored to win all states except Texas, in which Cruz continues to have an edge. His likelihood of winning exceeds 90% in Virginia, Georgia, Oklahoma, Massachusetts, Vermont, Alabama, Tennessee and Alaska. He is also favored to win Arkansas and Minnesota, though there is somewhat less certainty about these.

The forecasts at FiveThirtyEight, based on polls and fundamentals, are a bit less skewed but tell a similar story.

The conventional wisdom seems to be that this is good news for the Democrats. For example:

This seems very premature, and is quite inconsistent with prediction market prices. Currently Trump is given an 83% chance of securing the nomination and a 38% chance of winning it all:




His probability of winning conditional on being nominated is accordingly not far below one-half.

Are the markets completely wrong or are pundits and prognosticators missing something important?

It seems to me that a major political realignment is underway in America. The press has focused on prominent Republicans who could not support Trump under any circumstances, such as Senator Sasse of Nebraska. Some of these will sit out the election or look for a third option; some may consider crossing over. But there will also be crossover votes in the other direction:
Nearly 20,000 Bay State Democrats have fled the party this winter, with thousands doing so to join the Republican ranks, according to the state’s top elections official. Secretary of State William Galvin said more than 16,300 Democrats have shed their party affiliation and become independent voters since Jan. 1, while nearly 3,500 more shifted to the MassGOP ahead of tomorrow’s “Super Tuesday” presidential primary...  The primary reason? Galvin said his “guess” is simple: “The Trump phenomenon,” a reference to GOP frontrunner Donald Trump, who polls show enjoying a massive lead over rivals Marco Rubio, Ted Cruz and others among Massachusetts Republican voters.
This phenomenon is unlikely to change the outcome in Massachusetts come November, but it could be enough to affect New Jersey, Pennsylvania or Ohio. In any case, the traditional lines between red and blue states are going to become increasingly blurred, with highly unpredictable net effects.

Perhaps the prediction markets are wrong on this point, skewed and shifted by Trump enthusiasts. I would certainly prefer it if that were the case. But I suspect that the prediction market crowd is on to something, and there is peril in ignoring it.

---

Update (March 2). The results are in the books, with Trump winning seven of the predicted ten, losing Oklahoma and Alaska to Cruz and Minnesota to Rubio. Cruz won his home state as predicted. Since the markets systematically overestimated Trump's performance, the results should have lowered his odds in both the nominee and the presidential winner markets. And indeed this is what happened:


But here's the thing. Trump's odds of winning the presidency conditional on being nominated did not decline, consistent with the argument I made above. And since he remains the overwhelming favorite for the nomination, it's worth keeping this in mind.

Sunday, January 10, 2016

College Sports and Deadweight Loss

The amount of money generated by college sports is staggering: broadcast rights alone are worth over a billion dollars annually, and this doesn't include tickets sales for live events, revenue from merchandise, or fees from licensing. But the athletes on whose talent and effort the entire enterprise is built get very little in return. As Donald Yee points out in a recent article, these athletes are "making enormous sums of money for everyone but themselves." Even the educational benefits are limited, with "contrived majors" built around athletic schedules and terribly low graduation rates.

Since colleges cannot compete for athletes by bidding up salaries, they compete in absurd and enormously wasteful ways:
Clemson’s new football facility will have a miniature-golf course, a sand volleyball pit and laser tag, as well as a barber shop, a movie theater and bowling lanes. The University of Oregon had so much money to spend on its football facility that it resorted to sourcing exotic building materials from all over the world.
The benefit that athletes (or anyone else for that matter) derives from exotic building materials used for this purpose are negligible in relation to the cost. Only slightly less wasteful are the bowling lanes and other frills at the Clemson facility. The intended beneficiaries would be much better off if they were to receive the amounts spent on these excesses in the form of direct cash payments. This squandering of resources is what economists refer to as deadweight loss.

But are competitive salaries really the best alternative to the current system? I think it's worth thinking creatively about compensation schemes that could provide greater monetary benefits to athletes while also improving academic preparation more broadly. Here's an idea. Suppose that athletes are paid competitive salaries but (with the exception of an allowance to cover living expenses) these are held in escrow until successful graduation. Upon graduation the funds are divided, with one-half going to the athlete as taxable income, and the rest distributed on a pro-rata basis to each primary and secondary school attended by the athlete prior to college. A failure to graduate would result in no payments to schools, and a reduced payment to the athlete.

This would provide both resources and incentives to improve academic preparation as well as athletic development at schools. Those talented few who make it to the highest competitive levels in college sports would clearly benefit, since their compensation would be in cash rather than exotic building materials. But the benefits would extend to entire communities, and link academic and athletic performance in a manner both healthy and enduring. It's admittedly a more paternalistic approach than pure cash payments, but surely less paternalistic than the status quo.

Monday, January 04, 2016

The Order Protection Rule

The following is a lightly edited version of my comment letter to the SEC in reference to the application by IEX to register as a national securities exchange. Related issues were discussed in a couple of earlier posts on intermediation in fragmented markets and in this piece on spoofing in an algorithmic ecosystem. 

---

In 1975, Congress directed the SEC, “through enactment of Section 11A of the Exchange Act, to facilitate the establishment of a national market system to link together the multiple individual markets that trade securities.” A primary goal was to assure investors that “that they are participants in a system which maximizes the opportunities for the most willing seller to meet the most willing buyer.”

To implement this directive the SEC instituted Regulation NMS, a centerpiece of which is the Order Protection Rule:
The Order Protection Rule (Rule 611 under Regulation NMS) establishes intermarket protection against trade-throughs for all NMS stocks. A trade-through occurs when one trading center executes an order at a price that is inferior to the price of a protected quotation, often representing an investor limit order, displayed by another trading center…. strong intermarket price protection offers greater assurance, on an order-by-order basis, that investors who submit market orders will receive the best readily available prices for their trades. 
To a layperson, the common sense meaning of a National Market System and the Order Protection Rule is that an arriving marketable order (say Order A) should be matched with the best readily available price in the market as a whole, before any order that is placed after Order A has made first contact with the market begins to be processed.

Given the large number of trading venues now in operation and the speeds at which communication occurs, it is important to be very clear about what these terms mean. If a marketable order arrives at an exchange, is partially filled, and then routed to another exchange, there will be a small gap in time before the second exchange receives what is left of the order. It is technologically possible for a third party to observe the first trade (either because they are a counterparty to it or have access to the data generated by it) and to act upon this information by sending orders to other exchanges. These may be orders to trade or to cancel, and may arrive at other exchanges before the first order has been fully processed.

Should these new orders, placed after Order A has made first contact with the market, be given priority over Order A in interacting with resting orders at other exchanges? It seems to me that the plain meaning of Congress’ directive and the order protection rule says that they should not.

IEX’s proposed design prevents this kind of event from taking place by delaying the dissemination of information generated by Order A’s first contact with the market until enough time has elapsed for the order to be fully processed. This brings the market closer to the national system envisaged by Congress, and indeed by the SEC itself.

It appears that the following example in a comment letter by Hudson River Associates, while submitted as an objection to the IEX application, actually supports this interpretation:
Example 3: IEX BD Router – IEX bypasses the POP allowing it beat a member to another exchange
  • Member C has an order to buy at 10.00 resting on IEX. 
  • IEX has a routable sell order that fully executes Member C’s buy interest on IEX. 
  • When executed, Member C decides to update its buy order prices on another exchange from 10.00 to 9.99. 
  • The POP would delay Member C’s execution information by 350 microseconds. As a result, although Member C’s buy order on IEX has been executed, it does not know this for at least 350 microseconds. 
  • Before Member C is informed of its buy order execution, the IEX BD Router sends an order to the other exchange to execute against Member C’s buy order at 10.00 on the other exchange. 
  • Since Member C was not informed of its execution on IEX, its order at 10.00 on the other exchange is executed by the IEX BD Router before Member C can update the price to 9.99. 
This example refers to cancellation, but there is nothing to prevent Member C from placing marketable sell orders at 10.00 that trade ahead of the routable order. In either case, liquidity that was “readily available” when the routable sell order made first contact with the market is removed before this order has been fully processed.

What the author of this letter appears to want is that Member C should be able to place an order (to cancel or trade) after the routable sell order has made first contact with the market, and to have these orders interact with the market before the routable sell order has been fully processed. This kind of activity is currently permitted by the SEC, but to me seems to clearly violate the spirit if not the letter of Congress’ directive.

The design proposed by IEX, by preventing orders from trading out of sequence (measured with respect to first contact with the market) would bring the system closer to that envisaged by Congress. In a true national market system with multiple exchanges, each order would receive a timestamp marking its first contact with the market, and no order would begin to be executed until all orders with earlier timestamps had been fully processed. In making a determination on the IEX application, I would urge the commission to consider whether approval would bring the system closer to this ideal. And indeed, to think further about what other changes to the rules governing market microstructure would also achieve the same goal.

Friday, October 16, 2015

Threats Perceived When There Are None

Sendhil Mullainathan is one of the most thoughtful people in the economics profession, but he has a recent piece in the New York Times with which I really must take issue.

Citing data on the racial breakdown of arrests and deaths at the hands of law enforcement officers, he argues that "eliminating the biases of all police officers would do little to materially reduce the total number of African-American killings." Here's his reasoning:
According to the F.B.I.’s Supplementary Homicide Report, 31.8 percent of people shot by the police were African-American, a proportion more than two and a half times the 13.2 percent of African-Americans in the general population... But this data does not prove that biased police officers are more likely to shoot blacks in any given encounter...

Every police encounter contains a risk: The officer might be poorly trained, might act with malice or simply make a mistake, and civilians might do something that is perceived as a threat. The omnipresence of guns exaggerates all these risks.

Such risks exist for people of any race — after all, many people killed by police officers were not black. But having more encounters with police officers, even with officers entirely free of racial bias, can create a greater risk of a fatal shooting.

Arrest data lets us measure this possibility. For the entire country, 28.9 percent of arrestees were African-American. This number is not very different from the 31.8 percent of police-shooting victims who were African-Americans. If police discrimination were a big factor in the actual killings, we would have expected a larger gap between the arrest rate and the police-killing rate.

This in turn suggests that removing police racial bias will have little effect on the killing rate. 
A key assumption underlying this argument is that encounters involving genuine (as opposed to perceived) threats to officer safety arise with equal frequency across groups. To see why this is a questionable assumption, consider two types of encounters, which I will call safe and risky. A risky encounter is one in which the confronted individual poses a real threat to the officer; a safe encounter is one in which no such threat is present. But a safe encounter might well be perceived as risky, as the following example of a traffic stop for a seat belt violation in South Carolina vividly illustrates:




Sendhil is implicitly assuming that a white motorist who behaved in exactly the same manner as Levar Jones did in the above video would have been treated in precisely the same manner by the officer in question, or that the incident shown here is too rare to have an impact on the aggregate data. Neither hypothesis seems plausible to me.

How, then, can one account for the rough parity between arrest rates and the rate of shooting deaths at the hands of law enforcement? If officers frequently behave differently in encounters with black civilians, shouldn't one see a higher rate of killing per encounter? 

Not necessarily. To see why, think of the encounter involving Henry Louis Gates and Officer James Crowley back in 2009. This was a safe encounter as defined above, but may not have happened in the first place had Gates been white. If the very high incidence of encounters between police and black men is due, in part, to encounters that ought not to have occurred at all, then a disproportionate share of these will be safe, and one ought to expect fewer killings per encounter in the absence of bias. Observing parity would then be suggestive of bias, and eliminating bias would surely result in fewer killings.

In justifying the termination of the officer in the video above, the director of the South Carolina Department of Public Safety stated that he "reacted to a perceived threat where there was none."  Fear is a powerful motivator, and even when there are strong incentives not to shoot, it is still a preferable option to being shot. This is why stand-your-ground laws have resulted in an increased incidence of homicide, despite narrowing the very definition of homicide to exclude certain killings. It is also why homicide is so volatile across time and space, and why staggering racial disparities in both victimization and offending persist.

None of this should detract from the other points made in Sendhil's piece. There are indeed deep structural problems underlying the high rate of encounters, and these need urgent policy attention. But a careful reading of the data does not support the claim that "removing police racial bias will have little effect on the killing rate." On the contrary, I expect that improved screening and better training, coupled with body and dashboard cameras, will result in fewer officers reacting to a perceived threat when there is none.

---

Update (10/18). I had a useful exchange of emails with Sendhil yesterday. I think that we both care deeply about the issue and are interested in getting to the truth, not in scoring points. But there's no convergence in positions yet. Here's an extract of my last to him (I'm posting it because it might help clarify the argument above):
Definitely you can easily make sense of the data without bias. The question is whether this is the right inference, given what we know about the processes generating encounters.

Suppose (for the sake of argument) that whites have encounters with police only if they are engaging in some criminal activity, while blacks sometimes have encounters with police when they are completely innocent. This need not be due to police bias: it could be because bystanders are more likely to think blacks are up to no good for instance (Gates and Rice come to mind).

Suppose further that those engaging in criminal activity are threats to the police with some probability, and this is independent of offender race. The innocents are never threats to the police. But cops can't tell black innocents from black criminals, so end up killing blacks and whites at the same rate per encounter. If they could tell them apart, blacks would be killed at a lower rate per encounter. What I mean by bias is really this inability to distinguish; to see threats when none are present. 

I believe that black cops are less likely than white cops to perceive an encounter with an innocent as threatening. If a suspect looks like your cousin, or a guy you sit beside to watch football on Sundays, you are less likely to see him as a threat when he is not. That's why I asked you in Cambridge whether you had data on officer race in killings - when the victim is innocent the officer seems invariably to be white. So a first very rough test of bias would be whether innocents are killed at the same rate by black and white officers...

I've found the twitter reaction to your post a bit depressing, because better selection, training and video monitoring are really urgent needs in my opinion, and the absence-of-bias narrative can feed complacency about these. I know that was far from your intention, and you are extremely sympathetic to victims of police (and other) violence. You also have a responsibility to speak out on the issue, given your close scrutiny of the data. But I do believe that the inference you've made about the likely negligible effects of eliminating police bias are not really supported by the evidence presented. That, and the personal importance of the issue to me, compelled me to write the response.
---

Update (10/19).  This post by Jacob Dink is worth reading. Jacob shows that the likelihood of being shot by police conditional on being unarmed is twice as high for blacks relative to whites. The likelihood is also higher conditional on being armed, but the difference is smaller:


This, together with the fact that rates of arrest and killing are roughly equal across groups, implies that blacks are less likely to be armed than whites, conditional on an encounter. In the absence of bias, therefore, the rate of killing per encounter should be lower for blacks, not equal across groups. So we can't conclude that "removing police racial bias will have little effect on the killing rate." That was the point I was trying to make in this post. 

---

Update (10/21). Andrew Gelman follows up. The link above to Jacob Dink's post seems to be broken and I can't find a cached version. But there's a post by Howard Frant from earlier this year that makes a similar point.

Tuesday, September 29, 2015

The Price Impact of Margin-Linked Shorts

The real money peer-to-peer prediction market PredictIt just made a major announcement: they plan to margin-link short positions. This will lead to an across-the board decline in the prices of many contracts, especially in the two nominee markets. Given that the prices in these markets are already being referenced by the campaigns, this change could well have an impact on the race.

What margin-linking short positions does is to make it substantially cheaper to bet simultaneously against multiple candidates. Instead of a trader's worst-case loss being computed separately for each position, it is computed based on the recognition that only one candidate can eventually win. So a bet against both Bush and Rubio ought to require less cash than a bet against just one of the two, since we know that a loss on one bet implies a win on the other.

In an earlier post I argued that a failure to margin-link short positions was a design flaw that results in artificially inflated prices for all contracts in a given market, making the interpretation of these prices as probabilities untenable. The problem can be seen by looking at some of the current prices in the GOP nominee market:


The "Buy No" column tells us the price per contract of betting against a candidate for the nomination, with each contract paying out a dollar if the named individual fails to secure the nomination. One could buy five of these contracts (Rubio, Bush, Trump, Fiorina, and Carson) for a total of $3.91, and even of one of these were to win, the payoff from the bet would be $4. If, on the other hand, Cruz or Kasich were to be nominated, the bet would pay $5. There is no risk of loss involved.

Margin-linking shorts recognizes this fact, and would make this basket of five bets collectively cost nothing at all. This would be about as pure an arbitrage opportunity as one is likely to find in real money markets. Aggressive bets would be placed on all contracts simultaneously, with consequent price declines.

A useful effect of this change in design is that manipulating the market becomes much harder. Buying contracts to push up a price would be met by a wall of resistance as long as the sum of all contract prices yields an opportunity for arbitrage. To sustain manipulation would require a trader not only to put a floor on the favored contract, but a ceiling on all others. This has been done before, but would be considerably more costly than under the current market design.

I'd be interested to see which prices are affected most as the transition occurs, and how much prices move in anticipation of the change. But no matter how the aggregate decline is distributed across contracts, this example illustrates one important fact about financial markets in general: prices depend not just on beliefs about the likelihood of future events, but also on detailed features of market design. Too uncritical an acceptance of the efficient markets hypothesis can lead us to overlook this somewhat obvious but quite important point.

Wednesday, April 22, 2015

Spoofing in an Algorithmic Ecosystem

A London trader recently charged with price manipulation appears to have been using a strategy designed to trigger high-frequency trading algorithms. Whether he used an algorithm himself is beside the point: he made money because the market is dominated by computer programs responding rapidly to incoming market data, and he understood the basic logic of their structure.

Specifically, Navinder Singh Sarao is accused of having posted large sell orders that created the impression of substantial fundamental supply in the S&P E-mini futures contract:
The authorities said he used a variety of trading techniques designed to push prices sharply in one direction and then profit from other investors following the pattern or exiting the market.

The DoJ said by allegedly placing multiple, simultaneous, large-volume sell orders at different price points — a technique known as “layering”— Mr Sarao created the appearance of substantial supply in the market.
Layering is a type of spoofing, a strategy of entering bids or offers with the intent to cancel them before completion.
Who are these "other investors" that followed the pattern or exited the market? Surely not the fundamental buyers and sellers placing orders based on an analysis of information about the companies of which the index is composed. Such investors would not generally be sensitive to the kind of order book details that Sarao was trying to manipulate (though they may buy or sell using algorithms sensitive to trading volume in order to limit market impact). Furthermore, as Andrei Kirilenko and his co-authors found in a transaction level analysis, fundamental buyers and sellers account for a very small portion of daily volume in this contract.

As far as I can tell, the strategies that Sarao was trying to trigger were high-frequency trading programs that combine passive market making with aggressive order anticipation based on privileged access and rapid responses to incoming market data. Such strategies correspond to just one percent of accounts on this exchange, but are responsible for almost half of all trading volume and appear on one or both sides of almost three-quarters of traded contracts.

The most sophisticated algorithms would have detected Sarao's spoofing and may even have tried to profit from it, but less nimble ones would have fallen prey. In this manner he was able to syphon off a modest portion of HFT profits, amounting to about four forty million dollars over four years.

What is strange about this case is the fact that spoofing of this kind is, to quote one market observer, as common as oxygen. It is frequently used and defended against within the high frequency trading community. So why was Sarao singled out for prosecution? I suspect that it was because his was a relatively small account, using a simple and fairly transparent strategy. Larger firms that combine multiple strategies with continually evolving algorithms will not display so clear a signature. 

It's important to distinguish Sarao's strategy from the ecology within which it was able to thrive. A key feature of this ecology is the widespread use of information extracting strategies, the proliferation of which makes direct investments in the acquisition and analysis of fundamental information less profitable, and makes extreme events such as the flash crash practically inevitable.