
Economics
Revealing Choice Bracketing
A. Ellis and D. J. Freeman
Discover how our choices may be more interconnected than we think! Groundbreaking research by Andrew Ellis and David J. Freeman reveals that a significant portion of individuals favors narrow bracketing in decision-making, challenging the common assumption. Get ready to explore innovative experiments that uncover the nuances in how we allocate portfolios, engage socially, and shop!
Playback language: English
Introduction
Individuals face many interconnected decisions. How an individual takes into account the interdependencies when choosing, or how they bracket these choices, significantly influences their decision-making process. Bracketing determines which outcomes are evaluated as gains or losses and fair or unfair, and also plays a role in measuring parameters like risk aversion. Nearly every behavioral model and most "rational" ones require some assumption about how people bracket choices. There are many ways to bracket. Optimal decision-making requires that people broadly bracket, considering every feasible combination of choices and selecting the best. The most common alternative is that people narrowly bracket by making each decision without considering any interdependencies. However, these two extremes are far from exhaustive. For instance, Barberis et al. (2006) and Rabin & Weizsäcker (2009) propose a hybrid of the two called partial-narrow bracketing. Most experimental evidence interpreted as being for narrow bracketing is actually evidence against broad bracketing. This evidence, surveyed in Section 2, comes mainly from studies that follow a similar design to Tversky & Kahneman (1981, Problem 3) or Kahneman & Tversky (1979, Problems 11-12). In the former, each subject makes two concurrent choices, and one pair of choices generates a distribution over outcomes dominated by another pair. They find that many subjects choose the dominated pair. In the latter, two groups of subjects face choices between lotteries that are economically identical but differ in how payments are divided between an endowed income and an active choice. The two groups make different choices. Both designs provide evidence against broad bracketing. However, narrow bracketing makes no testable predictions in either design: any choices are consistent with it. This leaves open the question of whether narrow bracketing is a good description of behavior. We propose a theoretical framework and experimental design to test how an individual brackets.
Literature Review
Most existing direct evidence of non-broad choice bracketing is derived from Tversky & Kahneman (1981), in which subjects face the decision problem described in Table 1. This design fits into our theoretical setting as a single-decision dataset with B1,1 = {A, B} and B1,2 = {C, D}. The choice combination (A, D) made by 73% of their subjects generates a first-order stochastically-dominated distribution over outcomes compared to the pair of choices (B, C) and so violates BB-Mon and BB-SARP. Their design cannot falsify narrow bracketing since every combination of choices satisfies NB-SARP. Without further restrictions, their results are uninformative about the choice bracketing of subjects who do not choose A and D. Yet about 70% of subjects made non (A, D) choices in incentivized follow-up experiments (Rabin & Weizsäcker, 2009; Koch & Nafziger, 2019). A related set of experiments considers how an exogenously fixed quantity, like a monetary endowment, an asset, or a background risk, affects a person's choice when separated from the description of available alternatives. For instance, studies suggest subjects behave as if they do not incorporate their endowment in risk-taking choices (Kahneman & Tversky, 1979, Problems 11–12), in social allocation tasks (Exley & Kessler, 2018), and in labor supply choices (Fallucchi & Kaufmann, 2021). These designs fit into our theoretical framework as decisions in which one part is a singleton set (the endowment) and one part a non-singleton set (the active choice), and these papers provide evidence against broad bracketing. The final body of evidence shows how failures of fungibility affect choice (Thaler, 1999). Studies of the “flypaper effect” suggest that transfers earmarked for a particular type of spending tend to actually be spent there (Hines & Thaler, 1995; Abeler & Marklein, 2017). Empirical evidence from consumption choices after unexpected price changes support the lack of fungibility (Hastings & Shapiro, 2013, 2018). This evidence has a more complicated relationship with our setting. Thaler (1985; 1999) and others (Galperti, 2019; Kőszegi & Matějka, 2020) explain the evidence through mental accounting (or budgeting). At its most general, this refers to a person breaking up the overall decision into categories. In contrast, narrow choice bracketing is a failure to aggregate smaller parts into a larger decision. These are not mutually exclusive. If categories and parts coincide, our model of narrow choice bracketing provides an extreme model of mental accounting. But in general, mental accounting is consistent with either broadly-bracketing across parts (Thaler, 1985, p 207) or with narrowly bracketing each part (e.g. as described by Corollary 1 of Kőszegi & Matějka, 2020). Choice bracketing is also relevant in dynamic settings. Gneezy & Potters (1997) show that subjects make different choices period-by-period than when they must choose in advance, evidence of non-broad bracketing. In contrast, recent evidence from Heimer et al. (2020) (building on Barberis (2012); Ebert & Strack (2015)) suggests that subjects behave as if they do not narrowly bracket future risk-taking opportunities. In financial decision-making, Shefrin & Statman (1985) and Odean (1998) document the disposition effect, and Thaler & Johnson (1990) and Imas (2016) document the house money effect. Either effect is inconsistent with both fully narrow and fully broad bracketing.
Methodology
We design and conduct three experiments to test the models of bracketing in different domains of choice. In each experiment, a participant faces five decision rounds, each consisting of one or two parts. Each part consists of all feasible integer-valued bundles of two goods obtained from a linear (or in one case, a piece-wise linear) budget set. At the end of the experiment, exactly one round is randomly selected for payment, which we call the "round that counts". We sum all goods purchased in all parts of the decision in the round that counts to obtain the final bundle that determines payments. By design, there are no complementarities across decisions. We implement this experimental design to study choice bracketing in three domains of interest: portfolio choice under risk (Risk), a social allocation task (Social), and a consumer choice experiment in which we induced subjects' values (Shopping). In the Risk Experiment, each part of every decision asks the subject to choose an integer allocation of tokens between two assets. Each asset pays off on only one of two equally likely states: Asset A (or C) pays out only on a die roll of 1-3 whereas Asset B (or D) pays out only on a die roll of 4-6. The payoff of each asset varies across decision problems and across parts. Because each decision problem uses assets with two equally likely states, preferences over portfolios of monetary payoffs for each state should be symmetric across states. In the Social Experiment, each part of every decision asks the subject to choose an integer allocation of tokens between two anonymous other subjects, Person A and Person B. The value of each token to A and B varies across decision problems and across parts. Because the two recipients are anonymous, we expect preferences to be symmetric across money allocated to A versus B. In the Shopping Experiment, each part of every decision asks the subject to choose a bundle of integer quantities of fictitious "apples" and "oranges" subject to a budget constraint. The monetary payment for the experiment is calculated from the final bundle in the round that counts according to the function pay = 2/5(√#apples + √#oranges). This function induces payoffs that are symmetric in apples and oranges and that are strictly variety-seeking. Any subject who prefers more money to less will wish to maximize this payoff function regardless of their underlying utility function. Our experimental budget sets, summarized in Table 2, allow us to conduct our revealed preference tests in the Risk and Social Experiments, and to conduct analogous tests that make use of the induced value function in our Shopping Experiment. We implemented our experiments on paper and followed up with some robustness treatments online (discussed further in Section 5 and Online Supplement H). In each session, paper instructions were provided and read aloud, subjects were given the opportunity to ask questions privately, and then participants completed a brief comprehension quiz and the experimenter individually checked answers and explained any errors. The comprehension quiz had each subject calculate how payment would be determined from their allocations when a decision involves two parts. This was designed to ensure that each subject was aware of how to calculate payments in these cases, but without instructing them to consider all possible combinations of allocations across parts. Each decision had a cover page indicating the number of parts in the decision, with each part stapled beneath as a separate page. Thus, a subject was always informed when a decision contained multiple parts, but could choose whether or not to look at both parts before making allocations. A subject indicated each allocation by highlighting the line corresponding to their choice for each part using a provided highlighter. Subjects were allowed no other aids at their desk when making choices. Only one decision was handed out to a subject at a time, and that decision was collected before the next round was handed out. The order of decisions, and of parts within each decision, was varied across sessions to allow us to test and control for order effects. Sessions took place in Toronto Experimental Economics Lab and SFU Experimental Economics Lab from June 2019 to February 2020, each taking place in a one hour block. Subjects were recruited from the labs' student participant pools to participate in one of the three experiments. Instructions, experimental materials, and details of the experimental procedure are provided in Online Supplement G.
Key Findings
This section reports the results of our experimental tests of the models of bracketing. For each test, we also compute results allowing for one or two "errors" relative to its requirements. We define an error as how far we would need to move a subject's allocations for them to pass that test, measured in lines on the decision sheet(s). For instance, in Risk and Social, a subject is within one error of passing a test if revising the choice by shifting one token from one asset/person to the other in a single part would lead to choices that pass that test. They are within two errors of passing a test if shifting two tokens from one asset/person to the other, either in the same part or in different parts, would lead to choices that pass. For predictions that require Walrasian budget sets, we modify the tests to account for discreteness in our experiment. In Section 5, we examine the robustness of the following analysis to two particular concerns. First, we argue our results are robust to changes in the presentation of the decision. Second, broad bracketing requires allocating all the budget to one good in two-part decisions, and earlier work suggests subjects may be loath to do so. We argue there that neither affects our conclusions much. We begin by performing the direct revealed preference tests of bracketing developed in Section 1: NB-WARP, BB-WARP, and BB-Mon (Table 3). Very few subjects are consistent with rationality and broad bracketing. There is only a single pair of decisions (D1 and D2) where choices could directly violate BB-WARP. For that pair, we test BB-WARP by comparing the final bundle for D1 to the final bundle for D2: for any choice of xA2.1 in D2 with xA2.1 ≤ 8, the same final bundle can be achieved in D1. In each of Risk and Social, only 20% of subjects are within one error of passing BB-WARP. Even fewer subjects are consistent with BB-Mon than with BB-WARP. In Risk and Social respectively, we find that 8% and 12% of subjects are within one error of passing BB-Mon in both decisions. Looking separately at D1 and D3, between 13% - 17% of subjects are within one error of passing BB-Mon. All told, the BB-WARP and BB-Mon tests provide evidence showing that 80%-92% of subjects are not broad bracketers. These rates of violations of broad bracketing are qualitatively similar to, but higher than, those found by Tversky & Kahneman (1981) (73%) and Rabin & Weizsäcker (2009) (28%-66%), and very close to the structural estimates of the latter (89%). In these prior experiments, each part consisted of a pairwise choice, so failures to broadly bracket are detected only for a particular range of risk preferences. In contrast, there are many ways a subject could reveal their failure to bracket broadly in our experiments, which gives us more power to detect failures. While previous work can only falsify broad bracketing, our design allows us to test narrow bracketing as well. We test NB-WARP by comparing the allocations in each of the two parts that appear in multiple decisions. Specifically, NB-WARP requires that a subject makes the same choice in D1.1, D3.2, and D5, and the same choice in D1.2 and D4. Far more subjects pass each NB-WARP test than either BB-WARP or BB-Mon. Between 75-77% of subjects in Risk and 69-81% of subjects in Social are within one error of passing each of the pairwise NB-WARP tests. Allowing for one error, 44% and 53% of subjects pass all possible NB-WARP tests in Risk and Social, respectively. We next conduct our revealed preference tests of the three models considered using the entire set of decisions for each subject. Notice that in both experiments, the alternative (x, y) leads to an identical outcome as the alternative (y, x). As discussed in Section 1, we extend the tests so that whenever (a, b) ≻D (x, y), i.e., (a, b) is directly revealed preferred to (x, y) in the dataset D, we also have (a, b) ≻D (y, x), as well as (b,a) ≻≻D (x, y) and (b,a) ≻≻D (y,x). This reduces the need to compare across decisions and makes our tests more demanding. For example, all three tests make point predictions in D2 and D4. Random behavior has a 0.001% chance or less of passing either BB- or NB-SARP with one error, and a 0.3% chance of passing the PNB Algorithm with one error (see Appendix C). Table 4 shows how many subjects pass each test. We compare results of the tests allowing for up to one error. We find that no subjects pass BB-SARP in Risk, and only 10% of subjects pass it in Social. However, 34-35% of subjects in each experiment pass NB-SARP. While fewer subjects pass BB-SARP than either BB-WARP or BB-Mon, a similar number of subjects pass all NB-WARP restrictions with one error and the more demanding NB-SARP with two errors. These results show that a plurality of our subjects are well-described as narrow bracketers. Our test of partial-narrow bracketing diagnoses how many of those who fail BB-SARP and NB-SARP behave consistently with intermediate degrees of bracketing. We find that 15% of subjects in Risk and 12% of subjects in Social pass the PNB test but neither BB-SARP nor NB-SARP when allowing for one error. Only 4% of subjects in Risk and 9% of subjects in Social pass the PNB-PE test but not the PNB test. The tests of our predictions thus far assume that utility is not observed. When utility is known, as in our Shopping Experiment, narrow and broad bracketing each make unique predictions in each decision. To test the models, we compare how far each subject's choices are from each model's predictions (Table 5). Testing the point predictions of narrow bracketing in each of Decisions 1 and 3, 23% and 64% of subjects are respectively within one error of the predictions of narrow-bracketing, while 21% are consistent in both. Allowing for two errors raises pass rates to 59% for Decision 1 and 49% for both. Looking at the full set of implications of narrow-bracketing on all choices made in the experiment, 40% of subjects are within two errors of passing. In contrast, 21%, 24%, and 14% of subjects are within one error of being consistent with broadly-bracketed maximization in Decisions 1, 3, and both, respectively. When allowing for two errors, those numbers remain similar. Using all decisions in the experiment, only 16% of subjects are within two errors of being consistent with all implications of broad-bracketed maximization. PNB describes less than 2% of the remaining subjects. Since we induce the payoff function, we are able to compute the value of α that best fits a subjects behavior (up to the limits imposed by discretization of the budget sets) under the assumption that the induced value function acts as their utility function. To that end, we compute the point predictions of the partial-narrow bracketing model for each α ∈ {0, 0.01, 0.02, ..., 0.99, 1} for Decisions 1 and 3, and obtain distinct predictions for nine intervals of α. We assign each subject to the range of α for which their choices exhibit the fewest errors relative to that range's predictions. We find that 64% of subjects are classified to a range that includes full narrow bracketing, α = 1, and 25% are classified to a range that includes full broad bracketing, α = 0. Of the remaining subjects, none are best described by α ∈ [0.25, 0.71]. This suggests that even those subjects who are not exactly described by either broad or narrow bracketing are close. The tests thus far do not make any adjustment for the fact that partial-narrow bracketing nests narrow and broad bracketing as polar cases, and can thus accommodate more behavior. To compare the predictive success of each model at the subject level, we use a subject-level implementation of the Selten score (Selten, 1991; Beatty & Crawford, 2011). For each subject and each model (symmetric versions for Risk and Social, using the induced value function for Shopping), we calculate the number of errors the subject exhibits relative to that model. Then, we calculate the number of possible choice combinations in the experiment that are consistent with that model and that number of errors – the model-error pair’s “predictive area”. We divide the predictive area by the total number of possible combinations of choices in the experiment to compute the measure for each subject i and model m ∈ {broad, narrow, PNB, PNB-PE} as predictive_successi,m = 1 – #predictive area for i, m/#all possible choices. We use all choices made in the experiment to assign each subject to the model with the highest predictive success; in cases where every rationalizing model-error pair for a subject would rationalize more than one million possible combinations of choices in our experiment, we categorize them as “Unclassified”.
Discussion
Across the three experiments, we classify 67-78% of subjects as narrow bracketers (Table 6). In contrast, 2%, 10%, and 27% of subjects are classified as broad bracketers in the Risk, Social, and Shopping Experiments respectively. However, 7%, 6%, and 5% were classified to one of the two partial-narrow bracketing models. After adjusting for predictive power, partial-narrow bracketing does not help explain very many subjects' behavior. To study how choice architecture mediates bracketing, we conducted an online version of our Risk Experiment that varied the presentation in a two-by-two design. First, the "Examine" treatment instructed the subject to "First, examine both accounts, then purchase investments" in the instructions, tested this in a quiz question, then included that text at the top of each two-part decision screen; the "Basic" treatment did not. Second, the "Tabs" treatment presented each part of a decision as a separate HTML tab (analogous to our separate pages in our paper experiments), whereas the "Side-by-Side" treatment presented parts of a decision side-by-side on the same screen (analogous to Tversky & Kahneman 1981). We recruited 200 US-based subjects from Prolific Academic to participate and randomly assigned each to one of the four treatment pairs. We also replicated our Shopping experiment online with 46 subjects from Prolific Academic. For all 46, we implemented the Side-by-Side and Examine interventions, eliminated the quantity-restricted sale in D2, and provided subjects with a calculator instead of a payoff table. We provide detailed screenshots and results in the Online Supplement. We find almost no effect on choices from either of the two interventions in the Online Risk Experiment. Both subjects who pass BB-SARP are in the Examine and Tabs treatment, contrary to our expectation that Side-by-Side would be more conducive to broad bracketing. Neither treatment has a large or statistically significant (with a small sample size caveat) effect on the rate of broad bracketing (p = 0.22, p= 0.24, for Fisher's exact tests of the Examine and Tabs treatments, respectively). Within the Tabs group, Examine does not have a statistically significant effect (p = 0.20, Fisher's exact test). We similarly find no effect of the treatments on the rates of narrow bracketing (p = 1.00 for both, Fisher's exact tests). In addition, the Online Shopping Experiment implemented both Examine and Side-by-Side. It provided subjects with a calculator instead of a payoff table. We classified 5 out of 46 (10.86%) subjects to broad bracketing and 33 (71.74%) to narrow. The rate of narrow bracketing did not change much, while the rate of broad bracketing slightly decreased relative to the pen-and-paper Shopping Experiment (p = 0.03, Fisher's exact test). All-in-all, this suggests that the low rates of broad bracketing we find are not overly sensitive to varying the choice architecture to encourage broad bracketing. We caution against over-interpreting this result since broad bracketing was so rare in all treatments. The shift to an online interface and the Prolific subject pool may have had more effect than any of the nudges. We suspect that more extreme nudges and decision aids might be more effective. Our online experiments shed light on how the choice process, and in particular consideration, differed across subjects. We have two tools for measuring consideration. In the Tabs arm of Online Risk, we observe which tabs subjects clicked and when they made their decision. In Online Shopping, we record how subjects used the calculator. This non-choice data shows that only some narrow bracketers completely ignore other parts of the decision. In both experiments, more than a quarter of narrow bracketers gave some consideration to both parts of the decision. These subjects had enough information to bracket more broadly yet did not. As we describe in Section 1, broad bracketing requires a corner solution in at least one part of any multi-part decision. Evidence from other contexts suggests that some subjects may be extremeness averse and avoid corner choices. For example, subjects in linear public good games tend not to play the Nash strategy of making no contributions. However, in non-linear public good games where the Nash strategy requires a positive but not complete contribution, subjects play the equilibrium strategy more frequently (see e.g. Section 2 of Vesterlund, 2016, for a discussion). Extremeness aversion could affect our conclusions about broad bracketing and its prevalence relative to narrow bracketing. Non-extreme allocations in two-part decisions lead to violations of BB-Mon. However, few additional subjects are consistent with a relaxation of BB-Mon that allows for subjects to be close to, but not necessarily at, a corner when required. This suggests that the effect of extremeness aversion on our tests is probably not too large. In addition, a similar number of subjects are consistent with broad bracketing in decisions that require two corner choices as are consistent in decisions that require only one corner choice. The evidence still suggests heterogeneity in bracketing, with a plurality best described as narrow bracketers, even after adjusting for extremeness aversion.
Conclusion
We propose revealed preference tests for how a person brackets their choices that rely only on monotonicity of underlying preferences. We deploy these tests in an experiment where both narrow and bracketing make falsifiable predictions, unlike in past work. Across our experiments, at least twice as many subjects were classified as narrow bracketers than as broad bracketers. A majority of people tend to narrowly bracket, while a noticeable minority broadly bracket. While many of our subjects are not well-described by either broad or narrow bracketing, our novel tests of partial-narrow bracketing suggest that it does not do much better after adjusting for predictive power. This suggests that applications should calibrate a population mix of broad and narrow bracketers rather than a representative agent model with a calibrated partial-narrow bracketing parameter (as in Barberis & Huang, 2007). Bracketing rates differ across tasks, domains, and subject pools. While our framework is well-suited to detect and to measure these differences, it is less well-suited to determine why these differences persist. Non-choice data, as we collected in the online follow up experiment, can help. It seems to rule out uniform explanations for why so many bracket narrowly, such as lack of awareness of complementarities across parts. We think understanding why people bracket the way they do is an interesting direction for future work.
Limitations
The fraction of subjects classified to broad bracketing varies across experiments, from 1% for Online Risk to 27% in Pen-and-paper Shopping. There is a significant difference in broad bracketing rates between Pen-and-paper Shopping and each of Risk and Social (p < 0.01, Fisher's exact test) as well as between the Online and Pen-and-paper Shopping experiments (p < 0.01). However, narrow bracketing rates varied much less, from 67% to 78% across Pen-and-paper experiments, with no significant differences in these rates (p > 0.10 for all pairwise Fisher's exact tests). Nor was there a significant difference in narrow bracketing rates between the Online and Pen-and-paper versions of Risk and Shopping (p > 0.10 for both Fisher's exact tests). Explanations for the higher rate of broad bracketing in Shopping and lower rate in Risk include the more naturalistic setting, the presence of an objectively-correct payoff function, and cognitive difficulties specific to choice under risk (as suggested by Martínez-Marquina et al. 2019). We cannot distinguish between these explanations.
Related Publications
Explore these studies to deepen your understanding of the subject.