Search | arXiv e-print repository

Bounding-Box Inference for Error-Aware Model-Based Reinforcement Learning

Authors: Erin J. Talvitie, Zilei Shao, Huiying Li, Jinghan Hu, Jacob Boerma, Rory Zhao, Xintong Wang

Abstract: In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically… ▽ More In model-based reinforcement learning, simulated experiences from the learned model are often treated as equivalent to experience from the real environment. However, when the model is inaccurate, it can catastrophically interfere with policy learning. Alternatively, the agent might learn about the model's accuracy and selectively use it only when it can provide reliable predictions. We empirically explore model uncertainty measures for selective planning and show that best results require distribution insensitive inference to estimate the uncertainty over model-based updates. To that end, we propose and evaluate bounding-box inference, which operates on bounding-boxes around sets of possible states and other quantities. We find that bounding-box inference can reliably support effective selective planning. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: To appear: Reinforcement Learning Conference (RLC), 2024

arXiv:2303.06701 [pdf, other]

Composite Sorting

Authors: Job Boerma, Aleh Tsyvinski, Ruodu Wang, Zhenyuan Zhang

Abstract: We propose a new sorting framework: composite sorting. Composite sorting comprises of (1) distinct worker types assigned to the same occupation, and (2) a given worker type simultaneously being part of both positive and negative sorting. Composite sorting arises when fixed investments mitigate variable costs of mismatch. We completely characterize optimal sorting and additionally show it is more p… ▽ More We propose a new sorting framework: composite sorting. Composite sorting comprises of (1) distinct worker types assigned to the same occupation, and (2) a given worker type simultaneously being part of both positive and negative sorting. Composite sorting arises when fixed investments mitigate variable costs of mismatch. We completely characterize optimal sorting and additionally show it is more positive when mismatch costs are less concave. We then characterize equilibrium wages. Wages have a regional hierarchical structure - relative wages depend solely on sorting within skill groups. Quantitatively, composite sorting can generate a sizable portion of within-occupations wage dispersion in the US. △ Less

Submitted 29 August, 2023; v1 submitted 12 March, 2023; originally announced March 2023.

Comments: 81 pages, 26 figures

arXiv:2204.13481 [pdf, other]

Bunching and Taxing Multidimensional Skills

Authors: Job Boerma, Aleh Tsyvinski, Alexander P. Zimin

Abstract: We characterize optimal policies in a multidimensional nonlinear taxation model with bunching. We develop an empirically relevant model with cognitive and manual skills, firm heterogeneity, and labor market sorting. The analysis of optimal policy is based on two main results. We first derive an optimality condition - a general ABC formula - that states that the entire schedule of benefits of taxes… ▽ More We characterize optimal policies in a multidimensional nonlinear taxation model with bunching. We develop an empirically relevant model with cognitive and manual skills, firm heterogeneity, and labor market sorting. The analysis of optimal policy is based on two main results. We first derive an optimality condition - a general ABC formula - that states that the entire schedule of benefits of taxes second order stochastically dominates the entire schedule of tax distortions. Second, we use Legendre transforms to represent our problem as a linear program. This linearization allows us to solve the model quantitatively and to precisely characterize the regions and patterns of bunching. At an optimum, 9.8 percent of workers is bunched both locally and nonlocally. We introduce two notions of bunching - blunt bunching and targeted bunching. Blunt bunching constitutes 30 percent of all bunching, occurs at the lowest regions of cognitive and manual skills, and lumps the allocations of these workers resulting in a significant distortion. Targeted bunching constitutes 70 percent of all bunching and recognizes the workers' comparative advantage. The planner separates workers on their dominant skill and bunches them on their weaker skill, thus mitigating distortions along the dominant skill dimension. Tax wedges are particularly high for low skilled workers who are bluntly bunched and are also high along the dimension of comparative disadvantage for somewhat more skilled workers who are targetedly bunched. △ Less

Submitted 28 April, 2022; originally announced April 2022.

arXiv:2109.02730 [pdf, other]

Sorting with Teams

Authors: Job Boerma, Aleh Tsyvinski, Alexander P. Zimin

Abstract: We fully solve a sorting problem with heterogeneous firms and multiple heterogeneous workers whose skills are imperfect substitutes. We show that optimal sorting, which we call mixed and countermonotonic, is comprised of two regions. In the first region, mediocre firms sort with mediocre workers and coworkers such that the output losses are equal across all these teams (mixing). In the second regi… ▽ More We fully solve a sorting problem with heterogeneous firms and multiple heterogeneous workers whose skills are imperfect substitutes. We show that optimal sorting, which we call mixed and countermonotonic, is comprised of two regions. In the first region, mediocre firms sort with mediocre workers and coworkers such that the output losses are equal across all these teams (mixing). In the second region, a high skill worker sorts with low skill coworkers and a high productivity firm (countermonotonicity). We characterize the equilibrium wages and firm values. Quantitatively, our model can generate the dispersion of earnings within and across US firms. △ Less

Submitted 27 November, 2023; v1 submitted 6 September, 2021; originally announced September 2021.

Showing 1–4 of 4 results for author: Boerma, J