Enhancing Preference-based Linear Bandits via Human Response Time

Li, Shen; Zhang, Yuyang; Ren, Zhaolin; Liang, Claire; Li, Na; Shah, Julie A.

Computer Science > Machine Learning

arXiv:2409.05798 (cs)

[Submitted on 9 Sep 2024]

Title:Enhancing Preference-based Linear Bandits via Human Response Time

Authors:Shen Li, Yuyang Zhang, Zhaolin Ren, Claire Liang, Na Li, Julie A. Shah

View PDF HTML (experimental)

Abstract:Binary human choice feedback is widely used in interactive preference learning for its simplicity, but it provides limited information about preference strength. To overcome this limitation, we leverage human response times, which inversely correlate with preference strength, as complementary information. Our work integrates the EZ-diffusion model, which jointly models human choices and response times, into preference-based linear bandits. We introduce a computationally efficient utility estimator that reformulates the utility estimation problem using both choices and response times as a linear regression problem. Theoretical and empirical comparisons with traditional choice-only estimators reveal that for queries with strong preferences ("easy" queries), choices alone provide limited information, while response times offer valuable complementary information about preference strength. As a result, incorporating response times makes easy queries more useful. We demonstrate this advantage in the fixed-budget best-arm identification problem, with simulations based on three real-world datasets, consistently showing accelerated learning when response times are incorporated.

Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Econometrics (econ.EM); Machine Learning (stat.ML)
Cite as:	arXiv:2409.05798 [cs.LG]
	(or arXiv:2409.05798v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2409.05798

Submission history

From: Shen Li [view email]
[v1] Mon, 9 Sep 2024 17:02:47 UTC (8,166 KB)

Computer Science > Machine Learning

Title:Enhancing Preference-based Linear Bandits via Human Response Time

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Enhancing Preference-based Linear Bandits via Human Response Time

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators