-
Automated Evaluation of Retrieval-Augmented Language Models with Task-Specific Exam Generation
Authors:
Gauthier Guinet,
Behrooz Omidvar-Tehrani,
Anoop Deoras,
Laurent Callot
Abstract:
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG). Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions based on the corpus of documents associated with the task. Our method is an automated, cost-efficient, interpretable, and robust strategy to select the optimal…
▽ More
We propose a new method to measure the task-specific accuracy of Retrieval-Augmented Large Language Models (RAG). Evaluation is performed by scoring the RAG on an automatically-generated synthetic exam composed of multiple choice questions based on the corpus of documents associated with the task. Our method is an automated, cost-efficient, interpretable, and robust strategy to select the optimal components for a RAG system. We leverage Item Response Theory (IRT) to estimate the quality of an exam and its informativeness on task-specific accuracy. IRT also provides a natural way to iteratively improve the exam by eliminating the exam questions that are not sufficiently informative about a model's ability. We demonstrate our approach on four new open-ended Question-Answering tasks based on Arxiv abstracts, StackExchange questions, AWS DevOps troubleshooting guides, and SEC filings. In addition, our experiments reveal more general insights into factors impacting RAG performance like size, retrieval mechanism, prompting and fine-tuning. Most notably, our findings show that choosing the right retrieval algorithms often leads to bigger performance gains than simply using a larger language model.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Interactive Region-of-Interest Discovery using Exploratory Feedback
Authors:
Behrooz Omidvar-Tehrani
Abstract:
In this paper, we propose a geospatial data management framework called IRIDEF which captures and analyzes user's exploratory feedback for an enriched guidance mechanism in the context of interactive analysis. We discuss that exploratory feedback can be a proxy for decision-making feedback when the latter is scarce or unavailable. IRIDEF identifies regions of interest (ROIs) via exploratory feedba…
▽ More
In this paper, we propose a geospatial data management framework called IRIDEF which captures and analyzes user's exploratory feedback for an enriched guidance mechanism in the context of interactive analysis. We discuss that exploratory feedback can be a proxy for decision-making feedback when the latter is scarce or unavailable. IRIDEF identifies regions of interest (ROIs) via exploratory feedback and highlights a few interesting and out-of-sight POIs in each ROI. These highlights enable the user to shape up his/her future interactions with the system. We detail the components of our proposed framework in the form of a data analysis pipeline and present the aspects of efficiency and effectiveness for each component. We also discuss evaluation plans and future directions for IRIDEF.
△ Less
Submitted 29 July, 2021;
originally announced July 2021.
-
Interactive and Explainable Point-of-Interest Recommendation using Look-alike Groups
Authors:
Behrooz Omidvar-Tehrani,
Sruthi Viswanathan,
Jean-Michel Renders
Abstract:
Recommending Points-of-Interest (POIs) is surfacing in many location-based applications. The literature contains personalized and socialized POI recommendation approaches which employ historical check-ins and social links to make recommendations. However these systems still lack customizability (incorporating session-based user interactions with the system) and contextuality (incorporating the sit…
▽ More
Recommending Points-of-Interest (POIs) is surfacing in many location-based applications. The literature contains personalized and socialized POI recommendation approaches which employ historical check-ins and social links to make recommendations. However these systems still lack customizability (incorporating session-based user interactions with the system) and contextuality (incorporating the situational context of the user), particularly in cold start situations, where nearly no user information is available. In this paper, we propose LikeMind, a POI recommendation system which tackles the challenges of cold start, customizability, contextuality, and explainability by exploiting look-alike groups mined in public POI datasets. LikeMind reformulates the problem of POI recommendation, as recommending explainable look-alike groups (and their POIs) which are in line with user's interests. LikeMind frames the task of POI recommendation as an exploratory process where users interact with the system by expressing their favorite POIs, and their interactions impact the way look-alike groups are selected out. Moreover, LikeMind employs "mindsets", which capture actual situation and intent of the user, and enforce the semantics of POI interestingness. In an extensive set of experiments, we show the quality of our approach in recommending relevant look-alike groups and their POIs, in terms of efficiency and effectiveness.
△ Less
Submitted 31 August, 2020;
originally announced September 2020.
-
Exploration of Interesting Dense Regions in Spatial Data
Authors:
Placido A. Souza Neto,
Francisco B. Silva Junior,
Felipe F. Pontes,
Behrooz Omidvar-Tehrani
Abstract:
Nowadays, spatial data are ubiquitous in various fields of science, such as transportation and the social Web. A recent research direction in analyzing spatial data is to provide means for "exploratory analysis" of such data where analysts are guided towards interesting options in consecutive analysis iterations. Typically, the guidance component learns analyst's preferences using her explicit fee…
▽ More
Nowadays, spatial data are ubiquitous in various fields of science, such as transportation and the social Web. A recent research direction in analyzing spatial data is to provide means for "exploratory analysis" of such data where analysts are guided towards interesting options in consecutive analysis iterations. Typically, the guidance component learns analyst's preferences using her explicit feedback, e.g., picking a spatial point or selecting a region of interest. However, it is often the case that analysts forget or don't feel necessary to explicitly express their feedback in what they find interesting. Our approach captures implicit feedback on spatial data. The approach consists of observing mouse moves (as a means of analyst's interaction) and also the explicit analyst's interaction with data points in order to discover interesting spatial regions with dense mouse hovers. In this paper, we define, formalize and explore Interesting Dense Regions (IDRs) which capture preferences of analysts, in order to automatically find interesting spatial highlights. Our approach involves a polygon-based abstraction layer for capturing preferences. Using these IDRs, we highlight points to guide analysts in the analysis process. We discuss the efficiency and effectiveness of our approach through realistic examples and experiments on Airbnb and Yelp datasets.
△ Less
Submitted 10 March, 2019;
originally announced March 2019.
-
Exploration of User Groups in VEXUS
Authors:
Sihem Amer-Yahia,
Behrooz Omidvar-Tehrani,
Joao Comba,
Viviane Moreira,
Fabian Colque Zegarra
Abstract:
We introduce VEXUS, an interactive visualization framework for exploring user data to fulfill tasks such as finding a set of experts, forming discussion groups and analyzing collective behaviors. User data is characterized by a combination of demographics like age and occupation, and actions such as rating a movie, writing a paper, following a medical treatment or buying groceries. The ubiquity of…
▽ More
We introduce VEXUS, an interactive visualization framework for exploring user data to fulfill tasks such as finding a set of experts, forming discussion groups and analyzing collective behaviors. User data is characterized by a combination of demographics like age and occupation, and actions such as rating a movie, writing a paper, following a medical treatment or buying groceries. The ubiquity of user data requires tools that help explorers, be they specialists or novice users, acquire new insights. VEXUS lets explorers interact with user data via visual primitives and builds an exploration profile to recommend the next exploration steps. VEXUS combines state-of-the-art visualization techniques with appropriate indexing of user data to provide fast and relevant exploration.
△ Less
Submitted 10 December, 2017;
originally announced December 2017.
-
Characterizing Driving Context from Driver Behavior
Authors:
Sobhan Moosavi,
Behrooz Omidvar-Tehrani,
R. Bruce Craig,
Arnab Nandi,
Rajiv Ramnath
Abstract:
Because of the increasing availability of spatiotemporal data, a variety of data-analytic applications have become possible. Characterizing driving context, where context may be thought of as a combination of location and time, is a new challenging application. An example of such a characterization is finding the correlation between driving behavior and traffic conditions. This contextual informat…
▽ More
Because of the increasing availability of spatiotemporal data, a variety of data-analytic applications have become possible. Characterizing driving context, where context may be thought of as a combination of location and time, is a new challenging application. An example of such a characterization is finding the correlation between driving behavior and traffic conditions. This contextual information enables analysts to validate observation-based hypotheses about the driving of an individual. In this paper, we present DriveContext, a novel framework to find the characteristics of a context, by extracting significant driving patterns (e.g., a slow-down), and then identifying the set of potential causes behind patterns (e.g., traffic congestion). Our experimental results confirm the feasibility of the framework in identifying meaningful driving patterns, with improvements in comparison with the state-of-the-art. We also demonstrate how the framework derives interesting characteristics for different contexts, through real-world examples.
△ Less
Submitted 17 November, 2017; v1 submitted 13 October, 2017;
originally announced October 2017.
-
Annotation of Car Trajectories based on Driving Patterns
Authors:
Sobhan Moosavi,
Behrooz Omidvar-Tehrani,
R. Bruce Craig,
Rajiv Ramnath
Abstract:
Nowadays, the ubiquity of various sensors enables the collection of voluminous datasets of car trajectories. Such datasets enable analysts to make sense of driving patterns and behaviors: in order to understand the behavior of drivers, one approach is to break a trajectory into its underlying patterns and then analyze that trajectory in terms of derived patterns. The process of trajectory segmenta…
▽ More
Nowadays, the ubiquity of various sensors enables the collection of voluminous datasets of car trajectories. Such datasets enable analysts to make sense of driving patterns and behaviors: in order to understand the behavior of drivers, one approach is to break a trajectory into its underlying patterns and then analyze that trajectory in terms of derived patterns. The process of trajectory segmentation is a function of various resources including a set of ground truth trajectories with their driving patterns. To the best of our knowledge, no such ground-truth dataset exists in the literature. In this paper, we describe a trajectory annotation framework and report our results to annotate a dataset of personal car trajectories. Our annotation methodology consists of a crowd-sourcing task followed by a precise process of aggregation. Our annotation process consists of two granularity levels, one to specify the annotation (segment border) and the other one to describe the type of the segment (e.g. speed-up, turn, merge, etc.). The output of our project, Dataset of Annotated Car Trajectories (DACT), is available online at https://figshare.com/articles/dact_dataset_of_annotated_car_trajectories/5005289 .
△ Less
Submitted 16 May, 2017; v1 submitted 15 May, 2017;
originally announced May 2017.