Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study

Published in SIGIR, 2023

Download: [ Preprint | Poster | Github ] 🏆

Received the Industry Impact Award at the RMIT/UniMelb 25+ Years of IR

Abstract

This paper explores the utility of a Large Language Model (LLM) to automatically generate queries and query variants from a description of an information need. Given a set of information needs described as backstories, we explore how similar the queries generated by the LLM are to those generated by humans. We quantify the similarity using different metrics and examine how the use of each set would contribute to document pooling when building test collections. Our results show potential in using LLMs to generate query variants. While they may not fully capture the wide variety of human-generated variants, they generate similar sets of relevant documents, reaching up to 71.1% overlap at a pool depth of 100.

Citation

If you find this paper useful, please cite it using the following BibTeX:

@INPROCEEDINGS{Alaofi23GptVariants,
    TITLE = {Can Generative LLMs Create Query Variants for Test Collections? An Exploratory Study},
    AUTHOR = {Alaofi, Marwah and Gallagher, Luke and Sanderson, Mark and Scholer, Falk and Thomas, Paul},
    BOOKTITLE = {Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval},
    YEAR = {2023},
    URL = {https://doi.org/10.1145/3539618.3591960},
    DOI = {10.1145/3539618.3591960},
}