There is increasing interest in spoken conversational search—multi-turn interactions with a search engine, spoken in natural language—but until recently there was little public data to support research. We describe our experiences building two data sets for spoken conversational search: the Microsoft Information-Seeking Conversation set (“MISC”) and the Spoken Conversational Search set (“SCSdata”). Each data set contains recordings of spoken interactions between two people collaborating on web search tasks, but relatively small differences in protocol have led to observably different data. We discuss some consequences of these differences, and describe attempts to reproduce analyses from one set to the other.
History
Volume
2337
Start page
1
End page
5
Total pages
5
Outlet
Proceedings of the Workshop on Barriers to Interactive IR Resources Re-use at the ACM SIGIR Conference on Human Information Interaction and Retrieval (BIIRRR 2019)
Editors
Toine Bogers, Samuel Dodson, Maria Gäde, Luanne Freund, Mark M. Hall, Marijn Koolen, Vivien Petras, Nils Pharo, Mette Skov
Name of conference
BIIRRR 2019: Volume 2337
Publisher
Rheinisch-Westfaelische Technische Hochschule Aachen * Lehrstuhl Informatik V