Time Masking: Leveraging Temporal Information in Spoken Dialogue Systems

Rylan Conway, Lambert Mathias

2019-07-25WS 2019 9Video Salient Object Detection Spoken Dialogue Systems

Abstract

In a spoken dialogue system, dialogue state tracker (DST) components track the state of the conversation by updating a distribution of values associated with each of the slots being tracked for the current user turn, using the interactions until then. Much of the previous work has relied on modeling the natural order of the conversation, using distance based offsets as an approximation of time. In this work, we hypothesize that leveraging the wall-clock temporal difference between turns is crucial for finer-grained control of dialogue scenarios. We develop a novel approach that applies a {\it time mask}, based on the wall-clock time difference, to the associated slot embeddings and empirically demonstrate that our proposed approach outperforms existing approaches that leverage distance offsets, on both an internal benchmark dataset as well as DSTC2.

Results

Task	Dataset	Metric	Value	Model
Video	SegTrack v2	AVERAGE MAE	0.116	TIMP
Video	SegTrack v2	S-Measure	0.644	TIMP
Video	SegTrack v2	max E-measure	0.768	TIMP
Video	MCL	AVERAGE MAE	0.113	TIMP
Video	MCL	MAX E-MEASURE	0.76	TIMP
Video	MCL	S-Measure	0.642	TIMP
Object Detection	SegTrack v2	AVERAGE MAE	0.116	TIMP
Object Detection	SegTrack v2	S-Measure	0.644	TIMP
Object Detection	SegTrack v2	max E-measure	0.768	TIMP
Object Detection	MCL	AVERAGE MAE	0.113	TIMP
Object Detection	MCL	MAX E-MEASURE	0.76	TIMP
Object Detection	MCL	S-Measure	0.642	TIMP
3D	SegTrack v2	AVERAGE MAE	0.116	TIMP
3D	SegTrack v2	S-Measure	0.644	TIMP
3D	SegTrack v2	max E-measure	0.768	TIMP
3D	MCL	AVERAGE MAE	0.113	TIMP
3D	MCL	MAX E-MEASURE	0.76	TIMP
3D	MCL	S-Measure	0.642	TIMP
Video Object Segmentation	SegTrack v2	AVERAGE MAE	0.116	TIMP
Video Object Segmentation	SegTrack v2	S-Measure	0.644	TIMP
Video Object Segmentation	SegTrack v2	max E-measure	0.768	TIMP
Video Object Segmentation	MCL	AVERAGE MAE	0.113	TIMP
Video Object Segmentation	MCL	MAX E-MEASURE	0.76	TIMP
Video Object Segmentation	MCL	S-Measure	0.642	TIMP
RGB Salient Object Detection	SegTrack v2	AVERAGE MAE	0.116	TIMP
RGB Salient Object Detection	SegTrack v2	S-Measure	0.644	TIMP
RGB Salient Object Detection	SegTrack v2	max E-measure	0.768	TIMP
RGB Salient Object Detection	MCL	AVERAGE MAE	0.113	TIMP
RGB Salient Object Detection	MCL	MAX E-MEASURE	0.76	TIMP
RGB Salient Object Detection	MCL	S-Measure	0.642	TIMP
2D Classification	SegTrack v2	AVERAGE MAE	0.116	TIMP
2D Classification	SegTrack v2	S-Measure	0.644	TIMP
2D Classification	SegTrack v2	max E-measure	0.768	TIMP
2D Classification	MCL	AVERAGE MAE	0.113	TIMP
2D Classification	MCL	MAX E-MEASURE	0.76	TIMP
2D Classification	MCL	S-Measure	0.642	TIMP
2D Object Detection	SegTrack v2	AVERAGE MAE	0.116	TIMP
2D Object Detection	SegTrack v2	S-Measure	0.644	TIMP
2D Object Detection	SegTrack v2	max E-measure	0.768	TIMP
2D Object Detection	MCL	AVERAGE MAE	0.113	TIMP
2D Object Detection	MCL	MAX E-MEASURE	0.76	TIMP
2D Object Detection	MCL	S-Measure	0.642	TIMP
16k	SegTrack v2	AVERAGE MAE	0.116	TIMP
16k	SegTrack v2	S-Measure	0.644	TIMP
16k	SegTrack v2	max E-measure	0.768	TIMP
16k	MCL	AVERAGE MAE	0.113	TIMP
16k	MCL	MAX E-MEASURE	0.76	TIMP
16k	MCL	S-Measure	0.642	TIMP

Time Masking: Leveraging Temporal Information in Spoken Dialogue Systems

Abstract

Results

Related Papers

Time Masking: Leveraging Temporal Information in Spoken Dialogue Systems

Abstract

Results

Related Papers