Neural Dialogue Context Online End-of-Turn Detection
This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker{'}s utterances and collocutor{'}s utterances. The proposed method combines multiple time-asynchronous long short-term memory recurrent neural networks, which can capture speaker{'}s and collocutor{'}s multiple sequential features, and their interactions. On the assumption of applying the proposed method to spoken dialogue systems, we introduce speaker{'}s acoustic sequential features and collocutor{'}s linguistic sequential features, each of which can be extracted in an online manner. Our evaluation confirms the effectiveness of taking dialogue context formed by the speaker{'}s utterances and collocutor{'}s utterances into consideration.
PDF Abstract