Deep Reinforcement Learning in Natural Language Scenarios
Reinforcement learning refers to a class of algorithms that aim at learning a good policy in a dynamic environment. Recently, by combining deep learning with reinforcement learning, researchers have made significant breakthroughs in many artificial intelligence applications. The most notable applications are Atari games and game of Go. However, natural language applications involving deep reinforcement learning are still rare. This thesis studies deep reinforcement learning in natural language scenarios with three contributions. First we introduce a novel architecture for reinforcement learning with deep neural networks designed to handle state and action spaces characterized by natural language. The architecture represents state and action spaces with separate embedding vectors, which are combined with an interaction function to approximate the Q-function in reinforcement learning. Second, we investigate reinforcement learning with a combinatorial, natural language action space. Novel deep reinforcement learning architectures are studied for effective modeling of the value function associated with actions comprised of interdependent sub-actions, accounting for redundancy among sub-actions. In addition, a two-stage Q-learning framework is introduced as a strategy for reducing the cost to search the combinatorial action space. Third, we augment the state representation to incorporate global context using an external unstructured knowledge source with temporal information. This approach is inspired by the observation that in a real-world decision making process, it is usually beneficial to consider background knowledge and popular current events relevant to the current local context. We experiment on two types of tasks, text-based games and predicting popular Reddit discussion threads. We show that all contributions help reinforcement learning in natural language scenarios. Specifically, experiments with paraphrased action descriptions on text games show that separate modeling of state and action spaces is extracting meaning rather than simply memorizing strings of text. For a combinatorial action space, our proposed model, which represents dependence between sub-actions through a bi-directional LSTM, gives the best performance for predicting popular Reddit threads across different domains. The two-stage Q-learning achieves significant performance gain compared to random sampling a subspace of the combinatorial action space. For tracking the most popular thread, incorporating external knowledge in the form of discussions about world news also leads to significant improvements with a 34% gain for discussions about topic (politics) for which world news is particularly relevant.
- Electrical engineering