Learning to Interpret and Generate Instructional Recipes
MetadataShow full item record
Enabling computers to interpret and generate instructional language has become increasingly important to our everyday lives: we ask our smartphones to set reminders and send messages; we rely on navigation systems to direct us to our destinations. We define instructional recipes as a special case of instructional language, where completion of the instructions results in a goal object. Some examples include cooking recipes, craft projects, and assembly instructions. Developing systems that automatically analyze and generate instructional recipes requires finding solutions to many semantic challenges, such as identifying implicit arguments (e.g., given the sentence "Bake for 15 min," identifying what is being baked and where the baking occurs) and learning physical attributes of entities (e.g., which ingredients are considered "dry"). Amassing this information has previously relied upon high-cost annotation efforts. We present a pair of models that can interpret and generate instructional recipes, respectively, and are trained on large corpora with minimal supervision -- only identification of the goal (e.g., dish to make), list of materials (e.g., ingredients to use), and recipe text. Our interpretation model is a probabilistic model that (1) identifies the sequence of actions described by the text of an instructional recipe and (2) how the provided materials (e.g., ingredients) and entities generated by actions (e.g., the mixture created by "Combine flour and sugar") are used. Our generation model is a novel neural language model that (1) generates an instructional recipe for a specified goal (e.g., dish to make), while (2) using all the required materials provided (e.g., list of ingredients to use). We also present an adaptation of our generation model that can jointly generate recipe text and its underlying structure. Experiments show that our models can successfully be trained to interpret and generate instructional recipes from unannotated text, while at the same time learning interpretable domain knowledge.