Step or Not: Discriminator for The Real Instructions in User-generated Recipes

Shintaro Inuzuka, Takahiko Ito, Jun Harashima
2018 Proceedings of the 2018 EMNLP Workshop W-NUT: The 4th Workshop on Noisy User-generated Text  
In a recipe sharing service, users publish recipe instructions in the form of a series of steps. However, some of the "steps" are not actually part of the cooking process. Specifically, advertisements of recipes themselves (e.g., "introduced on TV") and comments (e.g., "Thanks for many messages") may often be included in the step section of the recipe, like the recipe author's communication tool. However, such fake steps can cause problems when using recipe search indexing or when being spoken
more » ... when being spoken by devices such as smart speakers. As presented in this talk, we have constructed a discriminator that distinguishes between such a fake step and the step actually used for cooking. This project includes, but is not limited to, the creation of annotation data by classifying and analyzing recipe steps and the construction of identification models. Our models use only text information to identify the step. In our test, machine learning models achieved higher accuracy than rule-based methods that use manually chosen clue words.
doi:10.18653/v1/w18-6128 dblp:conf/aclnut/InuzukaIH18 fatcat:acp3p2npjfdgpapcgefbcbnmzi