Folktale Classification using Learning to Rank. Dong Nguyen, Dolf Trieschnigg, and Mariët Theune University of Twente

Similar documents
Information Extraction slides adapted from Jim Martin s Natural Language Processing class

CSCI 5832 Natural Language Processing

Longitudinal Analysis Report. Embry-Riddle Aeronautical University - Worldwide Campus

Longitudinal Analysis Report. Embry-Riddle Aeronautical University - Worldwide Campus

Kristina Ricks ISYS 520 VBA Project Write-up Around the World

Lecture 2: Image Classification pipeline. Fei-Fei Li & Andrej Karpathy Lecture 2-1

Knowlywood: Mining Activity Knowledge from Hollywood Narratives

Saighton Camp, Chester. Technical Note: Impact of Boughton Heath S278 Works upon the operation of the Local Highway Network

Discriminate Analysis of Synthetic Vision System Equivalent Safety Metric 4 (SVS-ESM-4)

MIT ICAT. Price Competition in the Top US Domestic Markets: Revenues and Yield Premium. Nikolas Pyrgiotis Dr P. Belobaba

Bioinformatics of Protein Domains: New Computational Approach for the Detection of Protein Domains

How to Program the PMDG 737 NGX FMC

USE OF 3D GIS IN ANALYSIS OF AIRSPACE OBSTRUCTIONS

VISITOR RISK MANAGEMENT APPLIED TO AVALANCHES IN NEW ZEALAND

RECREATION OPPORTUNITY SPECTRUM CLASSIFICATION IN NATURAL TOURISM ATTRACTIONS, CHIANG RAI PROVINCE

TIMS & PowerSchool 2/3/2016. TIMS and PowerSchool. Session Overview

Special edition paper Development of a Crew Schedule Data Transfer System

Massey Hall. 178 Victoria St, Toronto, ON M5B 1T7. CAP Index, Inc. REPORT CONTENTS. About CAP Index, Inc. 3-Mile Methodology. 3 Tract Map.

Airport Wildlife Strike Summary and Risk Analysis Report: a new addition to the FAA s Wildlife Hazard Mitigation Website

APPENDIX 8. Leeds Socio-Economic Baseline Report. Report. July Metro and Leeds City Council

UAS to GIS Utilizing a low-cost Unmanned Aerial System (UAS) for Coastal Erosion Monitoring

Mid-Air Collision Risk And Areas Of High Benefit For Traffic Alerting

Evaluation of High-Occupancy-Vehicle

White Mountain National Forest. Rumney Rocks Project Supplemental Environmental Assessment. 30-day Comment Report

Theme Park Routing: A Decision Support System for Walt Disney World Trips

Urban Legends, like Hip Hop and Rap Exhibit many features of Literature: They are dramatic, and play on such emotions as fear or embarrassment.

Improving Taxi Boarding Efficiency at Changi Airport

Orientation and Conferencing Plan

Simulation of disturbances and modelling of expected train passenger delays

Supply of Medical and Welfare Facilities for the Elderly in Islands of the Seto Inland Sea

Operational Evaluation of a Flight-deck Software Application

Amusement Park Case. Situation. Cost of theme park development

Designing computer based training programs. Sam Chan Research Scientist. Posit Science Corporation, San Francisco, CA.

Natural Language Processing. Dependency Parsing

1.- Introduction Pages Description 21.- Tutorial 22.- Technical support

DATA-DRIVEN STAFFING RECOMMENDATIONS FOR AIR TRAFFIC CONTROL TOWERS

ScienceDirect. Prediction of Commercial Aircraft Price using the COC & Aircraft Design Factors

3. Aviation Activity Forecasts

The Cultural and Heritage Traveler 2013 Edition

AM I A GOOD WITNESS?

SE2. English Literacy 2017/2018. Name / Surname(s): School: Group: City / Town: Date: Year 2 of Secondary Education

METROBUS SERVICE GUIDELINES

HILL-FORTS OF THE INNER TAY ESTUARY PERTH. Phase One PERTH AND KINROSS. Archaeological Survey Report. Oxford Archaeology North.

Mathcad Prime 3.0. Curriculum Guide

4th Grade Emergency Sub Plans

Thinking Guide Activities Expository Title of the Selection: Rosa Parks: My Story Genre: Nonfiction Informational

PREFACE. Service frequency; Hours of service; Service coverage; Passenger loading; Reliability, and Transit vs. auto travel time.

PHY 133 Lab 6 - Conservation of Momentum

Annex/FANS 17. ADS/CPDLC Report EUR/SAM Corridor: Index. Systems Direction Navigation and Surveillance Division. 2. Traffic Data Summary

Analysis and Validation of a 3-D EM Simulation Model of Rogowski Coils as PLC Coupling Elements for Automotive Batteries

2017 LRT Passenger Count Report

FLIGHT OPERATIONS PANEL

The NAT OPS Bulletin Checklist, available at (Documents, NAT Docs), contains an up to date list of all current NAT Ops Bulletins.

REVIEW OF SUN METRO LIFT SERVICES

In-Service Data Program Helps Boeing Design, Build, and Support Airplanes

The Computerized Analysis of ATC Tracking Data for an Operational Evaluation of CDTI/ADS-B Technology

Part 1. Part 2. airports100.csv contains a list of 100 US airports.

VAST Challenge 2017 Reviewer Guide: Mini-Challenge 1

CURRENT SHORT-RANGE TRANSIT PLANNING PRACTICE. 1. SRTP -- Definition & Introduction 2. Measures and Standards

Dublin Airport Journey Towards SIS

Scarecrow Mobile Solutions (Pty) Ltd Customer inspired, hand-crafted software. Airline Online Recruitment Management September 2017

The NAT OPS Bulletin Checklist, available at (Documents, NAT Docs), contains an up to date list of all current NAT Ops Bulletins.

PPS Release Note

ICTAP Program. Interoperable Communications Technical Assistance Program. Communication Assets Survey and Mapping (CASM) Tool Short Introduction

1 Listen to Chapters 1 and 2 on your CD/download and decide if these sentences are true or false. Can you correct the false ones?

myidtravel Functional Description

Indicative AS3959 Bushfire Attack Level Assessment Report

Curriculum Guide. Mathcad Prime 4.0

Summaries. Dead Man s Folly

ARPA Veneto- Centro Valanghe di Arabba, Via Pradat, Arabba (BL),Italy 2

Airspace Complexity Measurement: An Air Traffic Control Simulation Analysis

AirFrance KLM - FlightPrice

A GEOGRAPHIC ANALYSIS OF OPTIMAL SIGNAGE LOCATION SELECTION IN SCENIC AREA

News English.com Ready-to-use ESL/EFL Lessons by Sean Banville

Appendix B CLEAR ZONES AND ACCIDENT POTENTIAL ZONES

NETWORK DEVELOPMENT AND DETERMINATION OF ALLIANCE AND JOINT VENTURE BENEFITS

Specialty Cruises. 100% Tally and Strip Cruises

Additional Boarding Setup and Daily Operations Guide

Predicting Flight Delays Using Data Mining Techniques

Forecast of Aviation Activity

THE VANISHING HITCHHIKER IN ROMANIAN URBAN LEGENDS. Oana-Cătălina VOICHICI, Assistant, PhD Candidate, Valahia University of Târgovişte

Global Aviation Data Management (GADM) Jehad Faqir Head of Safety & Flight Operations IATA- MENA

Sharing UAE experience in. AIM implementation

An Econometric Study of Flight Delay Causes at O Hare International Airport Nathan Daniel Boettcher, Dr. Don Thompson*

Reducing Garbage-In for Discrete Choice Model Estimation

Student Activities. Dead Man s Folly. Part 1 (Chapters 1 3) 3 Vocabulary Match the words on the left with their definitions on the right.

Otago Economic Overview 2013

The AeroKurier Online Contest Not Just for Computer Nerds

Baggage Reconciliation System

1.0 OUTLINE OF NOISE ANALYSIS...3

P4.6. Andrew F. Loughe, 1,3 * Sean Madine, 2,3 Jennifer Mahoney 3 1. INTRODUCTION

Central Coast Origin-Destination Survey

Overview of the TREC 2009 Entity track. What is the track about? Information need. Airlines that currently use Boeing 747 planes

A Model to Assess the Mobility of the National Airspace System (NAS).

Annotating, Extracting, and Linking Legal Information

Research on Controlled Flight Into Terrain Risk Analysis Based on Bow-tie Model and WQAR Data

Authentic Measurements as a Basis for Cadastral GIS

MACHINE LEARNING MODEL FOR AIRCRAFT PERFORMANCES

Urban Legends and Myths

SESAR Active ECAC INF07 REG ASP MIL APO USE INT IND NM

Transcription:

Folktale Classification using Learning to Rank Dong Nguyen, Dolf Trieschnigg, and Mariët Theune University of Twente

Folktales Fairy tales Riddles Legends Urban legends Jokes Etc..

Folktale researchers Folktales are a resource to research Variation in tales Shifting moral values, beliefs, identities etc. Intertextuality

Classification systems Folktale researchers have developed classification systems To compare and analyze stories To organize stories

Classification systems Folktale researchers have developed classification systems To compare and analyze stories To organize stories Story types

The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.!

A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.!

A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.!

A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.!

A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.! A car driver picks up a girl wearing a white dress. He accidently spills red wine on her dress. The next day! he finds out she died a! year ago. When the police open! her grave, they find the white! dress with the red wine spot.!

A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.! A car driver picks up a girl wearing a white dress. He accidently spills red wine on her dress. The next day! he finds out she died a! year ago. When the police open! her grave, they find the white! dress with the red wine spot.!

A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.! A car driver picks up a girl wearing a white dress. He accidently spills red wine on her dress. The next day! he finds out she died a! year ago. When the police open! her grave, they find the white! dress with the red wine spot.!

A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.! A car driver picks up a girl wearing a white dress. He accidently spills red wine on her dress. The next day! he finds out she died a! year ago. When the police open! her grave, they find the white! dress with the red wine spot.!

Story Type Indexes: ATU Aarne- Thompson- Uther classifica4on system (ATU) Red Riding Hood (ATU 0333), The Race between Hare and Tortoise (ATU 0275A), etc.

Story Type Indexes: Brunvand Urban legends The Microwaved Pet (BRUN 02000), The Kidney Heist (BRUN 06305), The Killer in the Backseat (BRUN 01305), The Vanishing Hitchhiker (BRUN 01000), etc.

Automatic Identification of Story Types Increasing digitalization Discover relationships between stories

Outline Problem descrip4on, corpora Experimental setup Baselines, Learning to Rank Results Baselines, Feature analysis, Error analysis Discussion/Conclusion

Goal and Evaluation Given an input story, return a ranking of story types Semi automatic setting Reciprocal Rank (MRR) 1 rank i Classifica4on Simula4ng a classifica4on serng. The highest ranked label is then taken as the predicted class. Accuracy

Folktale database Dutch Folktale Database (http://www.verhalenbank.nl) Over 42.000 stories In our experiments: only stories written in standard Dutch

Dataset I Number of storytypes 0 20 40 60 0 20 40 60 Number of stories (a) ATU Number of storytypes 0 20 40 60 0 20 40 60 Number of stories (b) Brunvand Fig. 1. Story type frequencies

Dataset II Index Train Dev Test Nr documents 400 75 25 50 Nr story types 98 59 24 43 ATU Index Train Dev Test Nr documents 687 175 50 75 Nr story types 125 92 40 50 Brunvand

Baselines: big doc Vanishing Hitchiker Input document (query) Microwaved Pet Killer in the Backseat Killer in the Backseat Ranking

Baselines: small doc Input document (query) Vanishing Hitchiker Microwaved Pet Vanishing Hitchiker Microwaved Pet Killer in the Backseat Killer in the Backseat Ranking When taking the top ranked label as the class, this is the same as a Nearest Neighbour classifier (k=1).

Results - Baseline MRR Accuracy Smalldoc 0.7779 0.72 Bigdoc 0.4423 0.36 ATU MRR Accuracy Smalldoc 0.6430 0.56 Bigdoc 0.6411 0.56 Brunvand

Learning to Rank 1. Retrieve an ini4al set of candidate stories (small document baseline). 2. Apply learning to rank to rerank the top 50 candidates. 3. Create a final ranked list of story types, by taking the corresponding labels of the ranked stories and removing duplicates.

Features I Small Document Scores (IR) Indicates the score of the query on the candidate stories. Fulltext (BM25 - Full text), only nouns (BM25 - nouns) and only verbs (BM25 verbs) Big Document Scores (Bigdoc) Similarity to all stories of the candidate's story type (bigdoc) Fulltext (Bigdoc - BM25 - Full text), only nouns (Bigdoc - BM25 - nouns) and only verbs (Bigdoc - BM25 - verbs). Lexical Similarity (LS) Jaccard and TFIDF similarity, calculated on the following token types: unigram, bigrams, character ngrams (2-5), chunks, named en44es, 4me and loca4ons.

Features II Verb(Subject, Object) triplets Lives (princess, castle)!! Disappear(driver,) Extracted based on dependencies obtained using Frog parser

Features III Matches Exact, Subject-Object, Verb-Object, Subject-Verb Abstraction VerbNet Example: consider-29.9 class in VerbNet: achten (esteem), bevinden (find), inzien (realise), menen (think/believe), veronderstellen (presume), kennen (know), wanen (falsely believe), denken (think)

Feature Analysis I Baseline (smalldoc) MRR 0.7779 0.72 + Bigdoc 0.8367 0.78 + IR 0.8049 0.76 + LS 0.7921 0.72 + Triplets 0.8016 0.72 All 0.8569 0.82 ATU Accuracy Baseline (smalldoc) MRR 0.6430 0.56 + Bigdoc 0.7933 0.72 + IR 0.7247 0.61 + LS 0.6810 0.60 + Triplets 0.6600 0.59 All 0.8132 0.76 Brunvand Accuracy

Feature Analysis II Feature Weight Bigdoc: BM25 - nouns 0.179 Bigdoc: BM25 - full text 0.158 LS: unigrams - TFIDF 0.109 Bigdoc: BM25 - verbs 0.069 Triplets: SO match, Jaccard, no abstrac4on ATU 0.063 Feature Weight Bigdoc: BM25 - full text 0.209 Bigdoc: BM25 - nouns 0.204 LS: unigrams - TFIDF 0.065 IR: BM25 - nouns 0.062 Bigdoc: BM25 - verbs 0.051 Brunvand

Error analysis Matches based on style instead of actual plot. Dis4nguishing narrator, or very short stories. ATU: Also matched on content words that were not core to the plot. Happened in par4cular with long stories.

Discussion High MRR Suitable for a semi- automa4c serng, where annotators are presented with a ranked list What s next? Other type indexes, dialects, historical variants.

Summary Classifica4on of story types. Two story type indexes: ATU and Brunvand. Nearest Neighbor using Learning to Rank approach. Combining a small document and big document model was very effec4ve.