Folktale Classification using Learning to Rank Dong Nguyen, Dolf Trieschnigg, and Mariët Theune University of Twente
Folktales Fairy tales Riddles Legends Urban legends Jokes Etc..
Folktale researchers Folktales are a resource to research Variation in tales Shifting moral values, beliefs, identities etc. Intertextuality
Classification systems Folktale researchers have developed classification systems To compare and analyze stories To organize stories
Classification systems Folktale researchers have developed classification systems To compare and analyze stories To organize stories Story types
The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.!
A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.!
A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.!
A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.!
A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.! A car driver picks up a girl wearing a white dress. He accidently spills red wine on her dress. The next day! he finds out she died a! year ago. When the police open! her grave, they find the white! dress with the red wine spot.!
A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.! A car driver picks up a girl wearing a white dress. He accidently spills red wine on her dress. The next day! he finds out she died a! year ago. When the police open! her grave, they find the white! dress with the red wine spot.!
A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.! A car driver picks up a girl wearing a white dress. He accidently spills red wine on her dress. The next day! he finds out she died a! year ago. When the police open! her grave, they find the white! dress with the red wine spot.!
A car driver picks up a hitchhiker. They talk about spiritual topics in life.! Suddenly the hitchhiker vanishes. The police tell! him they have heard the! story earlier that day.! A guy bikes through the park! at night. He encounters a! girl covered in blood.! On their way to the! police, she suddenly! disappears. She resembles a murdered girl! The Vanishing Hitchhiker (BRUN 01000)!! A ghostly or heavenly hitchhiker that vanishes! from a vehicle, sometimes after giving warning or prophecy.! A car driver picks up a hitchhiker and borrows her! his sweater. When he stops! by to pick up the sweater,! he discovers she passed! away due to a car accident! a while ago. He finds! his sweater on her grave.! A car driver picks up a girl wearing a white dress. He accidently spills red wine on her dress. The next day! he finds out she died a! year ago. When the police open! her grave, they find the white! dress with the red wine spot.!
Story Type Indexes: ATU Aarne- Thompson- Uther classifica4on system (ATU) Red Riding Hood (ATU 0333), The Race between Hare and Tortoise (ATU 0275A), etc.
Story Type Indexes: Brunvand Urban legends The Microwaved Pet (BRUN 02000), The Kidney Heist (BRUN 06305), The Killer in the Backseat (BRUN 01305), The Vanishing Hitchhiker (BRUN 01000), etc.
Automatic Identification of Story Types Increasing digitalization Discover relationships between stories
Outline Problem descrip4on, corpora Experimental setup Baselines, Learning to Rank Results Baselines, Feature analysis, Error analysis Discussion/Conclusion
Goal and Evaluation Given an input story, return a ranking of story types Semi automatic setting Reciprocal Rank (MRR) 1 rank i Classifica4on Simula4ng a classifica4on serng. The highest ranked label is then taken as the predicted class. Accuracy
Folktale database Dutch Folktale Database (http://www.verhalenbank.nl) Over 42.000 stories In our experiments: only stories written in standard Dutch
Dataset I Number of storytypes 0 20 40 60 0 20 40 60 Number of stories (a) ATU Number of storytypes 0 20 40 60 0 20 40 60 Number of stories (b) Brunvand Fig. 1. Story type frequencies
Dataset II Index Train Dev Test Nr documents 400 75 25 50 Nr story types 98 59 24 43 ATU Index Train Dev Test Nr documents 687 175 50 75 Nr story types 125 92 40 50 Brunvand
Baselines: big doc Vanishing Hitchiker Input document (query) Microwaved Pet Killer in the Backseat Killer in the Backseat Ranking
Baselines: small doc Input document (query) Vanishing Hitchiker Microwaved Pet Vanishing Hitchiker Microwaved Pet Killer in the Backseat Killer in the Backseat Ranking When taking the top ranked label as the class, this is the same as a Nearest Neighbour classifier (k=1).
Results - Baseline MRR Accuracy Smalldoc 0.7779 0.72 Bigdoc 0.4423 0.36 ATU MRR Accuracy Smalldoc 0.6430 0.56 Bigdoc 0.6411 0.56 Brunvand
Learning to Rank 1. Retrieve an ini4al set of candidate stories (small document baseline). 2. Apply learning to rank to rerank the top 50 candidates. 3. Create a final ranked list of story types, by taking the corresponding labels of the ranked stories and removing duplicates.
Features I Small Document Scores (IR) Indicates the score of the query on the candidate stories. Fulltext (BM25 - Full text), only nouns (BM25 - nouns) and only verbs (BM25 verbs) Big Document Scores (Bigdoc) Similarity to all stories of the candidate's story type (bigdoc) Fulltext (Bigdoc - BM25 - Full text), only nouns (Bigdoc - BM25 - nouns) and only verbs (Bigdoc - BM25 - verbs). Lexical Similarity (LS) Jaccard and TFIDF similarity, calculated on the following token types: unigram, bigrams, character ngrams (2-5), chunks, named en44es, 4me and loca4ons.
Features II Verb(Subject, Object) triplets Lives (princess, castle)!! Disappear(driver,) Extracted based on dependencies obtained using Frog parser
Features III Matches Exact, Subject-Object, Verb-Object, Subject-Verb Abstraction VerbNet Example: consider-29.9 class in VerbNet: achten (esteem), bevinden (find), inzien (realise), menen (think/believe), veronderstellen (presume), kennen (know), wanen (falsely believe), denken (think)
Feature Analysis I Baseline (smalldoc) MRR 0.7779 0.72 + Bigdoc 0.8367 0.78 + IR 0.8049 0.76 + LS 0.7921 0.72 + Triplets 0.8016 0.72 All 0.8569 0.82 ATU Accuracy Baseline (smalldoc) MRR 0.6430 0.56 + Bigdoc 0.7933 0.72 + IR 0.7247 0.61 + LS 0.6810 0.60 + Triplets 0.6600 0.59 All 0.8132 0.76 Brunvand Accuracy
Feature Analysis II Feature Weight Bigdoc: BM25 - nouns 0.179 Bigdoc: BM25 - full text 0.158 LS: unigrams - TFIDF 0.109 Bigdoc: BM25 - verbs 0.069 Triplets: SO match, Jaccard, no abstrac4on ATU 0.063 Feature Weight Bigdoc: BM25 - full text 0.209 Bigdoc: BM25 - nouns 0.204 LS: unigrams - TFIDF 0.065 IR: BM25 - nouns 0.062 Bigdoc: BM25 - verbs 0.051 Brunvand
Error analysis Matches based on style instead of actual plot. Dis4nguishing narrator, or very short stories. ATU: Also matched on content words that were not core to the plot. Happened in par4cular with long stories.
Discussion High MRR Suitable for a semi- automa4c serng, where annotators are presented with a ranked list What s next? Other type indexes, dialects, historical variants.
Summary Classifica4on of story types. Two story type indexes: ATU and Brunvand. Nearest Neighbor using Learning to Rank approach. Combining a small document and big document model was very effec4ve.