TrueView Features. Amanda J. Minnich and Dr. Abdullah Mueen University of New Mexico {aminnich, March 9, 2015

Similar documents
A Statistical Method for Eliminating False Counts Due to Debris, Using Automated Visual Inspection for Probe Marks

BELLINGHAM INT L AIRPORT (BLI)

15:00 minutes of the scheduled arrival time. As a leader in aviation and air travel data insights, we are uniquely positioned to provide an

How to turn. Reviews into Revenue & Trends in Online Reputation

WP7 Multiple Domains. Presented by: Jacek Maślankowski, Statistics Poland Sónia Quaresma, INE Portugal

Thinking With Mathematical Models Invs. 4.3, Correlation Coefficients & Outliers. HW ACE #4 (6-9) starts on page 96

Research in Coastal Infrastructure Reliability: Rerouting Intercity Flows in the Wake of a Port Outage

100% HOTEL RANKS 6 OF 154 HOTELS IN KISSIMMEE A GUEST FEEDBACK SUCCESS STORY. Response rate within 48 hours. Out of 154 hotels in Kissimmee

5 th Luxury Hotel Benchmarking Report. by BluSky Marketing.

Zip-A-Dee-Doo-Dah. Grades: 4th and 5th Team Size: 1-3 competitors Duration: 30 minutes Supervisors: Anna Cronin, Tali Khain, Shwetha Rajaram

The below is select text taken from the February 20, 2018 City Commission meeting:

MEASUREMENT OF ACCELERATION Pre-Lab. Name: Roster#

Temporal Deviations from Flight Plans:

An Examination of the Effect of Multiple Supervisors on Flight Trainees' Performance

Sensitivity Analysis for the Integrated Safety Assessment Model (ISAM) John Shortle George Mason University May 28, 2015

2018 Hotel Brand Reputation Rankings: USA & Canada

Individual Lab Report Ci-Trol Jun,2016. APTT (seconds) Ci-Trol 1 - Lot# Your Lab

Quantifying and Reducing Demand Uncertainty in Ground Delay Programs. Michael O. Ball Thomas Vossen University of Maryland

SFO Airport Performance

2015 British Columbia Parks. Visitor Survey. Juan De Fuca Park. China Beach

GetThere Integration User Guide. Cvent, Inc 1765 Greensboro Station Place McLean, VA

2006 RENO-SPARKS VISITOR PROFILE STUDY

measured 3 feet tall was planted. A hundred years later, the tree is now 25 feet tall. How many times taller is the tree now

Scalable Runtime Support for Data-Intensive Applications on the Single-Chip Cloud Computer

Step-by-Step Instructions

Airline Fuel Efficiency Ranking

OAKDALE VILLAGE - 15 TOWNHOMES WITH APPROVED TENTATIVE MAP

The Power of Internet Distribution Systems (IDS)

Source: W Hospitality group Hotel Chains Development Pipeline

Interim FDG-PET Visual interpretation vs. qpet

Alpha Systems AOA Classic & Ultra CALIBRATION PROCEDURES

Benchmarking Service Quality

ACS-1805 Introduction to Programming (with App Inventor) Chapter 7. Ladybug Chase 10/4/2018 1

QuickStart Guide. Concur Premier: Travel

Longitudinal Analysis Report. Embry-Riddle Aeronautical University - Worldwide Campus

Mercer SCOOT Adaptive Signal Control. Karl Typolt, Transpo Group PSRC RTOC July 6th, 2017

Crime & Disorder at Motels

Longitudinal Analysis Report. Embry-Riddle Aeronautical University - Worldwide Campus

Massey Hall. 178 Victoria St, Toronto, ON M5B 1T7. CAP Index, Inc. REPORT CONTENTS. About CAP Index, Inc. 3-Mile Methodology. 3 Tract Map.

Activity Template. Drexel-SDP GK-12 ACTIVITY. Subject Area(s): Sound Associated Unit: Associated Lesson: None

Lecture 2: Image Classification pipeline. Fei-Fei Li & Andrej Karpathy Lecture 2-1

University College of Jaffna, Jaffna, Sri Lanka. Keywords: destination image, revisit, tourism risks, word of mouth communication, ritual beach sites

Concur Travel User Guide

July 29-30, 2010 Washington, D.C Procurement Agencies. Coast Guard Agencies

Estimating Sources of Temporal Deviations from Flight Plans

P ARK I MPROVEMENTS 2013

Visitor Use Computer Simulation Modeling to Address Transportation Planning and User Capacity Management in Yosemite Valley, Yosemite National Park

EDEN A Short Film By Adam Widdowson

Name: Date: Period: Samples and Populations Investigation 1.1: Comparing Wait Times

ACS-1805 Introduction to Programming

FDAP Seminar. Miami, October 2016

Airspace Complexity Measurement: An Air Traffic Control Simulation Analysis

HAS ARRIVING PASSENGER SATISFACTION QUESTIONNAIRE FINAL Interviewer Comments Section. Time (in 24 hours:)

Travelex s Travel Technology Report. The Changing Face of Travel Technology

Preliminary Staff User s Manual. CASSi The Computerized Aircraft Scheduling System Rev. 1.28a. February 10, 2001

Smart Commute Tool User Guide

ICAO CORSIA CO 2 Estimation and Reporting Tool (CERT) Design, Development and Validation

San Joaquin County Emergency Medical Services Agency

Carbon Baseline Assessment of the Envirofit G3300 and JikoPoa Improved Cookstoves in Kenya

Invitation to the 4 th PSS SINCAL International User Meeting and PSS SINCAL Training Session

(2, 3) 2 B1 cao. 1 B1 cao

Sleep Inn & Suites Airport East Syracuse, NY 13057

Rentals will be for a minimum of four hours on weekends (Fridays, Saturdays and Sundays).

Year 10 Mathematics Examination SEMESTER

IELTS General Reading Test 1. Section 3

Model Solutions. ENGR 110: Test 2. 2 Oct, 2014

Identification Numbers. Chapter 9

Lesson1: Bivariate Relationships PRACTICE PROBLEMS

City of Galion Park Satisfaction Survey Results

Department of Tourism. Japan International Cooperation Agency

Risk Compensation in General Aviation: The Effect of Ballistic Parachute Systems

ecommerce in Independent Hotels 2012 Report

TRAIL DATABASE SCHEMA (8/26/2014)

2015 British Columbia Parks. Visitor Survey. Provincial Summary

Performance Measures Year End Updated-

[Docket No. FAA ; Product Identifier 2017-NM-032-AD; Amendment ; AD ]

Statistical Evaluation of Seasonal Effects to Income, Sales and Work- Ocupation of Farmers, the Apples Case in Prizren and Korça Regions

January / February 2006

Central Wasatch Visitor Use Study STEVEN W. BURR, PH.D. AND CHASE C. LAMBORN, M.S. INSTITUTE FOR OUTDOOR RECREATION AND TOURISM UTAH STATE UNIVERSITY

Figure 1. Overview map of Burrard Inlet, showing location of False Creek inlet.

Multi-regional Visitor Profile Summer 2015

Proficiency Testing FINAL REPORT Check sample program 16CSP02 February 2016

An Investigation of Inbound Nature-Based Tourism: the Case of Western Visitors to Kamikochi in the Japan Alps.

GFA. New South Wales State Gliding Championships COMPETITION RULES

Testing Results of the Ecocina Cooking Stove from El Salvador By Nordica MacCarty March 5th, 2008

METRO OPERATIONS COMMITTEE

Daily Estimation of Passenger Flow in Large and Complicated Urban Railway Network. Shuichi Myojo. Railway Technical Research Institute, Tokyo, Japan

Taxiway Pavement Evaluation to Support the Operational of Terminal 2 Juanda Airport

CITY COUNCIL AGENDA MEMORANDUM

Demand, Load and Spill Analysis Dr. Peter Belobaba

Public Comment. Comment To consider extending State Park lands above River Mile 7.0

Implication of the Term "Rawaa 'anhu Al-Naass"for Ta'deel

Online Appendix for Revisiting the Relationship between Competition and Price Discrimination

Multiple comparison of green express aviation network path optimization research

Universidad de Monterrey

Harvey Field Airport. Planning Advisory Committee & Public Open House. April 1, Comment Responses

Visitor Tradeoffs and Preferences for Conditions at Henry Rierson Spruce Run Campground in Clatsop State Forest, Oregon

A Profile of the Mexican Paint Industry - First Edition

Hamburg Airport Airport Charges Part II

Quantitative Comparative Analysis of the Cruise Homeport Competitive Situation in China - Taking Shanghai, Xiamen, Tianjin and Sanya for Example

Transcription:

TrueView Features Amanda J. Minnich and Dr. Abdullah Mueen University of New Mexico {aminnich, mueen}@cs.unm.edu March 9, 2015 This is a description of the features used in our outlier detection algorithms to calculate the TrueView scores. This is meant to accompany the paper: TrueView: Harnessing the Power of Multiple Review Sites http://dx.doi.org/10.1145/2736277.2741655 id: Hotel ID number star b: Hotel rating on Booking.com normalized to a 1-5 scale star h: Hotel rating on Hotels.com star t: Hotel rating on TripAdvisor.com star int b: Hotel rating on Booking.com as integer star int h: Hotel rating on Hotels.com as integer star int t: Hotel rating on TripAdvisor.com as integer num reviews b: Number of review on Booking.com num reviews h: Number of reviews on Hotels.com num reviews t: Number of reviews on TripAdvisor.com cleanliness b: Hotel s cleanliness rating on Booking.com 1

service b: Hotel s cleanliness rating on Booking.com comfort b: Hotel s comfort rating on Booking.com condition b: Hotel s condition rating on Booking.com neighborhood b: Hotel s neighborhood rating on Booking.com value b: Hotel s value rating on Booking.com cleanliness h: Hotel s cleanliness rating on Hotels.com service h: Hotel s service rating on Hotels.com comfort h: Hotel s comfort rating on Hotels.com condition h: Hotel s condition rating on Hotels.com neighborhood h: Hotel s neighborhood rating on Hotels.com num5 b: Number of 5 star ratings on Booking.com num4 b: Number of 4 star ratings on Booking.com num3 b: Number of 3 star ratings on Booking.com num2 b: Number of 2 star ratings on Booking.com num1 b: Number of 1 star ratings on Booking.com num5 h: Number of 5 star ratings on Hotels.com num4 h: Number of 4 star ratings on Hotels.com num3 h: Number of 3 star ratings on Hotels.com num2 h: Number of 2 star ratings on Hotels.com num1 h: Number of 1 star ratings on Hotels.com num5 t: Number of 5 star ratings on TripAdvisor.com num4 t: Number of 4 star ratings on TripAdvisor.com num3 t: Number of 3 star ratings on TripAdvisor.com 2

num2 t: Number of 2 star ratings on TripAdvisor.com num1 t: Number of 1 star ratings on TripAdvisor.com review mean b: Mean of review ratings on Booking.com review std b: Standard deviation of review ratings on Booking.com review mean h: Mean of review ratings on Hotels.com review std h: Standard deviation of review ratings on Hotels.com review mean t: Mean of review ratings on TripAdvisor.com review std t: Standard deviation of review ratings on TripAdvisor.com title length mean h: Mean length of review titles on Hotels.com title length std h: Standard deviation of review title length on Hotels.com title length mean t: Mean length of review titles on TripAdvisor.com title length std t: Standard deviation of review title length on TripAdvisor.com review length mean h: Mean length of review text on Hotels.com review length std h: Standard deviation of review text on Hotels.com review length mean t: Mean length of review text on TripAdvisor.com review length std t: Standard deviation of review text on TripAdvisor.com good length mean b: Mean of positive review comment length on Booking.com good length std b: Standard deviation of positive review comment length on Booking.com bad length mean b: Mean of negative review comment length on Booking.com 3

bad length std b: Standard deviation of negative review comment length on Booking.com cleanliness review mean h: Mean of review cleanliness ratings on Hotels.com cleanliness review std h: Standard deviation of review cleanliness ratings on Hotels.com service review mean h: Mean of review service ratings on Hotels.com service review std h: Standard deviation of review service ratings on Hotels.com comfort review mean h: Mean of review comfort ratings on Hotels.com comfort review std h: Standard deviation of review comfort ratings on Hotels.com Mean of review condition ratings on Ho- condition review mean h: tels.com condition review std h: Standard deviation of review condition ratings on Hotels.com neighborhood review mean h: Mean of review neighborhood ratings on Hotels.com Standard deviation of review neighbor- neighborhood review std h: hood ratings on Hotels.com num good reviews b: Number of 5 star reviews on Booking.com num avg reviews b: Number of 2, 3, and 4 star reviews on Booking.com num bad reviews b: Number of 1 star reviews on Booking.com num good reviews h: Number of 5 star reviews on Hotels.com num avg reviews h: Number of 2, 3, and 4 star reviews on Hotels.com num bad reviews h Number of 1 star reviews on Hotels.com 4

num good reviews t: Number of 5 star reviews on TripAdvisor.com num avg reviews t: Number of 2, 3, and 4 star reviews on TripAdvisor.com num bad reviews t Number of 1 star reviews on TripAdvisor.com num good then bad b: Number of 5 star reviews followed by a 1 star review on Booking.com num bad then good b: Number of 1 star reviews followed by a 5 star review on Booking.com num good then bad h: Number of 5 star reviews followed by a 1 star review on Hotels.com num bad then good h: Number of 1 star reviews followed by a 5 star review on Hotels.com num good then bad t: Number of 5 star reviews followed by a 1 star review on TripAdvisor.com num bad then good t: Number of 1 star reviews followed by a 5 star review on TripAdvisor.com num empty b: Number of empty reviews on Booking.com num empty h: Number of empty reviews on Hotels.com num empty t: Number of empty reviews on TripAdvisor.com num susp zip reviews: Number of reviews written by users who have written more than 5 reviews in one postal code on TripAdvisor.com num susp date reviews: Number of reviews written by users who have written more than 3 reviews on the same day on TripAdvisor.com burst b: Max(number of reviews in a given day) - Avg(number of reviews per day) on Booking.com burst h: Max(number of reviews in a given day) - Avg(number of reviews per day) on Hotels.com 5

burst t: Max(number of reviews in a given day) - Avg(number of reviews per day) on TripAdvisor.com text sim b: Measure of the number of repeated sentences per reviewer, aggregated for each hotel on Booking.com text sim h: Measure of the number of repeated sentences per reviewer, aggregated for a given hotel on Hotels.com text sim t: Measure of the number of repeated sentences per reviewer, aggregated for a given hotel on TripAdvisor.com clique size: Measure of the maximum number of users that all rated the same group of hotels on TripAdvisor.com 1 Multi-site features All of these features are the combination of ones described above. Div means that the feature consists of the quotient of the normalized values. star int b div h star int b div t star int h div t num reviews b div h num reviews b div t num reviews h div t cleanliness b div h service b div h comfort b div h condition b div h 6

neighborhood b div h num5 b div h num4 b div h num3 b div h num2 b div h num1 b div h num5 b div t num4 b div t num3 b div t num2 b div t num1 b div t num5 h div t num4 h div t num3 h div t num2 h div t num1 h div t num avg reviews b div h num avg reviews b div t num avg reviews h div t num good then bad b div h num good then bad b div t num good then bad h div t num bad then good b div h 7

num bad then good b div t num bad then good h div t num empty b div h num empty b div t num empty h div t burst b div h burst b div t burst h div t text sim b div h text sim b div t text sim h div t rating correlation b h: Correlation coefficient between a hotel s rating distribution on Booking.com and Hotels.com rating correlation b t: Correlation coefficient between a hotel s rating distribution on Booking.com and TripAdvisor.com rating correlation h t: Correlation coefficient between a hotel s rating distribution on Hotels.com and TripAdvisor.com Mann Whitney U test b h rating distribution: p-value of the Mann Whitney rank test of a hotel s rating distribution on Booking.com and Hotels.com Mann Whitney U test b t rating distribution: p-value of the Mann Whitney rank test of a hotel s rating distribution on Booking.com and TripAdvisor.com Mann Whitney U test h t rating distribution: p-value of the Mann Whitney rank test of a hotel s rating distribution on Hotels.com and TripAdvisor.com 8

Mann Whitney U test h t review length distribution: p-value of the Mann Whitney rank test of a hotel s review length distribution on Hotels.com and TripAdvisor.com Mann Whitney U test h t title length distribution: p-value of the Mann Whitney rank test of a hotel s review title length distribution on Hotels.com and TripAdvisor.com 9