Genetic Algorithm in Python. Data mining lab 6

Similar documents
A Review of Airport Runway Scheduling

A hybrid genetic algorithm for multi-depot and periodic vehicle routing problems

Research Article Study on Fleet Assignment Problem Model and Algorithm

Construction of Conflict Free Routes for Aircraft in Case of Free Routing with Genetic Algorithms.

Activity Template. Drexel-SDP GK-12 ACTIVITY. Subject Area(s): Sound Associated Unit: Associated Lesson: None

Fair Allocation Concepts in Air Traffic Management

Predicting Flight Delays Using Data Mining Techniques

A Coevolutionary Simulation of Real-Time Airport Gate Scheduling

OPTIMAL PUSHBACK TIME WITH EXISTING UNCERTAINTIES AT BUSY AIRPORT

Maximization of an Airline s Profit

ONLINE DELAY MANAGEMENT IN RAILWAYS - SIMULATION OF A TRAIN TIMETABLE

Using Ant Algorithm to Arrange Taxiway Sequencing in Airport

Kristina Ricks ISYS 520 VBA Project Write-up Around the World

= Coordination with Direct Communication

HOW TO IMPROVE HIGH-FREQUENCY BUS SERVICE RELIABILITY THROUGH SCHEDULING

Performance Evaluation of Individual Aircraft Based Advisory Concept for Surface Management

A RECURSION EVENT-DRIVEN MODEL TO SOLVE THE SINGLE AIRPORT GROUND-HOLDING PROBLEM

NOTES ON COST AND COST ESTIMATION by D. Gillen

Hotel Investment Strategies, LLC. Improving the Productivity, Efficiency and Profitability of Hotels Using Data Envelopment Analysis (DEA)

A Methodology for Integrated Conceptual Design of Aircraft Configuration and Operation to Reduce Environmental Impact

ATTEND Analytical Tools To Evaluate Negotiation Difficulty

A GRASP for Aircraft Routing in Response to Groundings and Delays

CompSci 101 Exam 2 Sec01 Spring 2017

World Airline Safety: Darker Days Ahead? Arnold Barnett MIT

Time Benefits of Free-Flight for a Commercial Aircraft

Residential Property Price Index

Airline Scheduling: An Overview

Integrated Optimization of Arrival, Departure, and Surface Operations

SERVICE NETWORK DESIGN: APPLICATIONS IN TRANSPORTATION AND LOGISTICS

Solution Repair/Recovery in Uncertain Optimization Environment

Simulating Airport Delays and Implications for Demand Management

Available online at ScienceDirect. Procedia Computer Science 36 (2014 )

Optimal assignment of incoming flights to baggage carousels at airports

CHAPTER 5 SIMULATION MODEL TO DETERMINE FREQUENCY OF A SINGLE BUS ROUTE WITH SINGLE AND MULTIPLE HEADWAYS

Quantile Regression Based Estimation of Statistical Contingency Fuel. Lei Kang, Mark Hansen June 29, 2017

Do Not Write Below Question Maximum Possible Points Score Total Points = 100

Flight Arrival Simulation

Aircraft and Gate Scheduling Optimization at Airports

Transportation Timetabling

American Airlines Next Top Model

Residential Property Price Index

A Statistical Method for Eliminating False Counts Due to Debris, Using Automated Visual Inspection for Probe Marks

Multi Objective Micro Genetic Algorithm for Combine and Reroute Problem

Systemic delay propagation in the US airport network

PRAJWAL KHADGI Department of Industrial and Systems Engineering Northern Illinois University DeKalb, Illinois, USA

Solving Probabilistic Airspace Congestion: Preliminary Benefits Analysis

Genetic Algorithms Applied to Airport Ground Traffic Optimization

Species: Wildebeest, Warthog, Elephant, Zebra, Hippo, Impala, Lion, Baboon, Warbler, Crane

World Airline Safety: Better than Ever? Arnold Barnett MIT

You Must Be At Least This Tall To Ride This Paper. Control 27

An Analytical Approach to the BFS vs. DFS Algorithm Selection Problem 1

Biodiversity Studies in Gorongosa

ARRIVAL CHARACTERISTICS OF PASSENGERS INTENDING TO USE PUBLIC TRANSPORT

IPSOS / REUTERS POLL DATA Prepared by Ipsos Public Affairs

Estimates of the Economic Importance of Tourism

Applying Integer Linear Programming to the Fleet Assignment Problem

VISITOR EXPERIENCE BY SERVICE AND AMENITY TYPE SECTION 2/6 INTERNATIONAL VISITORS RATED THEIR EXPERIENCE IN NEW ZEALAND CONSISTENTLY HIGHLY.

On-line decision support for take-off runway scheduling with uncertain taxi times at London Heathrow airport.

along a transportation corridor in

Project: Implications of Congestion for the Configuration of Airport Networks and Airline Networks (AirNets)

Evaluating the Robustness and Feasibility of Integer Programming and Dynamic Programming in Aircraft Sequencing Optimization

CAPAN Methodology Sector Capacity Assessment

UC Berkeley Working Papers

Combining Control by CTA and Dynamic En Route Speed Adjustment to Improve Ground Delay Program Performance

Efficiency and Automation

Volume 7, Issue 4, April 2017

Decision aid methodologies in transportation

Bioinformatics of Protein Domains: New Computational Approach for the Detection of Protein Domains

Aircraft Arrival Sequencing: Creating order from disorder

Outline. 1. Timetable Development 2. Fleet Size. Nigel H.M. Wilson. 3. Vehicle Scheduling J/11.543J/ESD.226J Spring 2010, Lecture 18

Proceedings of the 54th Annual Transportation Research Forum

Solving Clustered Oversubscription Problems for Planning e-courses

PERFORMANCE REPORT JANUARY Keith A. Clinkscale Performance Manager

Introduction Runways delay analysis Runways scheduling integration Results Conclusion. Raphaël Deau, Jean-Baptiste Gotteland, Nicolas Durand

Airline Boarding Schemes for Airbus A-380. Graduate Student Mathematical Modeling Camp RPI June 8, 2007

ATM Seminar 2015 OPTIMIZING INTEGRATED ARRIVAL, DEPARTURE AND SURFACE OPERATIONS UNDER UNCERTAINTY. Wednesday, June 24 nd 2015

RECEDING HORIZON CONTROL FOR AIRPORT CAPACITY MANAGEMENT

Optimal Control of Airport Pushbacks in the Presence of Uncertainties

Egg-streme Parachuting Flinn STEM Design Challenge

Fixed-Route Operational and Financial Review

Specialty Cruises. 100% Tally and Strip Cruises

AI in a SMART AIrport

A Study of Tradeoffs in Airport Coordinated Surface Operations

Pricing Challenges: epods and Reality

ICFP programming contest 2017 Lambda punter (1.3)

Making the most of school-level per-student spending data

Performance and Efficiency Evaluation of Airports. The Balance Between DEA and MCDA Tools. J.Braz, E.Baltazar, J.Jardim, J.Silva, M.

Simulation of disturbances and modelling of expected train passenger delays

An Analysis of Dynamic Actions on the Big Long River

Price-Setting Auctions for Airport Slot Allocation: a Multi-Airport Case Study

Schedule Compression by Fair Allocation Methods

DESIGN OF AN AIRPORT SURFACE ROUTING EVALUATION TOOL

2004 SOUTH DAKOTA MOTEL AND CAMPGROUND OCCUPANCY REPORT and INTERNATIONAL VISITOR SURVEY

Two Major Problems Problems Crew Pairing Problem (CPP) Find a set of legal pairin Find gs (each pairing

DMAN-SMAN-AMAN Optimisation at Milano Linate Airport

Clustering radar tracks to evaluate efficiency indicators Roland Winkler Annette Temme, Christoph Bösel, Rudolf Kruse

Identification of Waves in IGC files

MODIFIED METHOD OF GRAVITY MODEL APPLICATION FOR TRANSATLANTIC AIR TRANSPORTATION

UNIVERSITY OF CAMBRIDGE INTERNATIONAL EXAMINATIONS General Certificate of Education Ordinary Level

SOLVING GATE ALLOCATION PROBLEM (AGAP) USING DISTANCE-EVALUATED PARTICLE SWARM OPTIMIZATION (DEPSO)

Transcription:

Genetic Algorithm in Python Data mining lab 6

When to use genetic algorithms John Holland (1975) Optimization: minimize (maximize) some function f(x) over all possible values of variables x in X A brute force: examining every possible combination of x in X in order to determine the element for which f is optimal: infeasible Optimization techniques are heuristic. The problem of local maximum (minimum). Mutation introduces randomness in the method to get out of trap

Evolution There is a population of individuals with randomly chosen values of variables (features) There are some environmental conditions which demand from an individual to have certain features The individuals which have the features which are best suited for these conditions have advantage over other individuals, they survive till the reproductive age and reproduce

Variation the pool for the evolution The best suited individuals (the fittest) survive, reproduce and mix their features with other surviving individuals In the simplest model, they contribute part of the features to a new individual in the next generation, and another part comes from a second parent The new individual can undergo the process of mutation random change of one of his features. This occurs rarely.

Genetic algorithm: the main steps I 1. Create population of random individuals 2. Choose fitness function: to evaluate how good is a particular individual for a specific purpose defined by a specific problem 3. Run several iterations (generations) elite

Genetic algorithm: the main steps II 5. The next generation consists of: Unchanged elite (parthenogenesis) Individuals which combine features of 2 elite parents (recombinant) Small part of elite individuals changed by random mutation 6. Repeat steps 4, 5 until no more significant improvement in the fitness of elite is observed

"Hello World" program for genetic algorithms Simple example: random population of strings evolves into a predefined template Hello World For simplicity: random strings have the same length as the target string Fitness function is calculated as the closeness of the given string to the target string

Fitness function def string_fitness (individual): fitness=0 for ipos in range (0,target_length): fitness+=abs( ord(individual[ipos]) - ord (TARGET_STRING[ipos]) ) return fitness Basically, all this does it goes through each member of the population and compares it with the target string. It adds up the differences between the characters and uses the cumulative sum as the fitness value (therefore, the lower the value, the better).

For comparison: random optimizer from ga_helloworld import * string_population=init_strings_population(204800) best_rand=randomoptimize( string_population, string_fitness ) print best_rand[1] print " score = %d" % best_rand[0] Random searching isn't a very good optimization method, but it makes it easy to understand exactly what all the algorithms are trying to do, and it also serves as a baseline so you can see if the other algorithms are doing a good job. The random optimizer in random_optimize.py randomly generates 202,800 random guesses and applies a fitness function for each guess. It keeps track of the best guess (the one with the lowest cost) and returns it.

Mutation operation for GA def mutate_string(individual): ipos=random.randint(0,target_length-1) #mutation changes character at random to any available ASCII character from 32 (space) to 90 (Z) rchar=chr(random.randint(0,32000)%90 + 32) individual=individual[0:ipos]+rchar+individual[(ipos+1):] return individual ipos random position 0:ipos ipos+1: random character

Mate operation (crossover) for GA def string_crossover(p1,p2): ipos=random.randint(1,target_length-2) return p1[0:ipos]+p2[ipos:] ipos random position +

Genetic algorithm I def genetic_optimize(population,fitness_function,mutation_function, mate_function, mutation_probability, elite, maxiterations): # How many winners from each generation? original_population_size=len(population) top_elite=int(elite*original_population_size) # Main loop for i in range(maxiterations): individual_scores=[(fitness_function(v),v) individual_scores.sort( ) for v in population] ranked_individuals=[v for (s,v) in individual_scores] # Start with the pure winners population=ranked_individuals[0:top_elite]

Genetic algorithm II # Add mutated and bred forms of the winners while len(population)<original_population_size: if random.random( )<mutation_probability: # Mutation c=random.randint(0,top_elite) population.append( mutation_function (ranked_individuals[c])) else: # Crossover c1=random.randint(0,top_elite) c2=random.randint(0,top_elite) if individual_scores[0][0]==0: return individual_scores[0][1] return individual_scores[0][1]

Running genetic optimizer from ga_helloworld import * string_population=init_strings_population(2048) genetic_optimize(string_population, string_fitness, mutate_string, string_crossover, 0.25,0.1,100) mutation rate elite percentage max iterations

More useful problem: group travel people = [('John','BOS'), ('Mary','DAL'), ('Laura','CAK'), ('Abe','MIA'), ('Greg','ORD'), ('Lee','OMA')] # LaGuardia airport in New York destination='lga' The family members are from all over the country and wish to meet up in New York. They will all arrive on the same day and leave on the same day, and they would like to share transportation to and from the airport. There are about 9 flights per day to New York from any of the family members' locations, all leaving at different times. The flights also vary in price and in duration.

Flight information The information about flights is in file schedule.txt This file contains origin, destination, departure time, arrival time, and price for a set of flights in a comma-separated format: LGA,MIA,20:27,23:42,169 MIA,LGA,19:53,22:21,173 LGA,BOS,6:39,8:09,86 BOS,LGA,6:17,8:26,89 LGA,BOS,8:23,10:28,149

Adding flight info to the dictionary flights={} for line in file('schedule.txt'): origin,dest,depart,arrive,price =line.strip( ).split(',') flights.setdefault((origin,dest),[]) # Add details to the list of possible flights flights[(origin,dest)].append( (depart,arrive,int(price))) flights_index_range=[(0,9)]*(len(people)*2) dictionary key dictionary value: flight details variants (list of size 10 for each key)

Representing solutions A very common representation is a list of numbers. In this case, each number can represent which flight a person chooses to take, where 0 is the first flight of the day, 1 is the second, and so on. Since each person needs an outbound flight and a return flight, the length of this list is twice the number of people. For example, the list: [1,4,3,2,7,3,6,3,2,4,5,3] Represents a solution in which John takes the second flight of the day from Boston to New York, and the fifth flight back to Boston on the day he returns. Mary takes the fourth flight from Dallas to New York, and the third flight back. Those are the positions in a list of flight details, we can interpret the flight details knowing this index and origin and destination of the flight

Fitness function design I The fitness function is the key to solving any problem using optimization, and it's usually the most difficult thing to determine. The goal of any optimization algorithm is to find a set of inputs flights, in this case that minimizes the cost function, so the cost function has to return a value that represents how bad a solution is. There is no particular scale for badness; the only requirement is that the function returns larger values for worse solutions.

Fitness function design II Price The total price of all the plane tickets, or possibly a weighted average that takes financial situations into account. Travel time The total time that everyone has to spend on a plane. Waiting time Time spent at the airport waiting for the other members of the party to arrive. Departure time Flights that leave too early in the morning may impose an additional cost by requiring travelers to miss out on sleep. Car rental period If the party rents a car, they must return it earlier in the day than when they rented it, or be forced to pay for a whole extra day.

def schedule_fitness(sol): Fitness function I totalprice=0 latestarrival=0 people John Mary Laura Abe Greg Lee earliestdep=24*60 for d in range(len(sol)/2): solution out in origin=people[d][1] outbound = flights[(origin,destination)][int(sol[2*d])] returnf = flights[(destination,origin)][int(sol[2*d+1])] # Total price is the price of all outbound and return flights totalprice+=outbound[2] totalprice+=returnf[2] # Track the latest arrival and earliest departure if latestarrival<getminutes(outbound[1]): latestarrival =getminutes(outbound[1]) if earliestdep>getminutes(returnf[0]): earliestdep=getminutes(returnf[0])

Fitness function II # Every person must wait at the airport until the latest person arrives. # They also must arrive at the same time and wait for their flights on the way back. totalwait=0 for d in range(len(sol)/2): origin=people[d][1] outbound = flights[(origin,destination)][int(sol[2*d])] returnf = flights[(destination,origin)][int(sol[2*d+1])] totalwait+=latestarrival-getminutes(outbound[1]) totalwait+=getminutes(returnf[0])-earliestdep # Does this solution require an extra day of car rental? That'll be $50! if latestarrival < earliestdep: totalprice+=5 return totalprice+totalwait

Execute GA for schedule optimization execfile ("ga_schedule.py") How much better is the solution comparing to the random optimizer?

Tuning GA We could choose several variants of the algorithm, namely: breeding elite with the entire population, 2-points crossover etc. In order to have fine grained control over the computation, we have to adjust parameters such as population size, percentage of elite, mutation rate... Obviously these must be set empirically in order to fine tune the performance of the GA.

Other problems Suggest optimization problems which can be efficiently solved with genetic algorithm