Degraded Modes of Operations in Software Engineering

Similar documents
The organisation of the Airbus. A330/340 flight control system. Ian Sommerville 2001 Airbus flight control system Slide 1

Roadmapping Breakout Session Overview

Glass Cockpits in General Aviation Aircraft. Consequences for training and simulators. Fred Abbink

Applicability / Compatibility of STPA with FAA Regulations & Guidance. First STAMP/STPA Workshop. Federal Aviation Administration

Surveillance and Broadcast Services

Human Factors of Remotely Piloted Aircraft. Alan Hobbs San Jose State University/NASA Ames Research Center

Air/Ground ATN Implementation Status ATN Seminar, Chiang Mai - 11/14 December

Single European Sky Awards Submission by the COOPANS Alliance. Short description of the project. (Required for website application)

Operational implementation of new ATM automated systems and integration of the existing systems ADS-B IMPLEMENTATION IN GUYANA. (Presented by Guyana)

Ground movement safety systems and procedures - an overview

Disruptive Technologies in Air Traffic Management

Appendix B. Comparative Risk Assessment Form

CMATS The Civil Military ATM System

NextGen and GA 2014 Welcome Outline Safety Seminars Safety Seminars

Identifying and Utilizing Precursors

Avionics Certification. Dhruv Mittal

Research on Controlled Flight Into Terrain Risk Analysis Based on Bow-tie Model and WQAR Data

Safety-Critical Systems

PBN and airspace concept

Official Journal of the European Union L 186/27

Where unique experience creates outstanding ATM expertise, products and services

DP-7 The need for QMS controlled processes in AIS/AIM. Presentation to QMS for AIS/MAP Service Implementation Workshop Dakar, Senegal, May 2011

Initial 4D Trajectory Management via SwiftBroadband Iris Event Salzberg

FLIGHT PATH FOR THE FUTURE OF MOBILITY

General Aviation Training for Automation Surprise

Flight Safety Officer Aydın Özkazanç

9/16/ CHG 213 VOLUME 3 GENERAL TECHNICAL ADMINISTRATION CHAPTER 61 AIRCRAFT NETWORK SECURITY PROGRAM

Seychelles Civil Aviation Authority. Telecomm & Information Services Unit

Research Challenges Associated with Unmanned Aircraft Systems Airspace Integration

ATC Global 2014 EUROCONTROL WORKSHOP Towards GLOBAL ATFM 18 September Ken Mclean Director SFO IATA Singapore

First Review Meeting of AFI VSAT Network Managers (AFI VSAT Review/1) (Dakar, Senegal, 3 to 5 October 2011) SUMMARY

Combined ASIOACG and INSPIRE Working Group Meeting, 2013 Dubai, UAE, 11 th to 14 th December 2013

The SESAR Airport Concept

Follow up to the implementation of safety and air navigation regional priorities XMAN: A CONCEPT TAKING ADVANTAGE OF ATFCM CROSS-BORDER EXCHANGES

NEW FPL Transition Phase & Implementation Stages

MULTIDISCIPLINARYMEETING REGARDING GLOBAL TRACKING

Curriculum for AIM Training Module 2: ARO Officer

Aviation Safety Information Analysis and Sharing ASIAS Overview PA-RAST Meeting March 2016 ASIAS Proprietary Do Not Distribute

Runway Safety Programme Global Runway Safety Action Plan

Feasibility of Battery Backup for Flight Recorders

ASSEMBLY 39TH SESSION

Establishing a Risk-Based Separation Standard for Unmanned Aircraft Self Separation

Air Traffic Control System Command Center (ATCSCC) Crisis Management. Federal Aviation Administration 1

NextGen Priorities: Multiple Runway Operations & RECAT

AVIATION INVESTIGATION REPORT A03O0213 LOSS OF SEPARATION

Part 171. Aeronautical Telecommunication Services - Operation and Certification. CAA Consolidation. 10 March 2017

Change to Automatic Dependent Surveillance Broadcast Services. SUMMARY: This action announces changes in ADS-B services, including Traffic Information

Peter Sorensen Director, Europe Safety, Operations & Infrastructure To represent, lead and serve the airline industry

Towards a Global ATFM Manual. Brian Flynn Head Network Operations

Future Automation Scenarios

Performance through Innovation. Case study: Singapore airspace Enhancing airport performance

Learning Objectives. By the end of this presentation you should understand:

Approach Specifications

CIVIL AVIATION REGULATIONS SURINAME PART 17 - AERONAUTICAL TELECOMMUNICATIONS VERSION 5.0

RECOMMENDED GUIDANCE FOR FPL AND RELATED ATS MESSAGES

Simulator Architecture for Training Needs of Modern Aircraft. Philippe Perey Technology Director & A350 Program Director

ICAO ATFM SEMINAR. Dubai, UAE, 14 December 2016

ADS-B Rule and Installation Guidance

New FPL 2012 Planning and Implementation Bosnia and Herzegovina

Surveillance and. Program Status. Federal Aviation Administration Broadcast Services. To: By:

SESAR ANNUAL DEMO WORKSHOP. Toulouse, October 2014 TOPLINK 1 & 2 Daniel MULLER, TOPLINK PM

Certification of Rotorcraft and FHA Process

Space Based ADS-B. ICAO SAT meeting - June 2016 AIREON LLC PROPRIETARY INFORMATION

Unmanned Aircraft Systems Integration

ATFM IMPLEMENATION IN INDIA PROGRESS THROUGH COLLABORATION PRESENTED BY- AIRPORTS AUTHORITY OF INDIA

Progressive Technology Facilitates Ground-To-Flight-Deck Connectivity

Monitoring & Control Tim Stevenson Yogesh Wadadekar

D DAVID PUBLISHING. Development and Achievement of the T-50 Flight Control s Consolidated OFP. 1. Introduction. 2. Consolidated OFP s Needs

CIVIL AVIATION AUTHORITY, PAKISTAN OPERATIONAL CONTROL SYSTEMS CONTENTS

Commit to Safety: Professional Pilots Always Use a Checklist INITIAL EQUIPMENT SETUP

CAPAN Methodology Sector Capacity Assessment

IATA Air Carrier Self Audit Checklist Analysis Questionnaire

Dave Allanby GM Operations SOUTH AFRICAN EXPRESS

ATC automation: facts and steps ahead

DFS Aviation Services GmbH. A brand of experience. Aviation Services

ACAS on VLJs and LJs Assessment of safety Level (AVAL) Outcomes of the AVAL study (presented by Thierry Arino, Egis Avia)

Subject: Automatic Dependent Surveillance-Broadcast (ADS-B) Operations and Operational Authorization

Airport Safety Management Systems: Integrating Planning Into the Process

Quality Assurance. Introduction Need for quality assurance Answer to the need of quality assurance Details on quality assurance Conclusion A B C D E

Entry of Flight Identity

Paradigm SHIFT. Eurocontrol Experimental Centre Innovative Research June, Laurent GUICHARD (Project Leader, ATM) Sandrine GUIBERT (ATC)

AIP PORTUGAL ENR NOV-2007

International Civil Aviation Organization

SAFETY & AIRCRAFT OPERATIONS LEGISLATIVE & REGULATORY ADVOCACY NETWORKING & COMMERCE EDUCATION & CAREER DEVELOPMENT BUSINESS MANAGEMENT RESOURCES

CCAMS USER MANUAL. Edition N : 2.0

Concept of Operations Workshop

FACILITATION PANEL (FALP)

PART E SECTION 1 AIRCRAFT DOCKING GUIDANCE SYSTEM

Definitions. U-SAFE : UAS Secure Autonomous Flight Environment. UTM: UAS Traffic Management

Network Manager Adding value to the Network 29 September 2011

INSTRUCTIONS FOR USING THIS SAMPLE FLIGHT MANUAL SUPPLEMENT

Unmanned Aircraft System (UAS): regulatory framework and challenges. NAM/CAR/SAM Civil - Military Cooperation Havana, Cuba, April 2015

The INs and OUTs of ADS-B

European ATM Development The Big Idea

Olympics Managing Special Events Brendan Kelly, Head of Operational Policy

PERFORMANCE REPORT CAPACITY

Module N B0-102: Baseline Ground-based Safety Nets

Hazard Analysis for Rotorcraft

SIMULATION TECHNOLOGY FOR FREE FLIGHT SYSTEM PERFORMANCE AND SURVIVABILITY ANALYSIS

DFS Aviation Services GmbH. A brand of experience

Transcription:

Degraded Modes of Operations in Software Engineering Prof. Chris Johnson, School of Computing Science, University of Glasgow, Scotland. http://www.dcs.gla.ac.uk/~johnson

Aging, Complex Critical Infrastructures...

What are Degraded Modes Unexpected high traffic loads, extreme weather conditions etc Normal Operations Equipment failures, staffing shortages etc. Abnormal Operations Degraded Modes Catalytic triggers eg individual or team error Emergency Situation

Introduction to Degraded Modes Staff struggle to maintain levels of service. Software failures force ad hoc solutions: violate safety requirements; Not supported by risk assessments. Lead to major failures if not addressed.

UPS Case Study Power Supply Station near ACC: Transformer and Generator. PS Switching boxes in ACC. Equipment installed 30 years ago: Procure new kit. Installation affects comms ACC/PS

Anatomy of the Incident (1) 14:25 UTC: Alarm Remote Control Unit In PS Station from UPS in ACC. Technician to ACC, checks UPS: 1. Warning on UPS display: <Power Supply is out of tolerance > 2. UPS operates on battery supply 3. UPS autonomy - 13 minutes

Anatomy of the Incident (2) 14:30: Technician returns to PS Station. Informs Technical Supervisor about problem Calls Head of department is not accessible. 14:32: In ACC again, Technician detects UPS autonomy - 6 minutes Makes erroneous decision to switch PS to 2 nd UPS; Switches 1st UPS to bypass configuration Generator voltage direct to Users, no stabilization; Under voltage but no over voltage protection.

Anatomy of the Incident (4) 14:35 UTC - In a few minutes collapse of: three quarters of Radar Data Displays, one half of Flight Data Displays, all radar inputs in DPS, Controller Working Positions for Voice Comms and AFTN connection with ARO & NOTAM. 14:40 UTC - Technical Supervisor tells ATC Supervisor needs 30 minutes. 14:45 UTC - ATC SUP decides to close FIR, CFMU told traffic is zero.

http://www.iaa.ie/files/2008/news/docs/20080919020223_atm_report_final.pdf

Dublin Airport Overview Busiest period of the year. Initial hardware failure: Poor quality of service from LAN; Slows flight data processing system. ATCOs cannot access data on radar targets: including aircraft identification and type data. Capacity restrictions for safety reasons.

Dublin Airport - Contracting ATM system provided by contractor: maintained under annual service contract; provide both hardware and software support; On-site support for diagnosis and debugging. General question for SESAR? ANSPs rely on subcontractors: key areas of technical support ; it will take another 30 minutes Is outsourcing a form of de-risking?

Secondary Response ANSPs engineering staff correct symptoms; Cannot identify root causes of the problem. Problem stemmed from double failure: triggered by a faulty network interface card; flooded network with spurious messages. Symptoms of the fault were masked; recovery mechanisms in Local Area Network; hard for engineers to identify component failure.

The Real Impact "The problem here is that you have an autonomous semi-state monopoly which doesn't care about its customers or the disruption to passengers," Michael O'Leary, CEO Ryanair

The Real Impact "The problem here is that you have an autonomous semi-state monopoly which doesn't care about its customers or the disruption to passengers," "Send the buggers to Shannon, if it was a commercial company they would have done so, Michael O'Leary, CEO Ryanair

The Real Impact Michael O'Leary, CEO Ryanair "The problem here is that you have an autonomous semi-state monopoly which doesn't care about its customers or the disruption to passengers," "Send the buggers to Shannon, if it was a commercial company they would have done so, They're not on top of the job. We're talking about 25 arrivals and departures per hour. The air traffic controllers should be capable of handling this volume of flights. http://www.herald.ie/news/oleary-more-disruption-if-iaa-doesnt-clean-up-act-1431408.html

Europe is Not Alone

June 2007 Atlanta FDPS System software bug; Switch data rate configuration error (again). Use of fallback system in Salt Lake City: Cascading failure cannot cope with demand. ATCOs enter flight data manually; Cannot cope with backlog, knock-on delays. 12 hours to diagnose problem; 6 more to catch up with backlog eg New York. 20

August 2008 and November 2009 August 2008: Software failure in Atlanta again. Processes flight plans for Eastern US. 566 flight delays+ Press, media and political outrage. GAO reports into ATM service provision. 21

November 2009 Fault stems from Los Angeles: Route map error on a new router installed to replace an older router version Routing error affects comms with Atlanta Also affects comms with 21 regional radar centers Impacted nationwide network supporting air traffic control automation systems 4 hours to diagnose, 12+ to restore support ATCOs enter flight plans manually (workload) Effects exacerbated by bad weather e.g., Chicago As a result of this failure, a second routing domain was established for the traffic 22

Media and Politicians Sisters Sharon Walker and Sheila James were taking their elderly mother to see their sister in St. Louis. Their 09.30 flight was delayed until 16:00... Sen. Charles Schumer said the country s aviation system is in shambles... the FAA needs to upgrade the system, these technical glitches that cause cascading chaos across the country are going to become a very regular occurrence... 23

April 2010 $2.1 Billion upgrade by Dec 2010: En Route Automation Modernization. Faults lead to missing flight plans; Other aircraft change identity in flight; Again cannot transfer flight data to Atlanta etc. Undermines ATCO confidence in system; fallback original 20 year old IBM system IBM contract expired, uses Jovial rarely used. Test deployment to Salt Lake City: FAA spend $14 million, still not working. Salt Lake City simple compared to Chicago... 24

Potential Solutions?

The Risk Assessment Blind Spot

MIL-STD 882D 1. Document the approach: 2. Identify potential system hazards: 3. Assess severity and probability: 4. Identify mitigation measures: 5. Implementation of mitigation 6. Verify intended risk reduction: 7. Communicate residual risks: 8. Risk management after deployment;

Limits of Conventional Risk Assessment Haddon-Cave report: If risk assessment has been conducted with proper skill, care and attention, the catastrophic fire risk would have been spotted. Risk assessment: no substitute for sound judgement. incompetence, complacency, cynicism. Documentation overwhelming; Many trivial or irrelevant failure modes; Few combined failures across functions; Most help for large-scale procurements.

Rapid Risk Assessment Techniques Techniques to address operational risk: Low cost, approximations, rules of thumb; Where necessary should trigger HAZOPS etc. When engineering analysis and risk assessments are condensed to fit on a standard form or overhead slide, information is inevitably lost. On the other hand: You cannot capture everything Limited time, limited training, present threats.

US Army TC 1-210

Wider Applications: MATS Forms

NTSB Risk Assessment Matrices

NTSB Risk Assessment Matrices

NTSB Risk Assessment Matrices

Rapid Risk Assessment

Rapid Risk Assessment

Any Questions?