Deutscher Wetterdienst

Similar documents
Deutscher Wetterdienst

Scalable Runtime Support for Data-Intensive Applications on the Single-Chip Cloud Computer

A Survey of Time and Space Partitioning for Space Avionics

Hitachi GigE Camera. Installation Manual. Version 1.6

Bringing hardware affinity information into MPI communication strategies

An Architecture for Combinator Graph Reduction Philip J. Koopman Jr.

PRAJWAL KHADGI Department of Industrial and Systems Engineering Northern Illinois University DeKalb, Illinois, USA

Update on TERRA developments within the CLM-Community

SPADE-2 - Supporting Platform for Airport Decision-making and Efficiency Analysis Phase 2

German Meteorological Service. The quality of AMPS-Wind-Forecasts with regards to flight operations at Dronning Maud Land Season 2010/2011

AI in a SMART AIrport

Model-based development of self-organized earthquake early warning systems

IASSF: A Simulation For F/A-18 Avionics Software Testing.

Autonomic Thread Scaling Library for QoS Management

Price-Setting Auctions for Airport Slot Allocation: a Multi-Airport Case Study

Tsunami Survey Results in the NPS and Reproduction Analysis Using Tsunami Inversion

Multi/many core in Avionics Systems

International Conference on Integrated Modular Avionics Moscow

Modeling Visitor Movement in Theme Parks

New York Aviation Management Association Conference

❷ s é ②s é í t é Pr ③ t tr t á t r ít. á s á rá. Pr ③ t t í t. t í r r t á r t á s ý. r t r é s②sté ②

BAGGAGE HANDLING SYSTEM MAKES FAST CONNECTIONS

Monitoring & Control Tim Stevenson Yogesh Wadadekar

PHY 133 Lab 6 - Conservation of Momentum

Integrated Modular Avionics. The way ahead for aircraft computing platforms?

A SEGMENTED ARCHITECTURE APPROACH TO PROVIDE A CONTINUOUS, LONG-TERM, ADAPTIVE AND COST- EFFECTIVE GLACIERS MONITORING SYSTEM

"Free at Last" Cage-based Living Geometry

EE382M.20: System-on-Chip (SoC) Design

RSA SecurID Ready Implementation Guide

Aeronautics & Air Transport in FP7

Platform and Products

RACOON PROJECT Daniele Teotino - ENAV. RACOON Project Manager Head of SESAR JU Activity Coordination

Analysis of Air Transportation Systems. Airport Capacity

Tutorial: The Siesta Code

Videosonde Observation in the Tropics

Evaluation of Strategic and Tactical Runway Balancing*

Egnatia Odos Observatory. Egnatia Odos Observatory Monitoring of Egnatia Motorway s s Spatial Impacts

MODAIR. Measure and development of intermodality at AIRport

ONLINE DELAY MANAGEMENT IN RAILWAYS - SIMULATION OF A TRAIN TIMETABLE

Aeronautics & Air Transport in FP7. DG RTD-H.3 - Aeronautics Brussels, January 2007

EE382V: Embedded System Design and Modeling

Simulator Architecture for Training Needs of Modern Aircraft. Philippe Perey Technology Director & A350 Program Director

CONSTELLATIONS' PERFORMANCE ANALYSIS AND SIMULATION FOR COCKPIT AND PASSENGERS AERONAUTICAL SERVICES

Ensemble methods for ice sheet init.

Management System for Flight Information

At a Glance India PSS User Group Meeting 7-11 November 2017 Delhi, India. Usa.siemens.com/PTIevents

Group constant generation for PARCS using Helios and Serpent and comparison to Serpent 3D model

FLIGHT PATH FOR THE FUTURE OF MOBILITY

Active Geodetic Network of Serbia

KEY FEATURES IN SHORT

European Aviation Network. Online Press Conference 5 th February

SIMULATION MODELING AND ANALYSIS OF A NEW INTERNATIONAL TERMINAL

WÄRTSILÄ HYBRID SOLUTIONS

7. Demand (passenger, air)

Project Deliverable 4.1.3d Individual City Report - City of La Verne

The organisation of the Airbus. A330/340 flight control system. Ian Sommerville 2001 Airbus flight control system Slide 1

Single Line Tethered Glider

Coastal Impact on barrier Islands: Application to Praia de Faro

Clean-Green Energy Solution.The Tilos paradigm.

ANALYZING IMPACT FACTORS OF AIRPORT TAXIING DELAY BASED ON ADS-B DATA

Bioinformatics of Protein Domains: New Computational Approach for the Detection of Protein Domains

Air Connectivity and Competition

4. Serrated Trailing Edge Blade Designs and Tunnel Configuration

Performance Indicator Horizontal Flight Efficiency

ATM Seminar 2015 OPTIMIZING INTEGRATED ARRIVAL, DEPARTURE AND SURFACE OPERATIONS UNDER UNCERTAINTY. Wednesday, June 24 nd 2015

USE OF MICROENCAPSULATED PCM IN BUILDINGS AND THE EFFECT OF ADDING AWNINGS

Impact of Equipage on Air Force Mission Effectiveness

Discrete-Event Simulation of Air Traffic Flow

ASPASIA Project. ASPASIA Overall Summary. ASPASIA Project

2017 Digital Grid Customer Summit Session Abstracts

SIMULATION S ROLE IN BAGGAGE SCREENING AT THE AIRPORTS: A CASE STUDY. Suna Hafizogullari Gloria Bender Cenk Tunasar

Spectral Efficient COMmunications for future Aeronautical Services. Jan Erik Håkegård ICT

Wake Turbulence Research Modeling

Aircraft Communication and Reporting System (ACARS) User s manual

Management System for Flight Information

PARAMOUNT A Local Based Service (LBS) Prototype for Mountaineers and Rescuers

A high resolution glacier model with debris effects in Bhutan Himalaya. Orie SASAKI Kanae Laboratory 2018/02/08 (Thu)

1224 Splitter and CTO combo, setup instructions using the Panelview HMI

Sample enumeration model for airport ground access

Alternative solutions to airport saturation: simulation models applied to congested airports. March 2017

Applicability / Compatibility of STPA with FAA Regulations & Guidance. First STAMP/STPA Workshop. Federal Aviation Administration

- HPEC 2010 Workshop -

OpenComRTOS: Formally developed RTOS for Heterogeneous Systems

Services for Air Transport. The mobile satellite company

An Analysis of Dynamic Actions on the Big Long River

CABIN BAGGAGE CHECKER

Analysis and design of road and bridge infrastructure database using online system

11 December Inverness Airport Outline Business Case

Preparatory Course in Business (RMIT) SIM Global Education. Bachelor of Applied Science (Aviation) (Top-Up) RMIT University, Australia

Observation of cryosphere

COMMITTEE OF THE WHOLE (WORKING SESSION) APRIL 19, 2011 YORK REGION RAPID TRANSIT HIGHWAY 7 BUS RAPID TRANSIT- VMC STATION WARD 4.

ATTEND Analytical Tools To Evaluate Negotiation Difficulty

Constrained Long-Range Plan for the National Capital Region.

Ground Deformation Monitoring at Natural Gas Production Sites using Interferometric SAR

Northfield to Ingle Farm #2 66 kv Sub transmission line

Establishing a Risk-Based Separation Standard for Unmanned Aircraft Self Separation

MEMORANDUM. Open Section Background. I-66 Open Section Study Area. VDOT Northern Virginia District. I-66 Project Team. Date: November 5, 2015

Nikolaos S. Bartsotas

Table 3-7: Recreation opportunity spectrum class range by prescription. Recreation Opportunity Spectrum (ROS) Classes

Terms of Reference: Introduction

Transcription:

Scalability and Performance of COSMO-Model 5.1 on Cray XC30 Ulrich Schättler Source Code Administrator COSMO-Model

Contents è Old Scalability Results è Latest Changes è Scalability Tests with COSMO-DE65 (651 x 715 x 65) è First Conclusions (for NWP) è What about the CCLM è Performance Counters 11/09/14 COSMO General Meeting 2014, Eretria, Greece 2

Old Scalability Results è From HP2C Report: Performance Analysis and Prototyping of the COSMO Regional Atmospheric Model (Matthew Cordery, et al.) è We note the poor parallel scaling characteristics of COSMO beyond 1000 cores è Parallel speedup of COSMO for a 1-hour simulation on 1 km grid è which domain size was used? 11/09/14 COSMO General Meeting 2014, Eretria, Greece 3

Latest Changes è The old tests were done using COSMO_RAPS_4.10 version, the benchmark version based on COSMO-Model Version 4.10 (from 11 th September 2009). è Since then, quite a few things happened: è new strong conservative fast waves solver è more stable advection schemes (Strang-splitting) è new COSMO-ICON microphysics with new 2D (blocked) data structure and copy to/from block structure. è (at DWD) use RTTOV10 to compute synthetic satellite images è modified module mpe_io2.f90 for asynchronous GRIB I/O (now including prefetching of data but not yet tested) è new module netcdf_io.f90 for asynchronous and parallel NetCDF I/O 11/09/14 COSMO General Meeting 2014, Eretria, Greece 4

Latest Changes è DWD now runs a Cray XC30 and no more NEC SX-9: for the second time in history we are running a machine with more than 1000 cores: è 2003/2004: IBM Power3 with nearly 2000 processors (but no application really used more than a few hundred processors) è since December 2013 (Phase 0): Cray XC30 with 364 nodes, each having 2 Intel Ivy-Bridge CPUs with 10 cores: 7280 cores è from December 2014 (Phase 1): extension to 784 nodes, but with mixed Ivy-Bridge and Haswell CPUs; in total: 17488 cores è Performance grows by a factor of 3: 15 members of a big COSMO-DE are running a 12 hour forecast in 1200 seconds on Phase 0 machine, 45 members will run on Phase 1 machine. è how much cores do we need for COSMO? 11/09/14 COSMO General Meeting 2014, Eretria, Greece 5

The new COSMO-DE65 è In 2015, DWD will upgrade the COSMO- DE to a larger domain and 65 vertical levels: 651 716 65 grid points è Test Characteristics è 12 hour forecast should run è in 1200 s in ensemble mode è in 400 s in deterministic mode è nudgecast run: nudging and latent heat nudging in the first 3h è SynSat pictures every 15 minutes è amount of output data per hour: 1.6 GByte: asynchronous output is used with 4 or 5 output cores 11/09/14 COSMO General Meeting 2014, Eretria, Greece 6

Scalability of COSMO-Model 5.1 for COSMO-DE65 32 16 8 4 2 Ideal Total Dynamics Physics 1 200 400 800 1600 3200 6400 11/09/14 COSMO General Meeting 2014, Eretria, Greece 7

Timings for COSMO-DE65 # cores 196+4 396+4 795+5 1596+4 3196+4 6396+4 Dynamics 1848.19 913.37 469.68 244.10 132.93 77,22 Dyn. Comm. 259.57 137.55 90.41 49.38 30.79 21.19 Physics 326.02 156.66 80.08 43.51 23.68 14.82 Phy. Comm. 17.08 9.92 5.41 3.44 2.52 1.89 Copying 19.26 9.21 4.71 2.25 1.03 0.47 Nudging 43.05 25.00 15.22 11.92 14.79 38.66 Nud. Comm. 27.92 34.48 29.73 35.48 49.62 77.72 Add. Comp. 726.64 400.47 216.37 117.67 54.84 27.87 Input 22.49 21.60 29.34 31.91 36.40 47.18 Output 33.75 25.22 24.06 29.33 47.62 94.40 Total 3333.62 1744.74 982.72 589.99 422.14 436.68 11/09/14 COSMO General Meeting 2014, Eretria, Greece 8

Timings for COSMO-DE65 # cores 196+4 396+4 795+5 1596+4 3196+4 6396+4 Dynamics 1848.19 913.37 469.68 244.10 132.93 77,22 Dyn. Comm. 259.57 137.55 90.41 49.38 30.79 21.19 Physics 326.02 156.66 80.08 43.51 23.68 14.82 Phy. Comm. 17.08 9.92 5.41 3.44 2.52 1.89 Copying 19.26 9.21 4.71 2.25 1.03 0.47 Nudging 43.05 25.00 15.22 11.92 14.79 38.66 Nud. Comm. 27.92 34.48 29.73 35.48 49.62 77.72 Add. Comp. 726.64 400.47 216.37 117.67 54.84 27.87 Input 22.49 21.60 29.34 31.91 36.40 47.18 Output 33.75 25.22 24.06 29.33 47.62 94.40 Total 3333.62 1744.74 982.72 589.99 422.14 436.68 11/09/14 COSMO General Meeting 2014, Eretria, Greece 9

Timings for COSMO-DE65 # cores 196+4 396+4 795+5 1596+4 3196+4 6396+4 Nudging 26.81 16.22 9.24 6.50 5.83 15.13 Nud. Comm. 3.37 2.45 2.60 2.36 3.36 9.08 Nud. Barrier 5.57 3.61 3.36 3.92 6.66 14.92 Latent Heat N. 16.24 8.78 5.98 5.42 8.96 23.53 LHN Comm. 18.65 28.42 23.77 29.20 39.60 53.72 Nud. Total 70.64 59.48 44.95 47.40 64.41 116.38 Add. Comp. 726.64 400.47 216.37 117.67 54.84 27.87 Total 3333.62 1744.74 982.72 589.99 422.14 436.68 ~ #gp /core 2330 1165 582 291 145 72 11/09/14 COSMO General Meeting 2014, Eretria, Greece 10

Timings for COSMO-DE65 # cores 196+4 396+4 795+5 1596+4 3196+4 6396+4 Input 22.49 21.60 29.34 31.91 36.40 47.18 read data 8.57 7.59 14.03 18.31 21.74 22.10 meta data 6.29 5.52 6.24 3.52 2.07 1.02 compute Input 0.72 2.34 9.07 10.08 12.58 24.06 distribute data 6.91 6.14 0.00 0.00 0.00 0.00 Output 33.75 25.22 24.06 29.33 47.62 94.90 compute Output 16.58 9.67 6.68 9.05 22.77 62.35 meta data 0.66 0.33 0.16 0.12 0.06 0.03 write data 0.14 0.08 0.04 0.02 0.01 0.01 gather data 16.35 15.12 17.17 20.13 24.77 32.50 Total 3333.62 1744.74 982.72 589.99 422.14 436.68 11/09/14 COSMO General Meeting 2014, Eretria, Greece 11

Scalability of COSMO Components (incl. Comm.) 32 16 8 4 2 1 0.5 Ideal Dynamics Physics Nudging LHN I/O Total 0.25 200 400 800 1600 3200 6400 11/09/14 COSMO General Meeting 2014, Eretria, Greece 12

First Conclusions è Scalability of COSMO-Model for COSMO-DE65 domain size is reasonably well up to 1600 cores. Dynamics and Physics also scale beyond up to 6400 cores. è Meeting the operational requirements: è for ensemble mode about 650 cores would be necessary to run a 12 hour forecast in less than 1200 seconds. But then 40 members will not fit in Phase 1 machine è for deterministic mode, it is not possible to run in less than 400 seconds. è This is not a problem of the scalability, but of some expensive components! 11/09/14 COSMO General Meeting 2014, Eretria, Greece 13

First Conclusions (II) è Expensive Components: è New fast-waves solver is more expensive than old one (40-50% of dynamics time; but not investigated further up to now) è Communication in the Latent Heat Nudging è Additional Computations: is almost only in RTTOV10 è factor of about 10-15 compared to RTTOV7 è very imbalanced computations, perhaps due to cloud characteristics è much effort for some diagnostic pictures è Tests were done on a usual crowded machine and really reflect the operational setups (no tricks, no cheating, no beautifying) 11/09/14 COSMO General Meeting 2014, Eretria, Greece 14

What about the CCLM è During climate simulations è You do not compute nudging or latent heat nudging è You do not compute the synthetic satellite images: timings for additional computations will drop down to about 10% of timings shown before è You will do less output: only about 60% of output amount from NWP output è How long would a simulation for 150 years take? (150 years are about 54790 forecast days) 11/09/14 COSMO General Meeting 2014, Eretria, Greece 15

Estimations for CCLM using COSMO-DE65 Size # cores 196+4 396+4 795+5 1596+4 3196+4 6396+4 Dynamics 2107.76 1050.92 560.09 293.48 163.72 98.41 Physics 362.36 175.79 90.20 49.20 27.23 17.18 Add. Comp. 72.66 40.05 21.64 11.77 5.48 2.79 Input 22.49 21.60 29.34 31.91 36.40 47.18 Output 20.25 15.13 14.44 17.60 28.57 56.64 Total 2585.52 1303.49 715.71 403.96 261.40 222.20 Forecast days per day Days for 150 year simulation 16.7 33.14 60.36 106.94 165.26 194.42 3281 9 years 1654 908 512 332 282 9 months 11/09/14 COSMO General Meeting 2014, Eretria, Greece 16

Conclusions for CCLM è Convection permitting climate simulations are still rather expensive, but not out of sight on today s HPC platforms. è Times for Additional Computations and Output are only estimated, not measured, in the table before. All other timings taken from NWP tests. 11/09/14 COSMO General Meeting 2014, Eretria, Greece 17

Performance Counters è We made also some runs on our small test machine (128 cores) with Intel Sandy Bridge processors (still with working hardware counters) è Domain and decomposition were chosen in a way that the subdomains are as big as for COSMO-DE65, when running on about 400 cores: 320x260x65 grid points on 7x9+1 tasks è From pat_report: è HW FP Ops / User Time 1973.205M/sec 264939705337 ops 9.5% peak(dp) è MFLOPS (aggregate) 126285.11M/sec è which corresponds to 166 GFlop/s per processor (*8 = 1.32 TFlop/s: 9.5 % are: 126.16 GFlop/s) è Are these measurements ok? We thought to get a much smaller percentage out of peak performance? 11/09/14 COSMO General Meeting 2014, Eretria, Greece 18

Thank you very much for your attention