Interacting with HDFS

Similar documents
Installation Guide. Unisphere Central. Installation. Release number REV 07. October, 2015

Video Media Center - VMC 1000 Getting Started Guide

IBM Tivoli Storage Manager Version Configuring an IBM Tivoli Storage Manager cluster with IBM Tivoli System Automation for Multiplatforms

Distributed Object Storage System Ceph in Practice

Incorporates passenger management, fleet management and revenue/cost reporting

A Hitchhiker s Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers

CA SiteMinder. Agent for JBoss Guide. r12.1 SP3. Third Edition

etrust SiteMinder Agent r5.5 for BEA WebLogic 9.0 etrust SiteMinder Agent for BEA WebLogic Guide

MyTraveler User s Manual

Punt Policing and Monitoring

Monitoring & Control Tim Stevenson Yogesh Wadadekar

UM1868. The BlueNRG and BlueNRG-MS information register (IFR) User manual. Introduction

EMC Unisphere 360 for VMAX

EMC Unisphere 360 for VMAX

KB 2449 CA Wily APM security example: CA SiteMinder for authentication with CA EEM for authorization

CA SiteMinder. Agent for JBoss Guide SP1

Dell EMC Unisphere 360

RSA SecurID Ready Implementation Guide

Bonita Workflow. Getting Started BONITA WORKFLOW

Scalable Runtime Support for Data-Intensive Applications on the Single-Chip Cloud Computer

Configuring a Secure Access etrust SiteMinder Server Instance (NSM Procedure)

API Gateway Version September Authentication and Authorization Integration Guide

EMC Unisphere 360 for VMAX

CA SiteMinder. Agent for JBoss Guide 12.51

InHotel. Installation Guide Release version 1.5.0

etrust SiteMinder Agent r6.0 for IBM WebSphere

CA SiteMinder Web Services Security

Supports full integration with Apollo, Galileo and Worldspan GDS.

Tivoli/Plus for ADSM 1.0

The World s First Robotic Digitization Company

How to Integrate CA SiteMinder with the Barracuda Web Application Firewall

Firewall Network and Proxy Datasheet

OTP SERVER NETEGRITY SITEMINDER 6. Rev 1.0 INTEGRATION MODULE. Copyright, NordicEdge, 2005 O T P S E R V E R I N T E G R A T I O N M O D U L E

ultimate traffic Live User Guide

WHAT S NEW in 7.9 RELEASE NOTES

MARKETO INTEGRATION GUIDE

etrust SiteMinder Connector for Oracle Solutions Architecture, Installation and Configuration Guide For UNIX Version 1.6 (Rev 1.

Solutions. Author, Department Place, Date

Be fast with fares. Be first with customers

Operational Evaluation of a Flight-deck Software Application

ELOQUA INTEGRATION GUIDE

Cisco CMX Cloud Proxy Configuration Guide

Table of Contents. Part I Introduction 3 Part II Installation 3. Part III How to Distribute It 3 Part IV Office 2007 &

A Statistical Method for Eliminating False Counts Due to Debris, Using Automated Visual Inspection for Probe Marks

Deployment of Virtual Cluster on a Commercial Cloud Platform for Molecular Docking and Elasticity of the Clusters

NGAP / TRAINAIR PLUS Regional Conference The Americas. Training Challenges for New Generation Aircraft

Measuring Productivity for Car Booking Solutions

PSS MVS 7.15 announcement

Opera TWI Room Type Changes Introduction

Verizon Select Services Inc. Massachusetts D.P.U. Tariff No. 2 Original Page 105 SECTION 7 - GOVERNMENT AGENCY SERVICE

Comfort Pro A Hotel. User Manual

Hosted Flight Data Monitoring. Information Sheet

In-Service Data Program Helps Boeing Design, Build, and Support Airplanes

FliteStar USER S GUIDE

Mobile FliteDeck VFR Release Notes

InHotel. Installation Guide Release version 1.6.0

Federal GIS Conference February 10 11, 2014 Washington DC. ArcGIS for Aviation. David Wickliffe

Kristina Ricks ISYS 520 VBA Project Write-up Around the World

Human Factors of Remotely Piloted Aircraft. Alan Hobbs San Jose State University/NASA Ames Research Center

IASSF: A Simulation For F/A-18 Avionics Software Testing.

Circular No. : NCDEX/TECHNOLOGY-027/2013/322 Date : October 23, 2013 Subject : Mock Trading Session for Spread day orders through Tradex Version 3.1.

CA SiteMinder Federation Standalone

Modifying a Reflex Workflow

Setup and Configure the Siteminder Policy Store with Dxmanager

Constrained Long-Range Plan for the National Capital Region.

The organisation of the Airbus. A330/340 flight control system. Ian Sommerville 2001 Airbus flight control system Slide 1

EMC Unisphere for VMAX

MYOB EXO OnTheGo. Release Notes 1.2

PRAJWAL KHADGI Department of Industrial and Systems Engineering Northern Illinois University DeKalb, Illinois, USA

Concur Travel User Guide

Big Data: Architectures and Data Analytics

Baggage Reconciliation System

Jeppesen Total Navigation Solution

Table of Contents. Part I Introduction 3 Part II Installation 3. Part III How to Distribute It 3 Part IV Office 2007 &

FOR SMALL AND MEDIUM SIZED AIRPORTS Velocity FIDS

Enhance your arrival services and shape with us the future of Lost & Found

David Controle, Analytics Accelerator Airbus. Why Invest in AI and Deep Learning NVIDIA GTC

IWXXM and WXXM Update. By: Aaron Braeckel Date: 22 September 2016

Question Answer. provide a list of these individuals including the the business areas they work in and their positions, is

# 1 in ease-of-use. Guest Service Interconnectivity. Made by hoteliers, for hoteliers.

Cvent Passkey Glossary

INTERNATIONAL CIVIL AVIATION ORGANIZATION AFI REGION AIM IMPLEMENTATION TASK FORCE. (Dakar, Senegal, 20 22nd July 2011)

S-Series Hotel App User Guide

Virgin Atlantic Airways

Vacuum Controls and Interlocks

Table of Content. Table of Contents Mobile Experts LLC. All Rights Reserved. 1

ICAO Bay of Bengal ATS Coordination Group ATFM Task Force

Free Aircraft Maintenance Tracker Makes Flying Safer for General Aviation Pilots

MetroAir Virtual Airlines

E-BOOK # HOW TO MANAGE A HOTEL

CALL CENTER PRE-CLASS Module 2

A320 Motorized PRO TQ Installation & Operation Manual

Aviation Software. DFT Database API. Prepared by: Toby Wicks, Software Engineer Version 1.1

TILOS & P3 DATA INTERFACE PAUL E HARRIS EASTWOOD HARRIS PTY LTD. 24 July 2007

Table of Contents 2015 Mobile Experts LLC. All Rights Reserved. 1

PSS E 34.0 Release Webinar 23 April 2015

Traffic Flow Management

Programmable Safety Systems PSS-Range

Analysis and design of road and bridge infrastructure database using online system

OpenComRTOS: Formally developed RTOS for Heterogeneous Systems

Transcription:

HADOOP Interacting with HDFS For University Program on Apache Hadoop & Apache Apex 1

2 What's the Need? Big data Ocean Expensive hardware Frequent Failures and Difficult recovery Scaling up with more machines

3 Hadoop Open source software - a Java framework - initial release: December 10, 2011 It provides both, Storage [HDFS] Processing [MapReduce] HDFS: Hadoop Distributed File System

4 How Hadoop addresses the need? Big data Ocean Have multiple machines. Each will store some portion of data, not the entire data. Expensive hardware Use commodity hardware. Simple and cheap. Frequent Failures and Difficult recovery Have multiple copies of data. Have the copies in different machines. Scaling up with more machines If more processing is needed, add new machines on the fly

5 HDFS Runs on Commodity hardware: Doesn't require expensive machines Large Files; Write-once, Read-many (WORM) Files are split into blocks Actual blocks go to DataNodes The metadata is stored at NameNode Replicate blocks to different node Default configuration: Block size = 128MB Replication Factor = 3

6

7

8

9 Where NOT TO use HDFS Low latency data access HDFS is optimized for high throughput of data at the expense of latency. Large number of small files Namenode has the entire file-system metadata in memory. Too much metadata as compared to actual data. Multiple writers / Arbitrary file modifications No support for multiple writers for a file Always append to end of a file

10 Some Key Concepts NameNode DataNodes JobTracker TaskTrackers ResourceManager (MRv2) NodeManager (MRv2) ApplicationMaster (MRv2)

11 NameNode & DataNodes NameNode: DataNode: Centerpiece of HDFS: The Master Only stores the block metadata: block-name, block-location etc. Critical component; When down, whole cluster is considered down; Single point of failure Should be configured with higher RAM Stores the actual data: The Slave In constant communication with NameNode When down, it does not affect the availability of data/cluster Should be configured with higher disk space SecondaryNameNode: Doesn't actually act as a NameNode Stores the image of primary NameNode at certain checkpoint Used as backup to restore NameNode

12

13 JobTracker & TaskTrackers JobTracker: Talks to the NameNode to determine location of the data Monitors all TaskTrackers and submits status of the job back to the client When down, HDFS is still functional; no new MR job; existing jobs halted Replaced by ResourceManager/ApplicationMaster in MRv2 TaskTracker: Runs on all DataNodes TaskTracker communicates with JobTracker signaling the task progress TaskTracker failure is not considered fatal Replaced by NodeManager in MRv2

14 ResourceManager & NodeManager Present in Hadoop v2.0 Equivalent of JobTracker & TaskTracker in v1.0 ResourceManager (RM): Runs usually at NameNode; Distributes resources among applications. Two main components: Scheduler and ApplicationsManager (AM) NodeManager (NM): Per-node framework agent Responsible for containers Monitors their resource usage Reports the stats to RM Central ResourceManager and Node specific Manager together is called YARN

15

16 Hadoop 1.0 vs. 2.0 HDFS 1.0: Single point of failure Horizontal scaling performance issue HDFS 2.0: HDFS High Availability HDFS Snapshot Improved performance HDFS Federation

17 HDFS Federation

18 Interacting with HDFS Command prompt: Similar to Linux terminal commands Unix is the model, POSIX is the API Web Interface: Similar to browsing a FTP site on web

Interacting With HDFS On Command Prompt 19

20 Notes File Paths on HDFS: hdfs://127.0.0.1:8020/user/username/demo/data/file.txt hdfs://localhost:8020/user/username/demo/data/file.txt /user/username/demo/file.txt demo/file.txt File System: Local: local file system (linux) HDFS: hadoop file system At some places: The terms file and directory has the same meaning.

21 Before we start Command: Usage: hdfs hdfs [--config confdir] COMMAND Example: hdfs dfs hdfs dfsadmin hdfs fsck hdfs namenode hdfs datanode

hdfs `dfs` commands 22

23 In general Syntax for `dfs` commands hdfs dfs -<COMMAND> -[OPTIONS] <PARAMETERS> e.g. hdfs dfs -ls -R /user/username/demo/data/

24 0. Do It yourself Syntax: hdfs dfs -help [COMMAND ] hdfs dfs -usage [COMMAND ] Example: hdfs dfs -help cat hdfs dfs -usage cat

25 1. List the file/directory Syntax: hdfs dfs -ls [-d] [-h] [-R] <hdfs-dir-path> Example: hdfs dfs -ls hdfs dfs -ls / hdfs dfs -ls /user/username/demo/list-dir-example hdfs dfs -ls -R /user/username/demo/list-dir-example

26 2. Creating a directory Syntax: hdfs dfs -mkdir [-p] <hdfs-dir-path> Example: hdfs dfs -mkdir /user/username/demo/create-dir-example hdfs dfs -mkdir -p /user/username/demo/create-direxample/dir1/dir2/dir3

27 3. Create a file on local & put it on HDFS Syntax: vi filename.txt hdfs dfs -put [options] <local-file-path> <hdfs-dir-path> Example: vi file-copy-to-hdfs.txt hdfs dfs -put file-copy-to-hdfs.txt /user/username/demo/putexample/

28 4. Get a file from HDFS to local Syntax: hdfs dfs -get <hdfs-file-path> [local-dir-path] Example: hdfs dfs -get /user/username/demo/get-example/file-copy-fromhdfs.txt ~/demo/

29 5. Copy From LOCAL To HDFS Syntax: hdfs dfs -copyfromlocal <local-file-path> <hdfs-file-path> Example: hdfs dfs -copyfromlocal file-copy-to-hdfs.txt /user/username/demo/copyfromlocal-example/

30 6. Copy To LOCAL From HDFS Syntax: hdfs dfs -copytolocal <hdfs-file-path> <local-file-path> Example: hdfs dfs -copytolocal /user/username/demo/copytolocalexample/file-copy-from-hdfs.txt ~/demo/

31 7. Move a file from local to HDFS Syntax: hdfs dfs -movefromlocal <local-file-path> <hdfs-dir-path> Example: hdfs dfs -movefromlocal /path/to/file.txt /user/username/demo/movefromlocal-example/

32 8. Copy a file within HDFS Syntax: hdfs dfs -cp <hdfs-source-file-path> <hdfs-dest-file-path> Example: hdfs dfs -cp /user/username/demo/copy-within-hdfs/file-copy.txt /user/username/demo/data/

33 9. Move a file within HDFS Syntax: hdfs dfs -mv <hdfs-source-file-path> <hdfs-dest-file-path> Example: hdfs dfs -mv /user/username/demo/move-within-hdfs/file-move.txt /user/username/demo/data/

34 10. Merge files on HDFS Syntax: hdfs dfs -getmerge [-nl] <hdfs-dir-path> <local-file-path> Examples: hdfs dfs -getmerge -nl /user/username/demo/merge-example/ /path/to/all-files.txt

35 11. View file contents Syntax: hdfs dfs -cat <hdfs-file-path> hdfs dfs -tail <hdfs-file-path> hdfs dfs -text <hdfs-file-path> Examples: hdfs dfs -cat /user/username/demo/data/cat-example.txt hdfs dfs -cat /user/username/demo/data/cat-example.txt head

36 12. Remove files/dirs from HDFS Syntax: hdfs dfs -rm [options] <hdfs-file-path> Examples: hdfs dfs -rm /user/username/demo/remove-example/remove-file.txt hdfs dfs -rm -R /user/username/demo/remove-example/ hdfs dfs -rm -R -skiptrash /user/username/demo/remove-example/

37 13. Change file/dir properties Syntax: hdfs dfs -chgrp [-R] <NewGroupName> <hdfs-file-path> hdfs dfs -chmod [-R] <permissions> <hdfs-file-path> hdfs dfs -chown [-R] <NewOwnerName> <hdfs-file-path> Examples: hdfs dfs -chmod -R 777 /user/username/demo/data/file-changeproperties.txt

38 14. Check the file size Syntax: hdfs dfs -du <hdfs-file-path> Examples: hdfs dfs -du /user/username/demo/data/file.txt hdfs dfs -du -s -h /user/username/demo/data/

39 15. Create a zero byte file in HDFS Syntax: hdfs dfs -touchz <hdfs-file-path> Examples: hdfs dfs -touchz /user/username/demo/data/zero-byte-file.txt

40 16. File test operations Syntax: hdfs dfs -test -[defsz] <hdfs-file-path> Examples: hdfs dfs -test -e /user/username/demo/data/file.txt echo $?

41 17. Get FileSystem Statistics Syntax: hdfs dfs -stat [format] <hdfs-file-path> Format Options: %b - file size in blocks, %n - filename %r - replication %y - modification date %g - group name of owner %o - block size %u - user name of owner

42 18. Get File/Dir Counts Syntax: hdfs dfs -count [-q] [-h] [-v] <hdfs-file-path> Example: hdfs dfs -count -v /user/username/demo/

43 19. Set replication factor Syntax: hdfs dfs -setrep -w -R n <hdfs-file-path> Examples: hdfs dfs -setrep -w -R 2 /user/username/demo/data/file.txt

44 20. Set Block Size Syntax: hdfs dfs -D dfs.blocksize=blocksize -copyfromlocal <local-file-path> <hdfs-file-path> Examples: hdfs dfs -D dfs.blocksize=67108864 -copyfromlocal /path/to/file.txt /user/username/demo/block-example/

45 21. Empty the HDFS trash Syntax: hdfs dfs -expunge Location:

Other hdfs commands (admin) 46

47 22. HDFS Admin Commands: fsck Syntax: hdfs fsck <hdfs-file-path> Options: [-list-corruptfileblocks [-move -delete -openforwrite] [-files [-blocks [-locations -racks]]] [-includesnapshots]

48

49 23. HDFS Admin Commands: dfsadmin Syntax: hdfs dfsadmin Options: [-report [-live] [-dead] [-decommissioning]] [-safemode enter leave get wait] [-refreshnodes] [-refresh <host:ipc_port> <key> [arg1..argn]] [-shutdowndatanode <datanode:port> [upgrade]] [-getdatanodeinfo <datanode_host:ipc_port>] [-help [cmd]] Examples: hdfs dfsadmin -report -live

50

51 24. HDFS Admin Commands: namenode Syntax: hdfs namenode Options: [-checkpoint] [-format [-clusterid cid ] [-force] [-noninteractive] ] [-upgrade [-clusterid cid] ] [-rollback] [-recover [-force] ] [-metadataversion ] Examples: hdfs namenode -help

52 25. HDFS Admin Commands: getconf Syntax: hdfs getconf [-options] Options: [ -namenodes ] [ -backupnodes ] [ -excludefile ] [ -confkey [key] ] [ -secondarynamenodes ] [ -includefile ] [ -nnrpcaddresses ]

53 Again,,, THE most important command!! Syntax: hdfs dfs -help [options] hdfs dfs -usage [options] Examples: hdfs dfs -help help hdfs dfs -usage usage

Interacting With HDFS In Web Browser 54

55 Web HDFS URL: http://namenode:50070/explorer.html Examples: http://localhost:50070/explorer.html http://ec2-52-23-214-111.compute-1.amazonaws.com:50070/explorer.html

56 References 1. 2. 3. 4. 5. 6. 7. 8. 9. http://www.hadoopinrealworld.com http://www.slideshare.net/sanjeeb85/hdfscommandreference http://www.slideshare.net/jaganadhg/hdfs-10509123 http://www.slideshare.net/praveenbhat2/adv-os-presentation http://www.tomsitpro.com/articles/hadoop-2-vs-1,2-718.html http://www.snia.org/sites/default/files/hadoop2_new_and_noteworthy_snia_v3.pdf http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoophdfs/hdfscommands.html http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoopcommon/filesystemshell.html http://hadoop.apache.org/docs/r1.2.1/distcp.html

57 Thank You!! Please send your questions at: pradeep@datatorrent.com pradeep.n.kumbhar@gmail.com

Resources Apache Apex website - http://apex.incubator.apache.org/ Subscribe - http://apex.incubator.apache.org/community.html Download - http://apex.incubator.apache.org/downloads.html Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex Facebook - https://www.facebook.com/apacheapex/ Meetup - http://www.meetup.com/topics/apache-apex Startup Program Free Enterprise License for Startups, Educational Institutions, Non-Profits - https://www.datatorrent.com/product/startup-accelerator/ Cloud Trial - http://web.datatorrent.com/cloudtrial.html 58 2016 DataTorrent

We Are Hiring jobs@datatorrent.com Developers/Architects QA Automation Developers Information Developers Build and Release 59 2016 DataTorrent

Upcoming Events March 15th March 17th 6pm PST Title March 24th 9am PST Title 60 2016 DataTorrent

APPENDIX 61

62 Copy data from one node to another node in HDFS Description: Copy data between clusters Syntax: hadoop distcp hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo hadoop distcp hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b hdfs: //nn2:8020/bar/foo hadoop distcp -f hdfs://nn1:8020/srclist.file hdfs://nn2:8020/bar/foo Where srclist.file contains hdfs://nn1:8020/foo/a hdfs://nn1:8020/foo/b