Published 9/2024
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 71.47 MB | Duration: 0h 33m
Hadoop to GCP Migration Using DistCp
What you'll learn
bigquery
gcs
hadoop
cloud functions
Requirements
Basic Hadoop knowledge
Description
Objective of the course , is to migrate data from On-prem Hadoop(Consider hadoop installed in Windows is considered as on-prem) to GCS(Google Cloud storage) and Google cloud storage to BigQuery. To learn about this course , basic knowledge in Hadoop commands is mandatory. Things you will learn from this course - Hadoop installation in Windows 11 - Load data from local file system to Hadoop - Load file from hdfs to gcs by installing gcs connector - Once data is loaded in bucket, the bucket name and file name is captured in bigquery table by creating a trigger using cloud function gen2. The input to apache beam is from latest file name bigquery table and load the contents of the file from the bucket to the actual bigquery table. We also tried to create hive external table, which is pointing out GCS bucket and file. Due to errors , we can't able to demo the approach. By creating that , the hive external table is loaded which indirectly loads data in GCS. The apache beam code which loads data from gcs to bigquery will run by direct runner , we faced some errors while running through dataflow runner. The datatype conversion is not handled in migration , considering all the columns as string.
Overview
Section 1: Hadoop to gcp migration - part1
Lecture 1 Hadoop to GCP Migration Using DistCp
Lecture 2 Capture GCS Events in BIgquery
Lecture 3 File from Bucket to BigQuery Table
Lecture 4 Overall flow
Lecture 5 Google cloud Data engineer certification exam experience
Lecture 6 Thank you
GCP Aspirants
Screenshots
rapidgator.net:
ddownload.com: