Google Cloud Data Engineer

Live Online (VILT) & Classroom Corporate Training Course

This GCP course covers structured, unstructured, and streaming data.
Google Cloud

How can we help you?

  • CloudLabs
    CloudLabs
  • Projects
    Projects
  • Assignments
    Assignments
  • 24x7 Support
    24x7 Support
  • Lifetime Access
    Lifetime Access
Box

Overview

This Data Engineering on Google Cloud Platform training course teaches attendees how to design data processing systems, build end-to-end data pipelines, analyze data, and carry out machine learning.

Box

Objectives

At the end of Google Data Engineer training course, participants will be able to

  • Design and build data processing systems on Google Cloud
  • Process batch and streaming data by implementing autoscaling data pipelines on Cloud Dataflow
  • Derive business insights from extremely large datasets using Google BigQuery
  • Train, evaluate, and predict using machine learning models using Tensorflow and Cloud ML
  • Leverage unstructured data using Spark and ML APIs on Cloud Dataproc
  • Enable instant insights from streaming data
Box

Prerequisites

  • Basic proficiency with common query language such as SQL
  • Experience with data modeling, extract, transform, load activities
  • Experience developing applications using a common programming language such as Python
  • Familiarity with Machine Learning and/or statistics
Box

Course Outline

  • Creating and managing clusters.
  • Leveraging custom machine types and preemptible worker nodes
  • Scaling and deleting Clusters

  • Running Pig and Hive jobs.
  • Separation of storage and compute.

  • Customize cluster with initialization actions.
  • BigQuery Support.

  • Google’s Machine Learning APIs
  • Common ML Use Cases
  • Invoking ML APIs
  • Serverless Data Analysis with Google BigQuery and Cloud Dataflow

  • What is BigQuery
  • Queries and Functions
  • Loading data into BigQuery
  • Exporting data from BigQuery
  • Nested and repeated fields
  • Querying multiple tables
  • Performance and pricing

  • The Beam programming model
  • Data pipelines in Beam Python
  • Data pipelines in Beam Java
  • Scalable Big Data processing using Beam
  • Incorporating additional data
  • Handling stream data
  • GCP Reference architecture
  • Serverless Machine Learning with TensorFlow on Google Cloud Platform

  • What is machine learning (ML)
  • Effective ML: concepts, types
  • ML datasets: generalization

  • Getting started with TensorFlow
  • TensorFlow graphs and loops + lab
  • Monitoring ML training

  • Why Cloud ML?
  • Packaging up a TensorFlow model
  • End-to-end training

  • Creating good features
  • Transforming inputs
  • Synthetic features
  • Preprocessing with Cloud ML
  • Building Resilient Streaming Systems on Google Cloud Platform

  • Stream data processing: Challenges
  • Handling variable data volumes
  • Dealing with unordered/late data

  • What is Cloud Pub/Sub?
  • How it works: Topics and Subscriptions

  • Challenges in stream processing.
  • Handle late data: watermarks, triggers, accumulation.

  • Streaming analytics: from data to decisions
  • Querying streaming data with BigQuery
  • What is Google Data Studio?

  • What is Cloud Spanner?
  • Designing Bigtable schema
  • Ingesting into Bigtable
Box

Testimonials