Chapter 14 of 26
Deploying Data Solutions: Cloud Storage, Cloud SQL, and BigQuery in Practice
Create buckets, databases, and datasets while configuring access, performance, and durability settings that show up in real exam tasks.
Module Overview and Core Scenario
What You Will Build
You will deploy Cloud Storage, Cloud SQL, and BigQuery and connect them in a simple, realistic data pipeline that mirrors common exam scenarios.
Scenario Background
A startup stores raw CSV logs, runs a transactional app on a relational DB, and lets analysts query historical data using SQL at scale.
Your Skills Focus
You will create buckets, databases, and datasets, configure access and durability, and move data between services using standard tools.
Exam Mindset
Think like an Associate Cloud Engineer: deploy and secure services, wire them together, and keep them reliable and cost efficient.
Cloud Storage Buckets: Locations, Classes, and Basics
Buckets and Objects
Cloud Storage stores objects (files) inside buckets. Bucket names are globally unique across Google Cloud.
Location Types
Choose Region for low-latency local access, Dual-region for two-region durability, Multi-region for broad, geo-distributed access.
Storage Classes
Standard for frequent access; Nearline, Coldline, and Archive for progressively colder, cheaper storage with higher access cost.
Creation Flow
In the console: Create bucket, set unique name, choose location and class, and usually select Uniform access for simpler IAM.
Common Exam Trap
Do not confuse location type (Region/Dual/Multi) with storage class (Standard/Nearline/Coldline/Archive); they solve different problems.
Hands-On: Create Buckets and Lifecycle Rules
Lifecycle Goal
You will create a hot logs bucket, an archive bucket, and a lifecycle rule that automatically deletes or transitions objects after 30 days.
Creating Buckets with gcloud
Use `gcloud storage buckets create` with `--location` and `--default-storage-class` to define where and how your data is stored.
Lifecycle JSON
Lifecycle rules are defined in JSON, with an action (Delete or SetStorageClass) and a condition such as object age in days.
Automation Benefit
Once configured, lifecycle management automatically optimizes storage cost without manual scripts or Cloud Functions.
Cloud SQL Fundamentals: Engines, Tiers, and Connectivity
Cloud SQL in One Sentence
Cloud SQL is a managed relational database for MySQL, PostgreSQL, and SQL Server, with backups, patching, and HA handled for you.
Engines and Resources
Pick MySQL, PostgreSQL, or SQL Server; then choose vCPU, RAM, and SSD or HDD storage according to performance needs.
Public vs Private IP
Public IP with authorized networks is easy but exposed; Private IP keeps traffic inside your VPC and is preferred for production.
Exam Security Trap
If an option suggests opening Cloud SQL to 0.0.0.0/0, it is almost always wrong; use private IP or tight authorized networks instead.
Hands-On: Create a Cloud SQL Instance with Backups and HA
Instance Requirements
You will create a PostgreSQL instance in us-central1 with private IP, SSD storage, regional HA, and automated backups plus PITR.
gcloud Creation Command
Use `gcloud sql instances create` with flags for database version, tier, region, network, no public IP, availability type, and backups.
High Availability
Regional availability creates a standby in another zone. If the primary zone fails, Cloud SQL automatically fails over.
Backups vs PITR
Automated backups provide daily snapshots; point-in-time recovery lets you restore to any time within the retention window.
BigQuery Basics: Datasets, Tables, and Querying
BigQuery Overview
BigQuery is a serverless data warehouse. You define datasets and tables and run SQL; Google manages the underlying infrastructure.
Datasets and Tables
A dataset is a regional container; tables inside hold rows and columns with a defined schema. IAM is often set at the dataset level.
Creating a Dataset
In BigQuery Studio, click Create dataset, choose an ID and region, and keep it aligned with your data location for performance and compliance.
Simple Aggregation Query
Use standard SQL to filter, group, and sort. For example, count pageviews per user and order by the count, limiting to the top 10.
Regional Consideration
BigQuery datasets are regional; loading from a bucket in a different continent can cause errors or cross-region data transfer.
Hands-On: Create a BigQuery Dataset and Load from Cloud Storage
This activity walks through creating a dataset and loading data from a Cloud Storage bucket using the `bq` command-line tool. This mirrors tasks you might see in labs or exam-style questions.
```bash
1. Create a BigQuery dataset in us-central1
bq --location=us-central1 mk \
--dataset my-project:analytics_logs
2. Create a table by loading a CSV file from Cloud Storage
Assume you previously exported or uploaded a file to gs://ace-logs-hot/webevents202605.csv
bq --location=us-central1 load \
--source_format=CSV \
--skipleadingrows=1 \
my-project:analyticslogs.webevents \
gs://ace-logs-hot/webevents202605.csv \
userid:STRING,eventtime:TIMESTAMP,page:STRING,event_date:DATE
3. Run a simple query from the command line
bq query --uselegacysql=false '
SELECT
page,
COUNT(*) AS views
FROM
`my-project.analyticslogs.webevents`
GROUP BY
page
ORDER BY
views DESC
LIMIT 5;'
```
Key flags to notice:
- `--location` must match the dataset region.
- `load` requires dataset.table, Cloud Storage URI, and schema.
- `--skipleadingrows=1` is common for CSVs with headers.
- `--uselegacysql=false` forces standard SQL (exam questions assume standard SQL).
If this were in the console, you would pick Create table, choose Cloud Storage as the source, paste the `gs://` URI, set file format to CSV, check Auto detect or define the schema manually, then click Create table.
Moving Data Between Cloud Storage, Cloud SQL, and BigQuery
Cloud SQL and Cloud Storage
Use gcloud sql export/import commands to move SQL dumps or CSVs between Cloud SQL and Cloud Storage for backup or migration.
Cloud Storage to BigQuery
Use load jobs to ingest structured files into native tables, or external tables to query data in Cloud Storage without loading.
Cloud SQL to BigQuery Pattern
Typical flow: export from Cloud SQL to Cloud Storage, then load into BigQuery. There is no direct native pull from Cloud SQL.
Permissions Reminder
The Cloud SQL instance’s service account needs write access to buckets for export and read access for import operations.
Design Exercise: Pick the Right Storage and Movement Pattern
Work through these short scenarios and decide which service and pattern fits best. Think like an exam question writer: what are they really testing?
- Scenario A: Your app in `us-central1` writes image uploads that must be kept for 7 years but are only accessed when a user views their profile. Cost is more important than latency.
- Which Cloud Storage location type and storage class would you choose and why?
- Would you use lifecycle rules? If yes, what would they do?
- Scenario B: A small e-commerce site uses a MySQL database for orders. They want:
- Automatic failover within `europe-west1`.
- No public database endpoint.
- Ability to restore to any point within the last 7 days.
- Which Cloud SQL settings do you enable? List at least three flags or console options.
- Scenario C: Analysts need to run monthly reports over 2 TB of log data. Logs arrive daily as CSV files in a `US` multi-region bucket. Queries can take minutes; that is acceptable.
- Do you load data into BigQuery tables or use an external table? Why?
- Sketch the steps to make the data queryable.
Pause and write down your answers. Then compare against the model answers in your head:
- Are you choosing regional vs multi-region for the right reasons (latency, redundancy, compliance)?
- Are you defaulting to private IP and HA for production databases when budget allows?
- Are you using Cloud Storage as the interchange layer between Cloud SQL and BigQuery?
Quiz 1: Cloud Storage and Cloud SQL Essentials
Answer this question to check your understanding of bucket configuration and Cloud SQL connectivity.
You are deploying a web app on GKE in us-central1 that writes logs to Cloud Storage and uses Cloud SQL for PostgreSQL. Which combination best follows Google Cloud best practices for performance and security?
- Create a multi-region bucket in EU with Archive class; create a Cloud SQL instance with public IP and authorized networks set to 0.0.0.0/0.
- Create a regional bucket in us-central1 with Standard class; create a Cloud SQL instance in us-central1 with private IP in the same VPC and regional availability.
- Create a dual-region bucket in asia-southeast1-asia-east1 with Nearline class; create a Cloud SQL instance in asia-southeast1 with public IP only.
- Create a regional bucket in us-west1 with Coldline class; create a Cloud SQL instance in us-central1 with zonal availability and public IP restricted to the GKE node IPs.
Show Answer
Answer: B) Create a regional bucket in us-central1 with Standard class; create a Cloud SQL instance in us-central1 with private IP in the same VPC and regional availability.
Option 2 aligns the bucket region (us-central1) with the GKE cluster for low-latency access and uses Standard class for active logs. It also deploys Cloud SQL in the same region with private IP and regional availability, improving security and high availability. Option 1 exposes the DB to the internet and uses a distant bucket. Option 3 misaligns regions and uses public IP only. Option 4 misaligns the bucket region and uses public IP instead of private IP.
Quiz 2: BigQuery and Data Movement Patterns
Test your understanding of BigQuery regions and data movement between services.
You need to analyze daily CSV exports from Cloud SQL that are stored in a bucket `gs://sales-exports` in region us-central1. You want analysts to run standard SQL queries over the last year of data. What is the most appropriate approach?
- Create a BigQuery dataset in europe-west1, create an external table pointing to the bucket, and query it directly.
- Create a BigQuery dataset in us-central1, load the CSV files into a native table using BigQuery load jobs, and query the table.
- Create a BigQuery dataset in US multi-region, copy the CSV files to a US multi-region bucket, and then create an external table.
- Connect BigQuery directly to Cloud SQL using a private IP connection and query the database live.
Show Answer
Answer: B) Create a BigQuery dataset in us-central1, load the CSV files into a native table using BigQuery load jobs, and query the table.
Option 2 keeps the dataset and bucket in the same region (us-central1) and uses load jobs to create a native table, which is efficient for repeated analysis. Option 1 misaligns regions. Option 3 adds unnecessary copying and a different location. Option 4 is not supported; BigQuery does not directly query Cloud SQL without an intermediate step like Cloud Storage and a load job or external table.
Key Term Flashcards
Flip through these cards to reinforce core terminology before you move on to mock exams and spaced review.
- Cloud Storage bucket location type
- Defines how broadly data is replicated: Region (single region), Dual-region (two specific regions), or Multi-region (multiple regions in a large geographic area). Independent from storage class.
- Cloud Storage storage class
- Determines cost and performance characteristics for objects: Standard for frequent access, Nearline for monthly, Coldline for quarterly, Archive for annual or less frequent access.
- Lifecycle rule (Cloud Storage)
- A JSON-defined policy attached to a bucket that automatically performs actions like Delete or SetStorageClass on objects when conditions (such as age in days) are met.
- Cloud SQL private IP
- Connectivity option where the Cloud SQL instance receives an internal IP address in a VPC, keeping traffic within Google Cloud’s private network and improving security.
- Cloud SQL availability-type=REGIONAL
- High availability mode that creates a standby instance in another zone within the same region, enabling automatic failover if the primary zone becomes unavailable.
- Point-in-time recovery (PITR)
- Cloud SQL feature that, when enabled with automated backups, lets you restore a database to any point within a retention window, not just to the time of the last full backup.
- BigQuery dataset
- A regional container for tables and views in BigQuery. It has a location (such as us-central1) and is a common boundary for access control and billing.
- BigQuery external table
- A table definition in BigQuery that references data stored outside BigQuery, typically in Cloud Storage, allowing you to query the data without fully loading it.
- Cloud SQL export to Cloud Storage
- An operation that writes a SQL dump or CSV files from a Cloud SQL instance to a Cloud Storage bucket, often used for backup, migration, or staging data for BigQuery.
- Service account (Google Cloud)
- A service account is a special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.
- Identity and Access Management (IAM)
- Identity and Access Management (IAM) lets you manage access control by defining who (identity) has what access (role) for which resource.
Key Terms
- BigQuery
- Google Cloud’s serverless, highly scalable data warehouse that lets you store and analyze large datasets using standard SQL without managing infrastructure.
- Cloud SQL
- A fully managed relational database service for MySQL, PostgreSQL, and SQL Server that handles backups, patching, and high availability on Google Cloud.
- BigQuery table
- A structured collection of data in BigQuery with a defined schema of columns and data types, stored within a dataset.
- Service account
- A service account is a special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.
- BigQuery dataset
- A logical, regional container within BigQuery that holds tables and views and is often used as a boundary for access control and organization.
- Cloud SQL Auth Proxy
- A tool and associated libraries that securely connect applications to Cloud SQL instances by handling authentication and encryption, without exposing the database directly.
- Cloud Storage bucket
- A globally named container for storing objects (files) in Cloud Storage, configured with a location type, location, storage class, and access controls.
- External table (BigQuery)
- A BigQuery table definition that references data stored outside BigQuery, such as files in Cloud Storage, allowing querying without loading the data.
- Point-in-time recovery (PITR)
- A Cloud SQL capability that lets you restore a database to any specific time within a configured retention window, assuming automated backups and binary log capture are enabled.
- Authorized networks (Cloud SQL)
- A list of IP address ranges that are allowed to connect to a Cloud SQL instance over its public IP address.
- Identity and Access Management (IAM)
- Identity and Access Management (IAM) lets you manage access control by defining who (identity) has what access (role) for which resource.
- Lifecycle management (Cloud Storage)
- A feature that automatically transitions or deletes objects in a bucket based on rules defined on conditions such as age, creation date, or storage class.