Chapter 14 of 27
Deploying Storage and Database Solutions: Cloud Storage, Cloud SQL, and BigQuery
Turn storage plans into real deployments; practice provisioning buckets, databases, and warehouses that support common application and analytics patterns.
Storage and Databases in the ACE Exam Context
Where this fits in ACE
Storage and database deployments are core to Deploying and implementing a cloud solution, and also matter for planning and security sections of the Associate Cloud Engineer exam.
What you must be able to do
You should be able to choose between Cloud SQL, BigQuery, Firestore, Spanner, and Bigtable, and deploy Cloud Storage buckets, Cloud SQL instances, and BigQuery datasets and tables.
Typical exam-style scenarios
Expect scenarios like: a web app needing a managed MySQL DB, batch jobs writing CSVs to Cloud Storage, and analysts querying data in BigQuery or exporting results back to Cloud Storage.
Your role mindset
Think like an Associate Cloud Engineer: turn requirements on performance, durability, and access patterns into secure, correctly configured Google Cloud resources.
Cloud Storage Buckets: Locations, Classes, and Access
What Cloud Storage is for
Cloud Storage holds objects: files, backups, logs, and static web assets. You configure buckets with a name, location, storage class, and access controls.
Locations and residency
Bucket locations are regions or multi-regions. They affect latency and data residency. Choose where your users are or where policies require data to stay.
Storage classes (exam detail)
The Cloud Storage classes are: Standard, Nearline, Coldline, Archive, Regional Persistent Disk. They differ mainly by availability and cost for storage vs. access.
Creating a bucket (console mental steps)
In the console: Storage → Buckets → Create. Pick a globally unique name, choose location and class, then configure access (uniform IAM or fine-grained ACLs).
Common Cloud Storage exam traps
Do not confuse bucket location with Network Service Tiers, remember bucket names are global, and avoid unintentionally making data public.
Example: Creating a Secure Bucket for App Uploads
Scenario requirements
Cloud Run API stores user profile pictures. Data must stay in the EU, is read frequently, only the app writes, support can read, and nothing should be public.
Design choices
Pick region `europe-west1`, Standard storage class, and uniform bucket-level access. This matches EU residency, low latency, and frequent access patterns.
Creating the bucket
In the console: Storage → Buckets → Create. Name `myproj-eu-profile-images`, choose Region `europe-west1`, Standard class, uniform access, and enable public access prevention.
Wiring permissions with IAM
Grant the Cloud Run service account an object write/read role on this bucket, and give support users `Storage Object Viewer`. Avoid any `allUsers` bindings.
Exam takeaways
Remember how service accounts access buckets, what IAM is, and how to prevent public access while still allowing app and support access.
Cloud SQL: Instances, Databases, Users, and Connectivity
What Cloud SQL provides
Cloud SQL is a managed MySQL, PostgreSQL, and SQL Server service. You manage instances, not OS or database binaries, and Google handles backups and patching.
Instance, database, user
Instance = server; database = logical container (like `appdb`); user = account used by apps or DBAs to connect and run SQL statements.
Creating an instance
In the console: SQL → Create instance → choose engine → configure CPU, RAM, storage, region, and HA; then set root password or IAM DB auth.
Secure connectivity
Prefer private IP when apps run in the same VPC. Use public IP only with authorized networks and secure methods like the Cloud SQL Auth Proxy.
Common Cloud SQL exam traps
Do not mix Cloud SQL with Spanner/Bigtable, and remember serverless services often need Serverless VPC Access to reach private IPs.
Example: Deploying Cloud SQL for a Cloud Run App
Scenario overview
Cloud Run in `us-central1` needs a MySQL database. Security policy forbids public database access, so you must design a private connection.
High-level design
Create a Cloud SQL MySQL instance in `us-central1` with private IP only, then let Cloud Run reach it through a Serverless VPC Access connector.
Creating networking pieces
First, create a Serverless VPC Access connector in the same region. Then create the Cloud SQL instance with private IP and no public IP.
Databases, users, and IAM
Inside the instance, create database `userdb` and user `appuser`. Grant the Cloud Run service account the `Cloud SQL Client` role.
Connecting from Cloud Run
In the Cloud Run service config, attach the VPC connector and specify the Cloud SQL instance connection. Pass DB credentials via env vars or secrets.
BigQuery Basics: Datasets, Tables, Partitioning, and Clustering
What BigQuery is
BigQuery is a serverless data warehouse for analytics. You think in terms of datasets and tables, not servers or disks, and you pay mainly for storage and query data scanned.
Datasets and tables
Datasets group tables and views and have a location. Tables hold rows and columns defined by a schema. The dataset location matters for where data physically resides.
Partitioning concept
Partitioning divides a table into segments, often by date or ingestion time. Queries that filter on the partition field scan fewer partitions and cost less.
Clustering concept
Clustering sorts data by one or more columns. When queries filter on those columns, BigQuery can skip large chunks of data, improving performance.
Common BigQuery exam traps
Remember dataset and Cloud Storage locations must be compatible for loads, and that partitioning reduces scanned data, not the raw stored size.
Loading Data into BigQuery and Exporting to Cloud Storage
BigQuery ↔ Cloud Storage direction 1
To load data into BigQuery, create a table with source set to Google Cloud Storage, point to a `gs://` URI, choose format, set schema, and choose write mode.
Location compatibility
The dataset and the Cloud Storage bucket must be in compatible locations (for example, both in the EU). Mismatched locations cause load jobs to fail.
Exporting results out
From query results, use Save results → Google Cloud Storage, pick a format like CSV or Parquet, and specify a `gs://` path for the export file.
Permissions needed
The user or service account must have both BigQuery permissions and Cloud Storage permissions to read source objects or write export files.
Key exam pitfalls
Watch for location mismatches and wrong write disposition; overwriting a table when you meant to append is a classic configuration mistake.
Example: Simple Analytics Pipeline (Cloud Storage → BigQuery → Cloud Storage)
Pipeline scenario
Nightly CSVs land in Cloud Storage, BigQuery stores and analyzes them, then daily summaries are exported back to Cloud Storage for another tool.
Resource layout
Use an ingestion bucket and export bucket in `us` multi-region, plus a `transactions` dataset in BigQuery US with a partitioned table `daily`.
Step 1: Load into BigQuery
A scheduled job loads `gs://myproj-tx-raw-us/YYYY-MM-DD.csv` into `transactions.daily` with write mode Append, leveraging the date partition column.
Step 2: Summarize and export
A scheduled query aggregates data. The results are exported as Parquet into `gs://myproj-tx-summary-us/summary-YYYY-MM-DD.parquet`.
Exam insights
See the pattern: Cloud Storage raw → BigQuery analytics → Cloud Storage export, with matching locations and correct IAM roles for service accounts.
Quiz 1: Cloud Storage and Cloud SQL
Test your understanding of bucket configuration and Cloud SQL connectivity.
Your company runs a Cloud Run service in europe-west1 that stores user documents and needs a relational database. Requirements: documents must stay in the EU, the database must not be publicly reachable, and the app should connect with minimal latency. Which combination is the BEST fit?
- Cloud Storage bucket in us-central1, Cloud SQL instance in us-central1 with public IP and authorized networks
- Cloud Storage bucket in europe-west1, Cloud SQL instance in europe-west1 with private IP and a Serverless VPC Access connector
- Cloud Storage bucket in europe-west1, Cloud SQL instance in us-central1 with public IP, no authorized networks
- Cloud Storage bucket in EU multi-region, Cloud SQL instance in europe-west1 with public IP and authorized networks
Show Answer
Answer: B) Cloud Storage bucket in europe-west1, Cloud SQL instance in europe-west1 with private IP and a Serverless VPC Access connector
Option 2 keeps both storage and database in europe-west1 for low latency and EU residency. The Cloud SQL instance uses private IP plus a Serverless VPC Access connector for secure, non-public connectivity from Cloud Run. Option 1 violates EU-only data and uses public IP. Option 3 crosses regions and exposes the DB. Option 4 uses a public IP, which the requirement explicitly forbids.
Quiz 2: BigQuery Loading and Partitioning
Check your understanding of BigQuery datasets, locations, and partitioning.
You need to load daily log files from a Cloud Storage bucket into a BigQuery table and minimize query costs when analysts filter by log date. Which design is MOST appropriate?
- Create an unpartitioned table; analysts can always add WHERE clauses on the date column.
- Create a table partitioned by ingestion time and store all logs in a single Cloud Storage bucket in us-central1, while the dataset is in EU.
- Create a table partitioned by a DATE column (log_date), ensure the dataset and Cloud Storage bucket are both in EU, and encourage analysts to filter on log_date.
- Create a clustered but not partitioned table on the log_date column; location of dataset and bucket does not matter.
Show Answer
Answer: C) Create a table partitioned by a DATE column (log_date), ensure the dataset and Cloud Storage bucket are both in EU, and encourage analysts to filter on log_date.
Option 3 uses partitioning on the DATE column analysts filter on, which reduces scanned data and cost. It also respects the requirement that dataset and Cloud Storage bucket locations must be compatible (both in EU). Option 1 does not control scan size. Option 2 mismatches locations. Option 4 clusters but does not partition and incorrectly ignores location compatibility.
Design Exercise: End-to-End Data Pipeline
Work through this thought exercise to connect Cloud Storage, Cloud SQL, and BigQuery.
Scenario:
- A mobile game writes event JSON files every hour to `gs://game-events-raw/`.
- The game backend uses Cloud SQL (PostgreSQL) for user profiles.
- Product managers want daily analytics in BigQuery combining user profile attributes (from Cloud SQL) with event data (from Cloud Storage).
Your tasks (think and optionally jot notes):
- Bucket configuration
- Choose a location and storage class for `game-events-raw` if most users are in North America.
- Decide whether you need uniform or fine-grained access.
- Cloud SQL export strategy
- How would you periodically move a snapshot of user profile data from Cloud SQL into BigQuery? Think about:
- Exporting from Cloud SQL to Cloud Storage (format, permissions).
- Loading that export into a BigQuery table.
- BigQuery schema and partitioning
- Sketch two tables: `events` and `users`.
- Which table should be partitioned, and on which field, to support daily dashboards?
- IAM and service accounts
- Identify at least two places where a service account is needed in this pipeline.
- For each, note what broad type of roles it needs (e.g., BigQuery read, Cloud Storage write).
After you think it through, compare your design mentally with the patterns from earlier steps: raw in Cloud Storage, analytics in BigQuery, secure exports from Cloud SQL, and careful IAM role assignment.
Key Term Review
Use these flashcards to reinforce core concepts before moving on.
- Cloud Storage bucket location
- The region or multi-region where bucket data is stored. It affects latency, data residency, and must be compatible with BigQuery dataset locations for load/export jobs.
- Cloud Storage classes (name all)
- The Cloud Storage classes are: Standard, Nearline, Coldline, Archive, Regional Persistent Disk.
- Identity and Access Management (IAM)
- Identity and Access Management (IAM) lets you manage access control by defining who (identity) has what access (role) for which resource.
- service account
- A service account is a special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.
- Cloud SQL private IP vs public IP
- Private IP lets resources in the same VPC connect without traversing the public internet (often with Serverless VPC Access). Public IP is reachable over the internet and should be restricted using authorized networks and secure clients.
- BigQuery dataset
- A top-level container in BigQuery that holds tables and views and is bound to a specific location (region or multi-region).
- BigQuery table partitioning
- A way to split a table into segments, usually based on a date/time column or ingestion time, so queries can scan only relevant partitions and reduce cost.
- BigQuery table clustering
- A technique that organizes data within a table or partition based on one or more columns, improving performance when queries filter or aggregate on those columns.
- Typical pipeline pattern (Cloud Storage ↔ BigQuery)
- Raw data lands in Cloud Storage, is loaded into BigQuery for analysis, and results can be exported back to Cloud Storage for downstream systems.
- Associate Cloud Engineer role focus in this module
- An Associate Cloud Engineer deploys and secures applications, services, and infrastructure, monitors operations of multiple projects, and maintains enterprise solutions to ensure that they meet target performance metrics. In this module, this means correctly deploying Cloud Storage, Cloud SQL, and BigQuery resources and wiring them together securely.
Key Terms
- BigQuery
- A serverless, highly scalable data warehouse for running analytical SQL queries on large datasets, with storage organized into datasets and tables.
- Cloud SQL
- A managed relational database service for MySQL, PostgreSQL, and SQL Server, where Google handles backups, replication, and patching.
- Service account
- A special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.
- Dataset (BigQuery)
- A logical container for tables and views in BigQuery, bound to a specific geographic location.
- Cloud Storage bucket
- A top-level container for objects (files) in Cloud Storage, configured with a globally unique name, location, storage class, and access controls.
- Serverless VPC Access
- A Google Cloud feature that lets serverless services like Cloud Run and Cloud Functions connect to resources in a VPC network using internal IPs.
- Clustered table (BigQuery)
- A BigQuery table whose data is organized based on one or more clustering columns to improve query performance when filtering or aggregating on those columns.
- Partitioned table (BigQuery)
- A BigQuery table that is divided into partitions based on a partitioning column or ingestion time, enabling queries to scan only relevant partitions.
- Authorized networks (Cloud SQL)
- A list of public IP address ranges that are allowed to connect to a Cloud SQL instance when using a public IP.
- Write disposition (BigQuery load job)
- A setting that controls how a load job writes to a table: write if empty, append to table, or overwrite table.