Identity and Access Management (IAM)

Identity and Access Management (IAM) lets you manage access control by defining who (identity) has what access (role) for which resource.

A service account is a special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.

Cloud SQL private IP vs public IP

Private IP lets resources in the same VPC connect without traversing the public internet (often with Serverless VPC Access). Public IP is reachable over the internet and should be restricted using authorized networks and secure clients.

A serverless, highly scalable data warehouse for running analytical SQL queries on large datasets, with storage organized into datasets and tables.

A managed relational database service for MySQL, PostgreSQL, and SQL Server, where Google handles backups, replication, and patching.

What is Service account?

A special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.

Deploying Storage and Database Solutions: Cloud Storage, Cloud SQL, and BigQuery — Google Cloud Associate Cloud Engineer: Complete Exam-Ready Masterclass

Storage and Databases in the ACE Exam Context

Where this fits in ACE

Storage and database deployments are core to Deploying and implementing a cloud solution, and also matter for planning and security sections of the Associate Cloud Engineer exam.

What you must be able to do

You should be able to choose between Cloud SQL, BigQuery, Firestore, Spanner, and Bigtable, and deploy Cloud Storage buckets, Cloud SQL instances, and BigQuery datasets and tables.

Typical exam-style scenarios

Expect scenarios like: a web app needing a managed MySQL DB, batch jobs writing CSVs to Cloud Storage, and analysts querying data in BigQuery or exporting results back to Cloud Storage.

Your role mindset

Think like an Associate Cloud Engineer: turn requirements on performance, durability, and access patterns into secure, correctly configured Google Cloud resources.

Cloud Storage Buckets: Locations, Classes, and Access

What Cloud Storage is for

Cloud Storage holds objects: files, backups, logs, and static web assets. You configure buckets with a name, location, storage class, and access controls.

Locations and residency

Bucket locations are regions or multi-regions. They affect latency and data residency. Choose where your users are or where policies require data to stay.

Storage classes (exam detail)

The Cloud Storage classes are: Standard, Nearline, Coldline, Archive, Regional Persistent Disk. They differ mainly by availability and cost for storage vs. access.

Creating a bucket (console mental steps)

In the console: Storage → Buckets → Create. Pick a globally unique name, choose location and class, then configure access (uniform IAM or fine-grained ACLs).

Common Cloud Storage exam traps

Do not confuse bucket location with Network Service Tiers, remember bucket names are global, and avoid unintentionally making data public.

Example: Creating a Secure Bucket for App Uploads

Scenario requirements

Cloud Run API stores user profile pictures. Data must stay in the EU, is read frequently, only the app writes, support can read, and nothing should be public.

Design choices

Pick region `europe-west1`, Standard storage class, and uniform bucket-level access. This matches EU residency, low latency, and frequent access patterns.

Creating the bucket

In the console: Storage → Buckets → Create. Name `myproj-eu-profile-images`, choose Region `europe-west1`, Standard class, uniform access, and enable public access prevention.

Wiring permissions with IAM

Grant the Cloud Run service account an object write/read role on this bucket, and give support users `Storage Object Viewer`. Avoid any `allUsers` bindings.

Exam takeaways

Remember how service accounts access buckets, what IAM is, and how to prevent public access while still allowing app and support access.

Cloud SQL: Instances, Databases, Users, and Connectivity

What Cloud SQL provides

Cloud SQL is a managed MySQL, PostgreSQL, and SQL Server service. You manage instances, not OS or database binaries, and Google handles backups and patching.

Instance, database, user

Instance = server; database = logical container (like `appdb`); user = account used by apps or DBAs to connect and run SQL statements.

Creating an instance

In the console: SQL → Create instance → choose engine → configure CPU, RAM, storage, region, and HA; then set root password or IAM DB auth.

Secure connectivity

Prefer private IP when apps run in the same VPC. Use public IP only with authorized networks and secure methods like the Cloud SQL Auth Proxy.

Common Cloud SQL exam traps

Do not mix Cloud SQL with Spanner/Bigtable, and remember serverless services often need Serverless VPC Access to reach private IPs.

Example: Deploying Cloud SQL for a Cloud Run App

Scenario overview

Cloud Run in `us-central1` needs a MySQL database. Security policy forbids public database access, so you must design a private connection.

High-level design

Create a Cloud SQL MySQL instance in `us-central1` with private IP only, then let Cloud Run reach it through a Serverless VPC Access connector.

Creating networking pieces

First, create a Serverless VPC Access connector in the same region. Then create the Cloud SQL instance with private IP and no public IP.

Databases, users, and IAM

Inside the instance, create database `userdb` and user `appuser`. Grant the Cloud Run service account the `Cloud SQL Client` role.

Connecting from Cloud Run

In the Cloud Run service config, attach the VPC connector and specify the Cloud SQL instance connection. Pass DB credentials via env vars or secrets.

BigQuery Basics: Datasets, Tables, Partitioning, and Clustering

What BigQuery is

BigQuery is a serverless data warehouse for analytics. You think in terms of datasets and tables, not servers or disks, and you pay mainly for storage and query data scanned.

Datasets and tables

Datasets group tables and views and have a location. Tables hold rows and columns defined by a schema. The dataset location matters for where data physically resides.

Partitioning concept

Partitioning divides a table into segments, often by date or ingestion time. Queries that filter on the partition field scan fewer partitions and cost less.

Clustering concept

Clustering sorts data by one or more columns. When queries filter on those columns, BigQuery can skip large chunks of data, improving performance.

Common BigQuery exam traps

Remember dataset and Cloud Storage locations must be compatible for loads, and that partitioning reduces scanned data, not the raw stored size.

Loading Data into BigQuery and Exporting to Cloud Storage

BigQuery ↔ Cloud Storage direction 1

To load data into BigQuery, create a table with source set to Google Cloud Storage, point to a `gs://` URI, choose format, set schema, and choose write mode.

Location compatibility

The dataset and the Cloud Storage bucket must be in compatible locations (for example, both in the EU). Mismatched locations cause load jobs to fail.

Exporting results out

From query results, use Save results → Google Cloud Storage, pick a format like CSV or Parquet, and specify a `gs://` path for the export file.

Permissions needed

The user or service account must have both BigQuery permissions and Cloud Storage permissions to read source objects or write export files.

Key exam pitfalls

Watch for location mismatches and wrong write disposition; overwriting a table when you meant to append is a classic configuration mistake.

Example: Simple Analytics Pipeline (Cloud Storage → BigQuery → Cloud Storage)

Pipeline scenario

Nightly CSVs land in Cloud Storage, BigQuery stores and analyzes them, then daily summaries are exported back to Cloud Storage for another tool.

Resource layout

Use an ingestion bucket and export bucket in `us` multi-region, plus a `transactions` dataset in BigQuery US with a partitioned table `daily`.

Step 1: Load into BigQuery

A scheduled job loads `gs://myproj-tx-raw-us/YYYY-MM-DD.csv` into `transactions.daily` with write mode Append, leveraging the date partition column.

Step 2: Summarize and export

A scheduled query aggregates data. The results are exported as Parquet into `gs://myproj-tx-summary-us/summary-YYYY-MM-DD.parquet`.

Exam insights

See the pattern: Cloud Storage raw → BigQuery analytics → Cloud Storage export, with matching locations and correct IAM roles for service accounts.

Quiz 1: Cloud Storage and Cloud SQL

Test your understanding of bucket configuration and Cloud SQL connectivity.

Your company runs a Cloud Run service in europe-west1 that stores user documents and needs a relational database. Requirements: documents must stay in the EU, the database must not be publicly reachable, and the app should connect with minimal latency. Which combination is the BEST fit?

Cloud Storage bucket in us-central1, Cloud SQL instance in us-central1 with public IP and authorized networks
Cloud Storage bucket in europe-west1, Cloud SQL instance in europe-west1 with private IP and a Serverless VPC Access connector
Cloud Storage bucket in europe-west1, Cloud SQL instance in us-central1 with public IP, no authorized networks
Cloud Storage bucket in EU multi-region, Cloud SQL instance in europe-west1 with public IP and authorized networks

Show Answer

Answer: B) Cloud Storage bucket in europe-west1, Cloud SQL instance in europe-west1 with private IP and a Serverless VPC Access connector

Option 2 keeps both storage and database in europe-west1 for low latency and EU residency. The Cloud SQL instance uses private IP plus a Serverless VPC Access connector for secure, non-public connectivity from Cloud Run. Option 1 violates EU-only data and uses public IP. Option 3 crosses regions and exposes the DB. Option 4 uses a public IP, which the requirement explicitly forbids.

Quiz 2: BigQuery Loading and Partitioning

Check your understanding of BigQuery datasets, locations, and partitioning.

You need to load daily log files from a Cloud Storage bucket into a BigQuery table and minimize query costs when analysts filter by log date. Which design is MOST appropriate?

Create an unpartitioned table; analysts can always add WHERE clauses on the date column.
Create a table partitioned by ingestion time and store all logs in a single Cloud Storage bucket in us-central1, while the dataset is in EU.
Create a table partitioned by a DATE column (log_date), ensure the dataset and Cloud Storage bucket are both in EU, and encourage analysts to filter on log_date.
Create a clustered but not partitioned table on the log_date column; location of dataset and bucket does not matter.

Show Answer

Answer: C) Create a table partitioned by a DATE column (log_date), ensure the dataset and Cloud Storage bucket are both in EU, and encourage analysts to filter on log_date.

Option 3 uses partitioning on the DATE column analysts filter on, which reduces scanned data and cost. It also respects the requirement that dataset and Cloud Storage bucket locations must be compatible (both in EU). Option 1 does not control scan size. Option 2 mismatches locations. Option 4 clusters but does not partition and incorrectly ignores location compatibility.

Design Exercise: End-to-End Data Pipeline

Work through this thought exercise to connect Cloud Storage, Cloud SQL, and BigQuery.

Scenario:

A mobile game writes event JSON files every hour to `gs://game-events-raw/`.
The game backend uses Cloud SQL (PostgreSQL) for user profiles.
Product managers want daily analytics in BigQuery combining user profile attributes (from Cloud SQL) with event data (from Cloud Storage).

Your tasks (think and optionally jot notes):

Bucket configuration

Choose a location and storage class for `game-events-raw` if most users are in North America.
Decide whether you need uniform or fine-grained access.

Cloud SQL export strategy

How would you periodically move a snapshot of user profile data from Cloud SQL into BigQuery? Think about:
Exporting from Cloud SQL to Cloud Storage (format, permissions).
Loading that export into a BigQuery table.

BigQuery schema and partitioning

Sketch two tables: `events` and `users`.
Which table should be partitioned, and on which field, to support daily dashboards?

IAM and service accounts

Identify at least two places where a service account is needed in this pipeline.
For each, note what broad type of roles it needs (e.g., BigQuery read, Cloud Storage write).

After you think it through, compare your design mentally with the patterns from earlier steps: raw in Cloud Storage, analytics in BigQuery, secure exports from Cloud SQL, and careful IAM role assignment.

Key Term Review

Use these flashcards to reinforce core concepts before moving on.

Cloud Storage bucket location: The region or multi-region where bucket data is stored. It affects latency, data residency, and must be compatible with BigQuery dataset locations for load/export jobs.
Cloud Storage classes (name all): The Cloud Storage classes are: Standard, Nearline, Coldline, Archive, Regional Persistent Disk.
Identity and Access Management (IAM): Identity and Access Management (IAM) lets you manage access control by defining who (identity) has what access (role) for which resource.
service account: A service account is a special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.
Cloud SQL private IP vs public IP: Private IP lets resources in the same VPC connect without traversing the public internet (often with Serverless VPC Access). Public IP is reachable over the internet and should be restricted using authorized networks and secure clients.
BigQuery dataset: A top-level container in BigQuery that holds tables and views and is bound to a specific location (region or multi-region).
BigQuery table partitioning: A way to split a table into segments, usually based on a date/time column or ingestion time, so queries can scan only relevant partitions and reduce cost.
BigQuery table clustering: A technique that organizes data within a table or partition based on one or more columns, improving performance when queries filter or aggregate on those columns.
Typical pipeline pattern (Cloud Storage ↔ BigQuery): Raw data lands in Cloud Storage, is loaded into BigQuery for analysis, and results can be exported back to Cloud Storage for downstream systems.
Associate Cloud Engineer role focus in this module: An Associate Cloud Engineer deploys and secures applications, services, and infrastructure, monitors operations of multiple projects, and maintains enterprise solutions to ensure that they meet target performance metrics. In this module, this means correctly deploying Cloud Storage, Cloud SQL, and BigQuery resources and wiring them together securely.

Key Terms

BigQuery: A serverless, highly scalable data warehouse for running analytical SQL queries on large datasets, with storage organized into datasets and tables.
Cloud SQL: A managed relational database service for MySQL, PostgreSQL, and SQL Server, where Google handles backups, replication, and patching.
Service account: A special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.
Dataset (BigQuery): A logical container for tables and views in BigQuery, bound to a specific geographic location.
Cloud Storage bucket: A top-level container for objects (files) in Cloud Storage, configured with a globally unique name, location, storage class, and access controls.
Serverless VPC Access: A Google Cloud feature that lets serverless services like Cloud Run and Cloud Functions connect to resources in a VPC network using internal IPs.
Clustered table (BigQuery): A BigQuery table whose data is organized based on one or more clustering columns to improve query performance when filtering or aggregating on those columns.
Partitioned table (BigQuery): A BigQuery table that is divided into partitions based on a partitioning column or ingestion time, enabling queries to scan only relevant partitions.
Authorized networks (Cloud SQL): A list of public IP address ranges that are allowed to connect to a Cloud SQL instance when using a public IP.
Write disposition (BigQuery load job): A setting that controls how a load job writes to a table: write if empty, append to table, or overwrite table.

Storage and Databases in the ACE Exam Context

Where this fits in ACE

What you must be able to do

Typical exam-style scenarios

Your role mindset

Cloud Storage Buckets: Locations, Classes, and Access

What Cloud Storage is for

Locations and residency

Storage classes (exam detail)

Creating a bucket (console mental steps)

Common Cloud Storage exam traps

Example: Creating a Secure Bucket for App Uploads

Scenario requirements

Design choices

Creating the bucket

Wiring permissions with IAM

Exam takeaways

Cloud SQL: Instances, Databases, Users, and Connectivity

What Cloud SQL provides

Instance, database, user

Creating an instance

Secure connectivity

Common Cloud SQL exam traps

Example: Deploying Cloud SQL for a Cloud Run App

Scenario overview

High-level design

Creating networking pieces

Databases, users, and IAM

Connecting from Cloud Run

BigQuery Basics: Datasets, Tables, Partitioning, and Clustering

What BigQuery is

Datasets and tables

Partitioning concept

Clustering concept

Common BigQuery exam traps

Loading Data into BigQuery and Exporting to Cloud Storage

BigQuery ↔ Cloud Storage direction 1

Location compatibility

Exporting results out

Permissions needed

Key exam pitfalls

Example: Simple Analytics Pipeline (Cloud Storage → BigQuery → Cloud Storage)

Pipeline scenario

Resource layout

Step 1: Load into BigQuery

Step 2: Summarize and export

Exam insights

Quiz 1: Cloud Storage and Cloud SQL

Quiz 2: BigQuery Loading and Partitioning

Design Exercise: End-to-End Data Pipeline

Key Term Review

Key Terms

Finished reading?