SkarpSkarp

Chapter 19 of 27

Operating Storage and Databases: Cloud Storage, Cloud SQL, and BigQuery

Data services must be monitored, tuned, and protected; develop the skills to keep storage and database workloads reliable and cost-effective.

27 min readen

Cloud Storage Operations: Buckets, Classes, and Lifecycle

Cloud Storage in Context

Cloud Storage is Google Cloud's general-purpose object store. It often feeds databases like BigQuery or Cloud SQL but is not itself a database. You manage buckets, objects, and policies over time.

Data Product Landscape

Know the core data storage product choices: Cloud SQL, BigQuery, Firestore, Spanner, Bigtable. Cloud Storage sits alongside these as object storage used by many services.

Buckets and Locations

A bucket is the top-level container for objects. Choose a regional, dual-region, or multi-region location. Keep buckets close to compute to reduce latency and cross-region egress cost.

Storage Classes

Remember the canonical list: Standard, Nearline, Coldline, Archive, Regional Persistent Disk. Standard is for frequent access; Nearline, Coldline, and Archive trade lower storage cost for higher access cost.

Lifecycle Management

Lifecycle rules automatically transition or delete objects based on conditions like age or storage class. This is key for long-term cost control and cleanup of old or noncurrent objects.

Cloud Storage Versioning, Retention, and Access Troubleshooting

Object Versioning

With versioning enabled, overwrites and deletes create noncurrent versions instead of losing data. Use it to protect against accidental changes, then clean up older versions via lifecycle rules.

Retention Policies

Retention policies enforce a minimum age before objects can be deleted or overwritten. Once locked, a bucket-level retention policy cannot be shortened or removed, which is key for compliance.

Object Holds

Temporary and event-based holds block deletion even after the retention period. They are used for legal or audit reasons when specific objects must not be removed yet.

IAM vs ACLs

IAM is the recommended way to manage Cloud Storage access. Legacy ACLs still exist but are discouraged. Check IAM roles, the correct project, and uniform bucket-level access when debugging 403 errors.

Public Access and Signed URLs

For public content, you can grant `allUsers` read access or use signed URLs for time-limited access. Exams often favor signed URLs as a more secure, controlled approach.

Cloud Storage Operations Scenario: Cost and Availability

Workload Overview

You store daily-accessed photos, infrequently-read analytics exports, and 7-year regulatory logs. One project must balance cost, durability, and manageability for all three data types.

Locations and Classes

App in `europe-west1`: photos in a Standard bucket in `europe-west1`; analytics in Nearline or Coldline; regulatory logs in an EU dual-region Archive bucket for extra resilience.

Lifecycle Strategy

Use lifecycle rules: move photos to Nearline after 365 days, delete analytics older than 3 years, and keep logs untouched until retention requirements are met.

Versioning and Retention

Enable versioning on photos for safety and clean up old versions after 90 days. Apply and lock a 7-year retention policy on logs to enforce compliance automatically.

403 Troubleshooting

If a VM gets 403 errors on analytics, check the VM's service account IAM role, project alignment, and whether uniform bucket-level access is enabled instead of legacy ACLs.

Cloud SQL Backups, Maintenance, and High Availability

Cloud SQL Basics

Cloud SQL is a managed relational database for MySQL, PostgreSQL, and SQL Server. You focus on configuring backups, maintenance, and high availability rather than managing OS or database binaries.

Automated and On-Demand Backups

Automated backups run daily in a chosen window with a retention period. On-demand backups are manual snapshots before risky changes. Together they support recovery from data loss.

Maintenance Windows

Google applies patches and updates during maintenance windows. You choose the day and hour to minimize impact. Expect brief downtime or failovers during this period.

High Availability

HA uses a primary instance and a standby in another zone. Cloud SQL replicates data and automatically promotes the standby on failure, keeping the same connection name for clients.

Storage and Read Scaling

Choose SSD or HDD and enable automatic storage increase to avoid disk-full outages. Use read replicas to offload read-heavy workloads and improve performance.

Cloud SQL Scenario: Configuring Backups and HA

E-commerce Requirements

Production PostgreSQL Cloud SQL needs RPO 5 minutes, RTO a few minutes, and no maintenance during 09:00–21:00. You must design backups, HA, and maintenance around these goals.

Backups and PITR Setup

Enable automated backups with 14-day retention and configure point-in-time recovery. Schedule backups at 02:00 local time and monitor that they complete successfully.

Enabling High Availability

Create the instance with HA: primary in one zone, standby in another. Clients use the same connection name, so failover is transparent aside from brief connection drops.

Maintenance Window Choice

Set the maintenance window to Sunday 03:00–05:00 when traffic is lowest. Inform stakeholders so they expect brief disruptions during that time.

Backup Misconfiguration Lesson

Disabling automated backups to save money can leave you with no recovery path after corruption. Exams often highlight this trade-off: always keep backups for production.

BigQuery Monitoring, Slots, and Query Optimization Basics

BigQuery Operations Focus

BigQuery is serverless, so you do not manage servers. Instead, you monitor storage and query costs, analyze query performance, and adjust schemas and queries to be efficient.

Pricing and Slots

BigQuery charges for storage and query processing. On-demand pricing bills per TB scanned. Reservations use slots (dedicated capacity). For the exam, understand on-demand and recognize slots exist.

Monitoring Usage

Use the BigQuery console for query history and bytes processed. Cloud Monitoring and audit logs help you track who ran which queries and which ones are most expensive.

Partitioning and Clustering

Partition tables by date or ingestion time, and cluster by frequently filtered or joined columns. Filtering on the partition column reduces scanned data and cost.

Query Optimization Habits

Avoid `SELECT *`, filter early on partitioned columns, and avoid unnecessary cross-joins. Use approximate aggregate functions when exact precision is not needed.

BigQuery Scenario: Identifying and Fixing Expensive Queries

Cost Spike Problem

BigQuery costs doubled in a month. Query history shows daily jobs scanning 5 TB each over a 2-year `pageviewsraw` table with `SELECT *` and filters on `eventdate` and `country`.

Table Analysis

`pageviewsraw` is unpartitioned, so every query scans the full table. Columns include `eventdate`, `country`, and many others that are not always needed.

New Table Design

Create a new table partitioned by `eventdate` and clustered by `country` and `userid`. Backfill historical data into this optimized structure.

Query Rewrite

Update queries to read from the new table, filter on `event_date`, and select only required columns instead of `SELECT *`. This reduces scanned data dramatically.

Measuring and Exam Insight

After changes, bytes processed drop and latency improves. Exams expect partitioning and query optimization, not adding servers, since BigQuery is serverless.

Using Logs and Metrics to Troubleshoot Storage and Databases

Cloud Storage Logs and Errors

Use Cloud Logging to inspect access and error logs. 403 errors usually mean IAM or identity issues; 404 errors often indicate wrong bucket, object path, or location.

Cloud SQL Logs and Metrics

Cloud SQL logs help you see connection failures and slow queries. Metrics like CPU, memory, disk usage, and connection count highlight performance or capacity problems.

BigQuery Jobs and Metrics

BigQuery job logs show query text, bytes processed, and errors. Metrics like bytes processed per project or user help you spot cost spikes or inefficient queries.

Role of Service Accounts

A service account is used by applications and workloads to access Google Cloud. Many access issues come from the wrong service account or missing IAM roles.

Exam-Oriented Troubleshooting

Exam scenarios often give symptoms and ask which logs or metrics to inspect. Think systematically: identity and IAM first, then network, then performance metrics.

Thought Exercise: Choosing the Right Data Service

Use this mental exercise to connect workloads to the right Google Cloud data products and operational focus.

You have four workloads:

  1. A mobile game that needs to store player profiles (JSON documents), read and write frequently with low latency.
  2. A financial reporting system that runs complex SQL joins and aggregations over 5 years of transactional data.
  3. A time-series sensor ingestion pipeline from IoT devices, with millions of writes per second and occasional analytic queries.
  4. A video-sharing app storing raw video files and thumbnails.

Tasks:

  1. For each workload, pick the most appropriate product from the data storage product choices: Cloud SQL, BigQuery, Firestore, Spanner, Bigtable. (Cloud Storage is also available for raw objects.)
  2. For each choice, briefly note one operational responsibility you would focus on as an Associate Cloud Engineer.

Sample reasoning (do not peek until you try):

  • Workload 1: Firestore, focus on monitoring read/write throughput and security rules.
  • Workload 2: BigQuery, focus on partitioning, clustering, and query cost monitoring.
  • Workload 3: Bigtable or Spanner depending on consistency and query patterns; focus on capacity planning and performance metrics.
  • Workload 4: Cloud Storage for video objects; focus on storage classes, lifecycle rules, and access control.

Now, think about how Cloud SQL, Cloud Storage, and BigQuery specifically would appear in exam scenarios for each workload. Which would be primary, and which might be supporting (for example, Cloud Storage as a staging area for BigQuery)?

Quiz: Cloud Storage and Cloud SQL Operations

Answer this question to check your understanding of Cloud Storage lifecycle and Cloud SQL backups.

Your team stores daily CSV exports in a Cloud Storage bucket. Data is heavily used for 30 days, sometimes queried for up to 6 months, and rarely after that. Compliance requires that data must be kept for at least 2 years. You also run a production Cloud SQL instance. Which combination of settings best balances cost, availability, and protection?

  1. Cloud Storage: Standard class only, no lifecycle rules. Cloud SQL: disable automated backups to save cost.
  2. Cloud Storage: Standard for 30 days, then lifecycle to Nearline until 6 months, then to Archive until 2 years with a 2-year retention policy. Cloud SQL: enable automated backups and point-in-time recovery.
  3. Cloud Storage: Archive class from day 1 with no retention policy. Cloud SQL: rely only on HA without backups.
  4. Cloud Storage: Coldline class from day 1 and delete objects after 6 months. Cloud SQL: automated backups disabled, manual exports once a year.
Show Answer

Answer: B) Cloud Storage: Standard for 30 days, then lifecycle to Nearline until 6 months, then to Archive until 2 years with a 2-year retention policy. Cloud SQL: enable automated backups and point-in-time recovery.

Option 2 is correct. Standard then Nearline then Archive matches access patterns while meeting the 2-year retention requirement. A retention policy prevents early deletion. For Cloud SQL, enabling automated backups and point-in-time recovery protects production data. The other options either ignore compliance, make recent data too hard to access, or dangerously disable backups.

Quiz: BigQuery Optimization and Troubleshooting

Check your understanding of BigQuery monitoring and optimization.

A daily BigQuery job that filters on a `created_date` column and joins several tables has suddenly become very expensive and slow. Which action is the BEST first step for you as an Associate Cloud Engineer?

  1. Increase the number of BigQuery slots by creating more Compute Engine instances.
  2. Check the BigQuery job history to see bytes processed and verify that the main table is partitioned on `created_date` and that the query filters on this column.
  3. Move the data from BigQuery to Cloud SQL and rerun the query there.
  4. Disable audit logging for BigQuery to reduce overhead and speed up queries.
Show Answer

Answer: B) Check the BigQuery job history to see bytes processed and verify that the main table is partitioned on `created_date` and that the query filters on this column.

Option 2 is correct. The first step is to inspect job history and ensure the table is partitioned on `created_date` and that the query uses that column in filters. This often explains cost spikes. Option 1 misunderstands slots and serverless execution. Option 3 moves the problem to a less suitable system. Option 4 removes visibility without fixing performance.

Key Term Review: Storage and Databases Operations

Flip through these cards to reinforce key concepts from this module.

Cloud Storage lifecycle rule
A configuration on a bucket that automatically transitions objects between storage classes or deletes them based on conditions such as object age, storage class, or version state, helping manage cost and retention.
Object versioning in Cloud Storage
A feature that keeps multiple generations of an object when it is overwritten or deleted, allowing recovery of previous versions at the cost of additional storage usage.
Retention policy in Cloud Storage
A bucket-level setting that enforces a minimum time before objects can be deleted or overwritten. Once locked, it cannot be shortened or removed, supporting compliance requirements.
Automated backups in Cloud SQL
Daily backups taken automatically during a configured window, stored for a defined retention period, and used for restoring instances or enabling point-in-time recovery.
High availability (HA) in Cloud SQL
A configuration where a primary instance replicates synchronously to a standby in another zone within the same region, enabling automatic failover on primary failure.
BigQuery partitioning
A table design technique that divides data into segments, typically by date or ingestion time, so queries that filter on the partition column scan fewer bytes and cost less.
BigQuery clustering
Organizing data within a table or partition based on one or more columns, improving performance for queries that filter or join on those clustered columns.
Service account (definition)
A service account is a special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.
IAM (Identity and Access Management) (definition)
Identity and Access Management (IAM) lets you manage access control by defining who (identity) has what access (role) for which resource.
BigQuery cost drivers
The main drivers are storage (bytes stored) and query processing (bytes scanned). Inefficient queries, like `SELECT *` on large unpartitioned tables, significantly increase cost.

Key Terms

IAM
Identity and Access Management (IAM) lets you manage access control by defining who (identity) has what access (role) for which resource.
BigQuery
Google Cloud's serverless, highly scalable data warehouse designed for analytical SQL queries over large datasets.
Cloud SQL
A fully managed relational database service for MySQL, PostgreSQL, and SQL Server, where Google manages maintenance, backups, and availability.
Cloud Storage
Google Cloud's unified object storage service for structured and unstructured data such as files, media, and backups.
Lifecycle rule
A policy on a Cloud Storage bucket that automatically transitions or deletes objects based on conditions like age or storage class.
Service account
A service account is a special kind of account used by an application or compute workload, not a person, to make authorized API calls and access Google Cloud resources.
Automated backup
A scheduled backup taken automatically by Cloud SQL during a configured window, used for recovery and point-in-time restore.
Retention policy
A Cloud Storage setting that enforces a minimum retention time for objects before they can be deleted or overwritten.
Object versioning
A Cloud Storage feature that preserves old versions of objects when they are overwritten or deleted.
Clustered table (BigQuery)
A BigQuery table where data is organized based on one or more columns to improve performance for filters and joins.
Partitioned table (BigQuery)
A table whose data is divided into partitions, often by date, enabling queries to scan only relevant partitions.
High availability (Cloud SQL)
A configuration with a primary and standby instance in different zones, providing automatic failover within a region.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself