SkarpSkarp

Chapter 6 of 27

Planning Data Storage: Relational, NoSQL, and Analytics Choices

Data design decisions ripple through performance, cost, and operations; learn how to map use cases to the right managed database or analytics service.

27 min readen

Big Picture: Mapping Workloads to Google Cloud Databases

Your Data Storage Toolkit

For this exam you must know 5 core data products: Cloud SQL, BigQuery, Firestore, Spanner, Bigtable. The skill is mapping real workloads to the right one.

Three Families

Group them as: relational/transactional (Cloud SQL, Spanner), NoSQL/operational (Firestore, Bigtable), and analytics/warehousing (BigQuery).

Scenario-Based Thinking

Questions describe an app (mobile, web, IoT, BI). You decide: is this transactional, real-time sync, time series, or analytics? That drives which product fits best.

Multi-Tier Architectures

Real solutions mix services: e.g. app on Cloud Run using Cloud SQL or Firestore, plus BigQuery for analytics. You will design these combinations in scenarios.

Cloud SQL: Managed Relational Workhorse

What is Cloud SQL?

Cloud SQL is a managed MySQL, PostgreSQL, and SQL Server service. Google handles backups, replication, and patching so you focus on schemas and queries.

When to Use It

Use Cloud SQL for transactional web apps, legacy migrations, and line-of-business apps that need ACID transactions, joins, and strong relational constraints.

Limits and Tradeoffs

Cloud SQL scales mainly by choosing bigger machines and adding read replicas. It is not ideal for huge analytical scans or globally distributed, ultra-high-write workloads.

Exam Signal Words

Phrases like “migrate existing MySQL,” “minimal code changes,” or “OLTP transactional app” are strong hints that Cloud SQL is the right product.

Cloud SQL in Practice: Migrating a Legacy App

Scenario Setup

You have an on-prem Java app using PostgreSQL. You want to move it to Google Cloud with minimal changes and run it on Compute Engine VMs.

Choosing Cloud SQL

Pick Cloud SQL for PostgreSQL. You keep the same engine, get managed backups and HA, and avoid rewriting SQL or schema logic.

Architecture Sketch

Visualize: VPC network, two web VMs in a managed instance group, a Cloud SQL instance on private IP, plus optional read replicas for reporting.

Ruling Out Other Options

BigQuery is for analytics, not OLTP. Firestore is a NoSQL document store and would require major schema and query rewrites.

BigQuery: Serverless Analytics Data Warehouse

What is BigQuery?

BigQuery is a serverless analytics data warehouse. You use SQL to run huge analytical queries without managing servers or storage volumes.

Strengths and Use Cases

It shines for BI dashboards, log analysis, and data warehousing where you scan millions or billions of rows, often in append-only tables.

Cost and Scaling Model

Storage and compute are separate. You pay for data stored and for query processing, so you can keep lots of data but control query costs.

When Not to Use It

BigQuery is not for transactional apps that need low-latency single-row reads/writes. Use Cloud SQL, Spanner, Firestore, or Bigtable instead.

BigQuery in Practice: Analytics Layer for an App

E‑Commerce Example

Your store runs on Cloud SQL, but product managers want rich reports. You do not want heavy analytics queries slowing down customers.

Building the Pipeline

Export Cloud SQL data to Cloud Storage, then load it into BigQuery. Also ingest clickstream logs into BigQuery for behavioral analysis.

Analytics in BigQuery

Create fact and dimension tables, define KPIs in SQL views, and connect BI tools to BigQuery for dashboards and ad-hoc queries.

Pattern to Remember

Use Cloud SQL for live transactions and BigQuery as the analytics layer. This offloads reporting and improves app performance.

Firestore, Spanner, Bigtable: Comparing NoSQL and Global Relational

Firestore in a Nutshell

Firestore is a document database with collections and documents, strong consistency on single documents, and great support for mobile/web apps and real-time sync.

Spanner in a Nutshell

Spanner is a globally distributed relational database with SQL, schemas, and strong external consistency across regions for mission-critical OLTP.

Bigtable in a Nutshell

Bigtable is a wide-column NoSQL store for petabyte-scale, low-latency workloads like time series and massive key-value datasets.

How to Choose

SQL + global consistency → Spanner. Flexible documents + real-time sync → Firestore. Extreme scale key-value/time series → Bigtable.

Thought Exercise: Match Workloads to Firestore, Spanner, or Bigtable

Use this step to practice choosing between Firestore, Spanner, and Bigtable. For each scenario, pause and decide which product fits best, then check the suggested answer.

  1. Global Banking Ledger
  • Requirements: Multi-region, strict consistency for account balances, complex SQL queries, and multi-row transactions.
  • Your choice?
  • Suggested answer: Spanner. You need global strong consistency, SQL, and ACID transactions.
  1. Mobile Chat Application
  • Requirements: Real-time message updates, offline support, flexible schema for message metadata, mainly mobile clients.
  • Your choice?
  • Suggested answer: Firestore. It integrates well with Firebase, supports real-time sync, and uses a document model.
  1. IoT Sensor Platform
  • Requirements: Millions of sensor readings per second, time series queries (by device and time range), low-latency writes and reads.
  • Your choice?
  • Suggested answer: Bigtable. It is optimized for time series and very high throughput key-based access.
  1. User Profiles with Simple Lookups
  • Requirements: Store user preferences, simple key-based lookups by user ID, high read traffic, but no complex joins.
  • Your choice?
  • Suggested answer: Either Firestore or Bigtable, depending on scale. For typical app scale and flexible fields, Firestore is often preferred on the exam.

As you practice, ask:

  • Do I need SQL and multi-row transactions? → Spanner.
  • Do I need documents and real-time sync? → Firestore.
  • Do I need ultra-scale key-value/time series? → Bigtable.

Designing Multi-Tier Architectures with Multiple Data Stores

Why Combine Databases?

Each product has strengths. Real apps often use multiple data stores so that transactional, real-time, and analytical workloads each sit on the best tool.

Example Architecture

Frontends on Cloud Run and mobile, microservices on GKE, Cloud SQL for orders, Firestore for sessions, Bigtable for metrics, and BigQuery for analytics.

Moving Data to Analytics

Regularly export operational data from Cloud SQL, Firestore, or Bigtable to Cloud Storage and into BigQuery for dashboards and ML features.

Exam Clue: Mixed Requirements

If a scenario mentions transactions + real-time sync + heavy analytics, expect a solution that combines Cloud SQL/Spanner, Firestore/Bigtable, and BigQuery.

Quick Check: Product Fit

Test your understanding of which Google Cloud data product fits each scenario.

You are designing a new analytics platform to run SQL queries over terabytes of log data stored in Google Cloud. Queries will be ad-hoc and scan entire months of data. Which primary product should you choose for the analytics store?

  1. Cloud SQL
  2. BigQuery
  3. Firestore
  4. Bigtable
Show Answer

Answer: B) BigQuery

**BigQuery** is the correct choice. It is a serverless analytics data warehouse optimized for large-scale SQL queries over huge datasets. Cloud SQL is for transactional workloads, Firestore is a document store for apps, and Bigtable is for low-latency key/time-series access, not general SQL analytics.

Scenario Quiz: Mixing Services

Another scenario to reinforce multi-tier design choices.

A startup is building a global ride-sharing app. They need: (1) strong consistency for ride payments and balances across regions using SQL, (2) real-time updates of driver locations to riders' phones, and (3) dashboards for historical analysis of rides. Which combination is most appropriate?

  1. Cloud SQL for everything
  2. Spanner for payments, Firestore for real-time locations, BigQuery for analytics
  3. Bigtable for payments, Firestore for locations, BigQuery for analytics
  4. Firestore for payments and locations, Bigtable for analytics
Show Answer

Answer: B) Spanner for payments, Firestore for real-time locations, BigQuery for analytics

The best combination is **Spanner for payments**, **Firestore for real-time locations**, and **BigQuery for analytics**. Spanner provides global, strongly consistent SQL for critical financial data. Firestore handles real-time, document-based updates to mobile clients. BigQuery is ideal for historical analytics. Cloud SQL is not designed for global OLTP at this scale, and Bigtable is not a relational store for payments.

Practical Snippets: Creating and Using Databases with gcloud

As an Associate Cloud Engineer, you should be comfortable using the gcloud CLI to provision data services. Here are practical snippets that show how these products appear in command-line workflows.

1. Create a Cloud SQL instance (MySQL example)

```bash

Create a Cloud SQL instance

gcloud sql instances create ace-demo-sql \

--database-version=MYSQL80 \

--tier=db-custom-2-7680 \

--region=us-central1

Create a database inside the instance

gcloud sql databases create appdb --instance=ace-demo-sql

```

2. Create a BigQuery dataset and table

```bash

Create a dataset

gcloud bigquery datasets create acedemods \

--location=US

Create a table from a schema definition file

gcloud bigquery tables create acedemods.events \

--schema=schema.json

```

3. Create a Firestore database in Native mode

```bash

Enable Firestore in Native mode for a project

(Location is chosen once per project for Firestore.)

gcloud firestore databases create \

--region=us-central \

--type=firestore-native

```

4. Create a Spanner instance and database

```bash

Create a Spanner instance

gcloud spanner instances create ace-spanner \

--config=regional-us-central1 \

--description="ACE demo" \

--nodes=1

Create a Spanner database

gcloud spanner databases create appdb \

--instance=ace-spanner

```

5. Create a Bigtable instance

```bash

Create a Bigtable instance

gcloud bigtable instances create ace-bt \

--display-name="ACE Bigtable" \

--cluster=ace-bt-c1 \

--cluster-zone=us-central1-b \

--cluster-num-nodes=3 \

--instance-type=PRODUCTION

```

You do not need to memorize every flag, but you should recognize the product families and be comfortable reading `gcloud` commands that create or interact with them.

Key Concepts Review: Data Storage Product Choices

Flip through these cards to reinforce the core distinctions you need for the exam.

Data storage product choices (list all 5 in order)
The data storage product choices are: Cloud SQL, BigQuery, Firestore, Spanner, Bigtable.
Cloud SQL – primary role and typical workloads
Cloud SQL is a managed relational database service (MySQL, PostgreSQL, SQL Server). It fits transactional applications, OLTP workloads, and legacy migrations that need ACID transactions, joins, and schemas with minimal code changes.
BigQuery – what is it optimized for?
BigQuery is a serverless analytics data warehouse optimized for large-scale SQL analytics, BI, and reporting over huge datasets, with separation of storage and compute.
Firestore – data model and best-fit use cases
Firestore is a NoSQL document database using collections and documents. It is best for mobile and web apps needing flexible schemas, real-time sync, and offline support, especially via Firebase.
Spanner – when choose it over Cloud SQL?
Choose Spanner over Cloud SQL when you need global or regional horizontal scaling, strong external consistency across regions, and relational SQL with multi-row ACID transactions for mission-critical OLTP.
Bigtable – core characteristics and use cases
Bigtable is a wide-column NoSQL store designed for petabyte-scale data, very high throughput, and low-latency access using a row key. It fits time series, IoT, personalization, and large key-value workloads.
Operational vs analytical workloads – mapping to products
Operational (OLTP, app backends): Cloud SQL, Spanner, Firestore, Bigtable. Analytical (OLAP, reporting, BI): BigQuery as the central warehouse, often fed from the operational stores.
Pattern: Offloading reporting from production DB
Keep transactions on Cloud SQL or Spanner, export or stream data into BigQuery, and run heavy analytical queries there to avoid impacting the production database.
Exam clue: “Real-time mobile sync and offline support”
This phrasing strongly indicates Firestore (often via Firebase) as the right choice due to its document model and real-time capabilities.
Exam clue: “Global consistency for financial transactions with SQL”
This describes Spanner: a globally distributed relational database with strong external consistency and ACID transactions.

Key Terms

OLAP
Online Analytical Processing; workloads focused on complex analytical queries and aggregations over large datasets, such as reporting and BI.
OLTP
Online Transaction Processing; workloads focused on many small, transactional read/write operations, such as order entry or banking systems.
Spanner
A globally distributed, strongly consistent relational database service on Google Cloud that supports SQL and horizontal scaling for mission-critical OLTP workloads.
BigQuery
A serverless, highly scalable analytics data warehouse on Google Cloud designed for large-scale SQL analytics and business intelligence.
Bigtable
A wide-column NoSQL database service on Google Cloud designed for petabyte-scale, low-latency workloads such as time series and large key-value datasets.
Cloud SQL
A managed relational database service on Google Cloud that supports MySQL, PostgreSQL, and SQL Server, suitable for transactional applications and legacy migrations.
Firestore
A NoSQL document database on Google Cloud, often used with Firebase, that provides real-time synchronization and offline support for mobile and web apps.
Data warehouse
A centralized repository optimized for analytical queries and reporting, typically storing historical, integrated data from multiple sources.
Document database
A type of NoSQL database that stores data as flexible JSON-like documents instead of rows in tables.
Wide-column store
A type of NoSQL database that organizes data into rows and dynamic columns, optimized for large-scale, sparse datasets accessed by row key.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself