Log bucket (Cloud Logging)

A storage container in Cloud Logging that holds log entries with its own location, retention, and access control, separate from Cloud Storage buckets.

A routing rule attached to the Cloud Logging router that selects log entries with filters and sends them to a destination such as a log bucket, Cloud Storage, BigQuery, or Pub/Sub.

A Cloud Monitoring metric derived from log entries that match a filter in Cloud Logging, used to count events or measure distributions like latency.

Counter vs Distribution log-based metrics

Counter metrics count how many log entries match a filter. Distribution metrics extract a numeric value from each matching entry and track its distribution over time.

Common IAM error pattern in logs

Messages containing PERMISSION_DENIED or insufficient authentication scopes, often with principalEmail, indicating missing or incorrect IAM roles or service accounts.

A Cloud Logging resource that defines how selected log entries are routed or exported, including a destination and inclusion/exclusion filters.

The Cloud Logging component that receives all log entries and, based on configured sinks, routes them to log buckets or export destinations.

A Google Cloud service that collects and displays latency data for requests, helping you analyze and optimize performance across distributed services.

Advanced Logging, Metrics, and Troubleshooting Across Services — Google Cloud Associate Cloud Engineer: Complete Exam-Ready Masterclass

Big Picture: Why Advanced Logging and Metrics Matter

Why This Matters

Complex Google Cloud issues often span multiple services. You need Cloud Logging and Cloud Monitoring together to trace problems end‑to‑end across compute, storage, and networking.

Exam Context

These skills live mainly in "Ensuring successful operation of a cloud solution" and "Deploying and implementing a cloud solution" for the Associate Cloud Engineer exam.

What You Will Do

You will configure log buckets and routing, export logs with sinks, create custom metrics, hook them to alerts, and practice full troubleshooting workflows using logs, metrics, and traces.

Cloud Logging Architecture: Buckets, Routers, and Sinks

Core Pieces

Cloud Logging has log entries, log buckets, the log router, and log sinks. The router receives all logs and sinks define how selected logs are routed or exported.

Default vs Custom Buckets

Projects have a Default log bucket and sometimes Required buckets. You can add user‑defined buckets with custom region, retention, and access control.

Common Exam Trap

Do not confuse Cloud Logging log buckets with Cloud Storage buckets. They are different services. Log sinks can export from log buckets into Cloud Storage buckets.

Hands‑On: Creating a Regional Log Bucket with Custom Retention

Scenario

You must keep production audit logs for 1 year in the EU, while other logs use default retention. You create a dedicated regional log bucket for these logs.

Create the Bucket

In Logging → Log buckets, create `prod-audit-eu`, choose an EU region like europe-west1, and set retention to 365 days to meet compliance.

Route Logs with a Sink

In Logging → Log router, create a sink `route-prod-audit-to-eu`, choose destination = Cloud Logging bucket `prod-audit-eu`, and filter audit logs from your prod project.

Log Sinks and Exports: Cloud Storage, BigQuery, Pub/Sub

What a Sink Does

A log sink defines a destination plus filters. It exports selected logs from Cloud Logging to Cloud Storage, BigQuery, Pub/Sub, or another log bucket.

Choosing Destinations

Cloud Storage is for cheap long‑term archives, BigQuery for SQL analytics and dashboards, and Pub/Sub for real‑time streaming or event‑driven processing.

IAM and Service Accounts

Each sink uses a service account that needs roles like storage.objectCreator, bigquery.dataEditor, or pubsub.publisher on the destination resource.

Quick Check: Log Buckets and Sinks

Test your understanding of log buckets and sinks.

You need to keep all production HTTP request logs for 7 years, but you rarely query them. Which configuration best balances cost and requirements?

Increase retention on the _Default log bucket to 7 years.
Create a log sink that exports matching logs to a Cloud Storage bucket with lifecycle rules for 7‑year retention.
Export logs to BigQuery and keep them for 7 years.
Create a log sink to a Pub/Sub topic and store logs in a custom application database.

Show Answer

Answer: B) Create a log sink that exports matching logs to a Cloud Storage bucket with lifecycle rules for 7‑year retention.

Cloud Storage is the low‑cost choice for long‑term retention when you do not need frequent querying. You create a sink from Cloud Logging to a Cloud Storage bucket and manage 7‑year retention with lifecycle rules. Extending _Default retention is more expensive; BigQuery is for analytics; Pub/Sub alone does not provide long‑term storage.

Custom Metrics in Cloud Monitoring

Why Custom Metrics

System metrics are not enough. You need app‑specific signals like failed logins or 500 errors. Log‑based metrics turn patterns in logs into metrics.

Metric Types

Log‑based metrics can be Counters (count events) or Distributions (track numeric values like latency). They are defined by a log filter in Logs Explorer.

Creation Flow

Filter logs in Logs Explorer, click Create metric, choose type, name it, and optionally map a numeric field. The metric then appears in Cloud Monitoring.

From Logs to Alerts: Building a 5xx Error Alert for Cloud Run

Step 1: Metric from Logs

Filter Cloud Run logs for 5xx status codes in Logs Explorer, then create a Counter log‑based metric named `cloudrun5xx_errors` from that filter.

Step 2: View the Metric

In Metrics explorer, search for `cloudrun5xxerrors` and plot it by servicename to see error counts per Cloud Run service over time.

Step 3: Alerting Policy

Create an alerting policy that triggers when `cloudrun5xx_errors` exceeds a threshold, such as more than 5 errors in 5 minutes, and attach notification channels.

Thought Exercise: Picking the Right Observability Tool

Scenario A: Latency Spikes

Cloud Run requests sometimes take 10 seconds, but CPU and memory look fine. Which tool best shows where request time is spent across services?

Scenario B: Pipeline Failures

A data pipeline sometimes fails to write to BigQuery because of permissions. Where do you look to see the error details and which identity failed?

Scenario C: SSH Security Alert

Security wants near real‑time alerts for failed SSH logins on any VM. Which combination of logs, metrics, and alerts would you use?

End‑to‑End Troubleshooting Workflow Across Services

Step 1: Start from Symptoms

Check Cloud Monitoring dashboards, alerting policies, and uptime checks to see which services show increased errors or latency during the incident.

Step 2–4: Logs, Chain, Quotas

Use Logs Explorer to find 5xx errors, follow dependencies (Cloud SQL, Pub/Sub), and look for PERMISSIONDENIED or QUOTAEXCEEDED to catch IAM or quota issues.

Step 5–6: Trace and Verify

Use Cloud Trace to pinpoint slow operations, apply fixes like scaling or IAM changes, then monitor metrics and logs to confirm recovery and stability.

Common Error Patterns in Logs and How to Interpret Them

IAM and Quota Errors

PERMISSIONDENIED and insufficient scopes point to IAM or service account issues. RESOURCEEXHAUSTED or 429 errors indicate quota or rate limit problems.

Network and Resource Limits

Connection timed out or refused suggests networking or firewall issues. OOMKilled or exceeded memory limit means you must adjust resource sizing or fix leaks.

Config Mistakes

Messages like Bucket not found or Access denied usually mean wrong project, wrong resource name, or missing IAM permissions for that resource.

Quiz: Interpreting Logs and Choosing Actions

Apply what you learned about error patterns and troubleshooting.

Your Cloud Run service logs show: `PERMISSION_DENIED: Caller does not have permission storage.objects.get on bucket my-prod-bucket` and `principalEmail: my-service@my-project.iam.gserviceaccount.com`. What is the most appropriate next step?

Increase the memory allocated to the Cloud Run service.
Grant the service account the Storage Object Viewer role on my-prod-bucket.
Create a new log-based metric counting PERMISSION_DENIED errors.
Move the bucket to a different region.