Chapter 21 of 26
Observability and Troubleshooting with Cloud Logging
Turn raw logs into actionable insights by filtering, routing, and analyzing logs to diagnose issues across distributed Google Cloud solutions.
Big Picture: Cloud Logging in Your Observability Stack
Cloud Logging in Context
Cloud Logging complements Cloud Monitoring. Monitoring tells you something is wrong; logging helps you see exactly what and why by recording detailed events from your Google Cloud resources and apps.
What Cloud Logging Collects
Cloud Logging ingests logs from Google Cloud services (Compute Engine, GKE, Cloud Run, Cloud Functions, Cloud SQL), your applications, and audit logs that track administrative and security-relevant actions.
Core Building Blocks
Key concepts: log entries (timestamp, severity, resource, labels, payload), log buckets and views (storage and access control), log-based metrics, the Log Router and sinks, and specialized audit logs.
Your Objectives
You will learn how logs are stored, how to filter and query them for troubleshooting, how to export logs via sinks, and how to use audit logs to investigate admin and security events for exam scenarios.
How Cloud Logging Collects and Stores Logs: Buckets and Views
Automatic Collection
Cloud Logging automatically collects logs from many Google Cloud services like Compute Engine, GKE, Cloud Run, and Cloud Functions, without extra configuration in most cases.
Anatomy of a Log Entry
Each log entry has a timestamp, severity, logName, resource (type and labels), and a payload (jsonPayload, textPayload, or protoPayload) that holds the actual message or structured data.
Log Buckets
Logs are stored in buckets: `Default` for general logs, `Required` for always-retained logs like Admin Activity, and custom buckets you create for specific workloads or compliance needs.
Retention, Location, Access
Each bucket has a retention period, a location (region or multi-region), and IAM-based access control. These settings determine how long logs stay and who can read them.
Log Views
Log views define filtered subsets of a bucket. You grant IAM on views to control who can see which logs, without moving data into separate buckets.
Hands-On: Exploring Logs in Logs Explorer
Open Logs Explorer
Navigate in the console to Logging → Logs Explorer and select your project. This is the main UI for searching and analyzing logs across your Google Cloud resources.
Set Scope and Resource
Choose the scope (project, folder, or organization) and bucket, then filter by resource type such as `cloudrunrevision` or `gce_instance` to narrow logs to a specific service.
Filter by Severity
Add a condition like `severity>=ERROR` in the Query Builder to focus on problematic log entries instead of all INFO-level noise.
Inspect Entries
Click an entry and expand the structured JSON view to see resource.labels, payload, and fields like HTTP status codes, container exit codes, or trace IDs.
Filtering and Querying Logs with Logging Query Language
What Is the Logging Query Language?
Logging query language is a boolean filter language used in Logs Explorer. It is not SQL; instead, you write expressions over log fields to narrow down which entries you see.
Basic Filters
Common filters: resource.type, logName, severity, and payload text. Example: `resource.type="cloudrunrevision" AND severity>=ERROR` to see only Cloud Run errors.
Payload and Labels
Filter inside payloads and labels, like `textPayload:"timeout"` or `labels."run.googleapis.com/service"="payment-service"` to isolate logs for a specific service.
Resource Types to Know
Key resource types: `gceinstance` for Compute Engine, `k8scontainer` for GKE pods, `cloudrunrevision` for Cloud Run and many 2nd gen Cloud Functions.
End-to-End Troubleshooting Workflows with Logs
Compute Engine Troubleshooting
For an unresponsive VM, filter logs by `resource.type="gce_instance"` and instance ID, focus on `severity>=ERROR`, and inspect textPayload for OOM, kernel panics, or app crashes.
GKE Pod Crashes
For CrashLoopBackOff, filter `resource.type="k8scontainer"` and containername. Look for exceptions and use pod_name to see if all pods fail similarly, plus check cluster or node logs.
Cloud Run 500 Errors
Filter `resource.type="cloudrunrevision"` and the service label, then `httpRequest.status>=500`. Inspect stack traces and use the trace field to correlate with Cloud Trace.
Cloud Functions Timeouts
Filter by Cloud Function resource type, search for messages like "Function execution took" and "deadline exceeded" to identify slow external calls or heavy processing.
Log Router and Sinks: Exporting Logs to BigQuery, Cloud Storage, and Pub/Sub
What Is the Log Router?
The Log Router receives logs at ingestion time and uses sinks to decide where copies should go: to Logging buckets, BigQuery, Cloud Storage, or Pub/Sub.
Sinks and Destinations
A sink has a filter and a destination. Common destinations are BigQuery for analytics, Cloud Storage for archival, and Pub/Sub for real-time processing or external integrations.
Creating a Sink
To create a sink: name it, define a filter like `severity>=ERROR`, choose a destination resource, and then grant the sink’s writer identity permission to write to that destination.
Exam-Focused Distinctions
Remember: buckets store logs inside Cloud Logging; sinks export or route logs. If exports fail, check the sink filter and IAM on the destination first.
Configuring a Log Sink via gcloud (Practical Example)
You may not need to memorize exact command syntax for the exam, but understanding the structure helps cement the concepts.
This example creates a sink that exports all ERROR or higher logs from Cloud Run to a BigQuery dataset.
```bash
Variables
PROJECT_ID="my-project"
SINK_NAME="cloudrun-errors-to-bq"
BQDATASET="loggingexport"
Create a BigQuery dataset first (if it does not exist)
gcloud bigquery datasets create ${BQ_DATASET} \
--project=${PROJECT_ID} \
--location=US
Create the log sink
gcloud logging sinks create ${SINK_NAME} \
"bigquery.googleapis.com/projects/${PROJECTID}/datasets/${BQDATASET}" \
--log-filter='resource.type="cloudrunrevision" AND severity>=ERROR' \
--project=${PROJECT_ID}
After running this, gcloud prints a service account identity
for the sink, such as:
serviceAccount: cloud-logs@system.gserviceaccount.com
Grant BigQuery write permissions to the sink's identity
(replace SINK_SA with the printed service account)
SINKSA="SERVICEACCOUNTFROMCOMMAND"
bq update --projectid=${PROJECTID} \
--dataset_access \
"[{'role':'WRITER','userByEmail':'${SINK_SA}'}]" \
${BQ_DATASET}
```
Key takeaways:
- `gcloud logging sinks create` needs a destination and a `--log-filter`.
- After creation, you must grant the sink's writer identity permission on the destination.
- The filter language is the same as in Logs Explorer.
Audit Logs and Admin Activity: Investigating Changes and Security Events
What Are Audit Logs?
Cloud Audit Logs record who did what, where, and when on your resources. They are essential for security investigations and many exam scenarios involving configuration changes.
Types of Audit Logs
Key types: Admin Activity (always on), Data Access (often optional), System Event (system actions), and Policy Denied (organization policy blocks).
Filtering Audit Logs
Filter by logName and methodName, such as Admin Activity logs with `protoPayload.methodName="SetIamPolicy"` to find IAM policy changes.
Who Deleted the VM?
To see who deleted a VM, filter Admin Activity logs for Compute Engine delete operations and check `protoPayload.authenticationInfo.principalEmail` for the actor.
Thought Exercise: Designing Logging for a Microservices App
Imagine you are the Associate Cloud Engineer for a company running a microservices e-commerce platform on Google Cloud.
Architecture:
- Frontend: Cloud Run service `web-frontend`.
- Backend APIs: GKE cluster with several `k8s_container` workloads.
- Payments worker: Cloud Functions (2nd gen) triggered by Pub/Sub.
- Database: Cloud SQL (you already studied this in the storage module).
Your goals:
- Quickly detect and troubleshoot 5xx errors from any service.
- Keep logs for 1 year for compliance, but only detailed debug logs for 30 days.
- Investigate who changed firewall rules or IAM roles.
Pause and sketch your design (mentally or on paper), then compare with the guiding questions below.
Guiding questions:
- Which resource types and labels would you filter on for each component when troubleshooting?
- How would you use log buckets and views to separate sensitive logs (for example, security events) from general application logs?
- What sinks would you create? Which logs go to BigQuery vs Cloud Storage vs Pub/Sub?
- How would you ensure you can always answer "who changed this firewall rule?" using audit logs?
There is no single correct answer here. The goal is to practice mapping requirements to:
- Logging query filters
- Bucket and retention design
- Sinks and export destinations
- Audit log usage
After this, move to the quizzes to test your understanding of specific points.
Quiz 1: Buckets, Views, and Sinks
Test your understanding of how logs are stored and routed.
You need to let a junior support team see only application INFO and ERROR logs from a specific Cloud Run service, but NOT any security or audit logs. You also want to avoid duplicating logs into another bucket. What is the BEST approach?
- Create a new log bucket that receives only Cloud Run logs via a sink, and give the team Viewer access to the bucket.
- Create a log view on the existing bucket with a filter that includes only the Cloud Run service logs and excludes security logs, then grant the team IAM access on that view.
- Use the Log Router to export Cloud Run logs to BigQuery and let the team query BigQuery instead of Cloud Logging.
- Tell the team to use Logs Explorer and manually add filters each time; no configuration is needed.
Show Answer
Answer: B) Create a log view on the existing bucket with a filter that includes only the Cloud Run service logs and excludes security logs, then grant the team IAM access on that view.
A log view lets you define a filtered subset of a bucket and assign IAM permissions to that view. This avoids data duplication and hides sensitive logs. Creating a separate bucket via sinks (A) adds complexity and cost. Exporting to BigQuery (C) is unnecessary for simple viewing. Relying on manual filters (D) does not enforce access control.
Quiz 2: Audit Logs and Troubleshooting Scenarios
Check your understanding of audit logs and troubleshooting workflows.
Yesterday, a production Compute Engine VM was deleted and your site went down. You must find out who deleted the VM. Where do you look and what do you filter on first?
- In Logs Explorer, filter for `resource.type="gce_instance" AND severity>=ERROR` in the `_Default` bucket.
- In Cloud Monitoring, check the uptime check history for the VM and look for alerts.
- In Logs Explorer, filter Admin Activity audit logs with `logName:"cloudaudit.googleapis.com/activity"` and `protoPayload.methodName` for the delete operation.
- In the VM serial console logs, search for `"shutdown"` messages around the time of deletion.
Show Answer
Answer: C) In Logs Explorer, filter Admin Activity audit logs with `logName:"cloudaudit.googleapis.com/activity"` and `protoPayload.methodName` for the delete operation.
To see who deleted a VM, you use Admin Activity audit logs. Filter `logName:"cloudaudit.googleapis.com/activity"` and the appropriate `protoPayload.methodName` for instance deletion, then inspect `protoPayload.authenticationInfo.principalEmail`. The other options either miss audit logs or cannot identify the actor.
Key Terms Review: Cloud Logging and Troubleshooting
Use these flashcards to reinforce the core concepts before moving on to practice questions and labs.
- Log bucket
- A storage container in Cloud Logging that holds log entries with specific retention, location, and IAM settings. Examples include the default `_Default` bucket, the always-on `_Required` bucket, and custom buckets you create.
- Log view
- A filtered window into a log bucket that defines which log entries are visible. IAM can be granted on the view to control access to subsets of logs without moving data.
- Log Router sink
- A configuration object that matches logs using a filter at ingestion time and routes copies to destinations such as BigQuery, Cloud Storage, Pub/Sub, or other log buckets.
- Logging query language
- The filter language used in Logs Explorer and sinks to select log entries based on fields such as resource.type, severity, logName, labels, and payload contents.
- Admin Activity audit logs
- Audit logs that record administrative operations that modify configuration or metadata, such as creating VMs or changing IAM policies. They are always enabled and stored in the `_Required` bucket.
- Data Access audit logs
- Audit logs that record read and write operations on user data, such as reading objects from Cloud Storage. Often disabled by default for volume reasons and must be explicitly enabled.
- System Event and Policy Denied logs
- System Event logs record actions taken by Google systems on your behalf; Policy Denied logs record when organization policy constraints block a request.
- Typical resource type for GKE pod logs
- Most GKE container logs use `resource.type="k8s_container"`, with labels for cluster name, namespace, pod name, and container name.
- When to export logs to BigQuery
- When you need to run analytical or long-term queries over large volumes of logs, correlate logs with other datasets, or build custom reports that go beyond Logs Explorer capabilities.
- Relationship between Cloud Logging and Cloud Monitoring
- Cloud Logging stores detailed event data (log entries), while Cloud Monitoring focuses on metrics, dashboards, and alerts. Log-based metrics can bridge logs into Monitoring for alerting.
Key Terms
- Log sink
- A configuration object for the Log Router that specifies a filter and a destination for exporting or routing logs.
- Log view
- A filtered subset of a log bucket that controls which log entries are visible to users who have IAM access to the view.
- Log entry
- An individual record in Cloud Logging that includes a timestamp, severity, logName, monitored resource, labels, and a payload (text, JSON, or proto).
- Log Router
- The component of Cloud Logging that receives logs at ingestion time and uses sinks to route copies of logs to destinations such as log buckets, BigQuery, Cloud Storage, or Pub/Sub.
- Log bucket
- A storage container in Cloud Logging that holds log entries with specific retention, location, and IAM settings.
- Cloud Logging
- Google Cloud's managed logging service that collects, stores, and lets you analyze logs from Google Cloud services, your applications, and audit logs.
- Logs Explorer
- The main Cloud Logging user interface in the Google Cloud console for searching, filtering, and analyzing logs.
- Cloud Audit Logs
- A set of logs that record administrative and data access activities on Google Cloud resources, including Admin Activity, Data Access, System Event, and Policy Denied logs.
- Log-based metric
- A metric derived from log entries that match a specific filter, which can then be used in Cloud Monitoring for charts and alerts.
- System Event logs
- Audit logs that record actions taken by Google systems on your behalf, such as maintenance or automatic restarts.
- Policy Denied logs
- Audit logs that record when a request is denied due to an organization policy constraint.
- Pub/Sub log export
- A type of log sink destination that publishes logs to a Pub/Sub topic for near-real-time processing or integration with external systems.
- BigQuery log export
- A type of log sink destination that sends logs to a BigQuery dataset for analytical querying.
- Data Access audit logs
- Audit logs that record read and write operations on user data; often disabled by default and must be explicitly enabled.
- Logging query language
- A boolean filter language used in Logs Explorer and sinks to select log entries based on their fields.
- Cloud Storage log export
- A type of log sink destination that writes logs to Cloud Storage buckets for long-term archival.
- Admin Activity audit logs
- Audit logs that record administrative operations that modify configuration or metadata; always enabled and retained in the `_Required` bucket.