A single possible result of a process, like "Heads" in a coin toss or "4" in a die roll.

A set of outcomes we care about, such as "even number" when rolling a die (outcomes {2, 4, 6}).

A number between 0 and 1 (or 0% to 100%) that measures how likely an event is to happen.

Events where knowing one happened does not change the chance of the other (for example, two separate fair coin tosses).

Events where knowing one happened changes the chance of the other (for example, drawing cards without replacement).

A collection of one or more outcomes that we are interested in, like "rolling an even number".

A subset of individuals or observations taken from a larger population for analysis.

A single possible result of a random process, such as "Heads" in a coin toss.

Probability and Uncertainty: Thinking in Chances — Data Science Foundations: From Raw Data to Insight

From Gut Feeling to Chances

Living With Uncertainty

We constantly face uncertainty: Will it rain? Will the bus be late? We usually answer with vague words like "probably" or "maybe".

Why Probability?

Probability turns vague words into clearer, numeric chances. It helps us think more consistently about uncertain events.

What You Will Learn

You will learn probabilities as numbers between 0 and 1, what events and outcomes are, independence vs dependence, and how randomness and sampling connect to data.

Link to Earlier Modules

You already know data types and descriptive statistics. Now we add a new layer: thinking about what might happen next, and how confident we are.

Basic Probability: Outcomes, Events, and the 0–1 Scale

Outcomes

An outcome is a single result of a process. Coin toss: Heads or Tails. Die roll: 1, 2, 3, 4, 5, or 6.

Events

An event is a set of outcomes we care about. Example: "even number" when rolling a die is {2, 4, 6}.

Probability as a Number

Probability measures how likely an event is. It is always between 0 (impossible) and 1 (certain).

Percentages

We often use percentages: 0 → 0%, 0.5 → 50%, 1 → 100%. A fair coin has 50% chance of Heads and 50% of Tails.

Real-World Probabilities

In practice we rarely know exact probabilities. We estimate them from data or use them as degrees of belief, like a 30% chance of rain.

Your Intuition: Translate Words to Numbers

Try this quick thought exercise. There are no perfectly right answers; the goal is to connect your intuition to numbers.

For each statement, write down (mentally or on paper) a probability between 0 and 1, or a percentage.

"It might rain later today."

What probability would you assign? (For example 0.2, 0.5, 0.8?)

"My phone battery will last until tonight."

Based on your experience today, what chance would you give it?

"A random student in my class has watched at least one full season of a streaming series this month."

What is your guess as a percentage?

"If I randomly pick a day of the week, it is a weekend."

Now this one you can calculate: out of 7 days, 2 are weekend days.
So: `P(weekend) = 2/7 ≈ 0.286 ≈ 28.6%`.

Reflect:

Which of your answers were guesses based on belief?
Which were calculations based on counting outcomes?

This difference (belief vs counting) is important. In data science, we often start with beliefs, then update them using data.

Probability as Long-Run Frequency

Long-Run Frequency Idea

Probability can be seen as the long-run frequency of an event when we repeat a random process many times.

Coin Toss Example

Toss a fair coin many times. As the number of tosses grows, the fraction of Heads tends to get closer to 0.5.

From Data to Probability

Website example: 320 purchases out of 10,000 visits gives an observed frequency of 3.2%. We use this as an estimated probability.

Link to Predictive Models

Modern predictive systems often estimate probabilities from large datasets, then use them to make predictions about future events.

Independent vs Dependent Events (Concept Only)

Independent Events

Events are independent if knowing one happened does not change the chance of the other. Example: two separate fair coin tosses.

Dependent Events

Events are dependent if knowing one happened changes the chance of the other. Example: drawing cards without replacement.

Everyday Dependence

Weather: morning rain and afternoon rain are often dependent. If it rains in the morning, afternoon rain becomes more likely.

Why It Matters

In data science, variables are often dependent. Some models assume independence; others handle complex relationships.

Key Idea

Remember: independent = no effect on chance; dependent = changes the chance when you know one event happened.

Classify Events: Independent or Dependent?

Decide whether each pair of events is independent or dependent. Reason it out in your own words.

Two dice

Event A: "Die 1 shows a 6".
Event B: "Die 2 shows a 6".
Are A and B independent or dependent?

Same class

Event A: "Student 1 in your class passes the exam".
Event B: "Student 2 in your class passes the exam".
Think about shared study conditions, teaching quality, etc.

Drawing marbles without replacement

A bag has 3 red and 3 blue marbles.
Event A: "First marble drawn is red".
Event B: "Second marble drawn is red".

Drawing marbles with replacement

Same bag, but after each draw you put the marble back and mix.
Event A: "First marble drawn is red".
Event B: "Second marble drawn is red".

Pause and decide for each.

Randomness, Uncertainty, and Sampling

Randomness

Random processes have unpredictable individual outcomes but stable long-run patterns, like coin tosses or die rolls.

Uncertainty

Uncertainty is our lack of full knowledge. Even non-random systems can be modeled with probability when we cannot observe everything.

What Is a Sample?

A sample is a subset of a larger population, like surveying 200 students out of 20,000 at a university.

Sampling Variability

Different random samples from the same population give slightly different results. This natural variation is sampling variability.

Why Probability Matters

Because samples vary, our estimates are uncertain. Probability provides a language to describe and manage this uncertainty.

Simulating Sampling Variability (Optional, Python)

If you know a bit of Python, you can see sampling variability in action.

This code:

Simulates a population of 100,000 people
Each person has a 30% chance of liking a new app
Draws many random samples of size 200
Shows how the sample proportion changes from sample to sample

```python

import numpy as np

Set a seed so results are reproducible

np.random.seed(42)

1. Create a population: 1 = likes app, 0 = does not

populationsize = 100000

trueproblike = 0.30

population = np.random.binomial(1, trueproblike, size=population_size)

2. Function to take a random sample and compute proportion who like the app

def sampleproportion(population, samplesize=200):

sampleindices = np.random.choice(len(population), size=samplesize, replace=False)

sample = population[sample_indices]

return sample.mean()

3. Take many samples and store their proportions

num_samples = 20

proportions = [sampleproportion(population) for in range(num_samples)]

print("True probability of liking the app:", trueproblike)

print("Sample proportions (each from 200 people):")

print(proportions)

print("Average of sample proportions:", np.mean(proportions))

```

What you should notice:

Each sample proportion is close to 0.30, but not exactly.
This is sampling variability.
As you increase `sample_size`, the sample proportions tend to get closer to the true probability.

Check Understanding: Probability and Events

Answer this question to check your understanding of basic probability.

You roll a fair six-sided die once. Which statement is correct?

The probability of getting a 7 is 1/7 because there are 7 possible integers.
The probability of getting an even number is 3/6 because there are three even outcomes.
The probability of getting a 3 changes if you already rolled a 3 earlier today.
The probability of getting a 1 is 0 because that is very unlikely.

Show Answer

Answer: B) The probability of getting an even number is 3/6 because there are three even outcomes.

A fair die has outcomes {1,2,3,4,5,6}. The event "even" is {2,4,6}, which has 3 outcomes out of 6, so the probability is 3/6 = 1/2. Getting a 7 is impossible, earlier rolls today do not affect a new roll, and unlikely events can still have non-zero probability.

Check Understanding: Independence and Sampling

Answer this question about independent events and sampling variability.

A university surveys two different random samples of 200 students each about whether they have a part-time job. In sample 1, 40% say yes. In sample 2, 46% say yes. Which is the best interpretation?

The survey is useless because the results are not exactly the same.
This difference is expected due to sampling variability, even if the true proportion is fixed.
The university must have changed its policy between the two samples.
It is impossible for random samples to give different results if they are honest.