Chapter 1 of 9
What Is Data Science, Really?
Step into the world of data science by seeing how companies turn messy real‑world data into decisions, products, and predictions—and why this field sits at the crossroads of math, coding, and domain expertise.
Big Picture: What Is Data Science?
What Is Data Science?
Data science is about turning messy, real-world data into useful decisions, products, and predictions. It sits at the crossroads of data, math & statistics, coding, and domain knowledge.
Four Ingredients
Data (numbers, text, images), math & stats (patterns, uncertainty), coding (Python, R, SQL), and domain knowledge (knowing the business or scientific context).
More Than Just AI
Today, data science overlaps with machine learning and AI, but is broader: it includes cleaning data, doing simple analyses, and explaining results to non-technical people.
Everyday Examples
Netflix recommendations, travel time estimates in maps, and Spotify's "Discover Weekly" are everyday examples of data science turning past data into helpful predictions.
A Day in the Life: Food Delivery App
Food Delivery Example
Imagine you work on the data team for a food delivery app. Your goal: turn raw data from orders, drivers, and restaurants into better decisions and user experiences.
Descriptive Question
Descriptive: "How many orders did we get last weekend in each city?" You clean order logs, count orders per city, and build charts so managers see where demand is rising or falling.
Predictive Question
Predictive: "How long will this order take?" You train a model using past data (distance, traffic, time of day, restaurant speed) to predict delivery times for new orders.
Prescriptive Question
Prescriptive: "Which driver should get this order?" You build algorithms that choose the best driver to minimize delays, considering driver locations and current assignments.
The Mix of Skills
This work uses data (GPS, logs), math & stats (models), coding (scripts and services), and domain knowledge (how deliveries work in each city). That mix is the heart of data science.
Data Science vs Analytics vs Software Engineering
Three Related Fields
Data science, data analytics, and software engineering are closely related but focus on different goals: models and insight, business understanding, and robust software systems.
Data Science Focus
Data science focuses on building models and analyses that learn from data to make predictions or guide decisions, such as churn models or A/B test analysis.
Data Analytics Focus
Data analytics focuses on understanding what is happening and why. Outputs include dashboards, KPIs, and one-off analyses using tools like SQL and BI platforms.
Software Engineering Focus
Software engineering focuses on building reliable apps, websites, and services. The main goal is scalable, secure, maintainable code, not statistical modeling.
Overlaps in Practice
Data scientists code but aim for insight and models; analysts explain and report; software engineers build systems. In real companies, titles and duties often overlap.
Data Science vs Traditional Statistics
Close Cousins
Data science and statistics are closely related. Statistics provides many of the core ideas; data science adds computing, large-scale data, and a strong focus on applications.
Traditional Statistics
Statistics often asks: "Does this treatment work?" or "Did this policy change outcomes?" It emphasizes study design, hypothesis tests, and careful reasoning about causality.
Data Science Focus
Data science often asks: "Who will churn?" or "What should we recommend?" It emphasizes prediction, handling messy large datasets, and deploying models in real systems.
Practical Differences
In practice, data science uses more programming and software tools. Statistics focuses more on explanation and causality; data science leans toward prediction and scale.
How to Think About It
You can think of data science as applied statistics plus computing and domain work, aimed at making real-world decisions and products, not just publishing results.
Types of Data Science Problems: Descriptive, Predictive, Prescriptive
Three Problem Types
Most data science tasks fit into three types: descriptive (what happened), predictive (what will happen), and prescriptive (what should we do).
Descriptive Analytics
Descriptive analytics summarizes past data: monthly sales by region, average time on a website, or maps showing where accidents occur. It uses counts, averages, and charts.
Predictive Analytics
Predictive analytics uses past data to predict future or unknown events, such as churn, demand next week, or loan default risk, using regression and other ML models.
Prescriptive Analytics
Prescriptive analytics recommends actions: which price to set, how to assign staff, or how to route trucks. It often uses optimization or simulation techniques.
Building Blocks
Projects often move from descriptive to predictive to prescriptive. When you face a data question, ask yourself which type it is: describe, predict, or prescribe.
Who Does What? Roles on a Modern Data Team
Data Teams Are Diverse
Modern data teams combine several roles: data scientists, data analysts, data engineers, ML engineers, product managers, and domain experts working together.
Data Scientist Role
Data scientists explore and clean data, build and evaluate models, run experiments like A/B tests, and explain results to decision-makers.
Data Analyst Role
Data analysts focus on reporting and business questions. They query data, build dashboards, track KPIs, and turn questions into clear numbers and charts.
Data Engineer and ML Engineer
Data engineers build pipelines and storage so data is reliable and accessible. ML engineers turn models into production services and monitor them over time.
The Team Sport Idea
Data science at scale is a team sport. Different roles bring technical and domain expertise so raw data can become real-world decisions and products.
Classify the Problem: Descriptive, Predictive, or Prescriptive?
Try this thought exercise. For each scenario, decide if it is mainly descriptive, predictive, or prescriptive. Then check your reasoning.
- A university tracks how many students visit the library each hour and shows it on a dashboard.
- Your guess: descriptive / predictive / prescriptive?
- Check: This is descriptive. It summarizes what has happened.
- A bank uses past customer data to estimate the chance that a new applicant will not repay a loan.
- Your guess: descriptive / predictive / prescriptive?
- Check: This is predictive. It forecasts a future risk.
- A streaming service chooses which movie thumbnails to show each user to maximize clicks.
- Your guess: descriptive / predictive / prescriptive?
- Check: This is prescriptive. It chooses an action (which thumbnail) to reach a goal.
- A hospital analyzes last year’s patient data to see which departments had the longest waiting times.
- Your guess: descriptive / predictive / prescriptive?
- Check: Descriptive. It explains what happened.
- A ride-sharing app estimates the price you will pay for a trip before you confirm.
- Your guess: descriptive / predictive / prescriptive?
- Check: Predictive. It predicts trip cost based on distance, time, and traffic.
If you misclassified any, revisit the definitions:
- Descriptive: summarize past.
- Predictive: estimate future/unknown.
- Prescriptive: recommend actions.
A Tiny Taste of Data Science in Python
You do not need to be a programming expert to start in data science, but code is a key tool. Here is a very small example in Python using `pandas`, a popular data library.
This example:
- Creates a tiny "orders" dataset.
- Computes the average order value.
- Counts orders by city.
```python
import pandas as pd
1. Create a tiny dataset
orders = pd.DataFrame({
"order_id": [1, 2, 3, 4],
"city": ["London", "Paris", "London", "Berlin"],
"amount": [20.5, 35.0, 12.0, 50.0]
})
print("Orders data:")
print(orders)
2. Descriptive: average order value
avg_amount = orders["amount"].mean()
print("\nAverage order amount:", avg_amount)
3. Descriptive: number of orders by city
ordersbycity = orders.groupby("city")["order_id"].count()
print("\nOrders by city:")
print(ordersbycity)
```
Try this in a Jupyter notebook, Google Colab, or any Python environment that has `pandas` installed.
Notice:
- We used real code to do descriptive analytics.
- With larger data and more features, the same ideas scale up.
You do not need to understand every line yet. For now, focus on the idea: code helps you move from raw tables to useful summaries.
Check Your Understanding: What Is Data Science?
Answer this quick question to check your understanding of the core idea.
Which of the following BEST captures what data science is in practice?
- Writing software applications without using any statistics.
- Using data, math/statistics, and code, plus domain knowledge, to turn raw data into useful insights, predictions, and decisions.
- Making charts and dashboards only, without building any models.
- Doing theoretical math proofs about algorithms, without touching real data.
Show Answer
Answer: B) Using data, math/statistics, and code, plus domain knowledge, to turn raw data into useful insights, predictions, and decisions.
Data science combines data, math/statistics, coding, and domain knowledge to turn raw data into insights, predictions, and decisions. It is broader than just dashboards or just coding, and it usually works with real data rather than only theory.
Key Terms Review
Use these flashcards to review the main ideas from this module.
- Data science
- An applied field that uses data, math/statistics, coding, and domain knowledge to turn messy real-world data into useful insights, predictions, and decisions.
- Descriptive analytics
- Type of analysis that answers "What happened?" by summarizing past data with counts, averages, charts, and dashboards.
- Predictive analytics
- Type of analysis that answers "What is likely to happen?" by using models trained on past data to predict future or unknown outcomes.
- Prescriptive analytics
- Type of analysis that answers "What should we do?" by recommending actions (such as prices, routes, or assignments) to achieve a goal.
- Data scientist
- A role focused on exploring data, building and evaluating models, running experiments, and communicating results to guide decisions.
- Data analyst
- A role focused on querying data, building reports and dashboards, tracking KPIs, and answering business questions with clear numbers and charts.
- Data engineer
- A role focused on building and maintaining data pipelines and infrastructure so that data is collected, stored, and made reliably available.
- Machine learning engineer
- A role focused on turning models into production systems, such as services or APIs, and monitoring their performance over time.
- Domain knowledge
- Understanding of the specific area (such as healthcare, finance, or marketing) where data science is applied, which makes analyses meaningful and useful.
Key Terms
- data analyst
- A professional who focuses on reporting, dashboards, and answering business questions with data.
- data science
- An applied field that combines data, math/statistics, coding, and domain knowledge to turn raw data into insights, predictions, and decisions.
- data engineer
- A professional who designs and maintains systems that collect, store, and organize data for others to analyze.
- data scientist
- A professional who builds models and performs deeper analyses to support predictions and decisions.
- domain knowledge
- Understanding of a specific subject area where data science is applied, such as medicine, finance, or marketing.
- predictive analytics
- Analysis that uses past data and models to estimate future or unknown outcomes.
- descriptive analytics
- Analysis that summarizes and explains what has happened in the past using counts, averages, and visualizations.
- prescriptive analytics
- Analysis that recommends actions to achieve specific goals, often using optimization or simulation.
- machine learning engineer
- A professional who deploys and maintains machine learning models in production systems.
- KPI (Key Performance Indicator)
- A measurable value that shows how well an organization is achieving a key objective, such as revenue growth or user retention.