SkarpSkarp

Chapter 1 of 9

What Is Data Science, Really?

Step into the world of data science by seeing how companies turn messy real‑world data into decisions, products, and predictions—and why this field sits at the crossroads of math, coding, and domain expertise.

15 min readen

Big Picture: What Is Data Science?

What Is Data Science?

Data science is about turning messy, real-world data into useful decisions, products, and predictions. It sits at the crossroads of data, math & statistics, coding, and domain knowledge.

Four Ingredients

Data (numbers, text, images), math & stats (patterns, uncertainty), coding (Python, R, SQL), and domain knowledge (knowing the business or scientific context).

More Than Just AI

Today, data science overlaps with machine learning and AI, but is broader: it includes cleaning data, doing simple analyses, and explaining results to non-technical people.

Everyday Examples

Netflix recommendations, travel time estimates in maps, and Spotify's "Discover Weekly" are everyday examples of data science turning past data into helpful predictions.

A Day in the Life: Food Delivery App

Food Delivery Example

Imagine you work on the data team for a food delivery app. Your goal: turn raw data from orders, drivers, and restaurants into better decisions and user experiences.

Descriptive Question

Descriptive: "How many orders did we get last weekend in each city?" You clean order logs, count orders per city, and build charts so managers see where demand is rising or falling.

Predictive Question

Predictive: "How long will this order take?" You train a model using past data (distance, traffic, time of day, restaurant speed) to predict delivery times for new orders.

Prescriptive Question

Prescriptive: "Which driver should get this order?" You build algorithms that choose the best driver to minimize delays, considering driver locations and current assignments.

The Mix of Skills

This work uses data (GPS, logs), math & stats (models), coding (scripts and services), and domain knowledge (how deliveries work in each city). That mix is the heart of data science.

Data Science vs Analytics vs Software Engineering

Three Related Fields

Data science, data analytics, and software engineering are closely related but focus on different goals: models and insight, business understanding, and robust software systems.

Data Science Focus

Data science focuses on building models and analyses that learn from data to make predictions or guide decisions, such as churn models or A/B test analysis.

Data Analytics Focus

Data analytics focuses on understanding what is happening and why. Outputs include dashboards, KPIs, and one-off analyses using tools like SQL and BI platforms.

Software Engineering Focus

Software engineering focuses on building reliable apps, websites, and services. The main goal is scalable, secure, maintainable code, not statistical modeling.

Overlaps in Practice

Data scientists code but aim for insight and models; analysts explain and report; software engineers build systems. In real companies, titles and duties often overlap.

Data Science vs Traditional Statistics

Close Cousins

Data science and statistics are closely related. Statistics provides many of the core ideas; data science adds computing, large-scale data, and a strong focus on applications.

Traditional Statistics

Statistics often asks: "Does this treatment work?" or "Did this policy change outcomes?" It emphasizes study design, hypothesis tests, and careful reasoning about causality.

Data Science Focus

Data science often asks: "Who will churn?" or "What should we recommend?" It emphasizes prediction, handling messy large datasets, and deploying models in real systems.

Practical Differences

In practice, data science uses more programming and software tools. Statistics focuses more on explanation and causality; data science leans toward prediction and scale.

How to Think About It

You can think of data science as applied statistics plus computing and domain work, aimed at making real-world decisions and products, not just publishing results.

Types of Data Science Problems: Descriptive, Predictive, Prescriptive

Three Problem Types

Most data science tasks fit into three types: descriptive (what happened), predictive (what will happen), and prescriptive (what should we do).

Descriptive Analytics

Descriptive analytics summarizes past data: monthly sales by region, average time on a website, or maps showing where accidents occur. It uses counts, averages, and charts.

Predictive Analytics

Predictive analytics uses past data to predict future or unknown events, such as churn, demand next week, or loan default risk, using regression and other ML models.

Prescriptive Analytics

Prescriptive analytics recommends actions: which price to set, how to assign staff, or how to route trucks. It often uses optimization or simulation techniques.

Building Blocks

Projects often move from descriptive to predictive to prescriptive. When you face a data question, ask yourself which type it is: describe, predict, or prescribe.

Who Does What? Roles on a Modern Data Team

Data Teams Are Diverse

Modern data teams combine several roles: data scientists, data analysts, data engineers, ML engineers, product managers, and domain experts working together.

Data Scientist Role

Data scientists explore and clean data, build and evaluate models, run experiments like A/B tests, and explain results to decision-makers.

Data Analyst Role

Data analysts focus on reporting and business questions. They query data, build dashboards, track KPIs, and turn questions into clear numbers and charts.

Data Engineer and ML Engineer

Data engineers build pipelines and storage so data is reliable and accessible. ML engineers turn models into production services and monitor them over time.

The Team Sport Idea

Data science at scale is a team sport. Different roles bring technical and domain expertise so raw data can become real-world decisions and products.

Classify the Problem: Descriptive, Predictive, or Prescriptive?

Try this thought exercise. For each scenario, decide if it is mainly descriptive, predictive, or prescriptive. Then check your reasoning.

  1. A university tracks how many students visit the library each hour and shows it on a dashboard.
  • Your guess: descriptive / predictive / prescriptive?
  • Check: This is descriptive. It summarizes what has happened.
  1. A bank uses past customer data to estimate the chance that a new applicant will not repay a loan.
  • Your guess: descriptive / predictive / prescriptive?
  • Check: This is predictive. It forecasts a future risk.
  1. A streaming service chooses which movie thumbnails to show each user to maximize clicks.
  • Your guess: descriptive / predictive / prescriptive?
  • Check: This is prescriptive. It chooses an action (which thumbnail) to reach a goal.
  1. A hospital analyzes last year’s patient data to see which departments had the longest waiting times.
  • Your guess: descriptive / predictive / prescriptive?
  • Check: Descriptive. It explains what happened.
  1. A ride-sharing app estimates the price you will pay for a trip before you confirm.
  • Your guess: descriptive / predictive / prescriptive?
  • Check: Predictive. It predicts trip cost based on distance, time, and traffic.

If you misclassified any, revisit the definitions:

  • Descriptive: summarize past.
  • Predictive: estimate future/unknown.
  • Prescriptive: recommend actions.

A Tiny Taste of Data Science in Python

You do not need to be a programming expert to start in data science, but code is a key tool. Here is a very small example in Python using `pandas`, a popular data library.

This example:

  1. Creates a tiny "orders" dataset.
  2. Computes the average order value.
  3. Counts orders by city.

```python

import pandas as pd

1. Create a tiny dataset

orders = pd.DataFrame({

"order_id": [1, 2, 3, 4],

"city": ["London", "Paris", "London", "Berlin"],

"amount": [20.5, 35.0, 12.0, 50.0]

})

print("Orders data:")

print(orders)

2. Descriptive: average order value

avg_amount = orders["amount"].mean()

print("\nAverage order amount:", avg_amount)

3. Descriptive: number of orders by city

ordersbycity = orders.groupby("city")["order_id"].count()

print("\nOrders by city:")

print(ordersbycity)

```

Try this in a Jupyter notebook, Google Colab, or any Python environment that has `pandas` installed.

Notice:

  • We used real code to do descriptive analytics.
  • With larger data and more features, the same ideas scale up.

You do not need to understand every line yet. For now, focus on the idea: code helps you move from raw tables to useful summaries.

Check Your Understanding: What Is Data Science?

Answer this quick question to check your understanding of the core idea.

Which of the following BEST captures what data science is in practice?

  1. Writing software applications without using any statistics.
  2. Using data, math/statistics, and code, plus domain knowledge, to turn raw data into useful insights, predictions, and decisions.
  3. Making charts and dashboards only, without building any models.
  4. Doing theoretical math proofs about algorithms, without touching real data.
Show Answer

Answer: B) Using data, math/statistics, and code, plus domain knowledge, to turn raw data into useful insights, predictions, and decisions.

Data science combines data, math/statistics, coding, and domain knowledge to turn raw data into insights, predictions, and decisions. It is broader than just dashboards or just coding, and it usually works with real data rather than only theory.

Key Terms Review

Use these flashcards to review the main ideas from this module.

Data science
An applied field that uses data, math/statistics, coding, and domain knowledge to turn messy real-world data into useful insights, predictions, and decisions.
Descriptive analytics
Type of analysis that answers "What happened?" by summarizing past data with counts, averages, charts, and dashboards.
Predictive analytics
Type of analysis that answers "What is likely to happen?" by using models trained on past data to predict future or unknown outcomes.
Prescriptive analytics
Type of analysis that answers "What should we do?" by recommending actions (such as prices, routes, or assignments) to achieve a goal.
Data scientist
A role focused on exploring data, building and evaluating models, running experiments, and communicating results to guide decisions.
Data analyst
A role focused on querying data, building reports and dashboards, tracking KPIs, and answering business questions with clear numbers and charts.
Data engineer
A role focused on building and maintaining data pipelines and infrastructure so that data is collected, stored, and made reliably available.
Machine learning engineer
A role focused on turning models into production systems, such as services or APIs, and monitoring their performance over time.
Domain knowledge
Understanding of the specific area (such as healthcare, finance, or marketing) where data science is applied, which makes analyses meaningful and useful.

Key Terms

data analyst
A professional who focuses on reporting, dashboards, and answering business questions with data.
data science
An applied field that combines data, math/statistics, coding, and domain knowledge to turn raw data into insights, predictions, and decisions.
data engineer
A professional who designs and maintains systems that collect, store, and organize data for others to analyze.
data scientist
A professional who builds models and performs deeper analyses to support predictions and decisions.
domain knowledge
Understanding of a specific subject area where data science is applied, such as medicine, finance, or marketing.
predictive analytics
Analysis that uses past data and models to estimate future or unknown outcomes.
descriptive analytics
Analysis that summarizes and explains what has happened in the past using counts, averages, and visualizations.
prescriptive analytics
Analysis that recommends actions to achieve specific goals, often using optimization or simulation.
machine learning engineer
A professional who deploys and maintains machine learning models in production systems.
KPI (Key Performance Indicator)
A measurable value that shows how well an organization is achieving a key objective, such as revenue growth or user retention.

Finished reading?

Test your understanding with a custom practice exam on this chapter.

Test yourself