Python Environment and Setup
Summary
This chapter guides students through setting up a complete Python development environment for data science. Students will install Python, learn about package management with pip and conda, set up virtual environments, and configure their IDE. The chapter also covers Jupyter notebooks in depth, including working with cells, kernels, and importing libraries. By the end of this chapter, students will have a fully functional data science development environment.
Concepts Covered
This chapter covers the following 15 concepts from the learning graph:
- Python Installation
- Package Management
- Pip
- Conda Environment
- Virtual Environment
- IDE Setup
- VS Code
- Jupyter Notebooks
- Notebook Cells
- Code Cell
- Markdown Cell
- Cell Execution
- Kernel
- Import Statement
- Python Libraries
Prerequisites
This chapter builds on concepts from:
Building Your Data Science Headquarters
Every superhero needs a headquarters. Batman has the Batcave. Iron Man has Stark Tower. The Avengers have their compound. You? You're about to build something just as cool—your personal data science command center.
In Chapter 1, you discovered your new superpower: the ability to extract insights from data. But a superpower without the right tools is like having super strength but nowhere to punch. This chapter is where you assemble your utility belt, stock your armory, and set up the base of operations where all your data science magic will happen.
By the end of this chapter, you'll have:
- Python installed and ready to go
- A professional code editor configured for data science
- Jupyter notebooks for interactive exploration
- Package management skills to add new tools whenever you need them
- A clean, organized environment that won't cause headaches later
Let's build your headquarters!
Diagram: Data Science Environment Architecture
Data Science Environment Architecture
Type: diagram
Bloom Taxonomy: Understand (L2)
Learning Objective: Help students visualize how all components of their data science environment fit together
Purpose: Show the layered architecture of a data science setup
Layout: Vertical stack diagram showing layers from bottom to top
Layers (bottom to top): 1. OPERATING SYSTEM (base layer) - Windows, macOS, or Linux - Color: Dark gray - Icon: Computer
- PYTHON INSTALLATION
- Python interpreter (the engine)
- Color: Blue (Python blue)
-
Icon: Python logo
-
PACKAGE MANAGER
- pip or conda (the supply chain)
- Arrows showing packages flowing in
- Color: Orange
-
Icon: Package box
-
VIRTUAL ENVIRONMENT
- Isolated workspace (the clean room)
- Shows boundary separating from other environments
- Color: Green
-
Icon: Bubble/container
-
PYTHON LIBRARIES
- pandas, numpy, matplotlib (the tools)
- Multiple small icons representing different libraries
-
Color: Various colors for each library
-
IDE / JUPYTER (top layer)
- VS Code or Jupyter Notebook (the cockpit)
- Color: Purple
- Icon: Code editor window
Side annotations: - Arrow from user to IDE: "You work here" - Arrow from libraries to IDE: "Tools you use" - Bracket around virtual environment: "Keeps projects separate"
Interactive elements: - Hover over each layer to see description and purpose - Click to see common problems at each layer
Visual style: Modern, clean boxes with rounded corners, superhero HQ aesthetic
Implementation: SVG with CSS hover effects
Python Installation: Powering Up Your System
Python installation is the first and most critical step. Python is the programming language that powers your data science work—it's the engine of your entire operation. Without it, nothing else works.
Think of Python like electricity in your headquarters. You can have the fanciest equipment in the world, but without power, it's all just expensive furniture. Installing Python gives your computer the ability to understand and execute Python code.
Choosing Your Python Distribution
Here's where it gets interesting: there are actually several ways to install Python. The two main approaches are:
| Approach | Best For | Includes |
|---|---|---|
| Python.org (Official) | Minimalists, learning basics | Just Python, nothing extra |
| Anaconda Distribution | Data scientists (that's you!) | Python + 250+ data science packages |
For this course, we strongly recommend Anaconda. Why? Because it comes pre-loaded with almost everything you'll need—pandas, NumPy, matplotlib, Jupyter, and hundreds of other tools. It's like buying a fully furnished headquarters instead of an empty warehouse.
Superhero Shortcut
Installing Anaconda is like getting a starter kit with all the gadgets already assembled. You could build everything from scratch, but why? Batman didn't forge his own Batarangs (okay, maybe he did, but you get the point).
Installation Steps
The installation process is straightforward:
- Visit anaconda.com and download the installer for your operating system
- Run the installer and accept the default options
- Wait for installation to complete (it might take a few minutes—grab a snack)
- Open a terminal or Anaconda Prompt to verify it worked
To verify your installation, open a terminal and type:
1 | |
You should see something like Python 3.11.5 (the exact version may differ). If you see an error, don't panic—check the troubleshooting section at the end of this chapter.
Package Management: Your Supply Chain
Package management is how you add new tools and capabilities to your Python installation. In the superhero world, this is like having access to a warehouse full of gadgets you can requisition whenever you need them.
Python's real power comes from its ecosystem of Python libraries—pre-written code packages that handle specific tasks. Need to work with data tables? There's a library for that (pandas). Need to create visualizations? There's a library for that (matplotlib). Need to do machine learning? Yep, library for that too (scikit-learn).
But how do you get these libraries? That's where package managers come in.
Pip: The Original Package Manager
Pip stands for "Pip Installs Packages" (yes, it's a recursive acronym—programmers think they're funny). It's the original Python package manager and comes built into Python.
Using pip is simple. Open a terminal and type:
1 | |
That's it! Pip will download pandas and all its dependencies, install them, and you're ready to go. Need to install multiple packages? Just list them:
1 | |
Common pip commands you'll use:
| Command | What It Does |
|---|---|
pip install package_name |
Install a package |
pip uninstall package_name |
Remove a package |
pip list |
Show all installed packages |
pip show package_name |
Show details about a package |
pip install --upgrade package_name |
Update to latest version |
Conda: The Data Scientist's Choice
While pip is great, data scientists often prefer conda—the package manager that comes with Anaconda. Conda does everything pip does, plus it manages non-Python dependencies and creates isolated environments (more on that soon).
1 | |
The syntax is almost identical to pip. So why use conda? Because some data science packages have complicated dependencies involving C libraries, Fortran code, or other system-level components. Conda handles all of that automatically, while pip sometimes struggles.
Pick One (Mostly)
Using both pip and conda in the same environment can sometimes cause conflicts—like having two quarterbacks calling different plays. In general, prefer conda for data science packages. Use pip only when a package isn't available through conda.
Diagram: Package Manager Workflow
How Package Managers Work
Type: workflow
Bloom Taxonomy: Understand (L2)
Learning Objective: Help students understand the flow of installing and using packages
Purpose: Visualize the package installation process from command to usage
Visual style: Horizontal flowchart with icons
Steps:
-
USER TYPES COMMAND Icon: Keyboard Example: "pip install pandas" Color: Blue
-
PACKAGE MANAGER SEARCHES Icon: Magnifying glass Label: "Searches PyPI (pip) or Anaconda Cloud (conda)" Color: Orange
-
DOWNLOADS PACKAGE Icon: Download arrow Label: "Downloads package + all dependencies" Color: Green
-
INSTALLS TO ENVIRONMENT Icon: Folder with checkmark Label: "Saves files to your Python environment" Color: Purple
-
READY TO IMPORT Icon: Python logo with sparkles Label: "import pandas as pd" Color: Gold
Annotations: - Between steps 2-3: "Internet connection required" - Below step 3: "May download multiple packages (dependencies)"
Interactive elements: - Hover each step to see common errors and solutions - Animation: Package icon traveling through pipeline
Implementation: SVG with CSS animations
Virtual Environments: Your Clean Room
Here's a scenario that trips up many beginners: You're working on two different projects. Project A needs pandas version 1.5, but Project B needs pandas version 2.0. If you install version 2.0, Project A breaks. Install version 1.5, and Project B breaks. What do you do?
The answer: virtual environments.
A virtual environment is an isolated Python installation where you can install packages without affecting your main system or other projects. Think of it as having multiple separate headquarters, each set up exactly how one specific mission requires.
Why Virtual Environments Matter
Without virtual environments:
- Package conflicts are inevitable
- Upgrading one project can break another
- Your system Python gets cluttered with random packages
- Reproducing your work on another computer becomes a nightmare
With virtual environments:
- Each project has exactly the packages it needs
- No conflicts between projects
- Easy to share your exact setup with others
- Clean, organized, professional workflow
Conda Environments: Your Mission-Specific Loadout
Since you're using Anaconda, you'll create environments using conda. Here's how:
1 2 3 4 5 6 7 8 9 10 11 | |
Once activated, anything you install goes into that environment only. Your other environments stay clean and untouched.
Real-World Analogy
Imagine you're a chef who cooks Italian food, Japanese food, and Mexican food. You COULD keep all your ingredients in one giant pantry. But it's much easier to have three separate stations, each stocked with exactly what that cuisine needs. Virtual environments are your separate cooking stations.
Diagram: Virtual Environment Isolation MicroSim
Virtual Environment Isolation Simulator
Type: microsim
Bloom Taxonomy: Apply (L3)
Learning Objective: Let students experiment with creating environments and installing packages to see how isolation works
Canvas layout (750x500px): - Left side (500x500): Visual representation of environments - Right side (250x500): Controls and terminal simulation
Visual elements: - Base system shown as gray platform at bottom - Virtual environments as colored bubbles floating above - Packages shown as small icons inside environments - Conflict indicators (red X) when same package different versions
Interactive controls: - Button: "Create Environment" - Adds new bubble with name input - Dropdown: "Select Environment" - Choose which env to work in - Button: "Install Package" - Shows package picker - Package picker: pandas, numpy, matplotlib with version selector - Button: "Delete Environment" - Removes selected environment - Toggle: "Show Conflicts" - Highlights version conflicts
Terminal simulation (right panel): - Shows conda commands being "typed" - Displays output messages - Command history
Default state: - Base system with Python - One environment "project-a" with pandas 1.5 - One environment "project-b" with pandas 2.0 - No conflicts (isolated!)
Behavior: - Creating environment adds new bubble - Installing package adds icon to current environment bubble - Installing conflicting versions in same env shows warning - Different versions in different envs shows green checkmarks - Hover over package shows version and description
Educational messages: - "Notice: Each environment is completely separate!" - "Try installing different pandas versions in different environments" - "See? No conflicts when properly isolated!"
Implementation: p5.js with interactive elements
IDE Setup: Your Command Center
An IDE (Integrated Development Environment) is your primary workspace—the command center where you'll write code, run experiments, and analyze results. While you could technically write Python in Notepad, that's like trying to fight crime with a flashlight instead of the Batcomputer.
A good IDE provides:
- Syntax highlighting (code is color-coded for readability)
- Auto-completion (suggests code as you type)
- Error detection (catches mistakes before you run)
- Integrated terminal (run commands without switching windows)
- Debugging tools (find and fix problems)
- Extension ecosystem (add new features)
VS Code: The Modern Hero's Choice
VS Code (Visual Studio Code) is our recommended IDE. It's free, fast, incredibly powerful, and loved by millions of developers worldwide. Microsoft makes it, but don't hold that against it—it's genuinely excellent.
Why VS Code for data science?
| Feature | Benefit |
|---|---|
| Python Extension | First-class Python support with IntelliSense |
| Jupyter Integration | Run notebooks directly in VS Code |
| Git Integration | Version control built right in |
| Extensions Marketplace | Thousands of add-ons available |
| Remote Development | Code on servers, containers, WSL |
| Free Forever | No subscriptions, no premium tiers |
Setting Up VS Code for Data Science
After installing VS Code, you'll want to add some extensions. Think of extensions as upgrades to your equipment—each one adds new capabilities.
Essential extensions for data science:
- Python (by Microsoft) - Core Python support
- Jupyter (by Microsoft) - Notebook support in VS Code
- Pylance - Advanced Python language features
- Python Indent - Fixes indentation automatically
- Rainbow CSV - Makes CSV files readable
To install an extension:
- Click the Extensions icon in the left sidebar (looks like four squares)
- Search for the extension name
- Click "Install"
- That's it—no restart required!
Keyboard Shortcut Superpower
Press Ctrl+Shift+P (Windows/Linux) or Cmd+Shift+P (Mac) to open the Command Palette—your gateway to every VS Code feature. Start typing what you want to do, and VS Code will find it.
Diagram: VS Code Interface Tour
VS Code Interface Guided Tour
Type: infographic
Bloom Taxonomy: Remember (L1)
Learning Objective: Help students identify and remember the key parts of the VS Code interface
Purpose: Interactive labeled diagram of VS Code interface
Layout: Screenshot-style representation of VS Code with numbered callouts
Main areas:
- ACTIVITY BAR (far left vertical strip)
- Explorer, Search, Git, Debug, Extensions icons
- Label: "Quick access to major features"
-
Color highlight: Blue
-
SIDE BAR (left panel)
- File explorer showing project structure
- Label: "Your project files and folders"
-
Color highlight: Green
-
EDITOR AREA (center, main area)
- Code with syntax highlighting
- Label: "Where you write code"
-
Color highlight: Purple
-
TABS (top of editor)
- Multiple file tabs
- Label: "Switch between open files"
-
Color highlight: Orange
-
MINIMAP (right edge of editor)
- Zoomed-out code preview
- Label: "Navigate large files quickly"
-
Color highlight: Teal
-
TERMINAL (bottom panel)
- Integrated command line
- Label: "Run commands without leaving VS Code"
-
Color highlight: Red
-
STATUS BAR (bottom strip)
- Python version, line number, encoding
- Label: "Current file info and settings"
- Color highlight: Gray
Interactive elements: - Hover over each numbered area to see detailed description - Click to see common tasks performed in that area - "Hide Labels" toggle to test recall
Visual style: Clean, modern, matching VS Code dark theme
Implementation: HTML/CSS with image map and tooltips
Jupyter Notebooks: Your Interactive Lab
While VS Code is great for writing Python scripts, data scientists have a special tool that's become essential to the craft: Jupyter Notebooks.
A Jupyter Notebook is an interactive document that combines:
- Live code you can run piece by piece
- Rich text explanations with formatting
- Visualizations and charts
- Output from your code, displayed inline
It's like having a lab notebook that can actually DO the experiments, not just record them. You write some code, run it, see the results immediately, then write more code based on what you learned. It's perfect for exploration and experimentation.
Why Notebooks Are Perfect for Data Science
Traditional programming is linear: write all the code, then run it all at once. But data science is iterative: load data, look at it, clean something, look again, try an analysis, adjust, repeat.
Notebooks support this workflow beautifully:
- Immediate feedback: See results instantly after each step
- Documentation built-in: Explain your thinking as you go
- Easy sharing: Send a notebook, and others see your code AND results
- Visual output: Charts appear right where you create them
- Reproducibility: Anyone can re-run your analysis step by step
A Data Scientist's Best Friend
"I never understood data until I started using Jupyter. Being able to see my data, tweak my code, and immediately see what changed—that's when everything clicked." — Every data scientist, basically
Starting Jupyter
If you installed Anaconda, you already have Jupyter. To launch it:
1 2 3 4 5 | |
This opens a browser window with the Jupyter interface. From there, you can create new notebooks, open existing ones, and organize your files.
Alternatively, in VS Code with the Jupyter extension, you can create and run notebooks directly without the browser interface.
Notebook Cells: Building Blocks of Discovery
Notebook cells are the fundamental units of a Jupyter notebook. Think of cells as individual building blocks—each one contains either code or text, and you can rearrange, add, or delete them as needed.
There are two main types of cells:
Code Cells
A code cell contains Python code that you can execute. When you run a code cell, Python processes the code and displays any output directly below the cell.
1 2 3 4 | |
Output:
1 | |
Code cells have some special features:
- The last expression in a cell is automatically displayed (no
print()needed) - Variables created in one cell are available in all other cells
- You can run cells in any order (though running in order is usually best)
Markdown Cells
A markdown cell contains formatted text using Markdown syntax. Use these to:
- Explain what your code does
- Document your analysis process
- Add headers and structure
- Include images or links
- Write conclusions and insights
Here's what Markdown looks like:
1 2 3 4 5 6 7 8 | |
The beauty of Markdown is that it's readable even before rendering. But when you "run" a Markdown cell, Jupyter converts it to beautifully formatted text.
| Cell Type | Contains | Run Behavior |
|---|---|---|
| Code Cell | Python code | Executes code, shows output |
| Markdown Cell | Formatted text | Renders as HTML |
Diagram: Notebook Cell Types Interactive Demo
Notebook Cell Types Interactive Demo
Type: microsim
Bloom Taxonomy: Apply (L3)
Learning Objective: Let students practice creating, editing, and running different cell types
Canvas layout (700x550px): - Main area (700x450): Notebook simulation - Bottom panel (700x100): Controls and instructions
Visual elements: - Simulated notebook interface with cells - Cell type indicator (Code/Markdown) on left side - Run button for each cell - Add cell buttons between cells - Cell highlight when selected
Starting cells: 1. Markdown cell: "# My First Notebook\nWelcome to data science!" 2. Code cell: "x = 42\nprint(f'The answer is {x}')" 3. Empty code cell (ready for input)
Interactive controls: - Click cell to select - Button: "Run Cell" (or Shift+Enter simulation) - Button: "Add Code Cell" - Button: "Add Markdown Cell" - Button: "Delete Cell" - Button: "Change Cell Type" - Text input area for editing selected cell
Behavior: - Running code cell shows output below - Running markdown cell renders formatted text - Variables persist between cells (running cell 2 sets x=42 for later use) - Error messages shown if code has bugs - Cell execution order numbers appear [1], [2], etc.
Instructions panel: - "Click a cell to select it" - "Press Run Cell to execute" - "Try changing the value of x and re-running!"
Implementation: p5.js with text rendering and simple Python interpreter simulation
Cell Execution: Bringing Your Code to Life
Cell execution is the process of running a cell and getting results. This is where the magic happens—where your ideas become reality.
Running Cells
There are several ways to run a cell:
- Shift + Enter: Run current cell and move to next cell
- Ctrl + Enter: Run current cell and stay on it
- Run button: Click the play icon next to the cell
- Run All: Execute all cells in order (from menu)
Execution Order Matters
Here's something crucial to understand: cells can be run in any order, but the ORDER you run them in determines the result. Watch this:
1 2 3 4 5 6 7 8 | |
If you run cells 1, 2, 3 in order: output is 15
If you run cells 1, 3, 2, 3 in order: output is 15 then 25
If you run cell 3 first: ERROR! (x doesn't exist yet)
The Restart Trap
A common mistake: you run cells out of order, get confused, then can't reproduce your results. Solution? Use Kernel > Restart & Run All regularly to verify your notebook runs correctly from top to bottom.
Execution Numbers
Notice those numbers in brackets next to code cells? Like [1], [2], [3]? Those tell you:
- Which cells have been run
- What order they were run in
If you see [5] followed by [3] followed by [7], that's a red flag that cells were run out of order. In a clean, reproducible notebook, numbers should be sequential: [1], [2], [3], etc.
The Kernel: Your Python Brain
The kernel is the computational engine behind your notebook. It's a running Python process that:
- Executes your code cells
- Keeps track of variables and their values
- Remembers function definitions
- Maintains the state of your session
Think of the kernel as Python's brain. When you run a cell, you're asking the brain to process that code and remember the results. All cells share the same brain, which is why a variable defined in one cell is available in all others.
Kernel Operations
Sometimes you need to control the kernel directly:
| Operation | What It Does | When to Use |
|---|---|---|
| Restart | Clear all variables, fresh start | When things get confusing |
| Restart & Clear Output | Restart + clear all outputs | Clean slate for sharing |
| Restart & Run All | Fresh start, then run everything | Verify reproducibility |
| Interrupt | Stop a running cell | When code takes too long |
The most common kernel operation: Restart & Run All. This verifies that your notebook works from scratch—essential before sharing your work or submitting an assignment.
Kernel Health Check
See a circular icon in the top right of Jupyter? That's the kernel status indicator. Empty circle = idle (ready). Filled circle = busy (running code). If it's stuck on busy forever, you might need to interrupt or restart.
Diagram: Kernel State Visualization
How the Kernel Remembers Variables
Type: infographic
Bloom Taxonomy: Understand (L2)
Learning Objective: Help students understand that the kernel maintains state across cell executions
Purpose: Visualize the kernel as a memory bank that persists between cell runs
Layout: Split view - notebook cells on left, kernel memory on right
Left side (Notebook cells):
Cell 1: name = "Alice"
Cell 2: age = 25
Cell 3: greeting = f"Hello {name}, you are {age}"
Cell 4: print(greeting)
Right side (Kernel Memory Bank): Visual representation of memory slots: - After Cell 1: name → "Alice" - After Cell 2: name → "Alice", age → 25 - After Cell 3: name → "Alice", age → 25, greeting → "Hello Alice, you are 25"
Animation flow: - Running each cell shows variable flowing into memory bank - Memory bank glows briefly when accessed - Clear visual that all cells share the same memory
Bottom section: "Restart Kernel" button → Memory bank empties → Variables gone Message: "After restart, you must re-run cells to recreate variables"
Interactive elements: - Step through button to simulate running each cell - Restart button to clear memory visualization - Hover over variable in memory to see when it was created
Visual style: Clean, bright, "mind palace" aesthetic
Implementation: HTML/CSS/JavaScript with animation
Import Statements: Summoning Your Tools
An import statement tells Python to load a library so you can use its features. Without imports, you're limited to basic Python. With imports, you have access to the entire data science arsenal.
Basic Import Syntax
There are several ways to import libraries:
1 2 3 4 5 | |
1 2 3 4 5 6 7 | |
1 2 3 4 5 | |
Standard Data Science Imports
The data science community has agreed on standard aliases. Using these makes your code readable to others:
| Library | Standard Import | What It's For |
|---|---|---|
| pandas | import pandas as pd |
Data manipulation |
| numpy | import numpy as np |
Numerical computing |
| matplotlib | import matplotlib.pyplot as plt |
Basic plotting |
| seaborn | import seaborn as sns |
Statistical visualization |
| scikit-learn | from sklearn import ... |
Machine learning |
Import Etiquette
Always put imports at the TOP of your notebook or script, not scattered throughout. This makes it easy to see what libraries your code requires. It's like listing ingredients at the start of a recipe—polite and helpful.
What Happens During Import?
When you write import pandas as pd, Python:
- Searches for the pandas library (in your environment)
- Loads the library's code into memory
- Creates a reference called
pdthat points to it - Makes all pandas functions available as
pd.something()
If you get an error like ModuleNotFoundError: No module named 'pandas', it means the library isn't installed in your current environment. Solution: conda install pandas or pip install pandas.
Python Libraries: Your Superpower Extensions
Python libraries (also called packages or modules) are collections of pre-written code that extend Python's capabilities. They're the reason Python dominates data science—thousands of brilliant people have written code you can use for free.
The data science ecosystem includes hundreds of libraries, but you'll start with a core set:
The Essential Five
| Library | Superpower | Example Use |
|---|---|---|
| pandas | Data manipulation | Load CSVs, filter rows, calculate statistics |
| numpy | Fast math | Array operations, linear algebra |
| matplotlib | Visualization | Line plots, bar charts, histograms |
| seaborn | Beautiful stats plots | Distribution plots, heatmaps |
| scikit-learn | Machine learning | Classification, regression, clustering |
The Broader Ecosystem
Beyond the essential five, you'll encounter:
- jupyter - The notebook system itself
- scipy - Scientific computing
- statsmodels - Statistical modeling
- plotly - Interactive visualizations
- pytorch / tensorflow - Deep learning
- requests - Web data fetching
- beautifulsoup - Web scraping
The beauty of package management: when you need a new capability, you can probably find a library for it. Someone else has already solved your problem—you just need to pip install their solution.
Diagram: Python Data Science Ecosystem Map
Python Data Science Library Ecosystem
Type: graph-model
Bloom Taxonomy: Remember (L1)
Learning Objective: Help students understand the landscape of Python data science libraries and how they relate
Purpose: Show relationships between major libraries and their purposes
Node types: 1. Core (large gold hexagons) - Python, NumPy
- Data (blue rectangles)
-
pandas, SQL connectors
-
Visualization (green circles)
-
matplotlib, seaborn, plotly
-
Machine Learning (purple diamonds)
-
scikit-learn, XGBoost
-
Deep Learning (red stars)
-
PyTorch, TensorFlow
-
Utilities (gray rounded rectangles)
- Jupyter, requests, BeautifulSoup
Relationships (edges): - NumPy → pandas (built on) - NumPy → matplotlib (uses arrays) - pandas → seaborn (data source) - matplotlib → seaborn (built on) - NumPy → scikit-learn (data format) - scikit-learn → XGBoost (similar API) - NumPy → PyTorch (similar arrays) - NumPy → TensorFlow (similar arrays)
Layout: Hierarchical with Python/NumPy at center
Interactive features: - Hover node: See library description and common use cases - Click node: See example import statement - Filter by category (checkboxes)
Visual styling: - Node size indicates popularity/importance - Edge thickness shows strength of dependency - Cluster by function area
Implementation: vis-network JavaScript Canvas size: 800x600px
Putting It All Together: Your First Complete Setup
Let's walk through setting up a complete data science environment from scratch. Follow along on your own computer!
Step 1: Install Anaconda
- Download Anaconda from anaconda.com/download
- Run the installer (accept defaults)
- Open Anaconda Prompt (Windows) or Terminal (Mac/Linux)
Step 2: Create Your Environment
1 2 3 4 5 6 7 8 | |
Step 3: Install VS Code
- Download VS Code from code.visualstudio.com
- Run the installer
- Open VS Code
- Install the Python and Jupyter extensions
Step 4: Create Your First Notebook
- In VS Code, press
Ctrl+Shift+Pand type "Create New Jupyter Notebook" - Select your
datascienceenvironment as the kernel - Add a markdown cell with a title
- Add a code cell with your first imports:
1 2 3 4 5 | |
- Run the cell. If you see the message, congratulations—you're ready!
Achievement Unlocked: Data Science HQ Online!
You now have a professional-grade data science environment. The same tools used at Google, Netflix, and NASA are now at your fingertips. The only difference between you and a professional data scientist? Practice. Let's get started!
Troubleshooting Common Issues
Even superheroes face setbacks. Here are solutions to common setup problems:
"Python not found"
Cause: Python isn't in your system PATH. Solution: Reinstall Anaconda and check "Add to PATH" option, OR always use Anaconda Prompt.
"Module not found"
Cause: Package not installed in current environment.
Solution: Activate your environment (conda activate datascience) then install the package.
"Kernel died" in Jupyter
Cause: Usually a memory issue or package conflict. Solution: Restart the kernel. If persistent, restart your computer or recreate the environment.
VS Code doesn't see my environment
Cause: VS Code hasn't refreshed its environment list.
Solution: Press Ctrl+Shift+P, type "Python: Select Interpreter", and choose your environment manually.
Everything is slow
Cause: Possibly too many packages or old hardware. Solution: Make sure you're using a clean environment with only the packages you need.
Chapter 2 Checkpoint: Test Your Understanding
Question: You're starting a new project and need pandas version 2.0, but your existing project uses pandas 1.5. How do you handle this without breaking either project?
Click to reveal answer:
Create a separate virtual environment for each project!
1 2 3 4 5 6 7 8 | |
Virtual environments keep projects isolated so different versions don't conflict.
Key Takeaways
-
Python installation (via Anaconda) is your foundation—everything else builds on it.
-
Package managers (pip and conda) let you install new tools whenever you need them.
-
Virtual environments keep your projects isolated and conflict-free—use them for every project.
-
VS Code is your command center—customize it with extensions for data science.
-
Jupyter Notebooks combine code, text, and visualizations for interactive exploration.
-
Cells (code and markdown) are the building blocks of notebooks; run them with Shift+Enter.
-
The kernel is Python's brain—restart it when things get confusing.
-
Import statements load libraries; use standard aliases like
pd,np,plt. -
Python libraries are your superpower extensions—pandas, numpy, matplotlib are essential.
Your headquarters is built. Your tools are ready. In the next chapter, you'll learn to wield them by working with Python's most important data structures. The real adventure begins now!