Data Science Setup
1 Version Control 🔄
Version control is essential for tracking changes in your code, collaborating with others, and ensuring you never lose your progress. Git and GitHub will be your best friends throughout the program!
1.1 Create a GitHub Account
GitHub is a platform for hosting and sharing your code. You’ll use it to collaborate on projects, submit assignments, and manage your repositories.
1.2 Download Git
Git is the version control system that allows you to track changes in your code and push updates to GitHub.
- Install Homebrew
Open Terminal (
cmd + space
or use the search bar and type ‘terminal’) and paste the following command to install Homebrew:/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
- Install Git Using Homebrew
Once Homebrew is installed, use the following command to install Git:
brew install git
- Verify Installation
After installation, confirm that Git is installed by checking its version:
git --version
- Download Winget (App Installer)
If your system doesn’t already have winget, download the latest App Installer package:
-Command "Invoke-WebRequest -Uri https://aka.ms/getwinget -OutFile AppInstaller.msixbundle" powershell
- Install Winget
After downloading, install the App Installer package:
-Command "Add-AppxPackage .\AppInstaller.msixbundle" powershell
- Verify Winget Installation
Check that winget is properly installed:
--version winget
- Install Git Using Winget
Now that winget is installed, you can install Git with the following command:
--id Git.Git -e --source winget winget install
Verify Git Installation
--version git
1.3 (Optional) Download GitHub Desktop
GitHub Desktop provides a graphical interface for Git, making it easier to manage repositories without using the command line. While optional, it can be helpful for beginners.
2 Distribution 📦
A distribution is a pre-packaged set of software and libraries that makes installation easy. This distribution will help you create environments to localize dependencies and avoid conflicts between different projects.
2.1 Download Anaconda
Anaconda is an all-in-one Python and R distribution that includes essential libraries and tools for data science.
- Install Anaconda with Homebrew
Run the following command in Terminal:
brew install --cask anaconda
- Verify Installation
Check if Anaconda is installed by running:
conda --version
- Download the Anaconda Installer
Use
curl
in PowerShell to download the Anaconda installer:curl https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Windows-x86_64.exe --output .\Downloads\Anaconda3-2024.10-1-Windows-x86_64.extension
- Run the Anaconda Installer
Start the installer using the command below:
start "" "%USERPROFILE%\Downloads\Anaconda3-2024.10-1-Windows-x86_64.exe"
- Verify Anaconda Installation
After installation is complete, verify that Anaconda (conda) was installed correctly:
--version conda
3 Integrated Development Environments (IDEs) 🖥️
An IDE is where you’ll write, test, and debug your code. Here at the Data Science program we recommend VS Code or PyCharm.
Your IDE is like your favorite coffee shop—you’ll spend a lot of time there. So choose one that feels comfortable to you!
3.1 Download an IDE
- Go to the VS Code Website
- Open your browser and visit https://code.visualstudio.com/Download.
- Click “Download for Windows” or “Download for macOS” based on your system.
- Install VS Code
- Windows: Run the downloaded
.exe
file and follow the installation instructions.
- macOS: Open the
.zip
file and move Visual Studio Code to the Applications folder.
- Verify Installation
- Open VS Code.
- Go to the PyCharm Website
- Visit https://www.jetbrains.com/pycharm/download.
- Select Community Edition.
- Click Download for Windows or macOS.
- Install PyCharm
- Windows: Run the
.exe
installer and follow the instructions.
- macOS: Open the
.dmg
file and drag PyCharm to the Applications folder.
- Verify Installation
- Open PyCharm.
3.2 Connect IDE to GitHub Account
- Sign in with GitHub
- Profile > Turn on cloud changes > Sign in with GitHub
Enter GitHub Credentials
Verify Connection
- Check if your GitHub profile appears when you go to Profile.
- Log In via GitHub
- Settings > Version control > GitHub
Enter GitHub Credentials
Verify Connection
- Check if your GitHub profile appears.
3.3 Extensions/Plugins
Extensions (or plugins) are add-ons that enhance functionality, such as adding language support, debugging tools, or AI-powered coding assistance. They help customize the IDE to improve productivity, automation, and development workflows.
Python
Adds Python support, IntelliSense, debugging, and Jupyter Notebook functionality.R
Provides language support for R, including syntax highlighting and code completion.Pylance
Offers advanced linting, type checking, and autocomplete features for Python.Git
Provides Git integration for version control within VS Code.GitHub
Allows you to manage and interact with GitHub repositories directly from VS Code.Code Runner
Enables you to run code snippets for multiple languages, including Python, directly within VS Code.Jupyter
Lets you run and edit Jupyter Notebooks directly in VS Code.EditCSV
Allows for easy editing of CSV files directly within VS Code.HTML Preview
Provides an HTML preview of your code directly in the editor.Quarto
Provides support for creating and rendering Quarto documents within VS Code.Remote - SSH
Allows you to open remote folders and develop on remote machines over SSH.TensorBoard
Enables the viewing of TensorFlow logs directly within VS Code.SVG Preview
Lets you preview SVG files within VS Code.Copilot (Paid)
A paid extension by GitHub that provides AI-powered code suggestions and autocompletions.Markdown All in One
Offers a comprehensive suite of features for editing and previewing Markdown files.Docker
Adds Docker support to VS Code, allowing you to manage containers and videos.
Key Promoter X
Displays keyboard shortcuts every time you use the mouse, helping you learn and use shortcuts more efficiently.Markdown Navigator
Provides enhanced markdown editing and previewing support.R Plugin
Adds support for R scripts and notebooks, allowing you to work with R code directly in PyCharm..env Files Support
Loads environment variables from.env
files, helping you manage sensitive information like API keys and credentials.Pandas Helper
Provides quick previews, descriptions, and structure analysis for Pandas DataFrames, making it easier to inspect data.DeepBugs
Uses machine learning to detect common Python coding mistakes and bugs in your code.Docker
Adds Docker support to PyCharm, allowing you to manage containers, videos, and other Docker resources directly from the IDE.Jupyter
Provides full Jupyter Notebook support in PyCharm, including the ability to run and edit notebooks.GitHub
Integrates GitHub repositories and allows you to work with your projects directly from PyCharm.Python Scientific
Adds support for scientific libraries like NumPy, SciPy, and Matplotlib, helping you visualize and analyze data.Tabnine
An AI-powered code completion tool, improving your coding speed by suggesting relevant completions.Python Docstring Generator
Helps you generate consistent Python docstrings with a single shortcut, saving time on documentation.Database Navigator
Provides easy database connection, navigation, and management capabilities within PyCharm.Flake8
Adds linting support for Python, helping you maintain clean and readable code by checking for errors and style issues.