Data Science Setup

🎉 Congratulations, you’ve officially entered the world of data, algorithms, and the occasional “why isn’t this running?” moment. Before you start training models, let’s get your machine geared up for the adventure ahead. 🚀

1 Version Control 🔄

Version control is essential for tracking changes in your code, collaborating with others, and ensuring you never lose your progress. Git and GitHub will be your best friends throughout the program!

1.1 Create a GitHub Account

GitHub is a platform for hosting and sharing your code. You’ll use it to collaborate on projects, submit assignments, and manage your repositories.

  1. Go to the GitHub Website
  1. Click “Sign Up”

  2. Enter Your Details

  3. Verify Your Account

  4. Confirm Email Account

1.2 Download Git

Git is the version control system that allows you to track changes in your code and push updates to GitHub.

  1. Install Homebrew
  • Open Terminal (cmd + space or use the search bar and type ‘terminal’) and paste the following command to install Homebrew:

    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  1. Install Git Using Homebrew
  • Once Homebrew is installed, use the following command to install Git:

    brew install git
  1. Verify Installation
  • After installation, confirm that Git is installed by checking its version:

    git --version
  1. Download Winget (App Installer)
  • If your system doesn’t already have winget, download the latest App Installer package:

    powershell -Command "Invoke-WebRequest -Uri https://aka.ms/getwinget -OutFile AppInstaller.msixbundle"
  1. Install Winget
  • After downloading, install the App Installer package:

    powershell -Command "Add-AppxPackage .\AppInstaller.msixbundle"
  1. Verify Winget Installation
  • Check that winget is properly installed:

    winget --version
  1. Install Git Using Winget
  • Now that winget is installed, you can install Git with the following command:

    winget install --id Git.Git -e --source winget
  1. Verify Git Installation

    git --version

1.3 (Optional) Download GitHub Desktop

GitHub Desktop provides a graphical interface for Git, making it easier to manage repositories without using the command line. While optional, it can be helpful for beginners.

  1. Go to the GitHub Desktop Website
  1. Download GitHub Desktop
  • Click the “Download” button depending on your OS to download the .dmg or .exe file.
  1. Install GitHub Desktop
  • Mac users: once the .dmg file is downloaded, open it and drag GitHub Desktop to your Applications folder.
  1. Sign In to GitHub
  • Open GitHub Desktop. You’ll be prompted to sign in with your GitHub account.
  1. Verify Installation
  • Once signed in, you can start using GitHub Desktop to manage your repositories easily.

2 Distribution 📦

A distribution is a pre-packaged set of software and libraries that makes installation easy. This distribution will help you create environments to localize dependencies and avoid conflicts between different projects.

2.1 Download Anaconda

Anaconda is an all-in-one Python and R distribution that includes essential libraries and tools for data science.

  1. Install Anaconda with Homebrew
  • Run the following command in Terminal:

    brew install --cask anaconda
  1. Verify Installation
  • Check if Anaconda is installed by running:

    conda --version
  1. Download the Anaconda Installer
  • Use curl in PowerShell to download the Anaconda installer:

    curl https://repo.anaconda.com/archive/Anaconda3-2024.10-1-Windows-x86_64.exe --output .\Downloads\Anaconda3-2024.10-1-Windows-x86_64.extension
  1. Run the Anaconda Installer
  • Start the installer using the command below:

    start "" "%USERPROFILE%\Downloads\Anaconda3-2024.10-1-Windows-x86_64.exe"
  1. Verify Anaconda Installation
  • After installation is complete, verify that Anaconda (conda) was installed correctly:

    conda --version

3 Integrated Development Environments (IDEs) 🖥️

An IDE is where you’ll write, test, and debug your code. Here at the Data Science program we recommend VS Code or PyCharm.

Pick Your Coding Home Wisely!

Your IDE is like your favorite coffee shop—you’ll spend a lot of time there. So choose one that feels comfortable to you!

3.1 Download an IDE

  1. Go to the VS Code Website
  1. Install VS Code
  • Windows: Run the downloaded .exe file and follow the installation instructions.
  • macOS: Open the .zip file and move Visual Studio Code to the Applications folder.
  1. Verify Installation
  • Open VS Code.
  1. Go to the PyCharm Website
  1. Install PyCharm
  • Windows: Run the .exe installer and follow the instructions.
  • macOS: Open the .dmg file and drag PyCharm to the Applications folder.
  1. Verify Installation
  • Open PyCharm.

3.2 Connect IDE to GitHub Account

  1. Sign in with GitHub
  • Profile > Turn on cloud changes > Sign in with GitHub
  1. Enter GitHub Credentials

  2. Verify Connection

  • Check if your GitHub profile appears when you go to Profile.
  1. Log In via GitHub
  • Settings > Version control > GitHub
  1. Enter GitHub Credentials

  2. Verify Connection

  • Check if your GitHub profile appears.

3.3 Extensions/Plugins

Extensions (or plugins) are add-ons that enhance functionality, such as adding language support, debugging tools, or AI-powered coding assistance. They help customize the IDE to improve productivity, automation, and development workflows.

  • Python
    Adds Python support, IntelliSense, debugging, and Jupyter Notebook functionality.

  • R
    Provides language support for R, including syntax highlighting and code completion.

  • Pylance
    Offers advanced linting, type checking, and autocomplete features for Python.

  • Git
    Provides Git integration for version control within VS Code.

  • GitHub
    Allows you to manage and interact with GitHub repositories directly from VS Code.

  • Code Runner
    Enables you to run code snippets for multiple languages, including Python, directly within VS Code.

  • Jupyter
    Lets you run and edit Jupyter Notebooks directly in VS Code.

  • EditCSV
    Allows for easy editing of CSV files directly within VS Code.

  • HTML Preview
    Provides an HTML preview of your code directly in the editor.

  • Quarto
    Provides support for creating and rendering Quarto documents within VS Code.

  • Remote - SSH
    Allows you to open remote folders and develop on remote machines over SSH.

  • TensorBoard
    Enables the viewing of TensorFlow logs directly within VS Code.

  • SVG Preview
    Lets you preview SVG files within VS Code.

  • Copilot (Paid)
    A paid extension by GitHub that provides AI-powered code suggestions and autocompletions.

  • Markdown All in One
    Offers a comprehensive suite of features for editing and previewing Markdown files.

  • Docker
    Adds Docker support to VS Code, allowing you to manage containers and videos.

  • Key Promoter X
    Displays keyboard shortcuts every time you use the mouse, helping you learn and use shortcuts more efficiently.

  • Markdown Navigator
    Provides enhanced markdown editing and previewing support.

  • R Plugin
    Adds support for R scripts and notebooks, allowing you to work with R code directly in PyCharm.

  • .env Files Support
    Loads environment variables from .env files, helping you manage sensitive information like API keys and credentials.

  • Pandas Helper
    Provides quick previews, descriptions, and structure analysis for Pandas DataFrames, making it easier to inspect data.

  • DeepBugs
    Uses machine learning to detect common Python coding mistakes and bugs in your code.

  • Docker
    Adds Docker support to PyCharm, allowing you to manage containers, videos, and other Docker resources directly from the IDE.

  • Jupyter
    Provides full Jupyter Notebook support in PyCharm, including the ability to run and edit notebooks.

  • GitHub
    Integrates GitHub repositories and allows you to work with your projects directly from PyCharm.

  • Python Scientific
    Adds support for scientific libraries like NumPy, SciPy, and Matplotlib, helping you visualize and analyze data.

  • Tabnine
    An AI-powered code completion tool, improving your coding speed by suggesting relevant completions.

  • Python Docstring Generator
    Helps you generate consistent Python docstrings with a single shortcut, saving time on documentation.

  • Database Navigator
    Provides easy database connection, navigation, and management capabilities within PyCharm.

  • Flake8
    Adds linting support for Python, helping you maintain clean and readable code by checking for errors and style issues.