Getting Started with Google Colab
Google Colab is a free Jupyter notebook environment that lets you run Python code in the browser. You can use GPUs without any setup, making it useful for machine learning experiments and quick prototyping.
What is Jupyter Notebook
Jupyter is an interactive development environment where you can run code cell by cell and see results immediately.
A regular .py file runs all code at once, but Jupyter notebooks (.ipynb) let you run code in pieces. When you run each cell, the result appears right below it, and variables stay in memory so you can use them in the next cell.
# Run cell 1
x = 10
# Run cell 2 - uses x defined above
print(x * 2) # 20
It's popular in data analysis and machine learning because you can try different preprocessing approaches after loading data, graphs and tables render right below cells, and you can write code alongside explanations (Markdown) for documentation.
Google Colab is a service that lets you run Jupyter notebooks in the cloud. To use Jupyter locally, run pip install jupyter then jupyter notebook.
How Variables Persist Across Cells
Instead of running a new Python process for each cell, a single Python process called the kernel keeps running in the background. All cells share the same global namespace of this process.
When you run python in a terminal, you get a >>> prompt where variables you define persist across lines. Jupyter works the same way - it's a REPL (Read-Eval-Print Loop). The only difference is the web-based notebook UI.
The notebook UI (frontend) communicates with the kernel via ZeroMQ, a messaging library. When you click run on a cell, the code is sent to the kernel, executed, and results are sent back to the frontend.
[Notebook UI] <-ZeroMQ-> [IPython Kernel (Python process)]
↓
Global Namespace
- x = 10
- df = DataFrame(...)
- model = ...
This is also why variables disappear when you restart the runtime. The kernel process terminates, and all variables in memory are lost.
Creating a Notebook
Go to Google Colab to get started. Just sign in with your Google account.
To create a new notebook, click File > New notebook. The notebook is automatically saved to the Colab Notebooks folder in your Google Drive.
To open an existing .ipynb file, use File > Upload notebook to upload a local file, or File > Open notebook to load from Google Drive or GitHub.
Running Cells
Colab notebooks run code cell by cell. Enter code in a cell and press Shift + Enter to run it. The output appears right below the cell.
print("Hello, Colab!")
To add a new cell, click the + Code button or press Ctrl + M B (add below current cell). Text cells can be added with the + Text button and support Markdown syntax.
Common shortcuts:
Shift + Enter: Run cell and move to nextCtrl + Enter: Run cell (stay in current cell)Ctrl + M B: Add new cell belowCtrl + M A: Add new cell aboveCtrl + M D: Delete current cell
Installing Packages
Colab comes with major libraries like NumPy, Pandas, TensorFlow, and PyTorch pre-installed. If you need additional packages, run pip in a cell.
!pip install transformers
Starting a cell with ! runs it as a shell command. Installed packages disappear when the runtime ends, so you need to reinstall them each time you reopen the notebook.
File Management
Click the folder icon in the left sidebar to open the file explorer. You can upload and download files here.
To upload files with code:
from google.colab import files
uploaded = files.upload()
This opens a file picker dialog, and selected files are uploaded to the current working directory.
To download files:
from google.colab import files
files.download('result.csv')
Google Drive Integration
For large files or datasets, it's convenient to mount Google Drive.
from google.colab import drive
drive.mount('/content/drive')
This prompts for Google account authentication. Once authenticated, you can access Drive at /content/drive/MyDrive.
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/data/train.csv')
Files saved to Drive persist even after the runtime ends, making it useful for preserving work results.
Runtime Settings
The default runtime uses only CPU. To use GPU or TPU, you need to change the runtime type.
Click Runtime > Change runtime type and select GPU or TPU under Hardware accelerator. GPUs are available even in the free version, but there are usage limits.
To check the currently allocated GPU:
!nvidia-smi
Runtimes automatically terminate after a period of inactivity. In the free version, the connection drops after about 90 minutes of idle time. When the runtime ends, installed packages and uploaded files are lost, so save important results to Drive.
Pricing
Colab lets you use GPUs for free, but with limitations. You can choose paid plans based on your needs.
The Free version provides access to GPUs like T4, but heavy continuous usage may temporarily block GPU allocation. Connection drops after about 90 minutes of idle time, and runtimes last up to 12 hours.
Colab Pro ($9.99/month) gives priority access to faster GPUs like V100 and A100. You also get more memory and runtimes that last up to 24 hours.
Colab Pro+ ($49.99/month) includes all Pro features plus background execution. Notebooks keep running even after you close the browser, which is useful for long training sessions.
There's also a Pay As You Go option. You can purchase compute units separately without a subscription. Units are available in 100-unit increments (about $9.99), and purchased units are valid for 90 days. You can buy them even with a free account, so this might be better if you only occasionally need GPUs. However, if you use it frequently, a Pro subscription is more economical.
The free version is sufficient for simple experiments and prototyping. Consider paid plans if you need large-scale model training or long-running sessions.
Running Multiple Notebooks Simultaneously
What happens to costs when running multiple notebooks at the same time?
The free version limits the number of concurrent sessions. Usually 1-2, and even with multiple notebooks open, GPU is often allocated to only one.
For paid versions, compute units are consumed separately for each notebook. Running 2 notebooks simultaneously for 1 hour consumes 2 hours worth of units. Consumption rates also vary by GPU type.
- T4 GPU: ~1.96 units per hour
- V100 GPU: ~5.36 units per hour
- A100 GPU: ~11.08 units per hour
- TPU: ~5 units per hour
Pro includes 100 units per month, Pro+ includes 500 units per month, and you need to purchase more if you exceed that. Running multiple notebooks simultaneously depletes units quickly, so it's best to run only what you need.
Cost Comparison with GCP VM
You could also spin up a VM instance directly on GCP (Google Cloud Platform) instead of using Colab. Which is cheaper?
Colab Pro+ is $49.99/month with 500 units included. At about 11 units per hour for A100, that's roughly 45 hours of usage. For T4 at about 2 units per hour, you get around 250 hours.
GCP VM on-demand pricing is about $3-4/hour for A100 40GB and about $0.35/hour for T4. Using A100 for 45 hours costs $135-180, and T4 for 250 hours costs about $87.
GCP Spot VMs (preemptible) are cheaper but can be interrupted anytime. A100 costs about $1-1.5/hour, T4 about $0.11/hour.
The bottom line: for light usage (tens of hours per month), Colab is cheaper and more convenient. For heavy usage (100+ hours per month), GCP Spot VMs might be cheaper. However, Colab has everything pre-configured, making it far more convenient. Unless you need to run multiple GPUs continuously for 24 hours, Colab Pro or Pro+ is usually the better choice.
Forms
You can create forms to make it easy to change input values when sharing notebooks with others.
#@title Settings
learning_rate = 0.001 #@param {type:"number"}
epochs = 10 #@param {type:"slider", min:1, max:100}
model_name = "bert-base" #@param ["bert-base", "bert-large", "roberta"]
A form UI appears on the right side of the cell, letting you adjust values with dropdowns or sliders.
Useful Tips
Click the RAM and Disk indicators at the top of the notebook to check current resource usage. If memory is low, restart the runtime or delete unnecessary variables.
Add the %%time magic command at the top of a cell to measure its execution time.
%%time
# Code to measure
model.fit(X_train, y_train)
Don't put secret keys or API keys directly in code. Use Colab's Secrets feature instead. Click the key icon in the left sidebar to register keys, then load them like this:
from google.colab import userdata
api_key = userdata.get('OPENAI_API_KEY')
How Colab Works
You might be curious how Colab provides each user with an isolated execution environment.
Container-Based Isolation
When you connect to Colab, a Linux container (or lightweight VM) is allocated on Google's servers. Each user gets an independent environment. The container comes with Python, Jupyter kernel, and major libraries pre-installed.
Google uses gVisor, their own container sandbox technology. It provides stronger security isolation than regular Docker.
Browser-Server Communication
When you enter and run code in the browser, it's sent via WebSocket to the Jupyter kernel on Google's servers. The kernel executes Python code inside the container and sends results (output, graphs, etc.) back to the browser.
[Browser UI] <-WebSocket-> [Jupyter Kernel] <-> [Python in Container]
GPU Allocation
When you select GPU runtime, you're connected to a container on a server with NVIDIA GPUs. Google datacenters have a pool of GPU servers that allocate on request. Free users share resources, hence the usage limits.
Why Offer It Free
From Google's perspective, developers who become familiar with TensorFlow and GCP services through Colab are more likely to use paid cloud services later. It's essentially a marketing expense.