Google Colab for Machine Learning: The Ultimate Step-by-Step Guide for Beginners and Experts

Machine learning (ML) has become an indispensable tool across industries, but the hardware requirements for training even moderately sized models can be daunting. A powerful GPU or TPU, along with a robust development environment, often costs thousands of dollars or requires complex cloud setups. Google Colaboratory, commonly known as Google Colab, eliminates these barriers by providing a free, cloud-based Jupyter notebook environment with access to accelerated computing resources. Whether you are a student experimenting with your first neural network or a seasoned data scientist prototyping a production model, Colab offers a flexible and collaborative platform that integrates seamlessly with Google Drive, GitHub, and essential ML libraries like TensorFlow, PyTorch, and scikit-learn.

This guide will take you from the moment you open your first Colab notebook to deploying a fully functional machine learning pipeline. We will cover every critical aspect: setting up your environment, choosing the right runtime, loading and preprocessing data, training models, and saving your work for future use. We will also dive into best practices that can save you hours of debugging and help you avoid common pitfalls like running out of memory or losing progress due to idle timeouts. By the end of this article, you will not only understand how to use Colab for ML but also how to harness its advanced features to accelerate your development workflow.

Article illustration

Step-by-Step Guide to Using Google Colab for Machine Learning

Step 1: Accessing and Configuring Your First Notebook

To begin, navigate to colab.research.google.com while signed in with your Google account. You will be greeted with a dialog that allows you to create a new notebook, open an existing one from Google Drive, or upload a .ipynb file. For a fresh start, click “New Notebook.” The interface will resemble a standard Jupyter notebook with a toolbar across the top, a code cell below, and a sidebar on the left for file browsing and table of contents. Before writing a single line of code, it is essential to configure the runtime environment because the default is a bare-bones CPU that may be too slow for any serious ML work. Click on “Runtime” in the menu bar, then “Change runtime type.” Here you can select from three hardware accelerators: None (CPU), GPU (typically an NVIDIA T4 or K80), and TPU (Tensor Processing Unit). For most deep learning tasks, a GPU is the sweet spot between performance and availability. Once you select GPU, Colab will provision a virtual machine with the requested hardware. This allocation takes only a few seconds, and you can verify the GPU model by executing !nvidia-smi in a code cell. Remember that free-tier Colab sessions have a maximum runtime of 12 hours (though the exact limit fluctuates) and will disconnect after periods of inactivity. To avoid losing work, connect your notebook to Google Drive early (Step 4) and periodically save checkpoints.

Step 2: Understanding Runtime Options and Resource Limits

Colab offers three primary runtime types, each with distinct trade-offs. The CPU runtime is suitable for lightweight data processing, text analysis, or small traditional ML models (e.g., linear regression, decision trees). It uses a single virtual CPU core with roughly 12 GB of RAM. The GPU runtime adds an NVIDIA Tesla T4 (or occasionally K80) with 16 GB of VRAM, enabling you to train convolutional networks, recurrent networks, and transformer models of moderate size. The TPU runtime provides a Cloud TPU v2-8, which excels at large-scale matrix operations and is ideal for massive batch training of models like BERT or ResNet, but requires you to adapt your code (e.g., using tf.distribute.TPUStrategy in TensorFlow). Importantly, free-tier users are subject to usage quotas: you may be unable to select a GPU if you exceed your daily allocation (which can be as low as a few hours). Upgrading to Colab Pro or Pro+ (paid tiers) grants priority access to faster GPUs (V100 or A100) and longer session durations (up to 24 hours). The table below summarizes the differences.

Runtime Type Hardware RAM / VRAM Best For Cost
CPU (None) Single virtual CPU (Intel Xeon) ~12 GB RAM Small data, traditional ML, quick scripts Free
GPU NVIDIA Tesla T4 (or K80) 12 GB RAM + 15-16 GB VRAM Deep learning (CNNs, RNNs, transformers) Free (limited), Pro from $9.99/mo
TPU Google Cloud TPU v2-8 8 TPU cores each with 8 GB HBM Large-scale distributed training Free (limited), Pro+ from $49.99/mo

To monitor your resource usage, you can run !free -h for RAM and !nvidia-smi for GPU memory. Colab also displays a RAM and disk usage icon in the bottom‑right corner of the notebook toolbar. Keep an eye on these numbers: if you exceed the available RAM, the runtime will crash and you will lose all unsaved variables. A common workaround for memory overflow is to use batch processing, reduce batch size, or use data generators.

Step 3: Installing Libraries and Managing Dependencies

Out of the box, Colab includes many popular data science libraries: NumPy, pandas, matplotlib, scikit-learn, TensorFlow (CPU), and PyTorch. However, you will often need additional packages such as Hugging Face transformers, OpenCV, XGBoost, or a specific version of a framework. Colab makes it easy to install packages using pip, apt, or even conda. Simply prepend an exclamation mark to run shell commands. For example, to install the latest TensorFlow with GPU support (if needed), you can run !pip install tensorflow==2.15.0. For system libraries like FFmpeg or libgl-dev, use !apt-get install -y ffmpeg. One important nuance: each time you run a cell, it executes in the same persistent shell session, so installations survive across cells. However, if your runtime is factory reset (due to disconnection or manual “Reset all runtimes”), you need to reinstall everything. To avoid repetitive installation, you can create a special cell at the top of your notebook that runs all required installations and save the notebook. Better yet, use the !pip install --upgrade pattern to ensure you always have the latest version. Also be aware that Colab has a default list of preinstalled packages; you can view them by executing !pip list. For reproducibility, consider pinning specific version numbers using a requirements.txt file and installing it with !pip install -r requirements.txt. If your project uses custom packages or proprietary modules stored in Google Drive, you can add the Drive path to sys.path.

Step 4: Mounting Google Drive and Handling Data

Most machine learning projects involve datasets that are too large to upload each session. Google Drive integration is one of Colab’s most powerful features. To mount your Google Drive, run the following code snippet: from google.colab import drive; drive.mount('/content/drive'). This will prompt you to authorize access by clicking a link, copying an authorization code, and pasting it into the notebook. Once mounted, you can access any file in your Drive under /content/drive/MyDrive. You can then read CSV files with pandas, load images with OpenCV, or access pre-trained models stored in Drive. However, do not store your entire dataset directly in Drive if it exceeds 15 GB (the free storage limit). Instead, consider using Google Cloud Storage (GCS) or BigQuery for large datasets. For moderate datasets, you can upload files using the file browser panel on the left (click the folder icon, then upload), but this is not recommended for files over a few hundred megabytes because uploads are slow and temporary—they disappear after the runtime is reset. A better approach is to store your dataset in Drive and mount it. For even faster I/O, copy the dataset from Drive to the Colab VM’s local SSD under /content/ using !cp or shutil. The local SSD is ephemeral but offers much faster read/write speeds than Drive. For instance: !cp /content/drive/MyDrive/dataset.zip /content/ and then unzip. This pattern can cut data loading time by half, especially for image datasets.

When dealing with very large datasets (tens of gigabytes), you may run into Drive bandwidth limits. In such cases, use Kaggle datasets (via the Kaggle API) or directly download from public URLs using wget or !curl. Another excellent alternative is to use Google Cloud Storage buckets: you can authenticate with gcloud inside Colab and read data directly into memory. The table below lists common data loading methods and their typical throughput.

Method Read Speed (approx.) Persistence Best for
Drive Mount (direct read) ~5-20 MB/s Persistent across sessions Small to medium datasets (<2 GB)
Copy to local SSD then read ~100-200 MB/s Temporary (runtime specific) Large datasets (2-20 GB)
Cloud Storage (GCS) fsspec ~50-100 MB/s Persistent, scalable Very large datasets (>20 GB)
Direct URL download (wget) Varies (network dependent) Requires re-download each session Public datasets (e.g., COCO, ImageNet)

Step 5: Writing and Executing Machine Learning Code

Once your environment is configured and data is ready, you can start building your ML pipeline. Colab supports all major frameworks through standard Python imports. For a deep learning example using TensorFlow/Keras, you could define a simple convolutional neural network (CNN) for image classification. Use tf.keras.Sequential to stack layers, compile the model with an optimizer and loss function, and fit it on your data. Because you selected a GPU runtime, TensorFlow will automatically use the GPU for training—you do not need to manually place tensors on devices. To verify GPU usage, check tf.config.list_physical_devices('GPU'). For PyTorch, the equivalent is torch.cuda.is_available(). You can also use with tf.device('/GPU:0'): for explicit control. Colab notebooks allow you to split your code into logical cells, which is excellent for iterative experimentation. You can train a model for a few epochs, inspect metrics, adjust hyperparameters, and re‑run cells without losing state—unless reset. However, note that the free GPU session may be preempted if you exceed the usage quota, so it is wise to save model checkpoints periodically to Drive. For example, you can define a ModelCheckpoint callback in Keras that saves weights to your Drive path. Additionally, Colab provides a built-in %tensorboard magic command that launches TensorBoard directly in the notebook, allowing you to visualize training curves and model graphs without leaving the environment. For hyperparameter tuning, you can integrate with keras-tuner or optuna.

Step 6: Saving and Sharing Your Work

Colab automatically saves your notebook to a temporary location every few seconds, but to make it permanent you must save to Google Drive or GitHub. Click “File” > “Save a copy in Drive” to create a Drive‑backed version. The notebook will then appear in your Drive under a “Colab Notebooks” folder. You can also export as a .ipynb file, a Python script, or even a GitHub gist directly from the File menu. Sharing is as simple as sending the link of the open notebook (ensure the sharing permissions are set to “Anyone with the link can view” or “Comment”). If you want to collaborate in real time, multiple users can edit the same notebook simultaneously—changes appear in real time similar to Google Docs. This collaborative feature is invaluable for team projects and code reviews. Additionally, you can embed Colab notebooks in blogs or documentation using the “Share” button’s embed option. For reproducibility, combine your notebook with a requirements.txt file and place both in a GitHub repository; you can then open the notebook directly from GitHub by substituting github.com with colab.research.google.com/github in the URL.

Tips and Best Practices for Google Colab Machine Learning

Tip 1: Optimize Session Time and Avoid Idle Disconnections

Colab’s free tier has a 12-hour session limit, but the runtime may disconnect much earlier if you leave the browser tab inactive for 90 minutes or so. To keep your session alive longer, you can run a small piece of code that periodically simulates activity, such as a loop that prints the current time every minute. However, this is considered against the spirit of fair use and may result in termination. Instead, adopt good habits: always save checkpoints to Drive at regular intervals (e.g., after each epoch using a custom callback). Also, consider using the %cp magic to copy output files back to Drive. If you are training a model that takes many hours, break the training into smaller chunks and save intermediate results. Another practical trick: use the “Runtime” > “Manage Sessions” panel to see the elapsed time and remaining quota. For long-running experiments, Colab Pro+ allows up to 24 hours continuous use with higher priority.

Tip 2: Manage Memory Efficiently with Garbage Collection and Batch Processing

The default RAM of 12 GB is shared between your operating system and Python objects. If you load an entire dataset into memory (e.g., all training images at once), you may quickly exhaust it. Use data generators or tf.data.Dataset with batch and prefetch to stream data from disk. After each epoch, explicitly delete large tensors using del variable and call import gc; gc.collect() to free unreferenced memory. Monitor RAM usage with the sidebar indicator; if it turns red, you are close to the limit. In addition, leverage Colab’s ability to run multiple notebooks in parallel (each in its own runtime) to distribute heavy tasks. For GPU memory, reduce the batch size if you encounter ResourceExhaustedError. A typical batch size of 32 for a ResNet-50 on a 16 GB GPU may need to be lowered to 16 or 8. Also, use mixed-precision training (float16) via tf.keras.mixed_precision to halve memory usage with minimal impact on accuracy.

Tip 3: Use Version Control and Scripts for Reproducibility

While notebooks are great for exploration, they can become messy quickly. Create a main script (.py) that contains your model definition and training loop, and then call it from a notebook cell using !python train.py. This separates the experimentation environment from the production code. Use Git integration by connecting the notebook to a GitHub repository via the “File” > “Save a copy to GitHub” option. Alternatively, use the %load_ext and %run magics to execute an external Python file. Always pin library versions in your first cell and document your environment with !pip freeze > requirements.txt at the end of each session. When sharing a notebook, include a clear comment about the runtime type required (GPU/TPU). Another best practice is to set a random seed for reproducibility: tf.random.set_seed(42); np.random.seed(42). Colab also integrates with Weights & Biases (WandB) for experiment tracking—simply install the wandb library and log your hyperparameters and metrics.

Frequently Asked Questions about Google Colab for Machine Learning

Q1: Is Google Colab completely free? What are the limitations?

Yes, Colab offers a free tier that includes CPU, GPU (T4 or K80), and TPU access, but with restrictions. Free users get limited GPU hours per day (sometimes as low as 2-4 hours) and sessions that can be preempted. The maximum session length is approximately 12 hours. Additionally, you cannot use certain advanced features like background execution or high‑memory VMs (25 GB RAM) without paying. For heavier usage, Colab Pro ($9.99/month) provides priority access to better GPUs (V100) and longer sessions up to 24 hours, while Pro+ ($49.99/month) offers even faster GPUs (A100) and more memory.

Q2: How can I avoid losing my work when the runtime disconnects?

The most reliable method is to save your notebook to Google Drive frequently (File > Save a copy in Drive). Also, periodically save model checkpoints and intermediate outputs to Drive using model.save_weights('/content/drive/MyDrive/checkpoints/'). If your runtime crashes, you lose the in-memory state, but the saved files remain. For long training runs, consider breaking the training into increments and resuming from the latest checkpoint. Some users run a script that pings Drive every few minutes to keep the session alive, but this is discouraged and may cause account flags.

Q3: Can I use custom Python libraries that are not preinstalled?

Absolutely. Use !pip install package_name in a code cell. You can also install packages from GitHub or local files. For conda environments, run !conda install -c conda-forge package_name (though conda can be slow on Colab). If you need system dependencies (e.g., poppler-utils for PDF processing), use !apt-get install -y. Just remember that custom installations persist only for the duration of the runtime session.

Q4: How do I connect a local runtime to Colab?

If you have a powerful local machine with a GPU, you can run Colab notebooks locally while still benefiting from the cloud interface. Click the “Connect” button dropdown and select “Connect to local runtime…” Follow the instructions to install the jupyter_http_over_ws extension and start a local Jupyter server. The Colab frontend then communicates with your local kernel. This is useful for projects that require high disk I/O or specific hardware that Colab’s free tier doesn’t provide, but it still requires your local machine to be on.

Q5: What is the best way to handle large datasets that do not fit in RAM?

Use streaming or lazy loading. For images, use tf.keras.preprocessing.image_dataset_from_directory with batch_size=32 and image_size=(224,224) – this creates a dataset pipeline that loads images on the fly. For tabular data, use pandas.read_csv with the chunksize parameter and process in chunks. If you have a very large dataset (100s of GB), store it in Google Cloud Storage and use TensorFlow’s tf.data.experimental.make_csv_dataset or the Hugging Face datasets library, which can stream from GCS efficiently.

Q6: Can I run multiple notebooks simultaneously in the free tier?

Yes, you can open multiple Colab tabs, each with its own runtime. However, each session may share the same GPU quota and account limit. Running many notebooks in parallel will quickly exhaust your daily GPU time. Also, each runtime uses separate system resources; you might run into memory or CPU limitations on your local browser. For efficiency, it is better to use a single notebook that handles multiple experiments sequentially.

Conclusion

Google Colab has revolutionized the way machine learning practitioners an d students develop and experiment with models. It eliminates the upfront cost of hardware, simplifies collaboration, and provides access to cutting-edge accelerators like GPUs and TPUs. In this guide, we walked through every essential step: getting started with notebooks, selecting the appropriate runtime, managing dependencies and data, writing and executing ML code, and saving your work for later reuse. We also covered practical tips that can help you avoid common frustrations such as session timeouts, memory overload, and reproducibility issues. By incorporating best practices like mounting Google Drive, using data generators, and saving checkpoints, you can make your Colab experience both productive and reliable.

As you continue your machine learning journey, remember that Colab is not only a tool for learning but also a platform for rapid prototyping and even deployment via its integration with TensorFlow Serving or Hugging Face Spaces. The community has created thousands of open-source Colab notebooks on GitHub covering everything from natural language processing to generative AI. Take advantage of these resources, and do not hesitate to clone and modify them for your own projects. With the knowledge from this guide, you are now equipped to leverage Google Colab’s full potential—whether you are training a simple linear regression or a state-of-the-art transformer. Go ahead, open a new notebook, and let your machine learning experiments take off.

sarah antaboga
Author: sarah antaboga

Leave a Reply

Your email address will not be published. Required fields are marked *