Data science projects revolve around the core elements of experimentation and visualization, where your ideas take center stage. In this blog, I’ll walk you through the process of setting up an environment that not only fosters these essential aspects but also guarantees reproducibility, streamlining your data science journey for improved efficiency.
There are three key parts of this setup:
- Package management and Virtual env tools
- Jupyter Lab with Git support
- (Optionally) Jupyter Lab extensions
In the data science landscape, experimentation drives innovation. A stable and reproducible Python environment is pivotal to conducting credible, replicable experiments. This section explores how package management and virtualization tools enable experimentation, empowering you to refine your data science solutions with confidence. In this section, we will delve into venv and poetry as tools to achieve this critical goal.
Venv provides a simple yet effective way to create isolated environments for Python projects. These environments allow you to control and manage project-specific dependencies, ensuring that your experiments and code remain stable and reproducible.
To create a virtual environment using Venv, simply execute the following code snippet in the root folder of your project:
This command will initiate a virtual environment named “.venv” in your project’s directory, providing you with a clean and isolated space for managing dependencies and conducting experiments.
If you’re interested in exploring more about venv and other alternatives like virtualenv and virtualenvwrapper, I recommend checking out this excellent article.
Poetry is a robust Python dependency management tool that goes beyond venv by simplifying dependency handling, project packaging, and distribution. It offers a unified solution for managing project-specific dependencies in a user-friendly format (pyproject.toml), streamlining the process of creating isolated environments, specifying version constraints, and packaging your projects. In this section, we’ll cover the fundamentals of Poetry.
Create a poetry project:
You’ll receive prompts to add packages for installation in the virtual environment, and Poetry will create a pyproject.toml file in your project directory, containing these package details. To install the dependencies within the virtual environment:
After creating the pyproject.toml file and installing your initial dependencies, if you wish to add a new package to your Poetry environment, you can do so by following these steps:
If you have a local package registry that you’d like to support, follow these steps:
If you are working with multiple projects that are under active development and want to add a local poetry project as a dependency on your current poetry project, use the following command:
When you’ve made the decision to publish your project as a Python package and share it with the world, use the following command to create and publish the package:
Basics of Jupyter Lab
Project Jupyter originated from the success of IPython notebooks and has since evolved to its next-generation platform known as Jupyter Lab. Jupyter Lab boasts advanced features, including visual debugging and enhanced extension support, similar to the functionalities found in modern IDEs such as VS Code
Install Jupyter lab:
Jupyter Lab offers a rich set of IDE features that significantly simplify your data science journey, making it more efficient and productive.
- Multiple panes: Jupyter Lab offers a multi-pane interface, allowing you to split your workspace into different sections or views.
- Console Attachment: You can attach consoles to both notebooks and scripts within Jupyter Lab. This means you can interactively run code, execute commands, and see the results directly alongside your code, enhancing the debugging and exploration process.
- Terminal: In Jupyter Lab, you have the capability to access the Linux terminal, providing you with substantial command-line power for various tasks and system operations.
To learn more about Jupyter Lab features, I will refer you to Jupyter Lab’s official documentation here.
Jupyter notebooks, being JSON files, often pose compatibility challenges with the default diff and merge tools in Linux. Nbdime steps in to bridge this gap by offering specialized diffing and merging functions tailored for Jupyter notebooks. Moreover, it seamlessly integrates with Git and extends its capabilities to Jupyter Lab, providing an enhanced experience for managing and collaborating on your data science projects.
Nbdime comes equipped with extensions designed for seamless integration with Jupyter Lab, offering the convenience of a simple diff button within your Jupyter Lab environment. To enable these extensions, you can use the following command:
To enable the integration of nbdime with git run the command below, Note: This command edits your global .gitconfig file.
Nbdime also comes with command line programs with a web interface for diff (nbdiff-web) and merging (nbmerge-web). Check this page for more details
In the previous section, we explored the fundamentals of Jupyter Lab and delved into Nbdime. Now, we’ll further expand our knowledge by exploring various methods to enhance Jupyter Lab through the utilization of Jupyter Lab extensions.
To display the execution time of each cell:
Interactive Matplotlib plots
To visualize Matplotlib plots with interactive features like zoom-in/out:
Language Server Protocol (LSP)
LSP (Language Server Protocol) empowers Jupyter Lab with a set of IDE-like features, including the ability to:
- Locate the declaration of a variable or method.
- Perform comprehensive variable renaming throughout the entire notebook.
- Conduct PEP8 style error checking and much more.
- and more…
This extension allows you to minimize sections based on heading level
Table of Contents (TOC)
This extension provides a TOC sidebar, that helps navigate between sections of Jupyter Notebook.
To recap, in this blog, we’ve emphasized the significance of stable and reproducible Python environments, delved into Venv and Poetry for robust dependency management, and explored the capabilities of Jupyter Lab, Nbdime, and Jupyter extensions. If you found this post valuable, please consider liking and subscribing for more insightful content. Thank you for being part of our community!