Having appropriate dtypes for your Series and DataFrame is very important for many reasons:
- Memory management: using the right dtype for a particular series can dramatically reduce its memory usage, and by extension this also applies to dataframes
- Interpretation: anyone else (human or computer) will make assumptions on your data based on its dtype: if a column full of integers is stored as a string, they will treat it as strings, not integers
- It enforces you to have clean data, like dealing with missing values or mis-recorded values. This will ease the data-crunching down the road a lot
And there are probably many more reasons, can you name a few? If so please write it in a comment.
In this first post of my pandas series, I want to review the basics of pandas datatypes — or dtypes.
We will first review the available dtypes pandas offers, then I’ll focus on 4 useful dtypes that will fulfill 95% of your needs, namely numerical dtypes, boolean dtype, string dtype, and categorical dtypes.
The end goal of this first post is to make you more comfortable with the various data types availables in pandas and what are their differences.
If you’re interested in pandas and time-series, make sure to check out my Fourier-transform for time-serie posts:
- Review how the convolution relate to the Fourier transform and how fast it is:
- Deepen your understanding of convolution using image examples:
- Understand how the Fourier-transform can be visualy…