
The pandas library is great for analysing and manipulating tabular data. It builds on the Python ecosystem, and compared to Excel, offers easy integration into software. Therefore, it is very popular among data scientists, both for data exploration, data visualization, and analysis.
What I often find lacking, however, is good documentation of the many functionalities that pandas has to offer. Sometimes I find a function that solves an issue where I had a workaround implemented for years. This convenience then saves me a lot of time and headache.
Today I am going to show you three useful pandas functions that can be applied to DataFrames:
- The
pivot
function - The
melt
function - The
explode
function
I will use the Penguins dataset to illustrate the use of these functions.
Penguin data as a Pandas DataFrame
The DataFrame
is the standard type for representing tabular data in pandas. It is comparable to the data in an Excel spreadsheet. Data is organized in rows and columns. Take, for example, the penguins dataset, available at https://github.com/allisonhorst/palmerpenguins under the CC0 license.