Subclassing pandas data structures
The problem
You want to extend pandas data structures with custom attributes and methods. The classes you define are subclasses of DataFrame and Series. That means that instances of your class are instances of DataFrame or Series as well. You won’t loose functionality of the original pandas data structures. Methods that accept pandas data structures, such as matplotlib, will treat your objects in the same manner.
A note of caution
Subclassing data structures is only recommended for advanced pandas users who aren’t scared of bug fixing. There is not an abundance of documentation and you might bump into issues for which stackoverflow does not have a ready made sollution.
Also, there are alternatives to subclassing. If you want to extend pandas data structures but you don’t need your objects to be instances of a Series/DataFrame, then consider composition, e.g.
Usage:
Are you sure you need to subclass? Then please continue reading. The guide below is based on this documentation and other sources addressing specific issues (linked below).
For this post I have used Python 3.10 and pandas 1.5.1.
Steps
- Define the subclasses
- Override constructor properties
- Define original properties
1. Define the subclasses
In this guide we will assume that you want to create two subclasses, a child for Series and a child for DataFrame. We create the child classes as usual, simply by sending the parent classes as a parameter.
When creating objects from these classes, the parameters get send to the parent class (__init__
is inherited from the parent):
2. Override constructor properties
If we manipulate these structures, then the child class might be lost. For example:
When manipulating, you want your SubclassedSeries to construct and return a SubclassedSeries and SubclassedDataFrame to construct and return a SubclassedDataFrame. For that one needs to overwrite the _constructor
property:
Now:
Likewise, we want the SubclassedSeries to construct a SubclassedDataFrame when going from 1D to 2D and vice versa.
Right now, a SubclassedSeries constructs a pandas DataFrame:
Vice versa, slicing a SubclassedDataFrame returns a pandas.Series instead of a SubclassedSeries.
To fix this, we need to override the _constructor_expanddim
and _constructor_sliced
properties.
Unfortunately, this will not copy the metadata (e.g. data you have added for original properties, see below). Here is a workaround (that may become unnecessary with a future release of pandas):
3. Define original properties
You are now ready to define new properties.
Transfering metadata
If at any point you need to transfer metadata from one instance of a subclassed data structure to another, just call __finalize__(self, method='inherit')
, e.g.