Datetime preprocessors

Appropiate handling of datetime features.
class DateLevel(Enum):
    """An enum representing different date levels."""

    YEAR = "year"
    QUARTER = "quarter"
    MONTH = "month"
    DAY = "day"
    HOUR = "hour"
    MINUTE = "minute"
    SECOND = "second"
    MICROSECOND = "microsecond"
    NANOSECOND = "nanosecond"
    WEEKDAY = "weekday"
    DAYOFYEAR = "dayofyear"
    DAYSINMONTH = "daysinmonth"

source

DateLevel

 DateLevel (value, names=None, module=None, qualname=None, type=None,
            start=1)

An enum representing different date levels.


source

DatetimeEncoder

 DatetimeEncoder (levels:Optional[Sequence[DateLevel]]=None,
                  fmt:Optional[str]=None)

An encoder for datetime columns that outputs integer features

levels is a list of DateLevel that define which date features to extract, i.e, [DateLevel.HOUR, DateLevel.MINUTE] will extract hours and minutes. If left to the default None, all available features will be extracted initially, but zero variance features will be dropped (for example, because the dates don’t have seconds).

Type Default Details
levels Optional[SequenceDateLevel] None Date features to extract.
fmt Optional[str] None Date format for string conversion if inputs are note datetime-like objects.
Follows standard Pandas/stdlib formatting, or example, ‘%Y-%m-%d %H:%M:%S’.

source

DatetimeEncoder.fit

 DatetimeEncoder.fit
                      (X:Union[pandas.core.frame.DataFrame,numpy.ndarray,L
                      ist], y=None)

Fit the DatetimeEncoder.

Type Default Details
X Union[pd.DataFrame, np.ndarray, List] Datetime-like features..
y NoneType None Unused.
Returns DatetimeEncoder Fitted DatetimeEncoder.

After fitting, the categories of each feature are held in the categories_ attribute.


source

DatetimeEncoder.transform

 DatetimeEncoder.transform
                            (X:Union[pandas.core.frame.DataFrame,numpy.nda
                            rray,List])

Apply transformation. Will ignore zero variance features seen during DatetimeEncoder.fit.

While this transformer is generally stateless, during DatetimeEncoder.fit it checks whether any of the extracted features have zero variance (only one unique value) and sets those levels to be ignored during DatetimeEncoder.transform.

Type Details
X Union[pd.DataFrame, np.ndarray, List] The data to encode.
Returns np.ndarray Transformed input.

TransformerMixin.fit_transform

 TransformerMixin.fit_transform (X, y=None, **fit_params)

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Type Default Details
X array-like of shape (n_samples, n_features) Input samples.
y NoneType None Target values (None for unsupervised transformations).
fit_params
Returns ndarray array of shape (n_samples, n_features_new) Transformed array.
import pandas as pd
X = pd.DataFrame(
    {
        "hours": pd.date_range(start="2022-01-01", freq="H", periods=25),
        "days": pd.date_range(start="2022-01-01", freq="D", periods=25),
    }
)

encoder = DatetimeEncoder()
pd.DataFrame(encoder.fit_transform(X), columns=encoder.get_feature_names_out()).head()
hours_day hours_hour hours_weekday hours_dayofyear days_day days_weekday days_dayofyear
0 1 0 5 1 1 5 1
1 1 1 5 1 2 6 2
2 1 2 5 1 3 0 3
3 1 3 5 1 4 1 4
4 1 4 5 1 5 2 5

Dates can be strings as well, but datetimes and strings cannot be combined.

date_format = "%Y-%m-%d"
X = pd.DataFrame(
    {
        "days": pd.date_range(start="2022-01-01", freq="D", periods=25).strftime(
            date_format
        ),
        "quarters": pd.date_range(start="2023-01-01", freq="Q", periods=25).strftime(
            date_format
        ),
    }
)

encoder = DatetimeEncoder(fmt=date_format)
pd.DataFrame(encoder.fit_transform(X), columns=encoder.get_feature_names_out()).head()
days_day days_weekday days_dayofyear quarters_year quarters_quarter quarters_month quarters_day quarters_weekday quarters_dayofyear quarters_daysinmonth
0 1 5 1 2023 1 3 31 4 90 31
1 2 6 2 2023 2 6 30 4 181 30
2 3 0 3 2023 3 9 30 5 273 30
3 4 1 4 2023 4 12 31 6 365 31
4 5 2 5 2024 1 3 31 6 91 31

Date levels may be chosen.

encoder = DatetimeEncoder(
    levels=[DateLevel.DAY, DateLevel.HOUR, DateLevel.MONTH], fmt=date_format
)
pd.DataFrame(encoder.fit_transform(X), columns=encoder.get_feature_names_out()).head()
days_day quarters_day quarters_month
0 1 31 3
1 2 30 6
2 3 30 9
3 4 31 12
4 5 31 3