class DateLevel(Enum):
"""An enum representing different date levels."""
YEAR = "year"
QUARTER = "quarter"
MONTH = "month"
DAY = "day"
HOUR = "hour"
MINUTE = "minute"
SECOND = "second"
MICROSECOND = "microsecond"
NANOSECOND = "nanosecond"
WEEKDAY = "weekday"
DAYOFYEAR = "dayofyear"
DAYSINMONTH = "daysinmonth"Datetime preprocessors
DateLevel
DateLevel (value, names=None, module=None, qualname=None, type=None, start=1)
An enum representing different date levels.
DatetimeEncoder
DatetimeEncoder (levels:Optional[Sequence[DateLevel]]=None, fmt:Optional[str]=None)
An encoder for datetime columns that outputs integer features
levels is a list of DateLevel that define which date features to extract, i.e, [DateLevel.HOUR, DateLevel.MINUTE] will extract hours and minutes. If left to the default None, all available features will be extracted initially, but zero variance features will be dropped (for example, because the dates don’t have seconds).
| Type | Default | Details | |
|---|---|---|---|
| levels | Optional[SequenceDateLevel] | None | Date features to extract. |
| fmt | Optional[str] | None | Date format for string conversion if inputs are note datetime-like objects. Follows standard Pandas/stdlib formatting, or example, ‘%Y-%m-%d %H:%M:%S’. |
DatetimeEncoder.fit
DatetimeEncoder.fit (X:Union[pandas.core.frame.DataFrame,numpy.ndarray,L ist], y=None)
Fit the DatetimeEncoder.
| Type | Default | Details | |
|---|---|---|---|
| X | Union[pd.DataFrame, np.ndarray, List] | Datetime-like features.. | |
| y | NoneType | None | Unused. |
| Returns | DatetimeEncoder | Fitted DatetimeEncoder. |
After fitting, the categories of each feature are held in the categories_ attribute.
DatetimeEncoder.transform
DatetimeEncoder.transform (X:Union[pandas.core.frame.DataFrame,numpy.nda rray,List])
Apply transformation. Will ignore zero variance features seen during DatetimeEncoder.fit.
While this transformer is generally stateless, during DatetimeEncoder.fit it checks whether any of the extracted features have zero variance (only one unique value) and sets those levels to be ignored during DatetimeEncoder.transform.
| Type | Details | |
|---|---|---|
| X | Union[pd.DataFrame, np.ndarray, List] | The data to encode. |
| Returns | np.ndarray | Transformed input. |
TransformerMixin.fit_transform
TransformerMixin.fit_transform (X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.
| Type | Default | Details | |
|---|---|---|---|
| X | array-like of shape (n_samples, n_features) | Input samples. | |
| y | NoneType | None | Target values (None for unsupervised transformations). |
| fit_params | |||
| Returns | ndarray array of shape (n_samples, n_features_new) | Transformed array. |
import pandas as pdX = pd.DataFrame(
{
"hours": pd.date_range(start="2022-01-01", freq="H", periods=25),
"days": pd.date_range(start="2022-01-01", freq="D", periods=25),
}
)
encoder = DatetimeEncoder()
pd.DataFrame(encoder.fit_transform(X), columns=encoder.get_feature_names_out()).head()| hours_day | hours_hour | hours_weekday | hours_dayofyear | days_day | days_weekday | days_dayofyear | |
|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 5 | 1 | 1 | 5 | 1 |
| 1 | 1 | 1 | 5 | 1 | 2 | 6 | 2 |
| 2 | 1 | 2 | 5 | 1 | 3 | 0 | 3 |
| 3 | 1 | 3 | 5 | 1 | 4 | 1 | 4 |
| 4 | 1 | 4 | 5 | 1 | 5 | 2 | 5 |
Dates can be strings as well, but datetimes and strings cannot be combined.
date_format = "%Y-%m-%d"
X = pd.DataFrame(
{
"days": pd.date_range(start="2022-01-01", freq="D", periods=25).strftime(
date_format
),
"quarters": pd.date_range(start="2023-01-01", freq="Q", periods=25).strftime(
date_format
),
}
)
encoder = DatetimeEncoder(fmt=date_format)
pd.DataFrame(encoder.fit_transform(X), columns=encoder.get_feature_names_out()).head()| days_day | days_weekday | days_dayofyear | quarters_year | quarters_quarter | quarters_month | quarters_day | quarters_weekday | quarters_dayofyear | quarters_daysinmonth | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 5 | 1 | 2023 | 1 | 3 | 31 | 4 | 90 | 31 |
| 1 | 2 | 6 | 2 | 2023 | 2 | 6 | 30 | 4 | 181 | 30 |
| 2 | 3 | 0 | 3 | 2023 | 3 | 9 | 30 | 5 | 273 | 30 |
| 3 | 4 | 1 | 4 | 2023 | 4 | 12 | 31 | 6 | 365 | 31 |
| 4 | 5 | 2 | 5 | 2024 | 1 | 3 | 31 | 6 | 91 | 31 |
Date levels may be chosen.
encoder = DatetimeEncoder(
levels=[DateLevel.DAY, DateLevel.HOUR, DateLevel.MONTH], fmt=date_format
)
pd.DataFrame(encoder.fit_transform(X), columns=encoder.get_feature_names_out()).head()| days_day | quarters_day | quarters_month | |
|---|---|---|---|
| 0 | 1 | 31 | 3 |
| 1 | 2 | 30 | 6 |
| 2 | 3 | 30 | 9 |
| 3 | 4 | 31 | 12 |
| 4 | 5 | 31 | 3 |