class DateLevel(Enum):
"""An enum representing different date levels."""
= "year"
YEAR = "quarter"
QUARTER = "month"
MONTH = "day"
DAY = "hour"
HOUR = "minute"
MINUTE = "second"
SECOND = "microsecond"
MICROSECOND = "nanosecond"
NANOSECOND = "weekday"
WEEKDAY = "dayofyear"
DAYOFYEAR = "daysinmonth" DAYSINMONTH
Datetime preprocessors
DateLevel
DateLevel (value, names=None, module=None, qualname=None, type=None, start=1)
An enum representing different date levels.
DatetimeEncoder
DatetimeEncoder (levels:Optional[Sequence[DateLevel]]=None, fmt:Optional[str]=None)
An encoder for datetime columns that outputs integer features
levels
is a list of DateLevel
that define which date features to extract, i.e, [DateLevel.HOUR
, DateLevel.MINUTE
] will extract hours and minutes. If left to the default None
, all available features will be extracted initially, but zero variance features will be dropped (for example, because the dates don’t have seconds).
Type | Default | Details | |
---|---|---|---|
levels | Optional[SequenceDateLevel] | None | Date features to extract. |
fmt | Optional[str] | None | Date format for string conversion if inputs are note datetime-like objects. Follows standard Pandas/stdlib formatting, or example, ‘%Y-%m-%d %H:%M:%S’. |
DatetimeEncoder.fit
DatetimeEncoder.fit (X:Union[pandas.core.frame.DataFrame,numpy.ndarray,L ist], y=None)
Fit the DatetimeEncoder.
Type | Default | Details | |
---|---|---|---|
X | Union[pd.DataFrame, np.ndarray, List] | Datetime-like features.. | |
y | NoneType | None | Unused. |
Returns | DatetimeEncoder | Fitted DatetimeEncoder . |
After fitting, the categories of each feature are held in the categories_
attribute.
DatetimeEncoder.transform
DatetimeEncoder.transform (X:Union[pandas.core.frame.DataFrame,numpy.nda rray,List])
Apply transformation. Will ignore zero variance features seen during DatetimeEncoder.fit
.
While this transformer is generally stateless, during DatetimeEncoder.fit
it checks whether any of the extracted features have zero variance (only one unique value) and sets those levels to be ignored during DatetimeEncoder.transform
.
Type | Details | |
---|---|---|
X | Union[pd.DataFrame, np.ndarray, List] | The data to encode. |
Returns | np.ndarray | Transformed input. |
TransformerMixin.fit_transform
TransformerMixin.fit_transform (X, y=None, **fit_params)
Fit to data, then transform it.
Fits transformer to X
and y
with optional parameters fit_params
and returns a transformed version of X
.
Type | Default | Details | |
---|---|---|---|
X | array-like of shape (n_samples, n_features) | Input samples. | |
y | NoneType | None | Target values (None for unsupervised transformations). |
fit_params | |||
Returns | ndarray array of shape (n_samples, n_features_new) | Transformed array. |
import pandas as pd
= pd.DataFrame(
X
{"hours": pd.date_range(start="2022-01-01", freq="H", periods=25),
"days": pd.date_range(start="2022-01-01", freq="D", periods=25),
}
)
= DatetimeEncoder()
encoder =encoder.get_feature_names_out()).head() pd.DataFrame(encoder.fit_transform(X), columns
hours_day | hours_hour | hours_weekday | hours_dayofyear | days_day | days_weekday | days_dayofyear | |
---|---|---|---|---|---|---|---|
0 | 1 | 0 | 5 | 1 | 1 | 5 | 1 |
1 | 1 | 1 | 5 | 1 | 2 | 6 | 2 |
2 | 1 | 2 | 5 | 1 | 3 | 0 | 3 |
3 | 1 | 3 | 5 | 1 | 4 | 1 | 4 |
4 | 1 | 4 | 5 | 1 | 5 | 2 | 5 |
Dates can be strings as well, but datetimes and strings cannot be combined.
= "%Y-%m-%d"
date_format = pd.DataFrame(
X
{"days": pd.date_range(start="2022-01-01", freq="D", periods=25).strftime(
date_format
),"quarters": pd.date_range(start="2023-01-01", freq="Q", periods=25).strftime(
date_format
),
}
)
= DatetimeEncoder(fmt=date_format)
encoder =encoder.get_feature_names_out()).head() pd.DataFrame(encoder.fit_transform(X), columns
days_day | days_weekday | days_dayofyear | quarters_year | quarters_quarter | quarters_month | quarters_day | quarters_weekday | quarters_dayofyear | quarters_daysinmonth | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 5 | 1 | 2023 | 1 | 3 | 31 | 4 | 90 | 31 |
1 | 2 | 6 | 2 | 2023 | 2 | 6 | 30 | 4 | 181 | 30 |
2 | 3 | 0 | 3 | 2023 | 3 | 9 | 30 | 5 | 273 | 30 |
3 | 4 | 1 | 4 | 2023 | 4 | 12 | 31 | 6 | 365 | 31 |
4 | 5 | 2 | 5 | 2024 | 1 | 3 | 31 | 6 | 91 | 31 |
Date levels may be chosen.
= DatetimeEncoder(
encoder =[DateLevel.DAY, DateLevel.HOUR, DateLevel.MONTH], fmt=date_format
levels
)=encoder.get_feature_names_out()).head() pd.DataFrame(encoder.fit_transform(X), columns
days_day | quarters_day | quarters_month | |
---|---|---|---|
0 | 1 | 31 | 3 |
1 | 2 | 30 | 6 |
2 | 3 | 30 | 9 |
3 | 4 | 31 | 12 |
4 | 5 | 31 | 3 |