Introduction
Time should be a data type often used in data processing. In addition to the two data types of datetime64 and timedelta64 in Numpy, pandas also integrates functions in other python libraries such as scikits.timeseries
.
Time classification
There are four time types in pandas:
- Date times: Date and time, with time zone.
datetime.datetime
similar to 06168f5c5ba20e in the standard library. - Time deltas: Absolute duration, similar to
datetime.timedelta
- Time spans: The time span defined by the time point and its associated frequency.
- Date offsets: The time calculated based on the calendar is similar to dateutil.relativedelta.relativedelta.
We use a table to represent:
type | Scalar class | Array class | pandas data type | Main creation method |
---|---|---|---|---|
Date times | Timestamp | DatetimeIndex | datetime64[ns] or datetime64[ns, tz] | to_datetime or date_range |
Time deltas | Timedelta | TimedeltaIndex | timedelta64[ns] | to_timedelta or timedelta_range |
Time spans | Period | PeriodIndex | period[freq] | Period or period_range |
Date offsets | DateOffset | None | None | DateOffset |
Look at an example of use:
In [19]: pd.Series(range(3), index=pd.date_range("2000", freq="D", periods=3))
Out[19]:
2000-01-01 0
2000-01-02 1
2000-01-03 2
Freq: D, dtype: int64
Take a look at the null value of the above data type:
In [24]: pd.Timestamp(pd.NaT)
Out[24]: NaT
In [25]: pd.Timedelta(pd.NaT)
Out[25]: NaT
In [26]: pd.Period(pd.NaT)
Out[26]: NaT
# Equality acts as np.nan would
In [27]: pd.NaT == pd.NaT
Out[27]: False
Timestamp
Timestamp is the most basic time type, we can create it like this:
In [28]: pd.Timestamp(datetime.datetime(2012, 5, 1))
Out[28]: Timestamp('2012-05-01 00:00:00')
In [29]: pd.Timestamp("2012-05-01")
Out[29]: Timestamp('2012-05-01 00:00:00')
In [30]: pd.Timestamp(2012, 5, 1)
Out[30]: Timestamp('2012-05-01 00:00:00')
DatetimeIndex
Timestamp as an index will automatically be converted to DatetimeIndex:
In [33]: dates = [
....: pd.Timestamp("2012-05-01"),
....: pd.Timestamp("2012-05-02"),
....: pd.Timestamp("2012-05-03"),
....: ]
....:
In [34]: ts = pd.Series(np.random.randn(3), dates)
In [35]: type(ts.index)
Out[35]: pandas.core.indexes.datetimes.DatetimeIndex
In [36]: ts.index
Out[36]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)
In [37]: ts
Out[37]:
2012-05-01 0.469112
2012-05-02 -0.282863
2012-05-03 -1.509059
dtype: float64
date_range and bdate_range
You can also use date_range to create a DatetimeIndex:
In [74]: start = datetime.datetime(2011, 1, 1)
In [75]: end = datetime.datetime(2012, 1, 1)
In [76]: index = pd.date_range(start, end)
In [77]: index
Out[77]:
DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
'2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
'2011-01-09', '2011-01-10',
...
'2011-12-23', '2011-12-24', '2011-12-25', '2011-12-26',
'2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30',
'2011-12-31', '2012-01-01'],
dtype='datetime64[ns]', length=366, freq='D')
date_range
is the calendar range, bdate_range
is the working day range:
In [78]: index = pd.bdate_range(start, end)
In [79]: index
Out[79]:
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
'2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
'2011-01-13', '2011-01-14',
...
'2011-12-19', '2011-12-20', '2011-12-21', '2011-12-22',
'2011-12-23', '2011-12-26', '2011-12-27', '2011-12-28',
'2011-12-29', '2011-12-30'],
dtype='datetime64[ns]', length=260, freq='B')
Both methods can take start
, end
, and periods
parameters.
In [84]: pd.bdate_range(end=end, periods=20)
In [83]: pd.date_range(start, end, freq="W")
In [86]: pd.date_range("2018-01-01", "2018-01-05", periods=5)
origin
Using the origin
parameter, you can modify the starting point of DatetimeIndex
In [67]: pd.to_datetime([1, 2, 3], unit="D", origin=pd.Timestamp("1960-01-01"))
Out[67]: DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)
By default, origin='unix'
, that is, the starting point is 1970-01-01 00:00:00
.
In [68]: pd.to_datetime([1, 2, 3], unit="D")
Out[68]: DatetimeIndex(['1970-01-02', '1970-01-03', '1970-01-04'], dtype='datetime64[ns]', freq=None)
format
Use the format parameter to format the time:
In [51]: pd.to_datetime("2010/11/12", format="%Y/%m/%d")
Out[51]: Timestamp('2010-11-12 00:00:00')
In [52]: pd.to_datetime("12-11-2010 00:00", format="%d-%m-%Y %H:%M")
Out[52]: Timestamp('2010-11-12 00:00:00')
Period
Period represents a time span, usually used with freq:
In [31]: pd.Period("2011-01")
Out[31]: Period('2011-01', 'M')
In [32]: pd.Period("2012-05", freq="D")
Out[32]: Period('2012-05-01', 'D')
Period can be calculated directly:
In [345]: p = pd.Period("2012", freq="A-DEC")
In [346]: p + 1
Out[346]: Period('2013', 'A-DEC')
In [347]: p - 3
Out[347]: Period('2009', 'A-DEC')
In [348]: p = pd.Period("2012-01", freq="2M")
In [349]: p + 2
Out[349]: Period('2012-05', '2M')
In [350]: p - 1
Out[350]: Period('2011-11', '2M')
Note that Period can only perform arithmetic operations with the same freq. Including offsets and timedelta
In [352]: p = pd.Period("2014-07-01 09:00", freq="H")
In [353]: p + pd.offsets.Hour(2)
Out[353]: Period('2014-07-01 11:00', 'H')
In [354]: p + datetime.timedelta(minutes=120)
Out[354]: Period('2014-07-01 11:00', 'H')
In [355]: p + np.timedelta64(7200, "s")
Out[355]: Period('2014-07-01 11:00', 'H')
Period as index can be automatically converted to PeriodIndex:
In [38]: periods = [pd.Period("2012-01"), pd.Period("2012-02"), pd.Period("2012-03")]
In [39]: ts = pd.Series(np.random.randn(3), periods)
In [40]: type(ts.index)
Out[40]: pandas.core.indexes.period.PeriodIndex
In [41]: ts.index
Out[41]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')
In [42]: ts
Out[42]:
2012-01 -1.135632
2012-02 1.212112
2012-03 -0.173215
Freq: M, dtype: float64
The PeriodIndex can be created through the pd.period_range method:
In [359]: prng = pd.period_range("1/1/2011", "1/1/2012", freq="M")
In [360]: prng
Out[360]:
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
'2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
'2012-01'],
dtype='period[M]', freq='M')
It can also be created directly through PeriodIndex:
In [361]: pd.PeriodIndex(["2011-1", "2011-2", "2011-3"], freq="M")
Out[361]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='period[M]', freq='M')
DateOffset
DateOffset represents the frequency object. It is very similar to Timedelta in that it represents a duration, but has special calendar rules. For example, Timedelta must be 24 hours a day, while in DateOffset, depending on the daylight saving time, there may be 23, 24, or 25 hours a day.
# This particular day contains a day light savings time transition
In [144]: ts = pd.Timestamp("2016-10-30 00:00:00", tz="Europe/Helsinki")
# Respects absolute time
In [145]: ts + pd.Timedelta(days=1)
Out[145]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')
# Respects calendar time
In [146]: ts + pd.DateOffset(days=1)
Out[146]: Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki')
In [147]: friday = pd.Timestamp("2018-01-05")
In [148]: friday.day_name()
Out[148]: 'Friday'
# Add 2 business days (Friday --> Tuesday)
In [149]: two_business_days = 2 * pd.offsets.BDay()
In [150]: two_business_days.apply(friday)
Out[150]: Timestamp('2018-01-09 00:00:00')
In [151]: friday + two_business_days
Out[151]: Timestamp('2018-01-09 00:00:00')
In [152]: (friday + two_business_days).day_name()
Out[152]: 'Tuesday'
DateOffsets and Frequency calculations are first off, take a look at the available Date Offset and its associated Frequency :
Date Offset | Frequency String | describe |
---|---|---|
DateOffset | None | General offset class |
BDay or BusinessDay | 'B' | Working day |
CDay or CustomBusinessDay | 'C' | Custom working day |
Week | 'W' | A week |
WeekOfMonth | 'WOM' | The first few days of each month |
LastWeekOfMonth | 'LWOM' | The day of the last week of each month |
MonthEnd | 'M' | Calendar month end |
MonthBegin | 'MS' | Beginning of calendar month |
BMonthEnd or BusinessMonthEnd | 'BM' | End of business month |
BMonthBegin or BusinessMonthBegin | 'BMS' | Beginning of business month |
CBMonthEnd or CustomBusinessMonthEnd | 'CBM' | Customize the end of the business month |
CBMonthBegin or CustomBusinessMonthBegin | 'CBMS' | Customize the beginning of the business month |
SemiMonthEnd | 'SM' | 15th day at the end of the calendar month |
SemiMonthBegin | 'SMS' | 15th day of the beginning of the calendar month |
QuarterEnd | 'Q' | Calendar season end |
QuarterBegin | 'QS' | Beginning of calendar season |
BQuarterEnd | 'BQ | End of working season |
BQuarterBegin | 'BQS' | Beginning of the working season |
FY5253Quarter | 'REQ' | Retail season (52-53 week) |
YearEnd | 'A' | Calendar year end |
YearBegin | 'AS' or 'BYS' | The beginning of the calendar |
BYearEnd | 'BA' | End of business year |
BYearBegin | 'BAS' | Beginning of business |
FY5253 | 'RE' | Retail year (aka 52-53 week) |
Easter | None | Easter Vacation |
BusinessHour | 'BH' | business hour |
CustomBusinessHour | 'CBH' | custom business hour |
Day | 'D' | Absolute time of day |
Hour | 'H' | One hour |
Minute | 'T' or 'min' | One minute |
Second | 'S' | one second |
Milli | 'L' or 'ms' | A subtle |
Micro | 'U' or 'us' | One millisecond |
Nano | 'N' | One nanosecond |
DateOffset also has two methods rollforward()
and rollback()
to move the time:
In [153]: ts = pd.Timestamp("2018-01-06 00:00:00")
In [154]: ts.day_name()
Out[154]: 'Saturday'
# BusinessHour's valid offset dates are Monday through Friday
In [155]: offset = pd.offsets.BusinessHour(start="09:00")
# Bring the date to the closest offset date (Monday)
In [156]: offset.rollforward(ts)
Out[156]: Timestamp('2018-01-08 09:00:00')
# Date is brought to the closest offset date first and then the hour is added
In [157]: ts + offset
Out[157]: Timestamp('2018-01-08 10:00:00')
The above operation will automatically save the hour, minute and other information, if you want to set it to 00:00:00, you can call the normalize() method:
In [158]: ts = pd.Timestamp("2014-01-01 09:00")
In [159]: day = pd.offsets.Day()
In [160]: day.apply(ts)
Out[160]: Timestamp('2014-01-02 09:00:00')
In [161]: day.apply(ts).normalize()
Out[161]: Timestamp('2014-01-02 00:00:00')
In [162]: ts = pd.Timestamp("2014-01-01 22:00")
In [163]: hour = pd.offsets.Hour()
In [164]: hour.apply(ts)
Out[164]: Timestamp('2014-01-01 23:00:00')
In [165]: hour.apply(ts).normalize()
Out[165]: Timestamp('2014-01-01 00:00:00')
In [166]: hour.apply(pd.Timestamp("2014-01-01 23:30")).normalize()
Out[166]: Timestamp('2014-01-02 00:00:00')
As index
Time can be used as an index, and it has some convenient features when used as an index.
You can directly use time to get the corresponding data:
In [99]: ts["1/31/2011"]
Out[99]: 0.11920871129693428
In [100]: ts[datetime.datetime(2011, 12, 25):]
Out[100]:
2011-12-30 0.56702
Freq: BM, dtype: float64
In [101]: ts["10/31/2011":"12/31/2011"]
Out[101]:
2011-10-31 0.271860
2011-11-30 -0.424972
2011-12-30 0.567020
Freq: BM, dtype: float64
Get data for the whole year:
In [102]: ts["2011"]
Out[102]:
2011-01-31 0.119209
2011-02-28 -1.044236
2011-03-31 -0.861849
2011-04-29 -2.104569
2011-05-31 -0.494929
2011-06-30 1.071804
2011-07-29 0.721555
2011-08-31 -0.706771
2011-09-30 -1.039575
2011-10-31 0.271860
2011-11-30 -0.424972
2011-12-30 0.567020
Freq: BM, dtype: float64
Get data for a certain month:
In [103]: ts["2011-6"]
Out[103]:
2011-06-30 1.071804
Freq: BM, dtype: float64
DF can accept time as a parameter of loc:
In [105]: dft
Out[105]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
... ...
2013-03-11 10:35:00 -0.747967
2013-03-11 10:36:00 -0.034523
2013-03-11 10:37:00 -0.201754
2013-03-11 10:38:00 -1.509067
2013-03-11 10:39:00 -1.693043
[100000 rows x 1 columns]
In [106]: dft.loc["2013"]
Out[106]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
... ...
2013-03-11 10:35:00 -0.747967
2013-03-11 10:36:00 -0.034523
2013-03-11 10:37:00 -0.201754
2013-03-11 10:38:00 -1.509067
2013-03-11 10:39:00 -1.693043
[100000 rows x 1 columns]
Time slice:
In [107]: dft["2013-1":"2013-2"]
Out[107]:
A
2013-01-01 00:00:00 0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00 0.113648
2013-01-01 00:04:00 -1.478427
... ...
2013-02-28 23:55:00 0.850929
2013-02-28 23:56:00 0.976712
2013-02-28 23:57:00 -2.693884
2013-02-28 23:58:00 -1.575535
2013-02-28 23:59:00 -1.573517
[84960 rows x 1 columns]
Slice and exact match
Consider the following Series object with a precision of minutes:
In [120]: series_minute = pd.Series(
.....: [1, 2, 3],
.....: pd.DatetimeIndex(
.....: ["2011-12-31 23:59:00", "2012-01-01 00:00:00", "2012-01-01 00:02:00"]
.....: ),
.....: )
.....:
In [121]: series_minute.index.resolution
Out[121]: 'minute'
If the time precision is less than minutes, a Series object is returned:
In [122]: series_minute["2011-12-31 23"]
Out[122]:
2011-12-31 23:59:00 1
dtype: int64
If the time accuracy is greater than minutes, a constant is returned:
In [123]: series_minute["2011-12-31 23:59"]
Out[123]: 1
In [124]: series_minute["2011-12-31 23:59:00"]
Out[124]: 1
Similarly, if the precision is seconds, an object will be returned if it is less than seconds, and a constant value will be returned if it is equal to seconds.
Time series operations
Shifting
Use the shift method to move the time series accordingly:
In [275]: ts = pd.Series(range(len(rng)), index=rng)
In [276]: ts = ts[:5]
In [277]: ts.shift(1)
Out[277]:
2012-01-01 NaN
2012-01-02 0.0
2012-01-03 1.0
Freq: D, dtype: float64
By specifying freq, you can set the shift mode:
In [278]: ts.shift(5, freq="D")
Out[278]:
2012-01-06 0
2012-01-07 1
2012-01-08 2
Freq: D, dtype: int64
In [279]: ts.shift(5, freq=pd.offsets.BDay())
Out[279]:
2012-01-06 0
2012-01-09 1
2012-01-10 2
dtype: int64
In [280]: ts.shift(5, freq="BM")
Out[280]:
2012-05-31 0
2012-05-31 1
2012-05-31 2
dtype: int64
Frequency conversion
The frequency of the time series can be converted by calling the asfreq method:
In [281]: dr = pd.date_range("1/1/2010", periods=3, freq=3 * pd.offsets.BDay())
In [282]: ts = pd.Series(np.random.randn(3), index=dr)
In [283]: ts
Out[283]:
2010-01-01 1.494522
2010-01-06 -0.778425
2010-01-11 -0.253355
Freq: 3B, dtype: float64
In [284]: ts.asfreq(pd.offsets.BDay())
Out[284]:
2010-01-01 1.494522
2010-01-04 NaN
2010-01-05 NaN
2010-01-06 -0.778425
2010-01-07 NaN
2010-01-08 NaN
2010-01-11 -0.253355
Freq: B, dtype: float64
asfreq can also specify the filling method after modifying the frequency:
In [285]: ts.asfreq(pd.offsets.BDay(), method="pad")
Out[285]:
2010-01-01 1.494522
2010-01-04 1.494522
2010-01-05 1.494522
2010-01-06 -0.778425
2010-01-07 -0.778425
2010-01-08 -0.778425
2010-01-11 -0.253355
Freq: B, dtype: float64
Resampling
The given time series can be resampled by calling the resample method:
In [286]: rng = pd.date_range("1/1/2012", periods=100, freq="S")
In [287]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
In [288]: ts.resample("5Min").sum()
Out[288]:
2012-01-01 25103
Freq: 5T, dtype: int64
resample can accept various statistical methods, such as: sum
, mean
, std
, sem
, max
, min
, median
, first
, last
, ohlc
In [289]: ts.resample("5Min").mean()
Out[289]:
2012-01-01 251.03
Freq: 5T, dtype: float64
In [290]: ts.resample("5Min").ohlc()
Out[290]:
open high low close
2012-01-01 308 460 9 205
In [291]: ts.resample("5Min").max()
Out[291]:
2012-01-01 460
Freq: 5T, dtype: int64
This article has been included in http://www.flydean.com/15-python-pandas-time/
The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!
Welcome to pay attention to my official account: "Program those things", know technology, know you better!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。