Introduction

Time should be a data type often used in data processing. In addition to the two data types of datetime64 and timedelta64 in Numpy, pandas also integrates functions in other python libraries such as scikits.timeseries .

Time classification

There are four time types in pandas:

  1. Date times: Date and time, with time zone. datetime.datetime similar to 06168f5c5ba20e in the standard library.
  2. Time deltas: Absolute duration, similar to datetime.timedelta
  3. Time spans: The time span defined by the time point and its associated frequency.
  4. Date offsets: The time calculated based on the calendar is similar to dateutil.relativedelta.relativedelta.

We use a table to represent:

typeScalar classArray classpandas data typeMain creation method
Date timesTimestampDatetimeIndexdatetime64[ns] or datetime64[ns, tz]to_datetime or date_range
Time deltasTimedeltaTimedeltaIndextimedelta64[ns]to_timedelta or timedelta_range
Time spansPeriodPeriodIndexperiod[freq]Period or period_range
Date offsetsDateOffsetNoneNoneDateOffset

Look at an example of use:

In [19]: pd.Series(range(3), index=pd.date_range("2000", freq="D", periods=3))
Out[19]: 
2000-01-01    0
2000-01-02    1
2000-01-03    2
Freq: D, dtype: int64

Take a look at the null value of the above data type:

In [24]: pd.Timestamp(pd.NaT)
Out[24]: NaT

In [25]: pd.Timedelta(pd.NaT)
Out[25]: NaT

In [26]: pd.Period(pd.NaT)
Out[26]: NaT

# Equality acts as np.nan would
In [27]: pd.NaT == pd.NaT
Out[27]: False

Timestamp

Timestamp is the most basic time type, we can create it like this:

In [28]: pd.Timestamp(datetime.datetime(2012, 5, 1))
Out[28]: Timestamp('2012-05-01 00:00:00')

In [29]: pd.Timestamp("2012-05-01")
Out[29]: Timestamp('2012-05-01 00:00:00')

In [30]: pd.Timestamp(2012, 5, 1)
Out[30]: Timestamp('2012-05-01 00:00:00')

DatetimeIndex

Timestamp as an index will automatically be converted to DatetimeIndex:

In [33]: dates = [
   ....:     pd.Timestamp("2012-05-01"),
   ....:     pd.Timestamp("2012-05-02"),
   ....:     pd.Timestamp("2012-05-03"),
   ....: ]
   ....: 

In [34]: ts = pd.Series(np.random.randn(3), dates)

In [35]: type(ts.index)
Out[35]: pandas.core.indexes.datetimes.DatetimeIndex

In [36]: ts.index
Out[36]: DatetimeIndex(['2012-05-01', '2012-05-02', '2012-05-03'], dtype='datetime64[ns]', freq=None)

In [37]: ts
Out[37]: 
2012-05-01    0.469112
2012-05-02   -0.282863
2012-05-03   -1.509059
dtype: float64

date_range and bdate_range

You can also use date_range to create a DatetimeIndex:

In [74]: start = datetime.datetime(2011, 1, 1)

In [75]: end = datetime.datetime(2012, 1, 1)

In [76]: index = pd.date_range(start, end)

In [77]: index
Out[77]: 
DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04',
               '2011-01-05', '2011-01-06', '2011-01-07', '2011-01-08',
               '2011-01-09', '2011-01-10',
               ...
               '2011-12-23', '2011-12-24', '2011-12-25', '2011-12-26',
               '2011-12-27', '2011-12-28', '2011-12-29', '2011-12-30',
               '2011-12-31', '2012-01-01'],
              dtype='datetime64[ns]', length=366, freq='D')

date_range is the calendar range, bdate_range is the working day range:

In [78]: index = pd.bdate_range(start, end)

In [79]: index
Out[79]: 
DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06',
               '2011-01-07', '2011-01-10', '2011-01-11', '2011-01-12',
               '2011-01-13', '2011-01-14',
               ...
               '2011-12-19', '2011-12-20', '2011-12-21', '2011-12-22',
               '2011-12-23', '2011-12-26', '2011-12-27', '2011-12-28',
               '2011-12-29', '2011-12-30'],
              dtype='datetime64[ns]', length=260, freq='B')

Both methods can take start , end , and periods parameters.

In [84]: pd.bdate_range(end=end, periods=20)
In [83]: pd.date_range(start, end, freq="W")
In [86]: pd.date_range("2018-01-01", "2018-01-05", periods=5)

origin

Using the origin parameter, you can modify the starting point of DatetimeIndex

In [67]: pd.to_datetime([1, 2, 3], unit="D", origin=pd.Timestamp("1960-01-01"))
Out[67]: DatetimeIndex(['1960-01-02', '1960-01-03', '1960-01-04'], dtype='datetime64[ns]', freq=None)

By default, origin='unix' , that is, the starting point is 1970-01-01 00:00:00 .

In [68]: pd.to_datetime([1, 2, 3], unit="D")
Out[68]: DatetimeIndex(['1970-01-02', '1970-01-03', '1970-01-04'], dtype='datetime64[ns]', freq=None)

format

Use the format parameter to format the time:

In [51]: pd.to_datetime("2010/11/12", format="%Y/%m/%d")
Out[51]: Timestamp('2010-11-12 00:00:00')

In [52]: pd.to_datetime("12-11-2010 00:00", format="%d-%m-%Y %H:%M")
Out[52]: Timestamp('2010-11-12 00:00:00')

Period

Period represents a time span, usually used with freq:

In [31]: pd.Period("2011-01")
Out[31]: Period('2011-01', 'M')

In [32]: pd.Period("2012-05", freq="D")
Out[32]: Period('2012-05-01', 'D')

Period can be calculated directly:

In [345]: p = pd.Period("2012", freq="A-DEC")

In [346]: p + 1
Out[346]: Period('2013', 'A-DEC')

In [347]: p - 3
Out[347]: Period('2009', 'A-DEC')

In [348]: p = pd.Period("2012-01", freq="2M")

In [349]: p + 2
Out[349]: Period('2012-05', '2M')

In [350]: p - 1
Out[350]: Period('2011-11', '2M')
Note that Period can only perform arithmetic operations with the same freq. Including offsets and timedelta
In [352]: p = pd.Period("2014-07-01 09:00", freq="H")

In [353]: p + pd.offsets.Hour(2)
Out[353]: Period('2014-07-01 11:00', 'H')

In [354]: p + datetime.timedelta(minutes=120)
Out[354]: Period('2014-07-01 11:00', 'H')

In [355]: p + np.timedelta64(7200, "s")
Out[355]: Period('2014-07-01 11:00', 'H')

Period as index can be automatically converted to PeriodIndex:

In [38]: periods = [pd.Period("2012-01"), pd.Period("2012-02"), pd.Period("2012-03")]

In [39]: ts = pd.Series(np.random.randn(3), periods)

In [40]: type(ts.index)
Out[40]: pandas.core.indexes.period.PeriodIndex

In [41]: ts.index
Out[41]: PeriodIndex(['2012-01', '2012-02', '2012-03'], dtype='period[M]', freq='M')

In [42]: ts
Out[42]: 
2012-01   -1.135632
2012-02    1.212112
2012-03   -0.173215
Freq: M, dtype: float64

The PeriodIndex can be created through the pd.period_range method:

In [359]: prng = pd.period_range("1/1/2011", "1/1/2012", freq="M")

In [360]: prng
Out[360]: 
PeriodIndex(['2011-01', '2011-02', '2011-03', '2011-04', '2011-05', '2011-06',
             '2011-07', '2011-08', '2011-09', '2011-10', '2011-11', '2011-12',
             '2012-01'],
            dtype='period[M]', freq='M')

It can also be created directly through PeriodIndex:

In [361]: pd.PeriodIndex(["2011-1", "2011-2", "2011-3"], freq="M")
Out[361]: PeriodIndex(['2011-01', '2011-02', '2011-03'], dtype='period[M]', freq='M')

DateOffset

DateOffset represents the frequency object. It is very similar to Timedelta in that it represents a duration, but has special calendar rules. For example, Timedelta must be 24 hours a day, while in DateOffset, depending on the daylight saving time, there may be 23, 24, or 25 hours a day.

# This particular day contains a day light savings time transition
In [144]: ts = pd.Timestamp("2016-10-30 00:00:00", tz="Europe/Helsinki")

# Respects absolute time
In [145]: ts + pd.Timedelta(days=1)
Out[145]: Timestamp('2016-10-30 23:00:00+0200', tz='Europe/Helsinki')

# Respects calendar time
In [146]: ts + pd.DateOffset(days=1)
Out[146]: Timestamp('2016-10-31 00:00:00+0200', tz='Europe/Helsinki')

In [147]: friday = pd.Timestamp("2018-01-05")

In [148]: friday.day_name()
Out[148]: 'Friday'

# Add 2 business days (Friday --> Tuesday)
In [149]: two_business_days = 2 * pd.offsets.BDay()

In [150]: two_business_days.apply(friday)
Out[150]: Timestamp('2018-01-09 00:00:00')

In [151]: friday + two_business_days
Out[151]: Timestamp('2018-01-09 00:00:00')

In [152]: (friday + two_business_days).day_name()
Out[152]: 'Tuesday'

DateOffsets and Frequency calculations are first off, take a look at the available Date Offset and its associated Frequency :

Date OffsetFrequency Stringdescribe
DateOffsetNoneGeneral offset class
BDay or BusinessDay'B'Working day
CDay or CustomBusinessDay'C'Custom working day
Week'W'A week
WeekOfMonth'WOM'The first few days of each month
LastWeekOfMonth'LWOM'The day of the last week of each month
MonthEnd'M'Calendar month end
MonthBegin'MS'Beginning of calendar month
BMonthEnd or BusinessMonthEnd'BM'End of business month
BMonthBegin or BusinessMonthBegin'BMS'Beginning of business month
CBMonthEnd or CustomBusinessMonthEnd'CBM'Customize the end of the business month
CBMonthBegin or CustomBusinessMonthBegin'CBMS'Customize the beginning of the business month
SemiMonthEnd'SM'15th day at the end of the calendar month
SemiMonthBegin'SMS'15th day of the beginning of the calendar month
QuarterEnd'Q'Calendar season end
QuarterBegin'QS'Beginning of calendar season
BQuarterEnd'BQEnd of working season
BQuarterBegin'BQS'Beginning of the working season
FY5253Quarter'REQ'Retail season (52-53 week)
YearEnd'A'Calendar year end
YearBegin'AS' or 'BYS'The beginning of the calendar
BYearEnd'BA'End of business year
BYearBegin'BAS'Beginning of business
FY5253'RE'Retail year (aka 52-53 week)
EasterNoneEaster Vacation
BusinessHour'BH'business hour
CustomBusinessHour'CBH'custom business hour
Day'D'Absolute time of day
Hour'H'One hour
Minute'T' or 'min'One minute
Second'S'one second
Milli'L' or 'ms'A subtle
Micro'U' or 'us'One millisecond
Nano'N'One nanosecond

DateOffset also has two methods rollforward() and rollback() to move the time:

In [153]: ts = pd.Timestamp("2018-01-06 00:00:00")

In [154]: ts.day_name()
Out[154]: 'Saturday'

# BusinessHour's valid offset dates are Monday through Friday
In [155]: offset = pd.offsets.BusinessHour(start="09:00")

# Bring the date to the closest offset date (Monday)
In [156]: offset.rollforward(ts)
Out[156]: Timestamp('2018-01-08 09:00:00')

# Date is brought to the closest offset date first and then the hour is added
In [157]: ts + offset
Out[157]: Timestamp('2018-01-08 10:00:00')

The above operation will automatically save the hour, minute and other information, if you want to set it to 00:00:00, you can call the normalize() method:

In [158]: ts = pd.Timestamp("2014-01-01 09:00")

In [159]: day = pd.offsets.Day()

In [160]: day.apply(ts)
Out[160]: Timestamp('2014-01-02 09:00:00')

In [161]: day.apply(ts).normalize()
Out[161]: Timestamp('2014-01-02 00:00:00')

In [162]: ts = pd.Timestamp("2014-01-01 22:00")

In [163]: hour = pd.offsets.Hour()

In [164]: hour.apply(ts)
Out[164]: Timestamp('2014-01-01 23:00:00')

In [165]: hour.apply(ts).normalize()
Out[165]: Timestamp('2014-01-01 00:00:00')

In [166]: hour.apply(pd.Timestamp("2014-01-01 23:30")).normalize()
Out[166]: Timestamp('2014-01-02 00:00:00')

As index

Time can be used as an index, and it has some convenient features when used as an index.

You can directly use time to get the corresponding data:

In [99]: ts["1/31/2011"]
Out[99]: 0.11920871129693428

In [100]: ts[datetime.datetime(2011, 12, 25):]
Out[100]: 
2011-12-30    0.56702
Freq: BM, dtype: float64

In [101]: ts["10/31/2011":"12/31/2011"]
Out[101]: 
2011-10-31    0.271860
2011-11-30   -0.424972
2011-12-30    0.567020
Freq: BM, dtype: float64

Get data for the whole year:

In [102]: ts["2011"]
Out[102]: 
2011-01-31    0.119209
2011-02-28   -1.044236
2011-03-31   -0.861849
2011-04-29   -2.104569
2011-05-31   -0.494929
2011-06-30    1.071804
2011-07-29    0.721555
2011-08-31   -0.706771
2011-09-30   -1.039575
2011-10-31    0.271860
2011-11-30   -0.424972
2011-12-30    0.567020
Freq: BM, dtype: float64

Get data for a certain month:

In [103]: ts["2011-6"]
Out[103]: 
2011-06-30    1.071804
Freq: BM, dtype: float64

DF can accept time as a parameter of loc:

In [105]: dft
Out[105]: 
                            A
2013-01-01 00:00:00  0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00  0.113648
2013-01-01 00:04:00 -1.478427
...                       ...
2013-03-11 10:35:00 -0.747967
2013-03-11 10:36:00 -0.034523
2013-03-11 10:37:00 -0.201754
2013-03-11 10:38:00 -1.509067
2013-03-11 10:39:00 -1.693043

[100000 rows x 1 columns]

In [106]: dft.loc["2013"]
Out[106]: 
                            A
2013-01-01 00:00:00  0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00  0.113648
2013-01-01 00:04:00 -1.478427
...                       ...
2013-03-11 10:35:00 -0.747967
2013-03-11 10:36:00 -0.034523
2013-03-11 10:37:00 -0.201754
2013-03-11 10:38:00 -1.509067
2013-03-11 10:39:00 -1.693043

[100000 rows x 1 columns]

Time slice:

In [107]: dft["2013-1":"2013-2"]
Out[107]: 
                            A
2013-01-01 00:00:00  0.276232
2013-01-01 00:01:00 -1.087401
2013-01-01 00:02:00 -0.673690
2013-01-01 00:03:00  0.113648
2013-01-01 00:04:00 -1.478427
...                       ...
2013-02-28 23:55:00  0.850929
2013-02-28 23:56:00  0.976712
2013-02-28 23:57:00 -2.693884
2013-02-28 23:58:00 -1.575535
2013-02-28 23:59:00 -1.573517

[84960 rows x 1 columns]

Slice and exact match

Consider the following Series object with a precision of minutes:

In [120]: series_minute = pd.Series(
   .....:     [1, 2, 3],
   .....:     pd.DatetimeIndex(
   .....:         ["2011-12-31 23:59:00", "2012-01-01 00:00:00", "2012-01-01 00:02:00"]
   .....:     ),
   .....: )
   .....: 

In [121]: series_minute.index.resolution
Out[121]: 'minute'

If the time precision is less than minutes, a Series object is returned:

In [122]: series_minute["2011-12-31 23"]
Out[122]: 
2011-12-31 23:59:00    1
dtype: int64

If the time accuracy is greater than minutes, a constant is returned:

In [123]: series_minute["2011-12-31 23:59"]
Out[123]: 1

In [124]: series_minute["2011-12-31 23:59:00"]
Out[124]: 1

Similarly, if the precision is seconds, an object will be returned if it is less than seconds, and a constant value will be returned if it is equal to seconds.

Time series operations

Shifting

Use the shift method to move the time series accordingly:

In [275]: ts = pd.Series(range(len(rng)), index=rng)

In [276]: ts = ts[:5]

In [277]: ts.shift(1)
Out[277]: 
2012-01-01    NaN
2012-01-02    0.0
2012-01-03    1.0
Freq: D, dtype: float64

By specifying freq, you can set the shift mode:

In [278]: ts.shift(5, freq="D")
Out[278]: 
2012-01-06    0
2012-01-07    1
2012-01-08    2
Freq: D, dtype: int64

In [279]: ts.shift(5, freq=pd.offsets.BDay())
Out[279]: 
2012-01-06    0
2012-01-09    1
2012-01-10    2
dtype: int64

In [280]: ts.shift(5, freq="BM")
Out[280]: 
2012-05-31    0
2012-05-31    1
2012-05-31    2
dtype: int64

Frequency conversion

The frequency of the time series can be converted by calling the asfreq method:

In [281]: dr = pd.date_range("1/1/2010", periods=3, freq=3 * pd.offsets.BDay())

In [282]: ts = pd.Series(np.random.randn(3), index=dr)

In [283]: ts
Out[283]: 
2010-01-01    1.494522
2010-01-06   -0.778425
2010-01-11   -0.253355
Freq: 3B, dtype: float64

In [284]: ts.asfreq(pd.offsets.BDay())
Out[284]: 
2010-01-01    1.494522
2010-01-04         NaN
2010-01-05         NaN
2010-01-06   -0.778425
2010-01-07         NaN
2010-01-08         NaN
2010-01-11   -0.253355
Freq: B, dtype: float64

asfreq can also specify the filling method after modifying the frequency:

In [285]: ts.asfreq(pd.offsets.BDay(), method="pad")
Out[285]: 
2010-01-01    1.494522
2010-01-04    1.494522
2010-01-05    1.494522
2010-01-06   -0.778425
2010-01-07   -0.778425
2010-01-08   -0.778425
2010-01-11   -0.253355
Freq: B, dtype: float64

Resampling

The given time series can be resampled by calling the resample method:

In [286]: rng = pd.date_range("1/1/2012", periods=100, freq="S")

In [287]: ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)

In [288]: ts.resample("5Min").sum()
Out[288]: 
2012-01-01    25103
Freq: 5T, dtype: int64

resample can accept various statistical methods, such as: sum , mean , std , sem , max , min , median , first , last , ohlc

In [289]: ts.resample("5Min").mean()
Out[289]: 
2012-01-01    251.03
Freq: 5T, dtype: float64

In [290]: ts.resample("5Min").ohlc()
Out[290]: 
            open  high  low  close
2012-01-01   308   460    9    205

In [291]: ts.resample("5Min").max()
Out[291]: 
2012-01-01    460
Freq: 5T, dtype: int64

This article has been included in http://www.flydean.com/15-python-pandas-time/

The most popular interpretation, the most profound dry goods, the most concise tutorial, and many tips you don't know are waiting for you to discover!

Welcome to pay attention to my official account: "Program those things", know technology, know you better!


flydean
890 声望437 粉丝

欢迎访问我的个人网站:www.flydean.com