Rolling pivot table in python
I have a df (sample is pasted here at the end). I am looking to find which tradePrice
had the most tradeVolume
per last 10 min, or any other rolling period.
This is the pivot table done in xls which is based on the sample data attached.
minute
tradePrice Data 0 1 10 Total Result
12548 Sum - tradeVolume 3 3
Count - tradePrice 2 2
12548.5 Sum - tradeVolume 1 1
Count - tradePrice 1 1
12549 Sum - tradeVolume 1 1
Count - tradePrice 1 1
12549.5 Sum - tradeVolume 95 95
Count - tradePrice 5 5
12550 Sum - tradeVolume 6 6
Count - tradePrice 4 4
12559 Sum - tradeVolume 93 93
Count - tradePrice 1 1
12559.5 Sum - tradeVolume 1 1
Count - tradePrice 1 1
12560 Sum - tradeVolume 5 5
Count - tradePrice 4 4
12560.5 Sum - tradeVolume 3 5 8
Count - tradePrice 3 2 5
12561 Sum - tradeVolume 4 5 9
Count - tradePrice 2 3 5
12561.5 Sum - tradeVolume 3 2 5
Count - tradePrice 3 1 4
12562 Sum - tradeVolume 9 7 16
Count - tradePrice 8 1 9
12562.5 Sum - tradeVolume 6 2 8
Count - tradePrice 2 2 4
12563 Sum - tradeVolume 2 2
Count - tradePrice 1 1
Total Sum - tradeVolume 120 27 106 253
Total Count - tradePrice 20 14 13 47
The result need to be a df something like this searching for the price with the most volume traded):
Price Volume
02:00:00 AM 12559 93
02:01:00 AM 12562 7
02:10:00 AM 12549.5 95
In order to get 1 min. results I groupby & applied the following function
def f(x): # function to find the POC price and volume
a = x['tradePrice'].value_counts().index[0]
b = x.loc[x['tradePrice'] == a, 'tradeVolume'].sum()
return pd.Series([a, b], ['POC_Price', 'POC_Volume'])
groupbytime = (str(Time)+"min")#ther is a column name by this
groups = df.groupby(groupbytime,as_index=True)
df_POC = groups.apply(f) #applys the function of the POC on the grouped data
My question is: How can I get the same solution but per rolling time period (which can not be less than 1 min) so the expected result (which price was traded with the max volume) for the last 10 minutes is:
Price Volume
02:10:00 AM 12549.5 95
thanks in advance!
The sample data:
dateTime tradePrice tradeVolume 1min time_of_day_10 time_of_day_30 date hour minute
0 2017-09-19 02:00:04 12559 93 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
49 2017-09-19 02:00:11 12562 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
50 2017-09-19 02:00:12 12563 2 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
51 2017-09-19 02:00:12 12562 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
122 2017-09-19 02:00:34 12561.5 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
123 2017-09-19 02:00:34 12562 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
127 2017-09-19 02:00:34 12562 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
129 2017-09-19 02:00:35 12561 2 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
130 2017-09-19 02:00:35 12560.5 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
131 2017-09-19 02:00:35 12561.5 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
135 2017-09-19 02:00:39 12562 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
136 2017-09-19 02:00:39 12562 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
137 2017-09-19 02:00:43 12561.5 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
138 2017-09-19 02:00:43 12561 2 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
139 2017-09-19 02:00:43 12560.5 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
140 2017-09-19 02:00:43 12560.5 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
152 2017-09-19 02:00:45 12562 2 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
153 2017-09-19 02:00:46 12562.5 4 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
166 2017-09-19 02:00:58 12562 1 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
167 2017-09-19 02:00:58 12562.5 2 2017-09-19 02:00:00 02:00:00 02:00:00 2017-09-19 2 0
168 2017-09-19 02:01:00 12562 7 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
169 2017-09-19 02:01:00 12562.5 1 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
170 2017-09-19 02:01:00 12562.5 1 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
171 2017-09-19 02:01:00 12561.5 2 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
175 2017-09-19 02:01:03 12561 1 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
176 2017-09-19 02:01:03 12561 3 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
187 2017-09-19 02:01:07 12560.5 2 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
188 2017-09-19 02:01:08 12561 1 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
189 2017-09-19 02:01:10 12560 1 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
190 2017-09-19 02:01:10 12560 1 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
191 2017-09-19 02:01:10 12559.5 1 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
192 2017-09-19 02:01:11 12560 1 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
193 2017-09-19 02:01:12 12560 2 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
194 2017-09-19 02:01:12 12560.5 3 2017-09-19 02:01:00 02:00:00 02:00:00 2017-09-19 2 1
593 2017-09-19 02:10:00 12550 1 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
594 2017-09-19 02:10:00 12549.5 12 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
604 2017-09-19 02:10:12 12548.5 1 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
605 2017-09-19 02:10:15 12549.5 22 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
606 2017-09-19 02:10:16 12549.5 21 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
636 2017-09-19 02:10:45 12548 1 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
637 2017-09-19 02:10:47 12548 2 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
638 2017-09-19 02:10:47 12549.5 23 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
639 2017-09-19 02:10:48 12549.5 17 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
640 2017-09-19 02:10:49 12549 1 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
665 2017-09-19 02:10:58 12550 1 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
666 2017-09-19 02:10:58 12550 1 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
667 2017-09-19 02:10:58 12550 3 2017-09-19 02:10:00 02:10:00 02:00:00 2017-09-19 2 10
If i understand your problem correctly you need to chose a time granuality and a time window. Then you can do this with a combination of groupby + unstack + rolling.
First the groupby:
time_grain = '1min'
df = df.groupby([pd.Grouper(key='dateTime', freq=time_grain),'tradePrice']).tradeVolume.sum()
dateTime tradePrice
2017-09-19 02:00:00 12559.0 93
12560.5 3
12561.0 4
12561.5 3
12562.0 9
12562.5 6
12563.0 2
2017-09-19 02:01:00 12559.5 1
12560.0 5
12560.5 5
12561.0 5
12561.5 2
12562.0 7
12562.5 2
2017-09-19 02:10:00 12548.0 3
12548.5 1
12549.0 1
12549.5 95
12550.0 6
Name: tradeVolume, dtype: int64
Then unstack + rolling:
window_size = '10min'
df = df.unstack('tradePrice').rolling(window_size).sum()
tradePrice 12548.0 12548.5 12549.0 12549.5 12550.0 12559.0
dateTime
2017-09-19 02:00:00 NaN NaN NaN NaN NaN 93.0
2017-09-19 02:01:00 NaN NaN NaN NaN NaN 93.0
2017-09-19 02:10:00 3.0 1.0 1.0 95.0 6.0 NaN
tradePrice 12559.5 12560.0 12560.5 12561.0 12561.5 12562.0
dateTime
2017-09-19 02:00:00 NaN NaN 3.0 4.0 3.0 9.0
2017-09-19 02:01:00 1.0 5.0 8.0 9.0 5.0 16.0
2017-09-19 02:10:00 1.0 5.0 5.0 5.0 2.0 7.0
tradePrice 12562.5 12563.0
dateTime
2017-09-19 02:00:00 6.0 2.0
2017-09-19 02:01:00 8.0 2.0
2017-09-19 02:10:00 2.0 NaN
Finally stack back tradePrice into the index and find the index with highest value for every time period:
df = df.stack('tradePrice')
idx_list = df.groupby('dateTime').idxmax()
result = df.loc[idx_list]
dateTime tradePrice
2017-09-19 02:00:00 12559.0 93.0
2017-09-19 02:01:00 12559.0 93.0
2017-09-19 02:10:00 12549.5 95.0
dtype: float64
Note that rolling defaults to a min number of observations of 1 if you use it with a time offset. Thats why you get 3 result rows.
I think the biggest drawback of this approach is that for a big dataframe with a high number of price points this will take a lot of memory (Because for every price point a new column is generated).
链接地址: http://www.djcxy.com/p/41432.html下一篇: 在Python中滚动数据透视表