Select the data from between two timestamp in python

Select the data from between two timestamp in python - python-3.x

My query is regrading getting the data, given two timestamp in python.
I need to have a input field, where i can enter the two timestamp, then from the CSV read, i need to retrieve for that particular input.
Actaul Data(CSV)
Daily_KWH_System PowerScout Temperature Timestamp Visibility Daily_electric_cost kW_System
0 4136.900384 P371602077 0 07/09/2016 23:58 0 180.657705 162.224216
1 3061.657187 P371602077 66 08/09/2016 23:59 10 133.693074 174.193804
2 4099.614033 P371602077 63 09/09/2016 05:58 10 179.029562 162.774013
3 3922.490275 P371602077 63 10/09/2016 11:58 10 171.297701 169.230047
4 3957.128982 P371602077 88 11/09/2016 17:58 10 172.806125 164.099307
Example:
Input:
start date : 2-1-2017
end date :10-1-2017
Output
Timestamp Value
2-1-2017 10
3-1-2017 35
.
.
.
.
10-1-2017 25
The original CSV would contain all the data
Timestamp Value
1-12-2016 10
2-12-2016 25
.
.
.
1-1-2017 15
2-1-2017 10
.
.
.
10-1-2017 25
.
.
31-1-2017 50

use pd.read_csv to read the file
df = pd.read_csv('my.csv', index_col='Timestamp', parse_dates=[0])
Then use your inputs to slice
df[start_date:end_date]

It seems you need dayfirst=True in read_csv with select by [] if all start and end dates are in df.index:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Timestamp;Value
1-12-2016;10
2-12-2016;25
1-1-2017;15
2-1-2017;10
10-1-2017;25
31-1-2017;50"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
#if necessary add sep
#index_col=[0] convert first column to index
#parse_dates=[0] parse first column to datetime
df = pd.read_csv(StringIO(temp), sep=";", index_col=[0], parse_dates=[0], dayfirst=True)
print (df)
Value
Timestamp
2016-12-01 10
2016-12-02 25
2017-01-01 15
2017-01-02 10
2017-01-10 25
2017-01-31 50
print (df.index.dtype)
datetime64[ns]
print (df.index)
DatetimeIndex(['2016-12-01', '2016-12-02', '2017-01-01', '2017-01-02',
'2017-01-10', '2017-01-31'],
dtype='datetime64[ns]', name='Timestamp', freq=None)
start_date = pd.to_datetime('2-1-2017', dayfirst=True)
end_date = pd.to_datetime('10-1-2017', dayfirst=True)
print (df[start_date:end_date])
Value
Timestamp
2017-01-02 10
2017-01-10 25
If some dates are not in index you need boolean indexing:
start_date = pd.to_datetime('3-1-2017', dayfirst=True)
end_date = pd.to_datetime('10-1-2017', dayfirst=True)
print (df[(df.index > start_date) & (df.index > end_date)])
Value
Timestamp
2017-01-31 50

Related

How to sum by month in timestamp Data Frame?

i have dataframe like this :
trx_date
trx_amount
2013-02-11
35
2014-03-10
26
2011-02-9
10
2013-02-12
5
2013-01-11
21
how do i filter that into month and year? so that i can sum the trx_amount
example expected output :
trx_monthly
trx_sum
2013-02
40
2013-01
21
2014-02
35

You can convert values to month periods by Series.dt.to_period and then aggregate sum:
df['trx_date'] = pd.to_datetime(df['trx_date'])
df1 = (df.groupby(df['trx_date'].dt.to_period('m').rename('trx_monthly'))['trx_amount']
.sum()
.reset_index(name='trx_sum'))
print (df1)
trx_monthly trx_sum
0 2011-02 10
1 2013-01 21
2 2013-02 40
3 2014-03 26
Or convert datetimes to strings in format YYYY-MM by Series.dt.strftime:
df2 = (df.groupby(df['trx_date'].dt.strftime('%Y-%m').rename('trx_monthly'))['trx_amount']
.sum()
.reset_index(name='trx_sum'))
print (df2)
trx_monthly trx_sum
0 2011-02 10
1 2013-01 21
2 2013-02 40
3 2014-03 26
Or convert to month and years, then output is different - 3 columns:
df2 = (df.groupby([df['trx_date'].dt.year.rename('year'),
df['trx_date'].dt.month.rename('month')])['trx_amount']
.sum()
.reset_index(name='trx_sum'))
print (df2)
year month trx_sum
0 2011 2 10
1 2013 1 21
2 2013 2 40
3 2014 3 26

You can try this -
df['trx_month'] = df['trx_date'].dt.month
df_agg = df.groupby('trx_month')['trx_sum'].sum()

Get the last date before an nth date for each month in Python

I am using a csv with an accumulative number that changes daily.
Day Accumulative Number
0 9/1/2020 100
1 11/1/2020 102
2 18/1/2020 98
3 11/2/2020 105
4 24/2/2020 95
5 6/3/2020 120
6 13/3/2020 100
I am now trying to find the best way to aggregate it and compare the monthly results before a specific date. So, I want to check the balance on the 11th of each month but for some months, there is no activity for the specific day. As a result, I trying to get the latest day before the 12th of each Month. So, the above would be:
Day Accumulative Number
0 11/1/2020 102
1 11/2/2020 105
2 6/3/2020 120
What I managed to do so far is to just get the latest day of each month:
dateparse = lambda x: pd.datetime.strptime(x, "%d/%m/%Y")
df = pd.read_csv("Accumulative.csv",quotechar="'", usecols=["Day","Accumulative Number"], index_col=False, parse_dates=["Day"], date_parser=dateparse, na_values=['.', '??'] )
df.index = df['Day']
grouped = df.groupby(pd.Grouper(freq='M')).sum()
print (df.groupby(df.index.month).apply(lambda x: x.iloc[-1]))
which returns:
Day Accumulative Number
1 2020-01-18 98
2 2020-02-24 95
3 2020-03-13 100
Is there a way to achieve this in Pandas, Python or do I have to use SQL logic in my script? Is there an easier way I am missing out in order to get the "balance" as per the 11th day of each month?

You can do groupby with factorize
n = 12
df = df.sort_values('Day')
m = df.groupby(df.Day.dt.strftime('%Y-%m')).Day.transform(lambda x :x.factorize()[0])==n
df_sub = df[m].copy()

You can try filtering the dataframe where the days are less than 12 , then take last of each group(grouped by month) :
df['Day'] = pd.to_datetime(df['Day'],dayfirst=True)
(df[df['Day'].dt.day.lt(12)]
.groupby([df['Day'].dt.year,df['Day'].dt.month],sort=False).last()
.reset_index(drop=True))
Day Accumulative_Number
0 2020-01-11 102
1 2020-02-11 105
2 2020-03-06 120

I would try:
# convert to datetime type:
df['Day'] = pd.to_datetime(df['Day'], dayfirst=True)
# select day before the 12th
new_df = df[df['Day'].dt.day < 12]
# select the last day in each month
new_df.loc[~new_df['Day'].dt.to_period('M').duplicated(keep='last')]
Output:
Day Accumulative Number
1 2020-01-11 102
3 2020-02-11 105
5 2020-03-06 120

Here's another way using expanding the date range:
# set as datetime
df2['Day'] = pd.to_datetime(df2['Day'], dayfirst=True)
# set as index
df2 = df2.set_index('Day')
# make a list of all dates
dates = pd.date_range(start=df2.index.min(), end=df2.index.max(), freq='1D')
# add dates
df2 = df2.reindex(dates)
# replace NA with forward fill
df2['Number'] = df2['Number'].ffill()
# filter to get output
df2 = df2[df2.index.day == 11].reset_index().rename(columns={'index': 'Date'})
print(df2)
Date Number
0 2020-01-11 102.0
1 2020-02-11 105.0
2 2020-03-11 120.0

how to take only maximum date value is there are two date in a week in dataframe

i have a dataframe called Data
Date Value Frequency
06/01/2020 256 A
07/01/2020 235 A
14/01/2020 85 Q
16/01/2020 625 Q
22/01/2020 125 Q
here it is observed that 6/01/2020 and 07/01/2020 are in the same week that is monday and tuesday.
Therefore i wanted to take maximum date from week.
my final dataframe should look like this
Date Value Frequency
07/01/2020 235 A
16/01/2020 625 Q
22/01/2020 125 Q
I want the maximum date from the week , like i have showed in my final dataframe example.
I am new to python, And i am searching answer for this which i didnt find till now ,Please help

First convert column to datetimes by to_datetime and use DataFrameGroupBy.idxmax for rows with maximum datetime per rows with Series.dt.strftime, last select rows by DataFrame.loc:
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
print (df['Date'].dt.strftime('%Y-%U'))
0 2020-01
1 2020-01
2 2020-02
3 2020-02
4 2020-03
Name: Date, dtype: object
df = df.loc[df.groupby(df['Date'].dt.strftime('%Y-%U'))['Date'].idxmax()]
print (df)
Date Value Frequency
1 2020-01-07 235 A
3 2020-01-16 625 Q
4 2020-01-22 125 Q
If format of datetimes cannot be changed:
d = pd.to_datetime(df['Date'], dayfirst=True)
df = df.loc[d.groupby(d.dt.strftime('%Y-%U')).idxmax()]
print (df)
Date Value Frequency
1 07/01/2020 235 A
3 16/01/2020 625 Q
4 22/01/2020 125 Q

Split dates into time ranges in pandas

14 [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15 [2017-07-26, 2017-06-09, 2017-02-24]
16 [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17 [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18 [2017-02-08]
this is my data, every ID has it's own dates that range between 2017-02-05 and 2018-06-30. I need to split dates into 5 time ranges of 4 months each, so that for the first 4 months every ID should have dates only in that time range (from 2017-02-05 to 2017-06-05), like this
14 [2017-03-06, 2017-02-13]
15 [2017-02-24]
16 [null] # or delete empty rows, it doesn't matter
17 [null]
18 [2017-02-08]
then for 2017-06-05 to 2017-10-05 and so on for every 4 month ranges. Also I can't use nested for loops because the data is too big. This is what I tried so far
months_4 = individual_dates.copy()
for _ in months_4['Date']:
_ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))
and
months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')
for _ in months_8['Date']:
_ = _[np.isin(_, range_8)]
achieved absolutely no result, data stays the same no matter what
update: I did what you said
individual_dates['Date'] = individual_dates['Date'].str.strip('[]').str.split(', ')
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())
})
df
and here is the result
Date ID
0 '2018-06-30T00:00:00.000000000' '2018-06-29T00... 14
1 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 15
2 '2018-03-14T00:00:00.000000000' '2018-03-13T00... 16
3 '2017-12-14T00:00:00.000000000' '2017-03-28T00... 17
4 '2017-05-30T00:00:00.000000000' '2017-05-22T00... 18
5 '2017-03-28T00:00:00.000000000' '2017-03-27T00... 19
6 '2017-03-27T00:00:00.000000000' '2017-03-26T00... 20
7 '2017-12-15T00:00:00.000000000' '2017-11-20T00... 21
8 '2017-07-05T00:00:00.000000000' '2017-07-04T00... 22
9 '2017-12-12T00:00:00.000000000' '2017-04-06T00... 23
10 '2017-05-21T00:00:00.000000000' '2017-05-07T00... 24

For better performance I suggest convert list to column - flatten it and then filtering by isin with boolean indexing:
from itertools import chain
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())),
'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})
range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'].isin(range_8)]
print (df)
Date ID
0 2017-03-06 14
0 2017-02-13 14
1 2017-02-24 15
4 2017-02-08 18

Python Subtracting two columns with date data, from csv to get number of weeks , months?

I have a csv in which I have two columns representing start date: st_dt and end date: 'end_dt` , I have to subtract these columns to get the number of weeks. I tried iterating through columns using pandas, but it seems my output is wrong.
st_dt end_dt
---------------------------------------
20100315 20100431

Use read_csv with parse_dates for datetimes and then after substract days:
df = pd.read_csv(file, parse_dates=[0,1])
print (df)
st_dt end_dt
0 2010-03-15 2010-04-30
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 2010-04-30 46
If some dates are wrong like 20100431 use to_datetime with parameter errors='coerce' for convert them to NaT:
df = pd.read_csv(file)
print (df)
st_dt end_dt
0 20100315 20100431
1 20100315 20100430
df['st_dt'] = pd.to_datetime(df['st_dt'], errors='coerce', format='%Y%m%d')
df['end_dt'] = pd.to_datetime(df['end_dt'], errors='coerce', format='%Y%m%d')
df['diff'] = (df['end_dt'] - df['st_dt']).dt.days
print (df)
st_dt end_dt diff
0 2010-03-15 NaT NaN
1 2010-03-15 2010-04-30 46.0

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Select the data from between two timestamp in python - python-3.x

use pd.read_csv to read the file df = pd.read_csv('my.csv', index_col='Timestamp', parse_dates=[0]) Then use your inputs to slice df[start_date:end_date]

Related

How to sum by month in timestamp Data Frame?

Get the last date before an nth date for each month in Python

how to take only maximum date value is there are two date in a week in dataframe

Split dates into time ranges in pandas

Python Subtracting two columns with date data, from csv to get number of weeks , months?

Categories

Resources