regarding selected certain rows based on a given requirements into another dataframe

regarding selected certain rows based on a given requirements into another dataframe - python-3.x

I have read the csv file into a dataframe using Pandas, the csv format is as follows. I would like to put the rows whose “time column information” is between the interval of 6/3/2011-10/20/2011 into another dataframe. How can I do it efficiently in Pandas?

Try this method:
data_frame['time'] = pd.to_datetime(data_frame['time'])
select_rows = (data_frame['time'] > start_date) & (data_frame['time'] <= end_date)
data_frame.loc[select_rows]
Or, you can make time column date time index and then select rows based on that as well.

I think you need to_datetime first and then filter by between with boolean indexing:
df['time'] = pd.to_datetime(df['time'], format='%m/%d/%Y')
df1 = df[df['time'].between('2011-06-03','2011-10-20')]
Create DatetimeIndex and select by loc:
df['time'] = pd.to_datetime(df['time'], format='%m/%d/%Y')
df = df.set_index('time')
df1 = df.loc['2011-06-03':'2011-10-20']

Related

How to ungroup Column groups and covert them into rows using pandas?

I have the following table from downloading stock data downloaded for multiple stocks. I used the following code
i = ['NTPC.NS', 'GAIL.NS']
stock = yf.download(tickers=i, start='2021-01-11', end='2021-03-10', interval = '5m', group_by = 'tickers')
The output dataframe looks like this
But I want the output to be like this

Use DataFrame.stack by first level, then rename index names and convert last level of MultiIndex to column by DataFrame.reset_index:
df = stock.stack(level=0).rename_axis(['Datetime','stockname']).reset_index(level=-1)
#if necessary change order of columns
df = df[df.columns.tolist()[1:] + df.columns.tolist()[:1]]

How do filter date time column according to time ranges specified in two column?

In one excel File that I have filtered-eda data, I want to filter such data according to my second excel file by using two columns; StartTime and EndTime; as a time range
(time column types in two excel datetime64[ns])
you can see my two excel files at picture
enter image description here
My code is
df1 = pd.read_excel(filename_1)`
df2 = pd.read_excel(filename_2, usecols= "A,C")
df3 = df1[df1['BinaryLabels'] == 1]
df2 = df2[(df3["StartTime"] <= df2.Time) & (df2.Time <= df3["EndTime"])]
print(df2)
and get error as :ValueError: Can only compare identically-labeled Series objects
How can I solve it?
Thanks for advance..

If geenral DataFrames with different number of rows you have to merge with cross join before filtering:
df1 = pd.read_excel(filename_1)
df2 = pd.read_excel(filename_2, usecols= "A,C")
df3 = df1[df1['BinaryLabels'] == 1]
df = df2.assign(a=1).merge(df3.assign(a=1), on='a', how='outer')
df = df[(df["StartTime"] <= df.Time) & (df.Time <= df["EndTime"])]

Want to copy selected column from old dataframe to a new dataframe column

I have a dataframe named new_df and would like to create a new data and copy column "Close" to new dataframe to column named "Col1". I would then open another dataframe named new_df and copy "Close" to Column named "Col2" of the new dataframe already created.
It is imporantant to note that when importing column that the data column may vary in lenghth, meaning first column import may have 30 records and second column import may have 32 records.
df = pd.read_csv('RIO.L.csv',parse_dates=True)
df['Date_1'] = pd.to_datetime(df['Date'], format= '%d/%m/%Y')
df['Year'] = pd.DatetimeIndex(df['Date_1']).year
df['Month'] = pd.DatetimeIndex(df['Date_1']).month
df['Day'] = pd.DatetimeIndex(df['Date_1']).day
df.sort_values(by=['Month','Year','Day'], inplace=True)
m_Year_Select = 2019
m_Month_Select = 5
v_data_select = (df['Year'] <= m_Year_Select) & (df['Month'] == m_month_Select)
new_df = df.loc[v_data_select]
print(new_df)

I used the pd.concat()
result = pd.concat([result, df_2000['Close']], axis=1, sort=False, join='outer')
Problem solved

How to replace the null valued column with the values store in the list with the corresponding indexes in CSV using python and pandas?

Check if the value of the cell in some specific row of CLOSE DATE is blank then proceed with formul of adding 3 days to the SOLVED DATE and update the value of the cell
I'm using pandas library and jupyter Notebook as my text editor.
d is the object of my csv file
for index, row in d.iterrows():
startdate = row["SOLVED DATE"]
print(index, startdate)
enddate = pd.to_datetime(startdate) + pd.DateOffset(days=3)
row["CLOSE DATE"]=enddate
#d.iloc[index,10]=enddate
l1.append(enddate)
L1 is the list that contains the values in datetime format
and i need to replace the values of the column named "CLOSE DATE" with the values of the L1 and update my csv file accordingly

Welcome to the Stackoverflow Community!
Iterrows() is usually a slow method and should be avoided in most cases. There are a few ways we can do your task.
Making Two Dataframes = Null DF & Not Null DF and imputing values in the Null DF then merging the two
Imputing values in the Null Df itself.
As a supplementary on the logic of adding the updated date column. It is as follows.
Let's first take the "SOLVED DATE" and store it in a new series,
let's call it "new_date".
Let's Modify the "new_date" by adding 3 days.
Once done set this "new_date" as the value of the column you want to be updated.
In terms of code
# 1st Method
import pandas as pd
null = d.loc[d['CLOSE DATE'].isna() == True]
not_null = d.loc[d['CLOSE DATE'].isna() != True]
new_date = null['SOLVED DATE]
new_date = pd.to_datetime(new_date) + pd.DateOffset(days=3)
null['CLOSE DATE'] = new_date
d = pd.concat([null not_null], axis = 0)
d = d.reset_index(drop = True)
# 2nd Method
import pandas as pd
new_date = d.loc[d['CLOSE DATE'].isna() == True,'SOLVED DATE]
new_date = pd.to_datetime(new_date) + pd.DateOffset(days=3)
d['CLOSE DATE'] = d['CLOSE DATE'].fillna(new_date)

grab substring in pandas series

I have a dataframe df with X columns.
I want to fill df['date'] and df['time'] with a substring located inside the column df['job.filename'].
I tried to convert the Series into list and then grab list[x:y]=date and also
for i,row in df.iterrows():
df.set_value(i,'time',row['job.filename'][-10:-4])
df.set_value(i,'date',row['job.filename'][21:27])
But this didn't work
Cheers

I took your sample job.filename to create a dataframe and tried the following:
df = pd.DataFrame(['IMAT list 1-3609-0-20161214-092934.csv'])
df['date'] = df[0].str.extract('.*-\d+-(\d+)-\d+') #0 is the column name, in your case job.filename
df['time'] = df[0].str.extract('.*-\d+-\d+-(\d+)')
You get:
0 date time
0 IMAT list 1-3609-0-20161214-092934.csv 20161214 092934
This regex will work only if all the values follow the exact pattern

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

regarding selected certain rows based on a given requirements into another dataframe - python-3.x

I have read the csv file into a dataframe using Pandas, the csv format is as follows. I would like to put the rows whose “time column information” is between the interval of 6/3/2011-10/20/2011 into another dataframe. How can I do it efficiently in Pandas?

Try this method: data_frame['time'] = pd.to_datetime(data_frame['time']) select_rows = (data_frame['time'] > start_date) & (data_frame['time'] <= end_date) data_frame.loc[select_rows] Or, you can make time column date time index and then select rows based on that as well.

Related

How to ungroup Column groups and covert them into rows using pandas?

How do filter date time column according to time ranges specified in two column?

Want to copy selected column from old dataframe to a new dataframe column

How to replace the null valued column with the values store in the list with the corresponding indexes in CSV using python and pandas?

grab substring in pandas series

Categories

Resources