Pandas add new column in csv and save - python-3.x

I have code like:
import pandas as pd
df = pd.read_csv('file.csv')
for id1, id2 in zip(df.iterrows(),df.loc[1:].iterrows()):
id1[1]['X_Next'] = id2[1]['X']
as you see, I need for each row to have next row's column value.
Iteration looks good, but I dunno how to save it bvack to csv file.
Can someone help me ? thanks!

IIUC use Series.shift:
df = pd.read_csv('file.csv')
df['X_Next'] = df['X'].shift(-1)
df.to_csv('file1.csv', index=False)

Related

Add/Subtract UTC Time to Datetime 'Time' column

I have a sample dataframe as given below.
import pandas as pd
import numpy as np
data = {'InsertedDate':['2022-01-21 20:13:19.000000', '2022-01-21 20:20:24.000000', '2022-02-
02 16:01:49.000000', '2022-02-09 15:01:31.000000'],
'UTCOffset': ['-05:00','+02:00','-04:00','+06:00']}
df = pd.DataFrame(data)
df['InsertedDate'] = pd.to_datetime(df['InsertedDate'])
df
The 'InsertedDate' is a datetime column wheres the 'UTCOffset' is a string column.
I want to add the Offset time to the 'Inserteddate' column and display the final result in a new column as a 'datetime' column.
It should look something like this image shown below.
Any help is greatly appreciated. Thank you!
You can use pd.to_timedelta for the offset and add with time.
# to_timedelta needs to have [+-]HH:MM:SS format, so adding :00 to fill :SS part.
df['UTCOffset'] = pd.to_timedelta(df.UTCOffset + ':00')
df['CorrectTime'] = df.InsertedDate + df.UTCOffset

Pandas : how to consider content of certain columns as list

Let's say I have a simple pandas dataframe named df :
0 1
0 a [b, c, d]
I save this dataframe into a CSV file as follow :
df.to_csv("test.csv", index=False, sep="\t", encoding="utf-8")
Then later in my script I read this csv :
df = pd.read_csv("test.csv", index_col=False, sep="\t", encoding="utf-8")
Now what I want to do is to use explode() on column '1' but it does not work because the content of column '1' is not a list since I saved df into a CSV file.
What I tried so far is to change column '1' type into a list with astype() without any success.
Thank you by advance.
Try this, Since you are reading from csv file,your dataframe value in column A (1 in your case) is essentially a string for which you need to infer the values as list.
import pandas as pd
import ast
df=pd.DataFrame({"A":["['a','b']","['c']"],"B":[1,2]})
df["A"]=df["A"].apply(lambda x: ast.literal_eval(x))
Now, the following works !
df.explode("A")

Merge duplicate rows in a text file using python based on a key column

I have a csv file and I need to merge records of those rows based on a key column name
a.csv
Name|Acc#|ID|Age
Suresh|2345|a-b2|24
Mahesh|234|a-vf|34
Mahesh|4554|a-bg|45
Keren|344|s-bg|45
yankie|999|z-bg|34
yankie|3453|g-bgbbg|45
Expected output: Merging records based on name like values from both the rows for name Mahesh and yankie are merged
Name|Acc#|ID|Age
Suresh|2345|a-b2|24
Mahesh|[234,4555]|[a-vf,a-bg]|[34,45]
Keren|344|s-bg|45
yankie|[999,3453]|[z-bg,g-bgbbg]|[34,45]
can someone help me with this in python?
import pandas as pd
df = pd.read_csv("a.csv", sep="|", dtype=str)
new_df = df.groupby('Name',as_index=False).aggregate(lambda tdf: tdf.unique().tolist() if tdf.shape[0] > 1 else tdf)
new_df.to_csv("data.csv", index=False, sep="|")
Output:
Name|Acc#|ID|Age
Keren|344|s-bg|45
Mahesh|['234', '4554']|['a-vf', 'a-bg']|['34', '45']
Suresh|2345|a-b2|24
yankie|['999', '3453']|['z-bg', 'g-bgbbg']|['34', '45']

python3 - import dataframe from textfile format head1=value|head2=value

I looked into the pandas documentation and there are several options to import data into a pandas dataframe. The common way seems to be importing a csv file, when it comes to importing textfiles.
The data I would like to use are logfiles formatted like this:
timestamp=2018-09-08T11:11:58.362028|head1=value|head2=value|head3=value
timestamp=2018-09-08T11:15:25.860244|head1=value|head2=value|head3=value
I only need some of these elements imported into the data timeframe, lets say timestamp, head1 and head3.
In a csv notation the dataframe would look like this:
timestamp;head1;head3
logfile row1 - value of timestamp; value of head1; value of head3
logfile row2 - value of timestamp; value of head1; value of head3
logfile row3 - value of timestamp; value of head1; value of head3
I could write a csv file using this data, to import it afterwords. But is there a pandas function or a direct way to import these data into a pandas dataframe?
Thank you for your help in advance!
You can do:
columns = ['timestamp','head1','head2','head3']
pd.read_csv(your_file.csv,sep='|',names = columns).drop('head2',1).replace('.*=','',regex=True)
I'd parse and process the file like this:
with open('file.csv', 'r') as fh:
df = pd.DataFrame([dict(x.split('=') for x in l.strip().split('|')) for l in fh])
df = df[['timestamp', 'head1', 'head3']]
df
timestamp head1 head3
0 2018-09-08T11:11:58.362028 value value
1 2018-09-08T11:15:25.860244 value value
Thank you for the great solutions! I used the solution provided but filtered the needed rows already during import, so that other different structured elements in the logfile do not bother:
import pandas as pd
with open('logfile.txt', 'r') as fh:
df = pd.DataFrame([dict(x.split('=') for x in l.strip().split('|') if x.find("timestamp") > -1 or x.find("head1") > -1 or x.find("head3") > -1) for l in fh])

Python Pandas dataframe, how to integrate new columns into a new csv

guys, I need a bit help on Pandas and would appreciate greatly your inputs.
My original file looks like this:
I would like to convert it by mergering some pairs of columns (generating their averages) and returns a new file looking like this:
Also, if possible, I would also like to split the column 'RateDateTime' into two columns, one contains the date, the other contains only the time. How should I do it? I tried coding as belows but it doesn't work:
import pandas as pd
dateparse = lambda x: pd.datetime.strptime(x, '%Y/%m/%d %H:%M:%S')
df = pd.read_csv('data.csv', parse_dates=['RateDateTime'], index_col='RateDateTime',date_parser=dateparse)
a=pd.to_numeric(df['RateAsk_open'])
b=pd.to_numeric(df['RateAsk_high'])
c=pd.to_numeric(df['RateAsk_low'])
d=pd.to_numeric(df['RateAsk_close'])
e=pd.to_numeric(df['RateBid_open'])
f=pd.to_numeric(df['RateBid_high'])
g=pd.to_numeric(df['RateBid_low'])
h=pd.to_numeric(df['RateBid_close'])
df['Open'] = (a+e) /2
df['High'] = (b+f) /2
df['Low'] = (c+g) /2
df['Close'] = (d+h) /2
grouped = df.groupby('CurrencyPair')
Open=grouped['Open']
High=grouped['High']
Low=grouped['Low']
Close=grouped['Close']
w=pd.concat([Open, High,Low,Close], axis=1, keys=['Open', 'High','Low','Close'])
w.to_csv('w.csv')
Python returns:
TypeError: cannot concatenate object of type "<class 'pandas.core.groupby.groupby.SeriesGroupBy'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
Can someone help me please? Many thanks!!!
IIUYC, you don't need grouping here. You can simply update existing dataframe with new columns and specify, what columns you need to save to csv file in to_csv method. Here is example:
df['Open'] = df[['RateAsk_open', 'RateBid_open']].mean(axis=1)
df['RateDate'] = df['RateDateTime'].dt.date
df['RateTime'] = df['RateDateTime'].dt.time
df.to_csv('w.csv', columns=['CurrencyPair', 'Open', 'RateDate', 'RateTime'])

Resources