I have several dataframes that I have concatenated with pandas in the line:
xspc = pd.concat([df1,df2,df3], axis = 1, join_axes = [df3.index])
In df2 the index values read one day later than the values of df1, and df3. So for instance when the most current date is 7/1/19 the index values for df1 and df3 will read "7/1/19" while df2 reads '7/2/19'. I would like to be able to concatenate each series so that each dataframe is joined on the most recent date, so in other words I would like all the dataframe values from df1 index value '7/1/19' to be concatenated with dataframe 2 index value '7/2/19' and dataframe 3 index value '7/1/19'. When methods can I use to shift the data around to join on these not matching index values?
You can reset the index of the data frame and then concat the dataframes
df1=df1.reset_index()
df2=df2.reset_index()
df3=df3.reset_index()
df_final = pd.concat([df1,df2,df3],axis=1, join_axes=[df3.index])
This should work since you mentioned that the date in df2 will be one day after df1 or df3
Related
I have an array in the format [27.214 27.566] - there can be several numbers. Additionally I have a Datetime variable.
now=datetime.now()
datetime=now.strftime('%Y-%m-%d %H:%M:%S')
time.sleep(0.5)
agilent.write("MEAS:TEMP? (#101:102)")
values = np.fromstring(agilent.read(), dtype=float, sep=',')
The output from the array is [27.214 27.566]
Now I would like to write this into a dataframe with the following structure:
Datetime, FirstValueArray, SecondValueArray, ....
How to do this? In the dataframe every one minute a new array is added.
I will assume you want to append a row to an existing dataframe df with appropriate columns : value1, value2, ..., lastvalue, datetime
We can easily convert the array to a series :
s = pd.Series(array)
What you want to do next is append the datetime value to the series :
s.append(datetime, ignore_index=True) cf Series.append
Now you have a series whose length matches df.columns. You want to convert that series to a dataframe to be able to use pd.concat :
df_to_append = s.to_frame().T
We need to get the transpose of the original dataframe, because Series.to_frame() returns a dataframe with the series as a single column, and we want a single index but multiple columns.
Before you concatenate, however, you need to make sure both those dataframes columns names match, or it will create additional columns :
df_to_append.columns = df.columns
Now we can concatenate our two dataframes :
pd.concat([df, df_to_append], ignore_index=True) cf pandas.Concat
For further details, see the documentation
I have 2 DataFrames : df0 and df1 and df1.shape[0] > df1.shape[0].
df0 and df1 have the exact same columns.
Most of the rows of df0 are in df1.
The indices of df0 and df1 are
df0.index = range(df0.shape[0])
df1.index = range(df1.shape[0])
I then created dft
dft = pd.concat([df0, df1], axis=0, sort=False)
and removed duplicated rows with
dft.drop_duplicates(subset='this_col_is_not_index', keep='first', inplace=True)
I have some duplicates on the index of dft. For example :
dft.loc[3].shape
returns
(2, 38)
My aim is to change the index of the second row returned to have a unique index 3.
This second row should be indexed dft.index.sort_values()[-1]+1.
I would like to apply this operation on all duplicates.
References :
Python Pandas: Get index of rows which column matches certain value
Pandas: Get duplicated indexes
Redefining the Index in a Pandas DataFrame object
Add parameter ignore_index=True to concat for avoid duplicated index values:
dft = pd.concat([df0, df1], axis=0, sort=False, ignore_index=True)
Use reset_index(drop = True)
dft.reset_index(drop=True)
Is thee any way yo subtract values of two existing dataframe with the common headers in java ?
For example
DF1
|H0|H1|H2|H3|
|00|01|02|03|
|04|05|06|07|
|08|09|10|11|
DF2
|H0|H1|H2|H3|H4|
|01|02|03|04|12|
|05|06|07|08|13|
|09|11|12|13|14|
Subtraction example:
DF2 - DF1
|H0|H1|H2|H3|H4|
|01|01|01|01|12|
|01|01|01|01|13|
|01|01|01|01|14|
I want to get data from only df2 (all columns) by comparing 'no' filed in both df1 and df2.
My 3 line code is below, for this i'm getting all columns from df1 and df2 not able to trim fields from df1. How to achieve ?
I've 2 pandas dataframes like below :
df1:
no,name,salary
1,abc,100
2,def,105
3,abc,110
4,def,115
5,abc,120
df2:
no,name,salary,dept,addr
1,abc,100,IT1,ADDR1
2,abc,101,IT2,ADDR2
3,abc,102,IT3,ADDR3
4,abc,103,IT4,ADDR4
5,abc,104,IT5,ADDR5
6,abc,105,IT6,ADDR6
7,abc,106,IT7,ADDR7
8,abc,107,IT8,ADDR8
df1 = pd.read_csv("D:\\data\\data1.csv")
df2 = pd.read_csv("D:\\data\\data2.csv")
resDF = pd.merge(df1, df2, on='no' , how='inner')
I think you need filter only no column, then on and how parameters are not necessary:
resDF = pd.merge(df1[['no']], df2)
Or use boolean indexing with filtering by isin:
resDF = df2[df2['no'].isin(df1['no'])]
I have two dataframes that are contain market daily end of day data. They are supposed to contain identical starting dates and ending dates and number of rows, but when I print the len of each, one is bigger by one than the other:
DF1
close
date
2008-01-01 45.92
2008-01-02 45.16
2008-01-03 45.33
2008-01-04 42.09
2008-01-07 46.98
...
[2870 rows x 1 columns]
DF2
close
date
2008-01-01 60.48
2008-01-02 59.71
2008-01-03 58.43
2008-01-04 56.64
2008-01-07 56.98
...
[2871 rows x 1 columns]
How can I show which row either:
has a duplicate row,
or has an extra date
so that I can delete the [probable] weekend/holiday date row that is in DF2 but not in DF1?
I have tried things like:
df1 = df1.drop_duplicates(subset='date', keep='first')
df2 = df1.drop_duplicates(subset='date', keep='first')
but can't get it to work [ValueError: not enough values to unpack (expected 2, got 0)].
Extra:
How do I remove weekend dates from a dataframe?
May using .loc
DF2=DF2.loc[DF1.index]
If check index different between DF1 and DF2
DF2.index.difference(DF1.index)
Check whether DF2 have duplicate index
DF2[DF2.index.duplicated(keep=False)]
Check the weekends
df.index.weekday_name.isin(['Sunday','Saturday'])
Fix your code
df1 = df1.reset_index().drop_duplicates(subset='date', keep='first').reset_index('date')
df2 = df2.reset_index().drop_duplicates(subset='date', keep='first').reset_index('date')
Also for this I recommend duplicated
df2 =df2 [df2.index.duplicated()]
About the business
def B_day(date):
return bool(len(pd.bdate_range(date, date)))
df.index.map(B_day)