I have a dataframe df with X columns.
I want to fill df['date'] and df['time'] with a substring located inside the column df['job.filename'].
I tried to convert the Series into list and then grab list[x:y]=date and also
for i,row in df.iterrows():
df.set_value(i,'time',row['job.filename'][-10:-4])
df.set_value(i,'date',row['job.filename'][21:27])
But this didn't work
Cheers
I took your sample job.filename to create a dataframe and tried the following:
df = pd.DataFrame(['IMAT list 1-3609-0-20161214-092934.csv'])
df['date'] = df[0].str.extract('.*-\d+-(\d+)-\d+') #0 is the column name, in your case job.filename
df['time'] = df[0].str.extract('.*-\d+-\d+-(\d+)')
You get:
0 date time
0 IMAT list 1-3609-0-20161214-092934.csv 20161214 092934
This regex will work only if all the values follow the exact pattern
Related
I have an array in the format [27.214 27.566] - there can be several numbers. Additionally I have a Datetime variable.
now=datetime.now()
datetime=now.strftime('%Y-%m-%d %H:%M:%S')
time.sleep(0.5)
agilent.write("MEAS:TEMP? (#101:102)")
values = np.fromstring(agilent.read(), dtype=float, sep=',')
The output from the array is [27.214 27.566]
Now I would like to write this into a dataframe with the following structure:
Datetime, FirstValueArray, SecondValueArray, ....
How to do this? In the dataframe every one minute a new array is added.
I will assume you want to append a row to an existing dataframe df with appropriate columns : value1, value2, ..., lastvalue, datetime
We can easily convert the array to a series :
s = pd.Series(array)
What you want to do next is append the datetime value to the series :
s.append(datetime, ignore_index=True) cf Series.append
Now you have a series whose length matches df.columns. You want to convert that series to a dataframe to be able to use pd.concat :
df_to_append = s.to_frame().T
We need to get the transpose of the original dataframe, because Series.to_frame() returns a dataframe with the series as a single column, and we want a single index but multiple columns.
Before you concatenate, however, you need to make sure both those dataframes columns names match, or it will create additional columns :
df_to_append.columns = df.columns
Now we can concatenate our two dataframes :
pd.concat([df, df_to_append], ignore_index=True) cf pandas.Concat
For further details, see the documentation
I have a dataframe df as below.
I want the final dataframe to be like this as follows. i.e, for each unique Name only last 2 rows must be present in the final output.
i tried the following snippet but its not working.
df = df[df['Name']].tail(2)
Use GroupBy.tail:
df1 = df.groupby('Name').tail(2)
Just one more way to solve this using GroupBy.nth:
df1 = df.groupby('Name').nth([-1,-2]) ## this will pick the last 2 rows
I have searched but not found excatly what I need. I have a dataframe which has 50 columns. The first one is a date dtype, the rest are floats dtypes.
Now I want to convert ONLY the float columns into integer but NOT the date column.
Can someone guide please?
When I slice the df like this df_sub1=df_sub.iloc[:, 1:].apply(np.int64) and then concat with the date column after, it crashes my laptop and did therefore not work. I hope there is a better way.
Well assuming that date is your first column
import pandas as pd
cols = df.columns
df[cols[1:]] = df[cols[1:]].apply(pd.to_numeric, errors='coerce')
you can do it like this.
new_df = df.drop(nameoffirstcolumn,1)
new_df.apply(np.int64)
then you can do something like.
final_df = pd.concat([df1['nameoffirstcolumn'],new_df], axis=1)
I have read the csv file into a dataframe using Pandas, the csv format is as follows. I would like to put the rows whose “time column information” is between the interval of 6/3/2011-10/20/2011 into another dataframe. How can I do it efficiently in Pandas?
Try this method:
data_frame['time'] = pd.to_datetime(data_frame['time'])
select_rows = (data_frame['time'] > start_date) & (data_frame['time'] <= end_date)
data_frame.loc[select_rows]
Or, you can make time column date time index and then select rows based on that as well.
I think you need to_datetime first and then filter by between with boolean indexing:
df['time'] = pd.to_datetime(df['time'], format='%m/%d/%Y')
df1 = df[df['time'].between('2011-06-03','2011-10-20')]
Create DatetimeIndex and select by loc:
df['time'] = pd.to_datetime(df['time'], format='%m/%d/%Y')
df = df.set_index('time')
df1 = df.loc['2011-06-03':'2011-10-20']
Is there a way in pandas to give the same column of a pandas dataframe two names, so that I can index the column by only one of the two names? Here is a quick example illustrating my problem:
import pandas as pd
index=['a','b','c','d']
# The list of tuples here is really just to
# somehow visualize my problem below:
columns = [('A','B'), ('C','D'),('E','F')]
df = pd.DataFrame(index=index, columns=columns)
# I can index like that:
df[('A','B')]
# But I would like to be able to index like this:
df[('A',*)] #error
df[(*,'B')] #error
You can create a multi-index column:
df.columns = pd.MultiIndex.from_tuples(df.columns)
Then you can do:
df.loc[:, ("A", slice(None))]
Or: df.loc[:, (slice(None), "B")]
Here slice(None) is equivalent to selecting all indices at the level, so (slice(None), "B") selects columns whose second level is B regardless of the first level names. This is semantically the same as :. Or write in pandas index slice way. df.loc[:, pd.IndexSlice[:, "B"]] for the second case.