I am trying to add a new column in a pandas data frame, then update the value of the column row by row:
my_df['col_A'] = 0
for index, row in my_df.iterrows():
my_df.loc[index]['col_A'] = 100 # value here changes in real case
print(my_df.loc[index]['col_A'])
my_df
However, in the print out, all values in the col_A are still 0, why is that? What did I miss? Thanks!
you are assigning to a slice in this line my_df.loc[index]['col_A'] = 100
Instead do
my_df['col_A'] = 0
for index, row in my_df.iterrows():
my_df.loc[index, 'col_A'] = 100 # value here changes in real case
print(my_df.loc[index]['col_A'])
Related
I want to populate all rows in a dataframe column with the values from one cell in a separate data frame.
Both dfs are based on data read in from the same CSV.
data_description = pd.read_csv(file.csv, nrows=1)
#this two rows of data: one row of column headers and one row of values. The value I want to use later is under the header "average duration"
data_table = pd.read_csv(file.csv, skiprows=3)
#this is a multi row data table located directly below the description. I to want add a "duration" column will all rows populated by "average duration" from above.
df1 = pd.DataFrame(data_description)
df2 = pd.DataFram(data_table)
df2['duration'] = df1['average duration']
The final line only works for the first from in the column. How can extend down all rows?
If I directly assign the 'average duration' value it works, e.g. df2['duration'] = 60, but I want it to be dynamic.
You have to extract the value from df1 and then assign the value to df2. What you're assigning is a Series, not the value.
data_description = pd.read_csv(file.csv, nrows=1)
data_table = pd.read_csv(file.csv, skiprows=3)
df1 = pd.DataFrame(data_description)
df2 = pd.DataFram(data_table)
df2['duration'] = df1['average duration'][0]
I have a dataframe (t_df_noNA) with over 4000 columns. I wanted to select all columns where the values have atleast one value over 70.
i am using the following code for extracting columns having values above cutoff (70)
filtertdf = t_df_noNA[t_df_noNA.columns[(t_df_noNA > 70).any()]]
the code works but selects columns having atleast one value above 70.
But now I have to select columns that have atleast 4 values over 70. How do I handle that?
Here's a way to do:
cols = df.columns[(df > 4).sum().gt(4)].tolist()
filter_df = df[cols]
From this data frame I like to select rows with same concentration and also almost same name. For example, first three rows has same concentration and also same name except at the end of the name Dig_I, Dig_II, Dig_III. This 3 rows same with same concentration. I like to somehow select this three rows and take mean value of each column. After that I want to create a new data frame.
here is the whole data frame:
import pandas as pd
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
import pandas as pd
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
new_df = df.groupby('concentration').mean()
Note: This will only find the averages for columns with dtype float or int... this will drop the img_name column and will take the averages of all columns...
This may be faster...
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js").groupby('concentration').mean()
If you would like to preserve the img_name...
df = pd.read_csv("https://gist.github.com/akash062/75dea3e23a002c98c77a0b7ad3fbd25b.js")
new = df.groupby('concentration').mean()
pd.merge(df, new, left_on = 'concentration', right_on = 'concentration', how = 'inner')
Does that help?
I'm getting the error code:
ValueError: Wrong number of items passed 3, placement implies 1.
What i want to do is import a dataset and count the duplicated values, drop the duplicated values and add a column which says that there were x number of duplicates of that number.
This is to try and sort a dataset of 13 000 rows and 45 columns.
I've tried different solutions found online but it seems like it does not help. I'm pretty new to programming and all help is really appreciated
'''import pandas as pd
# Making file ready
data = pd.read_excel(r'Some file.xlsx', header = 0)
data.rename(columns={'Dato': 'Last ordered', 'ArtNr': 'Item No:'}, inplace
= True)
#Formatting dates
pd.to_datetime(data['Last ordered'],
format = '%Y-%m-%d %H:%M:%S')
#Creates new table content and order
df = data[['Item No:','Last ordered', 'Description']]
df['Last ordered'] = df['Last ordered'].dt.strftime('%Y-/%m-/%d')
df = df.sort_values('Last ordered', ascending = False)
#Adds total sold quantity column
df['Quantity'] = df.groupby('Item No:').transform('count')
df2 = df.drop_duplicates('Item No:').reset_index(drop=True)
#Prints to environment and creates new excel file
print(df2)
df2.to_excel(r'New Sorted File.xlsx')'''
I expect it to provide a new excel file with columns:
Item No | Last ordered | Description | Quantity
And i want to be able to add other columns from the original dataset as well if i need to later on.
The problem is at this line:
df['Quantity'] = df.groupby('Item No:').transform('count')
The right side part of the assignment is a dataframe and you are trying to fit it inside a column. You need to select only one of the columns. Something like
df['Quantity'] = df.groupby('Item No:').transform('count')['Description']
should work.
I have a df that looks like this
data.answers.1542213647002.subItemType data.answers.1542213647002.value.1542213647003
thank you for the response TRUE
How do I slice out the column name only for columns that have the string .value. and the column has the value TRUE into a new df like so?:
new_df
old_column_names
data.answers.1542213647002.value.1542213647003
I have roughly 100 more columns with .value. in it but not all of them have TRUE in them as values.
assume this sample df:
df = pd.DataFrame({'col':[1,2]*5,
'col2.value.something':[True,False]*5,
'col3.value.something':[5]*10,
'col4':[True]*10})
then
# boolean indexing with stack
new = pd.DataFrame(list(df[((df==True) & (df.columns.str.contains('.value.')))].stack().index))
# drop duplicates
new = new.drop(columns=0).drop_duplicates()
1
0 col2.value.something