I´m trying to do something like this.
cur.executemany("UPDATE tableA SET Col1 ='S' WHERE Col2 = %s AND Col3= %s ", data [:][0], data [:][4])
Where "data" is a list. I need make an update for each line in my list (data). For each line I need look for Clo1 equal element 0 and Col2 equal elemente 4.
You need to transform the list before passing it to executemany, extracting the elements you need, perhaps like this:
cur.executemany("UPDATE tableA SET Col1 ='S' WHERE Col2 = ? AND Col3= ? ",
[(row[0], row[4]) for row in data])
Related
I have a data frame as follows:
df = pd.DataFrame()
df['col1'] = [2,2,3,4,5]
df['col2'] = [2,2,2,2,2]
I want to create new column, which is a combination of col1 and col2. But col2 is super subscript of col1. Below is the expected output for col3.
How can I achieve this?
I have a pandas dataframe with a size of 235607 records, and 94 attributes. I am very new python I was able to create a correlation matrix between all of the attributes but it is a lot to look through individually. I tried writing a for loop to print a list of the columns with a correlation greater than 80% but I keep getting the error "'DataFrame' object has no attribute 'c1'"
This is the code I used to create the correlation between the attributes as well as the sample for loop. Thank you in advance for your help :-
corr = data.corr() # data is the pandas dataframe
c1 = corr.abs().unstack()
c1.sort_values(ascending = False)
drop = [cols for cols in upper.c1 if any (upper[c1] > 0.80)]
drop
Sort in place, if you need to use the same variable c1 and then just grab the variables-names pair, using a comprehensive list using the indexes
c1.sort_values(ascending=True, inplace=True)
columns_above_80 = [(col1, col2) for col1, col2 in c1.index if c1[col1,col2] > 0.8 and col1 != col2]
Edit: Added col1 != col2 in the comprehensive list so you don't grab the auto-correlation
you can simply use the numpy.where like this:
corr.loc[np.where(corr>0.8, 1, 0)==1].columns
the output would be array with the names of the columns, which are having values greater then 0.8.
EDIT: I hope this will work. I edited the code above little.
Given a Data Frame like the following:
df = pd.DataFrame({'term' : ['analys','applic','architectur','assess','item','methodolog','research','rs','studi','suggest','test','tool','viewer','work'],
'newValue' : [0.810419, 0.631963 ,0.687348, 0.810554, 0.725366, 0.742715, 0.799152, 0.599030, 0.652112, 0.683228, 0.711307, 0.625563, 0.604190, 0.724763]})
df = df.set_index('term')
print(df)
newValue
term
analys 0.810419
applic 0.631963
architectur 0.687348
assess 0.810554
item 0.725366
methodolog 0.742715
research 0.799152
rs 0.599030
studi 0.652112
suggest 0.683228
test 0.711307
tool 0.625563
viewer 0.604190
work 0.724763
I want to add n new empty columns "".
Therefore, I have a value stored in variable n which indicates the number of required new columns.
n = 5
Thanks for your help in advance!
According to this answer,
Each not empty DataFrame has columns, index and some values.
So your dataframe must not have a column without name anyway.
This is the shortest way that I know of to achieve your goal:
n = 5
for i in range(n):
df[len(df.columns)] = ""
newValue 1 2 3 4 5
term
analys 0.810419
applic 0.631963
architectur 0.687348
assess 0.810554
item 0.725366
methodolog 0.742715
research 0.799152
rs 0.599030
studi 0.652112
suggest 0.683228
test 0.711307
tool 0.625563
viewer 0.604190
work 0.724763
IIUC, you can use:
n= 5
df=(pd.concat([df,pd.DataFrame(columns=['col'+str(i)
for i in range(n)])],axis=1,sort=False).fillna(''))
print(df)
newValue col0 col1 col2 col3 col4 col0 col1 col2 col3 col4
analys 0.810419
applic 0.631963
architectur 0.687348
assess 0.810554
item 0.725366
methodolog 0.742715
research 0.799152
rs 0.599030
studi 0.652112
suggest 0.683228
test 0.711307
tool 0.625563
viewer 0.604190
work 0.724763
Note: You can remove the fillna() if you want NaN.
I have a data frame with two columns - col1 and col2, but when I use df.plot.barh, the plot returns results in col2 and col1 order. Is there a way to get the plot to display results in col1 and col2 order?
df = pd.DataFrame(np.random.randint(0,10,(5,2)), columns=['col1','col2'])
df.plot.barh()
will yield this:
Instead using bar():
df = pd.DataFrame(np.random.randint(0,10,(5,2)), columns=['col1','col2'])
df.plot.bar()
In both instances, col1 is first in that it is closest to the x axis. To reverse the order of the columns, you would need to reverse the order in which they appear in your dataframe. For just two columns you can use:
df = df[df.columns[::-1]]
COL1 COL2 COL3
Hi T_M12345678 T_455462
T_M12345670 T_M12345678
bye T_M123456781 T_M12345670
T_M123 T_M589646
T_M894545 T_M123456781
T_M418554651
T_M4546565
I need to compare COL2 and COL3; if any match is found then I need to compare with COL1 for that match found and if there is any value in COL1 then it should return a value on below mentioned scenarios true in COL4.
For Example,
Scenario 1:
Data T_M12345678 is present in COL2 and COL3 so match is found then, I need to check whether I have any value in COL1 for this data in COL2 and in this case, it is YES (Hi is the value in COL1) so I should print TRUE in COL4.
Scenario 2:
Data T_M12345670 is present in COL2 and COL3 so match is found; then I need to check whether I have any value in COL1 for this data in COL2 and in this case, it is NO so I should print TRUE1 in COL4.
Scenario 3:
Data T_M589646 in COL3 is not present in COL2 so I need to print FALSE in COL4.
Since you did not post the expected outcome, I created 2 additional columns (1 for values in COL2, other for values in COL3). The following formulas work as you defined.
COL2 value check:
=IFERROR(IF(AND(MATCH(B2,$C$2:$C$8,0),ISBLANK(A2)),"TRUE1","TRUE"),"FALSE")
COL3 value check:
=IFERROR(IF(AND(MATCH(C2,$B$2:$B$8,0),ISBLANK(A2)),"TRUE1","TRUE"),"FALSE")