Read columns in dataframe - python-3.x

I have a dataframe as below:-
Emplid,Name_x,Age_x,Name_y,Age_y
1,ABC,23,ABC,23
2,XYZ,24,PQR,24
I want to compare Name_x with Name_y & Age_x with Age_y values , if they match I add additional record saying all values matched ( as in 1st case) & if any of column doesnt match it should say "<column_name> not matched for Emplid" ( as in 2nd case it should say "age didnt match for empl_id 2)
My column names will be changing each time based on user input , so i have it currently captured in list & i am trying to scan the column list using for loop , but doesnt work fine .
Any leads on what approach or sudo code i can use ?

Seems like you're trying to check values in 2 columns.
df['new_column'] = np.where(df['Name_x']==df['Name_y'], 'matched names', 'no_match')
This adds a column saying the names match, if not it will say they haven't matched. You can loop through this with some added code for other columns

Related

How do I drop complete rows (including all values in it) that contain a certain value in my Pandas dataframe?

I'm trying to write a python script that finds unique values (names) and reports the frequency of their occurrence, making use of Pandas library. There's a total of around 90 unique names, which I've anonymised in the head of the dataframe pasted below.
,1,2,3,4,5
0,monday09-01-2022,tuesday10-01-2022,wednesday11-01-2022,thursday12-01-2022,friday13-01-2022
1,Anonymous 1,Anonymous 1,Anonymous 1,Anonymous 1,
2,Anonymous 2,Anonymous 4,Anonymous 5,Anonymous 5,Anonymous 5
3,Anonymous 3,Anonymous 3,,Anonymous 6,Anonymous 3
4,,,,,
I'm trying to drop any row (the full row) that contains the regex expression "^monday.*", intending to indicate the word "monday" followed by any other number of random characters. I want to drop/deselect any cell/value within that row.
To achieve this goal, I've tried using the line of code below (and many other approaches I found on SO).
df = df[df[1].str.contains("^monday.*", case = True, regex=True) == False]
To clarify, I'm trying to search values of column "1" for the value "^.monday.*" and then deselecting the rows and all values in that row that match the regex expression. I've succesfully removed "monday09-01-2022" and "tuesday10-01-2022" etc.. but I'm also losing random names that are not in the matching rows.
Any help would be very much appreciated! Thank you!

Comparing two columns and their values and outputting the greater value

I'm trying to compare two columns ("Shows") from different tables and showing which one has the greater number ("Rating") associated with it in another table.
Ignore the operation column above as part of the solution that I'm trying to get, it's just to illustrate for you what I'm trying to compare.
Important note: If the names are duplicated. Compare the matching pair in their corresponding order. (1st with 1st, 2nd with 2nd, 3rd with 3rd etc..) illustrated in the table below:
Thanks
You can try the following in cell F3 for an array solution that spills the entire result at once:
=LET(sA, A3:A6, rA, B3:B6, sB, C3:C6, rB, D3:D6, CNTS, LAMBDA(x,
LET(seq, SEQUENCE(ROWS(x)), MAP(seq, LAMBDA(s,ROWS(FILTER(x,(x=INDEX(x,s))
*(seq<=s))))))), cntsA, CNTS(sA), cntsB, CNTS(sB), eval, MAP(sA, rA, cntsA,
LAMBDA(s,r,c,IF(r > FILTER(rB, (sB=s) * (cntsB=c)), "Table 1", "Table 2"))),
HSTACK(sA, eval))
Here is the output:
Explanation
The main idea is to count repeated show values. We use a user LAMBDA function CNTS, to avoid repetition of the same formula twice. Once we have the counts (cntsA, contsB), we use MAP to iterate over Table 1 elements with the counts and look for specific show and counts to compare with Table 2 columns. The FILTER function will return always a single value (based on sample data). Finally, we prepare the output as expected using HSTACK.
Try-
=IF(INDEX(FILTER($B$3:$B$6,$A$3:$A$6=G3),COUNTIFS($G$3:$G3,G3))>INDEX(FILTER($E$3:$E$6,$D$3:$D$6=G3),COUNTIFS($G$3:$G3,G3)),"Table-1","Table-2")

Create a text cell value based on row entries and corresponding columns

I understand this is a tough way of wording the problem I have. Please try and help me.
I want to create a Column called Orders which contains cells based on corresponding item values.
So if I have columns: FlatNo, Truffle, Pineapple, Mango, Chocochips; I want to create a column called Orders which has value:
FlatNo - A51
Mango - 1
Chocochips - 1
(if no values in the Pineapple & Truffle Columns, none show up in Orders columns)
See image
How do I do that ? Thank you in advance
You can use IF and &. & simply puts the different desired things altogether.
Hope the following formula will get you the result for column orders. I have put the number of each item ordered inside parentheses before the item.
="Flat No. "&A2&IF(ISBLANK(B2),"","-("&B2&")"&$B$1)&IF(ISBLANK(C2),"","-("&C2&")"&$C$1)&IF(ISBLANK(D2),"","-("&D2&")"&$D$1)&IF(ISBLANK(E2),"","-("&E2&")"&$E$1)
For instance the third order is shown like this: Flat No. E-23-(1)Truffle -1 Pc Rs 60-(3)Mango -1 Pc Rs 60

Compare two columns and output new column based on order of reference column

I'm trying to compare two columns (list) with same IDs (just in different order). I want to reference first columns order, compare it to the next column, and then reformat 2nd columns order based on first columns order in new column (or list). From there I can pull corresponding columns that match the order of first column (price, demographic, etc).
Input:
First column (reference column):
12321
12323
324214
32313452
1232132
fs2421
sfasrfas
asfasd
Second column (re-order necessary):
12321
sfasrfas
12323
324214
1232132
fs2421
asfasd
32313452
I have tried writing a for loop in python with two separate lists for each column IDs as well as Index/Match in excel but can only seem to output 'matching' IDs.
Excel
=INDEX($A$2:$A$589,MATCH(C2,$A$2:$A$589,0),2)
Python
## setting empty list and extract only matched values from both lists made above ##
matched_IDs = []
unique_IDs = []
for Part_No in updated_2_list:
if Part_No in updated_1_list:
matched_IDs.append(Part_No)
elif Part_No not in updated_2_list:
unique_IDs.append(Part_No)
print(matched_IDs)
#len(matched_IDs)
len(matched_IDs)
I expect to match the order of first column in new column (or list).
Output:
Third column (new column after second column was re-ordered)
12321
12323
324214
32313452
1232132
fs2421
sfasrfas
asfasd
You mean like this:
=INDEX(C:C,MATCH(A1,C:C,0))

Using tbl.Lookup to match just part of a column value

This question relates to the Schematiq add-in for Microsoft Excel.
Using =tbl.Lookup(table, columnsToSearch, valuesToFind, resultColumn, [defaultValue]) the values in the valuesToFind column have a consistent 3 characters to the left and then varying characters after (e.g. 908-123456 or 908-321654 - i.e. 908 is always consistent)
How can I tell the function to lookup the value based on the first 3 characters only? The expected answer should be the sum of the results of the above, i.e. 500 + 300 = 800
tbl.Lookup() works by looking for an exact match - this helps ensure it's fast but in this case it means you need an extra step to calculate a column of lookup values, something like this:
A2: =tbl.CalculateColumn(A1, "code", "x => LEFT(x, 3)", "startOfCode")
This will give you a new column that you can use for the columnsToSearch argument, however tbl.Lookup() also looks for just one match - it doesn't know how to combine values together if there is more than one matching row in the table, so I think you also need one more step to group your table by the first 3 chars of the code, like this:
A3: =tbl.Group(A2, "startOfCode", "amount")
Because tbl.Group() adds values together by default, this will give you a table with a row for each distinct value of startOfCode and the subtotal of amount for each of those values. Finally, you can do the lookup exactly as you requested, which for your input table will return 800:
A4: =tbl.Lookup(A3, "startOfCode", "908", "amount")

Resources