Creating a derived field based on df value comparison in python pandas

Creating a derived field based on df value comparison in python pandas - python-3.x

I have 2 dataframes - one is a data source dataframe and another is reference dataframe.
I want to create an additional column in df1 based on the comparison of those 2 dataframes
df1 - data source
No | Name
213344 | Apple
242342 | Orange
234234 | Pineapple
df2 - reference table
RGE_FROM | RGE_TO | Value
2100 | 2190 | Sweet
2200 | 2322 | Bitter
2400 | 5000 | Neutral
final
if first 4 character of df1.No fall between the range of df2.RGE_FROM to df2.RGE_TO, get df2.Value for the derived column df.DESC. else, blank
No | Name | DESC
213344 | Apple | Sweet
242342 | Orange | Natural
234234 | Pineapple |
Any help is appreciated!
Thank you!

We can create an IntervalIndex from the columns RGE_FROM and RGE_TO, then set this as an index of column Value to create a mapping series, then slice the first four characters in the column No and using Series.map substitute the values from the mapping series.
i = pd.IntervalIndex.from_arrays(df2['RGE_FROM'], df2['RGE_TO'], closed='both')
df1['Value'] = df1['No'].astype(str).str[:4].astype(int).map(df2.set_index(i)['Value'])
No Name Value
0 213344 Apple Sweet
1 242342 Orange Neutral
2 234234 Pineapple NaN

Related

How to merge two rows in the same dataframe

I have a data frame which contains two rows. The value in the column "ID" for both these rows is the same. How can I create a new data frame and bring all the values in both the rows into one row, but in separate columns?
For example, if in the input data frame, there is a column called "Amount" in both the rows, The new data frame should contain one-row with two different columns as Amount_1 and Amount_2.
groupby does not work as I do not want all the information in the same columns.
I can not merge, as this is not from two different data frames.
Turn:
+------+--------+----------+---------+
| ID | Amount |Name |State |
|------|--------|----------+---------+
| 1 | 16 |A |CA |
| 2 | 32 |B |GA |
| 2 | 64 |C |NY |
+------+--------+----------+---------+
into:
+------+----------+----------+-------+--------+---------+--------+
| ID | Amount_1 | Amount_2 |Name_1 | Name_2 | State_1 | State_2|
|------|----------|----------|-------+--------+---------+--------+
| 1 | 16 | |A | | CA | |
| 2 | 32 | 64 |B |C | GA | NY |
+------+----------+----------+-------+--------+---------+--------+

Add a column that will contain the column names of the new DataFrame by using cumcount. After that, use pivot:
df['amountnr'] = 'Amount_' + df.groupby('ID').cumcount().add(1).astype(str)
df.pivot(index='ID', columns= 'amountnr', values='Amount')
#amountnr Amount_1 Amount_2
#ID
#1 16.0 NaN
#2 32.0 64.0
Edit
With you new specifications, I feel you should really use a MultiIndex, like so:
df['cumcount'] = df.groupby('ID').cumcount().add(1)
df.set_index(['ID', 'cumcount']).unstack()
# Amount Name State
#cumcount 1 2 1 2 1 2
#ID
#1 16.0 NaN A NaN CA NaN
#2 32.0 64.0 B C GA NY
If you insist, you can later always join the columns of your MultiIndex:
df2.columns = ['_'.join([coltype, str(count)]) for coltype, count in df2.columns.values]

How to Assign row numbers based on row values - Excel

In the bellow excel data sheet, the values in the third column are manually entered. I would need a formula to automate this.
TYPE | CATEGORY | Expected_Value
fruits | apple | 1
fruits | apple | 2
fruits | apple | 3
fruits | bananna | 1
fruits | bananna | 2
fruits | mango | 1
fruits | mango | 2
fruits | mango | 3
fruits | mango | 4
fruits | mango | 5
Expected_Value represents the n-th duplicate of a given (TYPE, CATEGORY) couple.
Could someone help?

You need to use COUNTIF with an absolute reference and a relative reference. If your Category column spanned from B2 to B12 you could use this:
=COUNTIF($B$2:B2,B2)

Excel Formula - How to get the highest 2 values of a cell based on another cell

I need some help with the following requirement:
Raw Data
A B C
-------------------------------------
1 | Opp | Vendor | Amount |
|-------------|-------------|---------|
2 | 101 | Vendor1 | 100000 |
3 | 101 | Vendor2 | 5000 |
4 | 103 | Vendor1 | 30000 |
5 | 103 | Vendor2 | 5000 |
6 | 103 | Vendor3 | 50000 |
Output Table
A B C D E
---------------------------------------------------------
1 | Opp | MainVendor | Amount1 | 2Vendor | Amount2 |
|-------------|-------------|---------|---------|---------|
2 | 101 | Vendor1 | 100000 | Vendor2 | 5000 |
3 | 103 | Vendor3 | 50000 | Vendor1 | 30000 |
MainVendor: Vendor with the highest Amount
Amount1: Amount for MainVendor
2Vendor: Second highest vendor
Amount2: Amount for 2Vendor
I was only able to get Amount1 value using the following array formula:
{=MAX(IF(A:A=[#[Opp]];C:C))}
in column C.
I'm failing to get the values for column B, D & E.

Here's a solution - may be a bit convoluted so happy to hear constructive criticism if people can refactor the formulas.
To get the Amount1, Amount2 values you can use:
=INDEX(LARGE($C$2:$C$10*--($A$2:$A$10=$A13),1),1)
Where:
$C$2:$C$10 are the values from the Amount column
multiplied by --($A$2:$A$10=$A13) - meaning {1;1;0;0;0;0;0;0;0} * {100000;5000;30000;5000;50000;45000;50000;40000;51000} - this has the impact of eliminating values in the Amount column that do not correspond to the Opp of the row.
The LARGE function then just picks 1st largest of 100000 and 5000 as the other values in the array are now 0.
to get 2nd largest, 3rd largest etc you suppy 2, 3 etc into the LARGE function
LARGE function is wrapped in INDEX to get the actual value out as I believe its returning it as an 1-element array (although this isn't clear with the formula inspector)
Sample:
To get MainVendor and Vendor2 values you can use:
=INDEX($B$2:$B$10,INDEX(MATCH(LARGE($C$2:$C$10*--($A$2:$A$10=$A13),1),$C$2:$C$10*--($A$2:$A$10=$A13),0),1))
Which is:
$B$2:$B$10 are the values from the Vendor column
the index is retrieved by using MATCH on the same formula used for AmountX against its own source array
the MATCH is wrapped in another INDEX to get the actual value out as, once again, I think the value is being returned as a 1-element array
Sample:
I created another set of Opp values and the formula works with some duplicate values in the Amount column but for different Opps.

Excel: count items from A and sum up value from B if identical

I want to do something very similar to what's been asked and answered here, although I don't want to have duplicates: Find count of items in column and add the values from another column
Basically, I have a column A with duplicated names, and different values associated in column B. In column C I want the name and in column D the summed up value. The problem with SUMIF is that it will show me several time the name and the summed up values, while I want to see it only once.
To make it more clear, here's the result I'm looking for:
A | B | C | D
------------+----------+-----------+---------
Name | Value | Name | Sum
------------+----------+-----------+---------
Potatoes | 9 | Potatoes | 13
Tomatoes | 4 | Tomatoes | 11
Carrots | 1 | Carrots | 16
Potatoes | 4 | Eggs | 5
Eggs | 5 | |
Tomatoes | 7 | |

Excel search value from whatever row and if found on another row replace that row with the value in the first row

What I am Trying to do is
lets say I have a excel sheet with
rows
ProductNo | Product | Sku | Price | Image | Thumb
25 | Shirt Blue | 4251 | $10 | shirt.jpg | shirtthumb.jpg
2 | Shirt Green | 4581 | $17 | green.jpg | greenthumb.jpg
8 | Shirt Black | 4561 | $15 | black.jpg | blackthumb.jpg
and just in different rows or on another excel sheet
ProductNo | Product | Sku | Price | Image | Thumb
25 | Shirt Blue | 4251 | $52 | |
2 | Shirt Green | 4581 | $42 | |
8 | Shirt Black | 4561 | $65 | |
How can i change the first table to update if the the second table or sheet columns data is different on specified columns and if the cells are empty forget about them ignore them and just replace the values from the second table onto the first
Final would be
ProductNo | Product | Sku | Price | Image | Thumb
25 | Shirt Blue | 4251 | $52 | shirt.jpg | shirtthumb.jpg
2 | Shirt Green | 4581 | $42 | green.jpg | greenthumb.jpg
8 | Shirt Black | 4561 | $65 | black.jpg | blackthumb.jpg
I have tried a couple of excel functions but they do not work since i have so many products to be doing cell additions
I tried doing in Vl but got confused and macro i dont even know what it is
Im open to whatever visual, functions just as long as i can perform the task
if anybody know hos let me know
Thank You

in stead of having the fixed values I propose you use a permanent formula in the specified columns.
Now to do this I would use a VLOOKUP() function. I am assuming that your ProductNo is the element that never changes therefore all the other columns will get a VLOOKUP() function.
Now if I understand correctly you MIGHT have an update in the 2nd table for the 1st table, but any empty cells in the 2nd table should be ignored.
I am also assuming you wish to see when an element will change because of the update therefore I propose the following:
In the first table add for the block of columns elements that might need an update: 2 blocks of columns, the first with the result of combining (the COMB-block) and the second with lookups from the 2nd table (the LOOKUP-block). For convenience of explaining I put the two tables in the same workbook on the sheets called table1 and table2
ProductNo | Product | Sku | Price | Image | Thumb | Product_comb | Sku_comb | Price_comb | Image_comb | Thumb_comb | Product_lookup | Sku_lookup | Price_lookup | Image_lookup | Thumb_lookup
Now start with formulas in the LOOKUP-block, use VLOOKUP(), such as this one for the *Product_vlookup* column:
=IFERROR(VLOOKUP($A2,table2!$A:B,COLUMNS(table2!$A:B),FALSE),"")
The IFERROR is for the case the product in table1 cannot be found in table2
For the formulas in the COMB-block the following will prefer the table 2 result over the table 1 result. As VLOOKUP of a matching ProductNo with an empty Element (for example for the Image) will result in a 0 (zero) returned all zeroes are regarded false lookup results as well. This is the script for the *Product_comb* column:
=IF(OR(ISBLANK(L2),L2=0),B2,L2)
As a final step to identify the products that changed you can either add a column that compared the original value with the _comb value:
=AND(B2=G2,C2=H2,D2=I2,E2=J2,F2=K2) (this returns true for no changed columns and false for any changed column)
Or use conditional formatting on each element separately or on the combination as the AND() formula shows.
As a final step in your process of updating you could copy all records from the COMB-block and paste it over the original elements.
If you have any further questions please ask.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Creating a derived field based on df value comparison in python pandas - python-3.x

Related

How to merge two rows in the same dataframe

How to Assign row numbers based on row values - Excel

Excel Formula - How to get the highest 2 values of a cell based on another cell

Excel: count items from A and sum up value from B if identical

Excel search value from whatever row and if found on another row replace that row with the value in the first row

Categories

Resources