Excel: copy data within a column based on duplicate values in another column - excel

I have a table as:
(Column A)Company Name | (Column B)Company Number
ABC | 123
ABC |
CBA |
CBA | 234
ACB | 567
ACB |
In Column B I need to insert data in row 2 as row 1 (or vice versa) because row 1 and 2 in Column A have same data. The table i have has about 6M such rows and hence looking for some help.

I think you need filters and then just paste the values you need.

Related

Pandas, combine unique value from two column into one column while preserving order

I have data in four column as shown below. There are some values which are present in column 1, and some value of column 1 is again duplicated in column 3. I would like to combine column 1 with 3, while removing the duplicates from column 3. I would also like to preserve the order of column. Column 1 is associated with column 2 and column 3 is associated with column 4, so it would be nice if I can move column 1 items with column 2 and column 3 items with column 4 during merge. Any help will be appreciated.
Input table:
Item
Price
Item
Price
Car
105
Truck
54822
Chair
20
Pen
1
Cup
2
Car
105
Glass
1
Output table:
Item
Price
Car
105
Chair
20
Cup
2
Truck
54822
Pen
1
Glass
1
Thank you in advance.
After separating the input table into the left and right part, we can concatenate the left hand items with the unduplicated right hand items quite simply with boolean indexing:
import pandas as pd
# this initial section only recreates your sample input table
from io import StringIO
input = pd.read_table(StringIO("""| Item | Price | Item | Price |
|-------|-------|------|-------|
| Car | 105 | Truck| 54822 |
| Chair | 20 | Pen | 1 |
| Cup | 2 | Car | 105 |
| | | Glass| 1 |
"""), ' *\| *', engine='python', usecols=[1,2,3,4], skiprows=[1], keep_default_na=False)
input.columns = list(input.columns[:2])*2
# now separate the input table into the left and right part
left = input.iloc[:,:2].replace("", pd.NA).dropna().set_index('Item')
right = input.iloc[:,2:] .set_index('Item')
# finally construct the output table by concatenating without duplicates
output = pd.concat([left, right[~right.index.isin(left.index)]])
Price
Item
Car 105
Chair 20
Cup 2
Truck 54822
Pen 1
Glass 1

Formula for count of distinct values, multiple conditions, one of which = or <> all repeating values

Excel formula (I know this may work with a pivot table, but wanting a formula) to count distinct values. If this is my table in Excel:
Region | Name | Criteria
------ | ------ | ------
1 | Jill | A
1 | Jill | A
1 | John | B
1 | John | A
2 | Jane | B
2 | Jane | B
2 | Bill | A
2 | Bill | B
3 | Mary | B
3 | Mary | B
3 | Gary | A
3 | Gary | A
In this example, I have the following formual to calculate the distinct values within each region =SUM(--(FREQUENCY(IF((Table1[Region]=A2)*(Table1[Name]<>""),MATCH(Table1[Name],Table1[Name],0)),ROW(Table1[Name])-ROW(Table!B2)+1)>0)) which results in 2 each (Region 1=Jill & John; 2=Jane & Bill, 3=Mary & Gary, each distinct name counted once).
I have an addition formula to calculate how many distinct values with criteria where there is at least 1 "B" for each distinct name within each region, by adding *(Table1[Category]="B") after <>"") ... in this example, it would return Region 1=1, Region 2=2, 3=1, because Jill nor Gary do not have "B" - all others have at least one "B".
Now I'm getting stuck on my last formula, where I want to count how many distinct values within each Region have ALL B's in all their occurrences. The outcome should be Region 1=0 (Jill has no B's and John has a B, but also has an A), Region 2=1 (Jane appears twice, counts as 1 distinct value, and both occurrences are B, Bill has a B in one of his), and 3=1 (Mary has all Bs).
It's too complex for a formula-only task, but feasible.
The following array formula does the job. Although you did not specify it, but I suppose that if "Mary" has an A in another region, this should not cancel her counting in region 3, so long as all records with name "Mary" in region 3 have a "B". In other words, names can repeat in different regions but should not interfere across regions (which made the formula even longer. I added a test case for this, Mary in region 4 with an A did not interfere with Mary in region 3).
=SUM(IF((Table1[Region]=Table1[#Region])*(0=COUNTIFS(Table1[Region],Table1[#Region],
Table1[Name],Table1[Name],Table1[Criteria],"<>B")), 1/COUNTIFS(Table1[Name],Table1[Name],
Table1[Criteria],"B",Table1[Region],Table1[#Region]), 0))
Enter it then press CtrlShiftEnter. then copy/paste down the column.

Excel macro to find a date in a range and then count for next value change in an adjacent column

I am attempting to write a macro to find February 2nd of each year in column A and then count the number of rows (days) until the value in column B changes. This count could be put in a new column, column C, but on the same row as the February 2nd that it correlates to, in this case row 3.
Using the table below the output to C3 would be 5. I am not counting the day of February 2nd but I am counting the day the change occurs. This is for 100+ years that I will need to loop through.
id | A | B | C
----------------------------
1 | 1946/01/31 | 0 |
2 | 1946/02/01 | 0 |
3 | 1946/02/02 | 0 |
4 | 1946/02/03 | 0 |
5 | 1946/02/04 | 0 |
6 | 1946/02/05 | 0 |
7 | 1946/02/06 | 0 |
8 | 1946/02/07 | 2 |
9 | 1946/02/08 | 0 |
The real challenge is to do it with a formula. Well, 2 formulas.
The first formula in cell E2 finds the date 2nd Feb by looking for "02/02" at the end of the text in column B and if it is found it places the contents of C2 in that cell. if it's not found it compares C1 with D1, the 2 cells above to see if they are the same because a match was previously found and if so it takes the contents of the cell above. This results in the zeros you can see in column E between 2nd Feb and the point where column C changes.
Formula for E2 and then autofill down to the end of your data
=IF(AND(MONTH(B2)=2,DAY(B2)=2),C1,IF(AND(E1<>"",E1=C1),E1,""))
Now all we need to do is count the cells in column D by looking for the first non blank cell in column D AND(E1="",E2<>"") and then count all the cells that match that cell. I'm not sure what gap you're expecting to find but you can change the 200 to ensure that you count everything. The last part is to take away 1 so that the 2nd feb row is not being counted.
Formula for D2 and then autofill down to the end of your data
=if(AND(E1="",E2<>""),countif(E2:E200,E2)-1,"")

Formatting a table

Say I have a table in Excel of two columns and the first column only contains integers from 1 to N. I want to format the second column into a table of N columns so that each value would be in a specific column - that is, the number of this column would be the integer, which corresponded to this value in the first table. That's what I mean:
First table
2 | 2.56
3 | 3.12
5 | 5.55
1 | 8.12
1 | 1.00
2 | 9.30
Second table (consists of 5 columns)
8.12 | 9.30 | 3.12 | - | 5.55
1.00 | 2.56
Can I do that automatically?
Head your first column 'Col' and your second column 'Number'. Make a third column with a header of 'Row'. In the third column, put
=COUNTIF($A$2:A2,A2)
And fill down as far as your data goes.
Make a pivot table from the data. Put 'Col' in Column Labels, 'Row' in Row Labels, and 'Sum of Number' in Values.

Vlookup a Cell which Contains a Part of Other Cell but not that Straightforward

Hi guys and all Excel gurus, I am stuck with this one excel problem which I cannot solve. I tried using Index, Match, Vlookup but to no avail.
Basically I tried getting Column D displays Value from Column B if the Value of Column C contains part of the value in Column A.
So what I'm dealing with is something kind of like this:
Fixed the table display
+------------------------------------------------------+
| Header Column A Column B Column C Column D |
+------------------------------------------------------+
| Row 1 111 AAA 1111 |
| Row 2 222 BBB 112 |
| Row 3 333 CCC 2225 |
| Row 4 444 DDD 333 |
+------------------------------------------------------+
So my expected result would be:
+------------------------------------------------------+
| Header Column A Column B Column C Column D |
+------------------------------------------------------+
| Row 1 111 AAA 1111 AAA |
| Row 2 222 BBB 112 N/A |
| Row 3 333 CCC 2225 BBB |
| Row 4 444 DDD 333 CCC |
+------------------------------------------------------+
Sorry for the poor table display and explanation. Thanks Guys.
=INDEX($C$2:$C$5, MATCH(1,IF(ISERR(FIND($B$2:$B$5, $D2)),0,1),0))
, where 5 is the last data row. Enter as an array formula (Ctrl+Shift+Enter) in E2, then drag down.
BTW on row 4 it gives CCC, not N/A.

Resources