I have over 100k rows of data like below:
ALLA,ALLA,"Company1, Inc.","Company1, Inc.",PSA,PSA,1,1,FALSE,FALSE
BCCO,BCCO,"Company2, Inc.","Company2, Inc.",PSB,PSB,1,1,FALSE,FALSE
CTTP,CTTP,"Company3, Inc.","Company3, Inc.",PSC,PSC,1,1,FALSE,FALSE
CMMZ,CMMZ,"Company4, Inc.","Company4, Inc.",PSD,PSD,1,1,FALSE,FALSE
I want to know how to figure if data in column 1 is the same as column 2, column 3 as column 4 and so on. How could I do that in excel?
Following Cory's formula, I found that I can compare whole columns using:
=if(A:A=B:B, "yay", "aww")
Problem is I have a header in the file:
c - symbol, symbol, c - companyname, companyname, c - tradingvenue, tradingvenue, c - tierrank, tierrank, c - iscaveatemptor, iscaveatemptor
Shouldn't this cause A:A=B:B to be false?
Given this:
| A | B |
---+-----+-----+
1 | X | X |
---+-----+-----+
2 | Y | Y |
---+-----+-----+
3 | Z | Z |
The formula =SUMPRODUCT(--(A1:A3=B1:B3)) will tell you how many times the A value matches the B value.
You should get 3 as a result here. If, for example, you change B3 to Q then it will give you 2.
To do this on two columns without specifying the end of the range, try:
=SUMPRODUCT(--(A:A=B:B),--(LEN(A:A)>0))
I've been using Excel since 1991, and unless you want to write a VB macro, I think the best way is to do the simple IF statement suggested in the comments. If you need to test several columns at once, which is what your question suggests, then I'd do
=IF(AND(A1=B1,C1=D1,E1=F1,G1=H1),0,1)
Fill that formula down the column and then you'll be able toinstantly count the number of rows that don't matchwith a data-filter, select all the rows which have a '1', so you'll be able to examine the rows that don't match
Related
I need help on creating a formula/code on data filtering.
I have 4 columns of data as below:
Column A Column B Column C Column D
__________________________________________________
| ID TEST FUNCTION SCRATCH |
|_________________________________________________|
|92018211 Y WELL |
|72937191 |
|01221921 WELL Yes |
|72901921 Y Yes |
|00192839 Y WELL Yes |
|_________________________________________________|
I want to filter my data into if any of the column B,C and D is blank, the data should be visible.
The data that have value in all three column B,C,D is the data that I do not want.
Example: Example the data that have value Y in column B, value WELL in column C, value Yes in Column D
So from my data above, after filter the data should be like below:
Column A Column B Column C Column D
__________________________________________________
| ID TEST FUNCTION SCRATCH |
|_________________________________________________|
|92018211 Y WELL |
|72937191 |
|01221921 WELL Yes |
|72901921 Y Yes |
| |
|_________________________________________________|
I would like to do the formula/code using excel macro Autofiltering.
Any help from anyone is much appreciated. Really hope anyone able to help me on this.
You could add a column E with the formula =OR(ISBLANK(D2),ISBLANK(C2),ISBLANK(D2))
This formula returns a FALSE when data is present all three columns (B, C, D)
After this you can apply a filter and delete the rows where column E is FALSE. This would you give you the desired result.
I currently have two columns that look like this:
value | condition
50 | Y
60 | N
30 | Y
10 | Y
I cant seem to make use of IF function to get a sum of all the rows. Basically the aim here is to only sum the values if condition is Y and display the total in a cell. And if condition column is N, it will see value as 0. I want to be able to do this without the need of creating an additional column even though I understand this is easily done with a brand new column.
as Nick mentioned, you can use Sumif:
A | B
1 value | condition
2 50 | Y
3 60 | N
4 30 | Y
5 10 | Y
sumif structure:
=sumif(Range, Criteria, Sum_Range)
sumif for table above:
=sumif(B2:B5,"Y",A2:A5)
Excel formulas are not case-sensitive and there is no difference between A1:A5 and A2:A5 because the table titles are not important but start row and end row is important in both Range and Sum_Range.
I've been struggling between the SUMPRODUCT and COUNTIFS formulas as there are a lot of specific dependencies in my data. Wondering if anyone can shed a bit more light on this issue.
Have tried SUMPRODUCT and COUNTIFS which give me calculations based on 1 set, but I need to include additional if/or statements.
I have the following:
| ID | Size | Dead/Alive | Duration | Days | Pass/Fil | Reason |
|----|---------|------------|-----------|------|----------|----------|
| 1 | Full | Dead | Permanent | 125 | Pass | Comments |
| 2 | Partial | Alive | Permanent | 500 | Pass | |
| 3 | Other | Dead | Temporary | 180 | Fail | Comments |
| 4 | No | Dead | Temporary | 225 | Fail | Comments |
| 5 | Yes | Alive | Permanent | 200 | Pass | |
with the following rules:
Only Count the ID/ROW if:
1) Values in column A = Full, Partial or Other
OR...
2) Values in column A = No AND values in column B = Dead
OR...
3) If values in column C = Permanent AND values in column D = >=100 or <=200
OR
4) If values in column C = Temporary AND values in column E = Pass, Fail AND column F=not blank
By my calculations, the total should be 5, but this is just a small sampling of my total data. Just not sure how to get that in Excel with either Sumproduct, Countifs or even someone suggested a Lookup function, although Ive never used that one.
Given that you have so many different conditions, I have to break it down one by one and create a few helper columns to account for each condition.
In my solution I created 10 helper columns as shown below, and I have added some sample data (ID 6 to 29) to test the solution.
I also named 7 conditions in my solution:
Cond_1 Values in column A = Full, Partial or Other
Cond_2 Values in column A = No AND values in column B = Dead
Cond_3A Values in column C = Permanent
Cond_3B Values in column D >=100
Cond_3C Values in column D <=200
Cond_3A, Cond_3B and Cond_3C must be TRUE at the same time
Cond_4 Values in column C = Temporary AND values in column E = Pass
Cond_5A Values in column C = Temporary AND values in column E = Fail
Cond_5B Column F is not blank (I did not give a name to this condition)
Cond_5A and Cond_5B must be TRUE at the same time
Please note my Cond_4, Cond_5A and Cond_5B are all related to your original condition 4), which reads a bit odd, and I am not 100% sure if my interpretation of the condition is correct. If not please re-state your last condition and I can amend my answer accordingly.
As shown in my screen-shot, the formulas in I2 to Q2 are listed in Column U. I only used MAX, AND, SUM, =, &, and/or <> to interpret each condition. Please note some of the formulas are Array Formula so you need to press Ctrl+Shift+Enter to make it work.
The To Count column is simply asking whether the SUM of the previous 9 columns is greater than 1, which means at least one of the conditions is met. If so returns 1 otherwise 0.
Then you just need to work out the total of To Count column. In my example it is 22. I have highlighted the entries that did not meet any of the given condition.
You can use only one helper column to capture all conditions in one formula, but I would not recommend it as it would be too long to be easily understood and modified in future.
{=--(SUM(MAX(--(A2=Cond_1)),MAX(--(A2&B2=Cond_2)),--(SUM(--(C2=Cond_3A),--(AND(D2>=Cond_3B,D2<=Cond_3C)))=2),MAX(--((C2&E2)=Cond_4)),--(SUM(MAX(--((C2&E2)=Cond_5)),--(F2<>""))=2))>0)}
Ps. I would also wonder if there is a formula-based solution without using any helper column...? :)
I have an If formula in Excel as seen below:
=IF((Q2="O")*AND(R2="O")*AND(S2="O")*AND(T2="O")*AND(U2="O"),"O","X")
Basically, if cells Q2 to U2 is O the cell with formula will have O written in it. Otherwise it will have X. Now I want to change it into a nested If statement due to new conditions.
These conditions, in order are:
If any one of cells Q2 to U2 = X, cell = X
If any one of cells Q2 to U2 = date format, cell = ∆
If Q2 to U2 = O, cell = O
If none of the conditions are met, the cell will have value "FALSE". (Default appears)
Each of the cells have this condition to follow,
If any one of cells Q2 to U2 = -, ignore that cell and count the other cells to get final result.
I tried switching to Or in my original formula to test out conditions 1 and 3.
=IF((Q2="O")*or(R2="O")*or(S2="O")*or(T2="O")*or(U2="O"),"O","X")
But it doesn't work. Plus I'm not sure how to do condition 5 as well. Any help?
Is it possible to do something so complex just by using Excel formula? Or do I need to go into VBA?
Things like this are usually easier if you tackle the problem in order of priority. It looks like 1 has the highest priority:
=IF(COUNTIF(Q2:U2,"X")>0,"X","Does not contain X")
COUNTIF(Q2:U2,"X") returns the number of occurrences of X in the range Q2:U2.
In place of "Does not contain X", we can first check for condition 2:
=IF(SUMPRODUCT(--ISNUMBER(Q2:U2))>0,"∆","Does not contain X or dates")
A date in excel is literally a number with some decorative formatting. I am using ISNUMBER to find the numbers and SUMPRODUCT to count the identified cells in the range. If there are more than 0 (at least 1), then it will become ∆.
In place of "Does not contain X or dates" now, we could check for Os. I would count the Os and add the cells with - (conditions 3 and 5 together) and see if they add up to the total cells in Q2:U2 (which is 5 in this case):
=IF(COUNTIF(Q2:U2,"O")+COUNTIF(Q2:U2,"-")=5,"O","FALSE")
When combined, it would become:
=IF(COUNTIF(Q2:U2,"X")>0,"X",IF(SUMPRODUCT(--ISNUMBER(Q2:U2))>0,"∆",COUNTIF(Q2:U2,"O")+COUNTIF(Q2:U2,"-")=5,"O","FALSE"))
Since all these involve counts, it might be easier setting up helper columns, something like:
| Q | R | S | T | U | V | W | X | Y | Z
1 | | | | | | Count of X | Count of dates | Count of O | Count of - | Final value
2 | | | | | | A | B | C | D | E
A will be:
=COUNTIF(Q2:U2,"X")
B will be:
=SUMPRODUCT(--ISNUMBER(Q2:U2))
C will be:
=COUNTIF(Q2:U2,"O")
D will be:
=COUNTIF(Q2:U2,"-")
E will be:
=IF(V2>0,"X",IF(W2>0,"∆",X2+Y2=5,"O","FALSE"))
I am trying to sort data imported from a csv file. The data comes in like such:
Columns
A | B
--------
t1 | 1
t3 | 9
t1 | 2
t2 | 5
t1 | 1
t3 | 13
t1 | 3
t3 | 11
t2 | 4
t2 | 7
t3 | 10
t3 | 10
and i want output similar to this:
Columns
D | E | F
----------------
t1 | 1 | 3
t2 | 4 | 7
t3 | 9 | 13
Explanation: Basically what I need to do is find the lowest and highest values from column B for each different value in column A, and list them neatly as shown in the second example.
Ive worked with VBA before, so if this would have to be done via VBA thats fine. Im just at a loss as to how to accomplish this task. Any help would be appreciated.
EDIT: Forgot to mention, if would make the task simpler, its fine if i have to manually sort the data alphabetically based on col A (thus putting same values together)
I agree with #chrisneilsen that a Pivot Table is the best way to go. If you are set on using formulas, you can try using the following (both entered as arrays - Ctrl+Shift+Enter):
In cell E1, which will represent the minimum value:
=MIN(IF($A$1:$A$12=D1,1,MAX($B$1:$B$12)+1)*$B$1:$B$12)
And in cell F1, which will represent the maximum value:
=MAX(IF($A$1:$A$12=D1,1,MIN($B$1:$B$12)-1)*$B$1:$B$12)
The general idea is that check to see which values in column A are equal to your target value (column D). The result will be an array of 1's where there is a match, and using MIN as an example, the maximum of the column + 1. This is done because we want to set this equal to a value that can't possibly be attained in your current setup, so the maximum value + 1 will ensure that MIN will return a value that is legitimate.
Here is a Pivot Table using Excel 2007. To create, add column headers to your data, select your data and then in the Ribbon click Insert -> Pivot Table. In the dialog box, you decide where you want to put it (it is commonly put in a New Worksheet, so you can leave the default if you want - I left it in the same worksheet for illustration purposes). From there, you can arrange it by dragging each field so it matches the pictures. For the Max/Min fields, just drag the Value field into the Values section twice. Then, in the actual Pivot Table, you can right-click on one of the values in the column and select Summarize Data By -> Min to summarize by the minimum value for each key: