I need to be able to do a summary of rows based on certain column conditions.
There's a table with the following columns:
ID# (row #)
Part
Customer
Job
QTY
Dept
Pass/Fail
Where ID# can possibly be the only unique value.
From the table the following needs to be obtained:
Need to return All jobs.
Need to return all new jobs, which should be all jobs minus any duplicates (i.e., the first entry for each unique Part-Customer-Job-QTY-Dept (i.e., where all 5-values are equal).
All Repeat jobs, which should be all ID#-Part-Customer-Job-QTY-Dept-Pass.
For example:
1-Part;Customer;Job; QTY; Dept; Fail
2-Part; Customer; Job; QTY; Dept; Pass
Where Part-Customer-Job-QTY-Dept are equal, with 1 failing and 2 passing.
1 and 2 are easy, but 3 is a little tricky.
Should I just find the ID#'s (rows) that include a fail prior to a pass.
Can it be done in a single loop?
While I'm here, #2 might be tricky as well. Is there any easy way to sum, without duplicates?
Any help will be highly appreciated!
Please let me know if any additional info is needed.
The sample data below should return:
10 for All
8 for New
2 for Repeats
Doing the sample has me thinking if just subtracting New from All will return all repeats.
I am not sure if you actually need to use VBA as the question seems can be solved by default functions.
The key problem here is to identify all the repeating records, of the unique key Part-Customer-Job-QTY-Dept. Note that you don't actually need to take care of the ID and Pass/Fail as these values do not affect your calculation.
Once you know the unique key, you can solve by the following steps:
Make a new column concatenates to produce the unique key. (F1)=A1&B1&C1&D1&E1
For each row, count the appearance of the unique key among the column.(G1)=countif(F:F,F1)
You can determine the record is duplicating when the count is larger than one, meaning there are multiple lines in the data using the same unique key.=countif(G:G,">1")
Once you have the Yes/No answer on each row, you simply count the Yes to yield the repeating count, thus the new jobs according to your definition.
This can also be implemented in VBA by the same logic.
Related
I have a Databricks delta table of financial transactions that is essentially a running log of all changes that ever took place on each record. Each record is uniquely identified by 3 keys. So given that uniqueness, each record can have multiple instances in this table. Each representing a historical entry of a change(across one or more columns of that record) Now if I wanted to find out cases where a specific column value changed I can easily achieve that by doing something like this -->
SELECT t1.Key1, t1.Key2, t1.Key3, t1.Col12 as "Before", t2.Col12 as "After"
from table1 t1 inner join table t2 on t1.Key1= t2.Key1 and t1.Key2 = t2.Key2
and t1.Key3 = t2.Key3 where t1.Col12 != t2.Col12
However, these tables have a large amount of columns. What I'm trying to achieve is a way to identify any columns that changed in a self-join like this. Essentially a list of all columns that changed. I don't care about the actual value that changed. Just a list of column names that changed across all records. Doesn't even have to be per row. But the 3 keys will always be excluded, since they uniquely define a record.
Essentially I'm trying to find any columns that are susceptible to change. So that I can focus on them dedicatedly for some other purpose.
Any suggestions would be really appreciated.
Databricks has change data feed (CDF / CDC) functionality that can simplify these type of use cases. https://docs.databricks.com/delta/delta-change-data-feed.html
To clarify, I will attack the code myself, however, it's the order and techniques to do this I need some guidance with.
I have a table with 4 columns. The table has 40000 rows. The columns are Date, Name, location, company.
I want to see how many times someone in the name column, appears in the database. I then want to apply a date filter to that. This I have managed to do if somewhat clunky.
My problem is that the unique function only returns the name, when i apply it to that column. and i need it to return the location and company so that i can sort on that too. How would you go about doing that?
I have used unique on the name column. It returns a spilled over list of about 4000 names. I then
use this code =COUNTIFS(MyName,$A2,MyDate,">="&DATE(List!$D$2,List!M$4,List!M$5),MyDate,"<="&DATE(List!$D$2,List!M$4,List!M$3)) to return the number of entries per month. A2 is the unique return column.
Any guidance would be appreciated.
Allan
I have a huge epidemiological dataset containing registry data with pathology reports and clinical information. I have merged several files into one masterfile in order to get all information from one file. Every patient is assigned an unique ID-number. Each patient can have several reports and hence the same ID number can be repeated several times in the ID column. For each ID entry = new row (= pathology or clinical report) there is a date of that sample/information reported.
My goal is to be able to read all pathology/clinical info for a particular ID within one row.
By sorting the IDs, I get a clear picture of the number of each ID that has been entered. The problem arises when there are several reports = multiple rows with identical ID because the dates within this one patients with several IDs = rows do not match. The dates come from pathology (sample date, answer date, clinical info date etc). The dates from pathology and clinical within one patient does not have to match exactly on the day but still within a reasonable timeframe e.g. within 1-2 months. This is best illustrated with an example.
I want to sort the columns so that dates from a particular row match together. I am sure there is a way to do that but I cannot figure it out.
Thanks in advance
The issue of mismatching records seems to arise once the two separate tables are merged into one. In order to fix this, there are several options you can take:
Re-do the merge but strengthen the way in which the tables are joined on.
Instead of only merging based on ID, see if there is another field that could easily connect the records, perhaps a medical record #, case #, or event #, and merge the tables based on this new field AND ID. This would be the strongest solution, however it will only work if you can find said field to strengthen the link.
A separate solution would be to first sort the original tables based on the dates so that they match up and then re-merging them together.
In theory this should solve your problem as I assume currently when matching up the two separate tables it is grabbing the first instance of patient X01 from both tables and matching them together. This can be confirmed by checking the merged query and looking to see if the mismatched records are in the same order as presented in the original tables. This is not perfect, as it relies on no clinical dates occurring between pathology dates for the record, so I would proceed with caution.
And to address your concern about losing track of ID's with multiple rows, this should not matter as in the end result after merged you can then sort by ID, however you can add multiple levels of sort by selecting the data and going to Data -> Sort -> Add Level. You can change the order in which the data is sorted (First by ID and then by Date).
Brief:
I have a large dataset, inside of which are Individual customer orders by item and quantity. What I'm trying to do is get excel to tell me which order numbers contain exact matches (in terms of items and quantities) to each other. Ideally, I'd like to have a tolerance of say 80% accuracy which I can flex to purpose but I'll take anything to get me off the ground.
Existing Solution:
At the moment, I've used concatenation to pair item with quantity, pivoted and then put the order references as column and concat as rows with quantity as data (sorted by quantity desc) and I'm visually scrolling across/down to find matches and then manually stripping in my main data where necessary. I have about 2,500 columns to check so was hoping I could find a more suitable solution with excel doing the legwork on identification.
Index/matching works at cross referencing a match for the concatenation but of course, the order numbers (which are unique) are different so its not giving me matches ACROSS orders.
Fingers crossed!
EDIT:
Data set with outcomes
As you can see, the bottom order reference has no correlation to the orders above it so is not listed as a match to the orders above but 3 are identical and 1 has a slightly different item but MOSTLY matches.
I have an issue regarding data validation in Excel, namely how to dynamically set the validation source.
I have three tables, where the first contains a product ID and a product name. The second table contains a product ID together with a serial number. A third table has three columns; one for product ID, one for serial number and a description for e.g. error reporting.
What I want to do is related to the third column where I select the product ID in a drop-down box which is linked to the first table. This works perfectly fine. The second column though, must only allow serial numbers related to the product ID selected according to the relationship in the second table. Hence, the data validation list must be dynamically generated by the input in the first column.
The reason for having it in Excel is due corporate reasons and personally I'd use an SQL-database for this very issue. E.g. if I were to use SQL-syntax to generate the validation list, the corresponding SQL-statement would be:
SELECT serialNumber WHERE productId = 12345;
I've tried using the INDEX-MATCH, but unfortunately MATCH only returns a scalar value rather than an array. I have not come across array functions prior to today, but I assume such might be included in order to accomplish this and have tried a bit without success.
If I somehow were to acquire an array returning the row numbers where there is a match, the INDEX-function would accomplish my needs, I presume.
My question is therefore, is there a method to acquire an array of matched values or can my problem be solved using a more elegant solution? If it could be of value if it can be made without VBA, also for corporate security reasons.
Thanks in advance!