Excel: Order by date within multiple IDs - excel

I have a huge epidemiological dataset containing registry data with pathology reports and clinical information. I have merged several files into one masterfile in order to get all information from one file. Every patient is assigned an unique ID-number. Each patient can have several reports and hence the same ID number can be repeated several times in the ID column. For each ID entry = new row (= pathology or clinical report) there is a date of that sample/information reported.
My goal is to be able to read all pathology/clinical info for a particular ID within one row.
By sorting the IDs, I get a clear picture of the number of each ID that has been entered. The problem arises when there are several reports = multiple rows with identical ID because the dates within this one patients with several IDs = rows do not match. The dates come from pathology (sample date, answer date, clinical info date etc). The dates from pathology and clinical within one patient does not have to match exactly on the day but still within a reasonable timeframe e.g. within 1-2 months. This is best illustrated with an example.
I want to sort the columns so that dates from a particular row match together. I am sure there is a way to do that but I cannot figure it out.
Thanks in advance

The issue of mismatching records seems to arise once the two separate tables are merged into one. In order to fix this, there are several options you can take:
Re-do the merge but strengthen the way in which the tables are joined on.
Instead of only merging based on ID, see if there is another field that could easily connect the records, perhaps a medical record #, case #, or event #, and merge the tables based on this new field AND ID. This would be the strongest solution, however it will only work if you can find said field to strengthen the link.
A separate solution would be to first sort the original tables based on the dates so that they match up and then re-merging them together.
In theory this should solve your problem as I assume currently when matching up the two separate tables it is grabbing the first instance of patient X01 from both tables and matching them together. This can be confirmed by checking the merged query and looking to see if the mismatched records are in the same order as presented in the original tables. This is not perfect, as it relies on no clinical dates occurring between pathology dates for the record, so I would proceed with caution.
And to address your concern about losing track of ID's with multiple rows, this should not matter as in the end result after merged you can then sort by ID, however you can add multiple levels of sort by selecting the data and going to Data -> Sort -> Add Level. You can change the order in which the data is sorted (First by ID and then by Date).

Related

How to identify all columns that have different values in a Spark self-join

I have a Databricks delta table of financial transactions that is essentially a running log of all changes that ever took place on each record. Each record is uniquely identified by 3 keys. So given that uniqueness, each record can have multiple instances in this table. Each representing a historical entry of a change(across one or more columns of that record) Now if I wanted to find out cases where a specific column value changed I can easily achieve that by doing something like this -->
SELECT t1.Key1, t1.Key2, t1.Key3, t1.Col12 as "Before", t2.Col12 as "After"
from table1 t1 inner join table t2 on t1.Key1= t2.Key1 and t1.Key2 = t2.Key2
and t1.Key3 = t2.Key3 where t1.Col12 != t2.Col12
However, these tables have a large amount of columns. What I'm trying to achieve is a way to identify any columns that changed in a self-join like this. Essentially a list of all columns that changed. I don't care about the actual value that changed. Just a list of column names that changed across all records. Doesn't even have to be per row. But the 3 keys will always be excluded, since they uniquely define a record.
Essentially I'm trying to find any columns that are susceptible to change. So that I can focus on them dedicatedly for some other purpose.
Any suggestions would be really appreciated.
Databricks has change data feed (CDF / CDC) functionality that can simplify these type of use cases. https://docs.databricks.com/delta/delta-change-data-feed.html

Excel sheets with scores by same ID of person (Kahoot) - How to extract and summarize scores from several quizzes?

I've used Kahoot in the classroom and have several excel files with scores from quizzes.
Students attended quizzes by using unique IDs. In each file, scores are visible for each ID (but ordered by success on each quiz). There are also some students missing or stating wrong IDs (I'll ignore it).
Now I would like to accumulate all scores for all student IDs in one sheet and summarize them by Student ID.
How can I do that most efficiently?
Any pointer or advice is appreciated.
Thanks,
B.
Here's a high level guide to getting what you want along with a sample in this file.
Step 1 - Combine Files to Sheet with Unified Columns
Objective
The goal here is to:
Combine all of your data from other files to single sheet
Merge the data to be in a single column for each field (i.e. Column A has ID, Column B has score).
No breaks in rows.
No formulas.
To illustrate, I made this fake list based loosely on your
description.
Method
You probably can do this manually, but a macro could also be used. If you expect to do this year over year, you might look into vba to open close files in a folder. However, since that wasn't part of question, you can do copy-paste (better yet make a kid do it!). Just make sure there's only one header for each column, and all of the data records align. Probably should do copy paste value if you have any formulas.
Step 2 - Show Summation
There's a couple ways this could be done. A pivot table is probably the most sensible because you could include each quiz as a column to see the total. You could also use a pivot table to do averages by student etc.
TO make a pivot table, I would recommend going on YouTube and they will do a better job of explaining than me.
On that same file I made as an example, I included some tabs to illustrate the power of pivot tables and a couple graphs.
Hope that helps. If you have specific technical questions on this, you might consider asking separately.

SSRS Cells auto-merge

I'm having trouble unmerging cells on the report.
3 Suppliers for the query
I have a SQL query that shows 3 instances of a supplier (left joined to contact) as shown below. However, when running the report for the query the 3 instance of the supplier is merged into one. This is not desirable in my case because when exporting the report to excel, I'd like to be able to sort columns based on other properties, however, this would not be possible due the the merging of the rows. How can I get results to show individually?
Cells are Merged on the report
Within the properties of each Row Group you can specify which columns to group on. You generally don't need a separate group for each field, but that's OK. In your last group, the one called "(Details)", if it is not grouped by anything, it will show one row per line of results from the query. So take a look at what it's grouped by. As long as the rows are in your dataset, the report will group or show them based on how you configure the grouping here. Grouping on nothing means it will show all rows.
Another tip is to align the end of your header textbox with the line of one of your columns. This will prevent it from creating an extra column in Excel for the "City" field.
Your report does not need all of those groupings - the SSRS grouping is not like SQL. You should only group when you want to aggregate data on that field. Normally you might have a company with its address in various fields in one group but you only need to group once on the Company Name or (preferably) ID - not on each field and not a separate group for each. You could then show details of various invoices in other columns that aren't grouped.
But since you want to display the company data on each row, you would not want ANY grouping on the company.
To fix your issues, remove all the groupings (but not the rows) and just leave the detail group (which doesn't have a Grouping).
You can check out MS Docs: Understanding Groups for a better explanation.

Excel Matching Customer Orders by Item and Quantity

Brief:
I have a large dataset, inside of which are Individual customer orders by item and quantity. What I'm trying to do is get excel to tell me which order numbers contain exact matches (in terms of items and quantities) to each other. Ideally, I'd like to have a tolerance of say 80% accuracy which I can flex to purpose but I'll take anything to get me off the ground.
Existing Solution:
At the moment, I've used concatenation to pair item with quantity, pivoted and then put the order references as column and concat as rows with quantity as data (sorted by quantity desc) and I'm visually scrolling across/down to find matches and then manually stripping in my main data where necessary. I have about 2,500 columns to check so was hoping I could find a more suitable solution with excel doing the legwork on identification.
Index/matching works at cross referencing a match for the concatenation but of course, the order numbers (which are unique) are different so its not giving me matches ACROSS orders.
Fingers crossed!
EDIT:
Data set with outcomes
As you can see, the bottom order reference has no correlation to the orders above it so is not listed as a match to the orders above but 3 are identical and 1 has a slightly different item but MOSTLY matches.

Looking up values from different tables including newly found values

I have several documents which contain statistical data of performance of companies. There are about 60 different excel sheets representing different months and I want to collect data into one big table. Original tables looks something like this, but are bigger:
Each company takes two rows which represent their profit from the sales of the product and cost to manufacture the product.I need both of these numbers.
As I said, there are ~60 these tables and I want to extract information about Product2. I want to put everything into one table where columns would represent months and rows - profit and costs of each company. It could be easily done (I think) with INDEX function as all sheets are named similarly. The problem I faced is that at some periods of time other companies enter the market:
Some of them stay, some of them fail. I would like to collect information on all companies that exist today or ever existed, but newly found companies distort the list (in second picture we see, that company BA is in 4th row, not BB). As row of a company changes from time to time, using INDEX becomes problematic, because in some cases results of different companies get into one row. Adjusting them one by one seems very painful.
Maybe there is some quick and efficient method to solve such problem?
Any help or ideas would be appreciated.
One think you may want to try is linking the Excel spreadsheets as tables in Access. From there you can create a query that ties the tables together. As data changes in the spreadsheets, the query will reflect those changes.

Resources