Restructure duplicate csv indexes conditionally - excel

I have a csv file that has husband and wife under the same section if there's a husband or spouse for this specific column, and multiple date of birth columns for when there are. Is there a way I can restructure this so that There's an individual column for each spouse, copying name to phone, (grabbing the individual first names if possible), and creating a new row with the individual data from date of birth to PD while removing the extra columns from the first?
I have no experience in excel or csv files to know how to do this, but I was hoping I could check with you guys and see if you're able to format this properly:
Here's a link to some dummy data: https://drive.google.com/file/d/1OGRY0JpMvhVFFEGzdSJwBtYqKXxN-uNO/view?usp=sharing
I would love to know if there's any way to do this, and if so how I can do it for multiple pages of thousands of clients

This data clean-up can easily be done with Power Query (Get & Transform) in the Data ribbon. There different approaches to arrive at your desired result, but I would use the following:
Load the table with all columns into Power Query and call it QHusband
remove the columns with the wife data
remove rows with blanks in column DOB: that leaves just husband data
split the first name column by the & delimiter
remove the column with the wife name
save the query as a connection, do not load into the grid.
Now you have a query that returns just the husband data. Next,
Load the table with all columns again into Power Query as QWife
remove the columns with the husband data
remove rows with blanks in column DOB; that leaves just the wife data
split the first name column by the & delimiter
remove the column with the husband name
save the query as a connection only, do not load into the grid.
Now you have a query that returns just the wife data.
Create a new query that appends the two queries. Load that combined query into the spreadsheet as a table.
When you get new data, you can simply refresh the query

Related

Excel data tables: Multiple outputs with only one input column

I am trying to create a data table with multiple outputs across periods, but for the same scenarios.
Is it possible to create that without inserting an extra column between each output column to deliver input for the data table (i.e. input column = index 50-110).
Is this in any way possible? See picture of what I would usually mark to create the data table (this does only cover one period/output though). But if I were to make the scenario for FY23, then I would need to insert a column between FY22 and FY23 where I copy the index 50-110 again. I would like to not have to do that.

How to invert a merge query in power query

I have a single column table of customer account numbers and a main table containing 400,000 records pulling from an access database. I want to remove all records from the table where the customer account number can be found in the single column table.
The merge query capability in power query allows me to return only the records where there is a match on the customer list (in addition to a variety of other variations on this theme) but I would like to know whether there is a way to invert this so that I return all records where the customer number does not appear in this list.
I have achieved this already by using the List.Contains function and adding a custom column to identify the rows to exclude and then filtering them out, but I think this is severely impacting the performance of my workbook. Refreshing the table that initially has 400,000 rows prior to this series of transformations takes a very long time, and all queries that depend on this table then also take a long time to refresh.
Thank you
If you do a Left Anti Join of your table with a single column, this will give you your table filtered to only have the rows which do not match to the single column.

Import 2 or more columns from Excel into 1 column Access

I have an Excel report that is the output of an opinion tool. In this Excel I have all the responses that the people submit for my quizz, in the questions that are multiple choise answer the tool output those questions like one question per option and only the selected option is the column with data in the Excel. For example, if my quizz is like this:
Q1 Your name:
R1 =
Q2 Options
opt 1
opt 2
opt 3
The Excel report will appear like this
Excel Report
So I want that when I import the Excel to Access it can automatically merge those columns to have only to headers in the Access table: "Q1 Your name:" & "Q2 Options"
Also, for context of the job, I will make some other editions to that imported table and then copy to another Access table (table 2) so even if there is a way to merge those Access columns before copy to the another one I will accept it like, I don't know, insert from this column and if empty insert from that column, I'm not good at doing queries sorry. Only the table 2 will have information, the first table would be like a temporary one so I will daily delete information from that one and preserve the important data en the table 2
Thanks for the support
Simplest way I can see to achieve your goal is to concatenate the three columns; since by the sound of it you will only ever have a value in one column per question per record. You could do this in Excel prior to the import, you could use a calculated field on the table or you could build a query that concatenates all your questions. My suggestion would be Excel since using the =CONCATENATE() function is probably going to be easiest option for you.
If you do import your raw data into Access you will need to assign unique column names, ie Q2_Op1, Q2_Op2, Q2_Op3.
The query syntax to concatenate these fields one would be something like:
SELECT Q1_Name, [Q2_Op1] & [Q2_Op2] & [Q3_Op3] AS Q2_Options
FROM Table1;
Where Q1_Name, Q2_Op1, Q2_Op2, Q3_Op3 are the column names on the imported data table.

Missing rows when merging

I am working with Excel 2010, Power Query, and PowerPivot.
I have a query named Database that consists of 60+ merged tables containing a total of 2m+ rows. I also have a separate query that consists of two columns PrimaryKey3 and Members (a count of members per month). The entries in PrimaryKey3 are unique, consisting of ID-MMM-YY.
Both queries have PrimaryKey3 in common, however in Database there can be multiple rows with the same PrimaryKey3.
In order to match a member amount to each row in Database, I tried a Left Outer join. There were no errors, but when I try to upload to PowerPivot it says there are only 169K rows. I then tried Full Outer join and Inner Join, and received an error "could not convert value to number," coming from a column already formatted as a text in Database. This column contains numbers and numbers proceeding with a letter: 1234, A234. Every non-blank row has a PrimaryKey3. Why is it trying to reformat my columns/ how do I get around that?
Should I be using a different type of join, or is there another way besides merging to do this?
Hope this makes sense, thank you for any help in advance!
I uploaded both queries to PowerPivot, and created a relationship through PrimaryKey3. I then created a new column in Database with =Related(Enrollment[Members]).

Excel Pivot Table Count of Sub-Rows

I have a Pivot Table structure as follows:
ROWS:
+-State
+---Customer
+-----Brand
Columns:
+-Cost
I would like to have another column that contains the number of Customers in each state. The issue being that my data contains every order that the customers had placed, so when I try to get the count of Customers it is returning every instance of said customer in the column. Another issue is that my data is 40,000 rows, so I want to try and avoid having to edit the raw data.
I can easily do this with brute force, but I was wondering if there is anyway to do this with standard pivot tables and no add-ons. The pivot table already does a nice job of consolidating the unique values for customers, now I just need a count of those unique values.

Resources