Excel 2010: Merge and Concatenate Rows - excel

I need help.
It seems all macros I stumble upon on this website require me to write out all the rows I'm concatenating and merging. I was wondering if I could do this with a while or if statement. In the meantime,
I need to merge a table of over 21000 names (more than half of which are duplicates) but each name duplicate either has data that the original is missing, or vice versa or sometimes has different data under each column and I need to merge them. There are also like 34 (up to AF) columns.
Thanks,
Eddie
P.S. Apparently I need at least 10 reputation to post images, so message me via my account name if you want a screenshot of what I'm looking for.
P.S.S.
So after consulting with someone who helped through a comment I wrote this java-based pseudocode. Could someone help me translate it to VBA while I start to learn VBA myself? Also could they verify that it theoretically works? I'd like to have this done by the end of the day, which is why I'm asking for translation help, but I'm planning to be able to do this on my own in the near future.
//Create primary keys for comparison, using the last cell as PK for easy line finishing
//Create concatenation comparison keys (conKey) to compare cells and merge
//import new sheet and create cell location to write to new sheet.
Create PK1 = (cell) AF1
Create PK2 = (cell) AF2
Create conKey1 = (cell) A1
Create conKey2 = (cell) A2
Create newSheet = [a new sheet]
Create writeLine = A1
//Initialize while loop. This list lasts until it reaches the last person's name
While(conKey2 <= maxClient) {
//Initialize the if statement. It finds out whether merge is necessary
If(PK1.equals(PK2)) {
//Initialize while loop. This lasts until PK1 no longer equals PK2
while(PK1.equals(PK2)) {
//Initialize if loop. It checks to see if the values are not equal. if so, it concatenates into conKey1
if(!conKey1.equals(conKey2)) {
conKey1 = concatenate(conKey1,", ",conKey2)
}
//Export cell to writeLine of newSheet. Shift everything to the right. verify writeLine equals conKey1
//Clear the doubled cell for safe keeping and to assist us in closing the while loop.
exportTo(newSheet.writeLine, conKey1)
conKey1.shiftsRight
writeLine.coordinates(equals(conKey1))
conKey2 = ""
conKey2.shiftsRight
}
//After this while loop is finished, delete the blank row.
//coordinates of PK1 and PK2 should remain the same at this point
deleteRow(PK2)
//If the merge was not necessary it will skip all of that above and shift each variable down a row.
} else {
PK1.nextRow
PK2.nextRow
conKey1.nextRow
conKey2.nextRow
writeLine.nextRow
}

If SQL is a solution and MAX value of all values is an accepted value... then using an ODBC, a defined table and a self reference may work..
Select Formulas then define name
Define the range of existing data
Select where you want the combined results to display (for example Sheet 2 (A1) instead of sheet 1)
Save the workbook
Select data then from other sources.
select data connection wizard
Select ODBC DSN
Select Excel Files
Find the file saved
Now select the new table (the one defined in step 1)
and complete the wizard.
Go to data menu
Properties
click connection properties button
select definition tab.
modify command text to fit needs
It takes data from sheet 1 such as this:
And provided the wizard is completed (steps 5-11) and then the SQL updated in step 16 you'll get something like
.
This is the command text I used. since it's SQL it can be altered to fit your needs max, concat, whatever
Select firstName, MiddleName, LastName, max(attrib1), max(attrib2), max(attrib3), max(attrib4)
From `yourPath&FileName.xlsx`.`YourDefinedName`
GROUP BY firstName, MiddleName, LastName

Related

How to Exclude a Column(s) in a Structured Reference to Table[#Data] (or similar)

I want to reference all the data in my dynamic table, except for the first two columns. My goal is to return the header of the first column that isn't blank, starting with the third column. I have the formula figured out for everything except the starting with the third column part. Is there an easy way to accomplish this? I'm thinking I might have to just do something like
`=Table[#Data] unless in the range of the first two columns'
Hoping for an easier way though.
EDIT: if my request isn't clear enough, I am looking for a formula that would produce the following exact situation in these circumstances. It must work in a table that can change size without issue, it must ignore the first two columns, it must scan a complete column of data from left to right before moving onto the next column (most of the formulas I've tried would give the result Aug-21 here), and it must return the header in basically any format.
I don't have the time to write up a full answer for this, but you should use the "From Table" button "Get & Transform" section of the data ribbon.
Then, in the query editor window, In the home ribbon, click Manage Reference.To find the position of the first non-blank column will be hard, requiring learning Power-Query language, probably something like clicking the advanced editor and adding steps like
let
Source = #"YourSourceQueryName",
ColumnNames = Table.ColumnNames(Source),
ColumnsToRemove = 2 + List.PositionOf( // PositionOf is zero-based, returning -1 if all are blank
List.Transform(
List.RemoveFirstN( // list of column names except the first two
ColumnNames,
2
),
(columnName) => List.IsEmpty(List.RemoveNulls(Table.Column(myTable, columnName)))
),
false
), // Power query is lazy, so this won't actually look at every column, it will stop when it finds the first column!
ColumnNamesToKeep = List.RemoveFirstN(
ColumnNames,
ColumnsToRemove
),
ReturnTable = if (ColumnsToRemove = 1) then
"All columns were blank!" // PositionOf returned -1!
else
Table.SelectColumns(Source, ColumnsToKeep)
in
ReturnTable
You can now use this in other queries or you can load it to your spreadsheet. Unfortunately power query doesn't refresh live, you have to either explicitly refresh the query or use the "Refresh All" button in the data ribbon.
(I stressed the word "like" because I didn't debug. May contain syntax errors or other issues for you to debug.)

Use Power Query to grab top row of CSV files in a folder. Place in Excel

I would like to grab the first rows of all CSV files in a folder. I have read that power query would probably be best.
I have gone to Excel > Data > Get Data > From Folder > OK. That has brought me to a table of all the csvs in the folder. I would like to grab the first row of all of these files. I do not want to import all rows of the tables because it was way too many rows. It is also too many tables to do one by one. Please tell me what I should do next. Thank you!
First image is where I am, Second image is where I would like to be
The approach below should give you a single table, wherein each column contains a given CSV's first row's values. It's not exactly what you've shown in your second image (namely, there are no blank columns in between each column of values), but it might still be okay for you.
You can parse a CSV with Csv.Document function (which should give you a table).
You can get the first row of the table (from the previous step) using:
Table.First and Record.FieldValues
or Table.PromoteHeaders and Table.ColumnNames
(It would make sense to create a custom function to do above the steps for you and then invoke the function for each CSV. See GetFirstRowOfCsv in code below.)
The function above returns a list (containing the CSV's first row's values). Calling the function for all your CSVs should give you a list of lists, which you can then combine into a single table with Table.FromColumns.
Overall, starting from the Folder.Files call, the code looks like:
let
filesInFolder = Folder.Files("C:\Users\"),
GetFirstRowOfCsv = (someFile as binary) as list =>
let
csv = Csv.Document(someFile, [Delimiter=",", Encoding=65001, QuoteStyle=QuoteStyle.Csv]),
promoted = Table.PromoteHeaders(csv, [PromoteAllScalars=true]),
firstRow = Table.ColumnNames(promoted)
in firstRow,
firstRowExtracted = Table.AddColumn(filesInFolder, "firstRowExtracted", each GetFirstRowOfCsv([Content]), type list),
combined =
let
columns = firstRowExtracted[firstRowExtracted],
headers = List.Transform(firstRowExtracted[Name], each Text.BeforeDelimiter(_, ".csv")),
toTable = Table.FromColumns(columns, headers)
in toTable
in
combined
which gives me:
The null values are because there were more values in the first row of my ActionLinkTemplate.csv than the first rows of the other CSVs.
You will need to change the folder path in the above code to whatever it is on your machine.
In the GUI, you can select the top N row(s) where you choose N. Then you can expand all remaining rows.

Cleaning Excel Table using VBA without impacting the entire table and formatting

Hi I am trying to change to write VBA for excel to clean up data elements that has extra information without impacting the other elements.
I am writing VBA for the first time my table is in the middle of the sheet.
Given Table and Requested Output.
I think your question was not clear in regard to the "steps" that you want to perform on your data (i.e. the exact logic or transformation that needs to be applied).
Based purely on your images and your comment, I make the "steps" to be:
Split any customer IDs in column valueC into multiple rows.
If column valueC does not contain customer IDs (i.e. is blank or contains non-customer ID text), leave it untouched.
My answer uses Power Query instead of VBA. If you are interested in trying it out, in Excel try clicking Data > Get Data > From Other Sources > Blank Query, then click Advanced Editor near the top-left, copy-paste the code below, then click Done.
You might need to change the name of the table in the first line of the code (below), as it was "Table1" for me, but I imagine yours is named something else. Also, the code below is case-sensitive. So if there is no column named exactly valueC, then you will get an error.
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
fxProcessSomeText = (textToProcess as any) =>
let
canBeSplit = Text.StartsWith(textToProcess, "### customer id"),
result = if textToProcess is null then null else if canBeSplit then Text.Split(Text.BetweenDelimiters(textToProcess, "### customer id", " ###"), ",") else {textToProcess}
in
result,
invokeFunction = Table.TransformColumns(Source, {{"valueC", fxProcessSomeText}}),
expanded = Table.ExpandListColumn(invokeFunction, "valueC"),
reindex =
let
removeIndex = Table.RemoveColumns(expanded, {"index"}),
addIndex = Table.AddIndexColumn(removeIndex, "index", 1, 1),
moveIndex = Table.ReorderColumns(addIndex, List.Distinct(List.InsertRange(Table.ColumnNames(addIndex), 0, {"index"})))
in
moveIndex
in
reindex
My output table contains more rows than yours. Also, the value in column valueA, row 11 is 1415 for me (it is 1234 in your request output). Not sure if this is a mistake in your example, or if I'm missing some logic.

Excel Power Query - from web with dynamic worksheet cell value

We have a spreadsheet that gets updated monthly, which queries some data from our server.
The query url looks like this:
http://example.com/?2016-01-31
The returned data is in a json format, like below:
{"CID":"1160","date":"2016-01-31","rate":{"USD":1.22}}
We only need the value of 1.22 from the above and I can get that inserted into the worksheet with no problem.
My questions:
1. How to use a cell value [contain the date] to pass the date parameter [2016-01-31] in the query and displays the result in the cell next to it.
2. There's a long list of dates in a column, can this query be filled down automatically per each date?
3. When I load the query result to the worksheet, it always load in pairs. [taking up two cells, one says "Value", the other contains the value which is "1.22" in my case]. Ideally I would only need "1.22", not the title, can this be removed? [Del won't work, will give you a "Column 1" instead, or you have to hide the entire row which will mess up with the layout].
I know this is a lot to ask but I've tried a lot of search and reading in the last few days and I have to say the M language beats me.
Thanks in advance.
Convert your Web.Contents() request into a function:
let
myFunct = ( param as date ) => let
x = Web.Contents(.... & Date.ToText(date) & ....)
in
x
in
myFunct
Reference your data request function from a new query, include any transformations you need (in this case JSON.Document, table expansions, remove extraneous data. Feel free to delete all the extra data here, including columns that just contain the label 'value'.
(assuming your table of domain values already exists) add a custom column like
=Expand(myFunct( [someparameter] ))
edit: got home and got into my bookmarks. Here is a more detailed reference for what you are looking to do: http://datachix.com/2014/05/22/power-query-functions-some-scenarios/
For a table - Add column where you get data and parse JSON
let
tt=#table(
{"date"},{
{"2017-01-01"},
{"2017-01-02"},
{"2017-01-03"}
}),
add_col = Table.AddColumn(tt, "USD", each Json.Document(Web.Contents("http://example.com/?date="&[date]))[rate][USD])
in
add_col
If you need only one value
Json.Document(Web.Contents("http://example.com/?date="&YOUR_DATE_STRING))[rate][USD]

Excel Power Query -- Select value in column specified in related table -- INDEX+MATCH alternative

Problem
I have two queries, one contains product data (data_query), the other (recode_query) contains product names from within the data_query and assigns them specific id_tags. id_tags are also column names within the data_query.
What I need to achieve and fail at
I need the data_query to look at the id_tag of the specific product name within the data_query, as parsed from the recode_query (this is already working and in place) and input the retrieved value within the specific custom column cell. In Excel, I would be using INDEX/MATCH combo:
{=INDEX(data_query[#Data];; MATCH(data_query[#id_tag]; data_query[#Headers]; 0))}
I have searched near and far, but I probably can't even spot the solution, even if I have come across it, as I am not that deep in the data manipulation and power query myself.
Is this what you're wanting?
let
DataQuery = Table.FromColumns({{1,2,3}, {"Boxed", "Bagged", "Rubberbanded"}}, {"ID","Pkg"}),
RecodeQuery = Table.FromColumns({{"Squirt Gun", "Coffee Maker", "Trenching Tool"}, {1,2,3}}, {"Prod Name", "ID2"}),
Rzlt = Table.Join(DataQuery, "ID", RecodeQuery, "ID2", JoinKind.Inner)
in
Rzlt

Resources