VBA or function to connect 2x CSV files into 1x XLSX - excel

Greets,
I got this scenario with 3x different files;
1) one CSV file has Column A (-first row) with abbreviations that needs to be copied on XLSX file (also in Column A)
+
2) another CSV has many rows and column where is explanation for the first case (abbrevations), and I have to look for explanation inside that big file (so vlookup I used).
=
3) xlsx file is separate that has to combine both CSV into one, where on Column A I will have abbreviations and on Column B explanations of the certain terms.
I tried with functions and simply defining ranges:
Column A1 ='C:\Users\MirzaV\Desktop\1\[0528-matrix.csv]0528-matrix'!A3
Column B1 =VLOOKUP(A1;'C:\Users\MirzaV\Desktop\1\[variantendb.csv]variantendb'!$C:$D;2;0)
So seems nothing hard or else, but problem is I am having XXX of these CSV files and one main CSV file with explanations (it is stated as "varianten") , that are gonna be updated periodically - all of the files.
Instead to open three files at the same time just to refresh my functions, is it a bit quicker way with a code or other functions?? And I would like to have it in XLSX file.
I tried to record a macro but it didnt work good, I was thinking I can use it for rest of the files but always gives an error.
Application.Left = 2318.5
Application.Top = 89.5
Windows("0528-matrix1.xlsx").Activate
Range("A1").Select
ActiveCell.FormulaR1C1 = "='0528-matrix.csv'!R[1]C"
Range("A1").Select
Selection.AutoFill Destination:=Range("A1:A500"), Type:=xlFillDefault
Range("A1:A500").Select
ActiveWindow.Close
ActiveWindow.ScrollRow = 1
Application.Left = 2161
Application.Top = 1
Application.Width = 720
Application.Height = 780
Windows("0528-matrix1.xlsx").Activate
Range("B1").Select
ActiveCell.FormulaR1C1 = "=VLOOKUP(RC[-1],variantendb.csv!C3:C4,2,0)"
Range("B1").Select
Selection.AutoFill Destination:=Range("B1:B500")
Range("B1:B500").Select
Application.Left = 1896.25
Application.Top = 32.5
Application.Width = 864
Application.Height = 493.5
Windows("variantendb.xlsx").Activate
ActiveWindow.Close
Application.Left = 1669
Application.Top = 1
ChDir "C:\Users\MirzaV\Desktop\1"

Since you're using Office 365 we can use the Get & Transform feature to create links to your CSV files. As long as you maintain the same filenames on the CSVs, this will enable Excel to automatically update the data.
We'll complete this data merge in 3 stages:
Link the reference CSV (the second file you have listed) to a table
Link to the data CSV (the first file) to a table
Write an Index/Match function to pull the descriptions.
Stage 1: Linking the reference file to a table
In a new Excel workbook, click on the Data tab, then click on the New Query dropdown in the Get & Transform section. Mouse over "From File >" and select "From CSV"
Navigate to CSV 2 and click Import
On the next window that pops up, click "Load"
Your lookup data will now load into a table on a new sheet. Now let's clean up the references here:
Click on the Formulas tab, then Click on Name Manager
Select your new table (it will be named the same as your file)
Change the name to "Reference" and click Ok.
Go to your table and change the column names from "Column 1" and "Column 2" to "Abbr" and "Desc"
And that's it for stage 1! Now that we have the reference table set up and linked, we can move on to loading the data table we want to find the descriptions for.
Stage 2: Linking the data file to a table
We're going to link to the data file in the same way we did the reference file. Go to Data > Get & Transform > New Query > From File > From CSV. Select your file and click Import, then click Load.
On the new table, rename Column 1 to "Code" (I would use Abbr, but Code will help keep the next step looking clear).
Add another column to this table. The simplest way is to just click in B1, type "Desc" (or whatever name of your choosing) and hit Enter.
Stage 3: The Index function that makes the magic
On your new data table with the blank description column, click in the first data cell.
Type in the function =INDEX(Reference[Desc],MATCH([#Code],Reference[Abbr],0)) and press Enter.
Watch the magic happen as Excel copies our formula to every cell in that table column!
By setting up our CSV files as external connections in this manner, we're able to create a dynamic table that will always update with the CSVs.
By using Index/Match, we're able to get away from the constraints of VLookup (data in left-most field, sorted alphabetically), and move to a system that allows us to look for the value we need from any field in any order.
Breaking it down, Index returns the value of the cell provided in the target row and column of the specified array or table. Because we specified the target array as a single column of data, we can use Index([array], [row number]), or using the code above Index(Reference[Desc], [row number]). What really makes this work is the use of Match. Match returns the row number in an array of a target value, so we use MATCH([#Code],Reference[Abbr],0). This returns the row number to Index, which then pulls the data from the desired cell.
There are some additional steps we can do using the Power Query Editor to ensure the column headers always stay the same, but that's a tutorial for a different day. Hope this helps!

Related

Excel sheet.range().Value holding wrong data

I am trying to take data from an Excel sheet named range and import it to an Access database. However, one range, with text data, is input by a dropdown list. For some reason, it somehow gets turned in data Variant(1 to 1).
Watch window:
+
xlx.Range("Talk_codes").Value(7)
Variant(1 to 1)
Form_Logs.GetRangeNames
Select Case xlx.Range("Talk_codes").Value
Case "Show ID"
Logs.Show_Promo = Logs.Show_Promo + DateAdd("s", xlx.Range("Talk_Time").Value, Logs.Show_Promo)
I know the first cell, at least in the cells, is "Show_ID".
Any suggestions to force it to have the actual data?

Execute macro commands in csv files or create a macro that transforms the file into csv

I have a file with a table of info and information about the entry. In the table I have columns, one column in the table is the social security number, the ID number in my country is 9 digits. Often the ID number begins with a 0 - digit number. The Excel always omits the number 0, I wrote a macro code that adds 0. But I end up converting the file to CSV, after converting it again omits the number 0. I want to know how I execute my macro code in CSV (which disappeared 0 - I will be happy to set up the macro on csv files or at least execute a macro that converts the file into a .csv file and saves the 0 that disappear. Here is my command that works on xslm:
Sub Add_Zeros()
Selection.NumberFormat = "#"
For Each CL In Selection.Cells
If CL <> "" Then CL.Value = Application.Rept("0", (9 - Len(CL))) & CL
Next
End Sub
The Csv will have the 0. You can check that by opening it in notepad. If you double click on the CSV and open it in Excel then the 0s will disappear.
To preserve the 0s, you will have to import the data directly in Excel. Follow the steps mentioned below.
Click on Data Tab | From Text
Select the Csv from the file slection dialog box
In text Import Wizard (STEP 1), select 'Delimited' and hit next.
In text Import Wizard (STEP 2), select 'Comma' and hit next.
In text Import Wizard (STEP 3), select all columns and click on 'text' in the 'column data format'
Click finish
Select the cell where you want to import the data and click 'ok'

Making a vector out of excel columns using python

everyone...
I just started on python a couple of days ago because I require to handle some excel data in order to automatically update the data of certain cells from one file into another.
However, I'm kind of stuck since I have barely programmed before, and it's my first time using python as well, but my job required me to find a solution and I'm trying to make it work even though it's not my field of expertise.
I used the "xlrd library", imported my file and managed to print the columns I'm needing... However, I can't find a way to put those columns into a matrix in order to handle the data like this:
Matrix =[DataColumnA DataColumnG DataColumnH] in the size [nrows x 3]
As for now, I have 3 different outputs for the 3 different columns I need, but I'm trying to join them together into one big matrix.
So far my code looks like this:
import xlrd
workbook = xlrd.open_workbook("190219_serviciosWRAmanualV5.xls");
worksheet = workbook.sheet_by_name("ServiciosDWDM");
workbook2 = xlrd.open_workbook("Potencia2.xlsx");
worksheet2 = workbook2.sheet_by_name("Hoja1");
filas = worksheet.nrows
filas2 = worksheet2.nrows
columnas = worksheet.ncols
for row in range (2, filas):
Equipo_A = worksheet.cell(row,12).value
Client_A = worksheet.cell(row,13).value
Line_A = worksheet.cell(row, 14).value
print (Equipo_A, Line_A, Client_A)
So I have only gotten, as mentioned above, the data in the columns which is what I'm printing which you can see.
What I'm trying to do, or the main thing I need to do is to read the cell of the first row in Column A and look for it in the other excel file... if the names match, I would have to validate that for the same row (in file 1) the data in both the ColumnG and ColumnH is the same as the data in the second file.
If they match I would have to update Column J in the first file with the data from the second file.
My other approach is to retrieve the value of the cell in ColumnA and look for it in the column A of the second file, then I would make an if conditional to see if ColumnsG and H are equal to Column C of 2nd file and so on...
The thing here is, I have no idea how to pin point the position of the cell and extract the data to make the conditional for this second approach.
I'm not sure if by making that matrix my approach is okay or if the second way is better, so any suggestion would be absolutely appreciated.
Thank you in advance!

Power Query: Adding characters to a set limit across several columns/rows

Very new to PQ, and I'm pretty sure it can do what I need in this situation, but I need help figuring out how to get there.
I have a timesheet report with 20 columns covering 50 rows that will need to be formatted to a word doc for uploading into a separate system. The original data in the cells range from 0 to any negative 2 digit number (ex: "-20"), but they need to be formatted to a seven-character set ending in ".00".
Examples:
0 will need to become "0000.00"
-4 will need to become "-004.00"
-25 will need to become "-025.00"
I think I should be able to use the text.insert function, but I'm not familiar enough with M Language to get it to do what I want it to do.
Any solutions/suggestions?
Here's my previous answer revisited...set up to use a function. You can just invoke the function once for each column you want to reformat. You'll just pass the name of the column you want to reformat to the function as you invoke the function each time.
Create a new blank query:
Open the new query in Advanced Editor and highlight everything in it:
Paste this over the highlighted text in the Advanced Editor:
let
FormatIt = (SourceColumn) =>
let
Base = Number.Round(SourceColumn,2)*.01,
Source = try Text.Start(Text.Range(
if Base < 7 then Text.From(Base) & "001" else
Text.From(Base),0,7),2) & Text.Range(Text.Range(
if Base < 7 then Text.From(Base) & "001" else
Text.From(Base),0,7),3,2) & "." & Text.End(Text.Range(
if Base < 7 then Text.From(Base) & "001" else
Text.From(Base),0,7),2)
otherwise "0000.00"
in
Source
in
FormatIt
...and click Done.
You'll see a new function has been created and listed in the Queries list on the left side of the screen.
Then go to your query with the columns you want to reformat (click on the name of your query that has the numbers you want to change in it, on the left side of the screen) and...
Click Invoke Custom Function
And fill out the pop-up like this:
- You can make up a different New column name than Custom.1.
- Function Query is the name of your query you are calling (the one you just created when you pasted the code)...for me, it's called Query1.
- Source Column is the column with the numbers you want to format.
...and click OK.
You can invoke this function once for each column. It will create a new formatted column for each.
You can use this formula = Text.PadStart(Text.From([Column1]),4,"0")&".00") in PQ to add new column that looks similar to your needs.
Here's an admittedly "busy" formula to do it:
= Table.AddColumn(#"Changed Type", "Custom", each Text.Start(Text.Range(if Number.Round([Column1],2)*.01 < 7 then Text.From(Number.Round([Column1],2)*.01) & "001" else Text.From(Number.Round([Column1],2)*.01),0,7),2) & Text.Range(Text.Range(if Number.Round([Column1],2)*.01 < 7 then Text.From(Number.Round([Column1],2)*.01) & "001" else Text.From(Number.Round([Column1],2)*.01),0,7),3,2) & "." & Text.End(Text.Range(if Number.Round([Column1],2)*.01 < 7 then Text.From(Number.Round([Column1],2)*.01) & "001" else Text.From(Number.Round([Column1],2)*.01),0,7),2))
It assumes your numbers that you want formatted are in Column1 to start. It creates a new column...Custom...with the formatted result.
To try it out, start with Column1 already populated and loaded into Power Query; then click the Add Column tab and then the Custom Column button, and populate the pop-up window like this:
...and click OK.
With more time, the repetitive parts could be made with variables to shorten this up a bit. This could also be turned into a function, given some time. But I don't have the time right now, so I figured I'd give you at least "something."

How can I add a 1 to the most recent, repeated row in Excel?

I have a dataset with 60+ thousand rows in excel and about 20 columns. The "ID column" sometimes repeats itself and I want to add a column that will return 1 only in the row that is the most recent only IF it repeats itself.
Here is the example. I have…
ID DATE ColumnX
AS1 Jan-2013 DATA
AS2 Feb-2013 DATA
AS3 Jan-2013 DATA
AS4 Dec-2013 DATA
AS2 Dec-2013 DATA
I want…
ID DATE ColumnX New Column
AS1 Jan-2013 DATA 1
AS2 Feb-2013 DATA 0
AS3 Jan-2013 DATA 1
AS4 Dec-2013 DATA 1
AS2 Dec-2013 DATA 1
I've been trying with a combination of sort and nested if's, but it depends on my data being always in the same order (so that it looks up the ID in the previous row).
Bonus points: consider my dataset if fairly large for excel, so the most efficient code that won't eat up processor would be appreciated!
An approach you could use is to point MSQuery at your table and use SQL to apply the business rules. On the positive side, this runs very quickly (a couple seconds in my tests against 64k rows). A huge minus is the query engine does not seem to support Excel tables exceeding 64k rows, but there might be ways to work around this. Regardless, I offer the solution in case it gives you some ideas.
To set up first give your data set a named range. I called it MYTABLE. Save. Next select a cell to the right of your table in row 1, and click through Data | From other sources | from Microsoft Query. Choose Excel Files* | OK, browse for your file. The Query Wiz should open, showing MYTABLE available, add all the columns. Click Cancel (really), and click Yes, you want to continue editing.
The MSQuery interface should open, click the SQL button and replace the code with the following. You will need to edit some specifics, such as the file path. (Also, note I used different column names. This was sheer paranoia on my part. The Jet engine is very finicky and I wanted to rule out conflicts with reserved words as I built this.)
SELECT
MYTABLE.ID_X,
MYTABLE.DATE_X,
MYTABLE.COLUMN_X,
IIF(MAXDATES.ID_x IS NULL,0,1) * IIF(DUPTABLE.ID_X IS NULL,0,1) AS NEW_DATA
FROM ((`C:\Users\andy3h\Desktop\SOTEST1.xlsx`.MYTABLE MYTABLE
LEFT OUTER JOIN (
SELECT MYTABLE1.ID_X, MAX(MYTABLE1.DATE_X) AS MAXDATE
FROM `C:\Users\andy3h\Desktop\SOTEST1.xlsx`.MYTABLE MYTABLE1
GROUP BY MYTABLE1.ID_X
) AS MAXDATES
ON MYTABLE.ID_X = MAXDATES.ID_X
AND MYTABLE.DATE_X = MAXDATES.MAXDATE)
LEFT OUTER JOIN (
SELECT MYTABLE2.ID_X
FROM `C:\Users\andy3h\Desktop\SOTEST1.xlsx`.MYTABLE MYTABLE2
GROUP BY MYTABLE2.ID_X
HAVING COUNT(1) > 1
) AS DUPTABLE
ON MYTABLE.ID_X = DUPTABLE.ID_X)
With the code in place MSQuery will complain the query can't be represented graphically. It's OK. The query will execute -- it might take longer than expected to run at this stage. I'm not sure why, but it should run much faster on subsequent refreshes. Once results return, File | Return data to Excel. Accept the defaults on the Import Data dialog.
That's the technique. To refresh the query against new data simply Data | Refresh. If you need to tweak the query you can get back to it though Excel via Data | Connections | Properties | Definition tab.
The code I provided returns your original data plus the NEW_DATA column, which has value 1 if the ID is duplicated and the date is the maximum date for that ID, otherwise 0. This code will not sort out ties if an ID's maximum date is on several rows. All such rows will be tagged 1.
Edit: The code is easily modified to ignore the duplication logic and show most recent row for all IDs. Simply change the last bit of the SELECT clause to read
IIF(MAXDATES.ID_x IS NULL,0,1) AS NEW_DATA
In that case, you could also remove the final LEFT JOIN with alias DUPTABLE.
Sort by ID, then by DATE (ascending). Define entries in new column to be 1 if previous row has the same ID and next row has a different ID or is empty (for last row), 0 otherwise.
It could be done in VBA. I'd be interested to know if this is possible just using formulas, I had to do something similar once before.
Sub Macro1()
Dim rowCount As Long
Sheets("Sheet1").Activate
rowCount = Cells(Rows.Count, 1).End(xlUp).Row
Columns("A:D").Select
Selection.AutoFilter
Range("D2:D" & rowCount).Select
Selection.ClearContents
Columns("A:D").Select
ActiveWorkbook.Worksheets("Sheet1").AutoFilter.Sort.SortFields.Add Key:=Range _
("B1:B" & rowCount), SortOn:=xlSortOnValues
ActiveWorkbook.Worksheets("Sheet1").AutoFilter.Sort.SortFields.Add Key:=Range _
("A1:A" & rowCount), SortOn:=xlSortOnValues
ActiveWorkbook.Worksheets("Sheet1").AutoFilter.Sort.Apply
Dim counter As Integer
For counter = 2 To rowCount
Cells(counter, 4) = 1
If Cells(counter, 1) = Cells(counter + 1, 1) Then Cells(counter, 4) = 0
Next counter
End Sub
So you activate the sheet and get the count of rows.
Then select and autofilter the results, and clear out Column D which has the 0s or 1s. Then filter on the values mbroshi suggested that you say you're already using. Then execute a loop for each record, changing the value to 1, but then back to 0 if the value ahead of it has the same ID.
Depending on your processor I dont think this would take more than a minute or two to run. If you do find something using formulas I would be interested to see it!

Resources