I am going to show a bit of a contrived example, so bear with it.
Our product makes use of CSV files for transitional data, data sent between an Excel user interface, a Java program to manipulate and transfer it to a SQL backend. We have a VBA script that handles all the Excel work in the following order:
Load all 8 CSV files into 8 sheets in a single Excel document. Then iterate over batches of the data doing the following:
'Loop over data:
Dim r As Range
...
r.NumberFormat = "General"
r.Formula = r.Formula
'End loop
This causes the entire sheet to be populated with data from CSV, with number cells to have text appearance and Excel formulas to remain unevaluated. Running r.Formula = r.Formula triggers all the functions to evaluate properly. The only problem is the number formatting.
The CSV files sometimes contain nested CSV. For example, a single cell may contain "1,2,3,15,654" These cells always appear as text. However, there is an edge case, wherein the cells could be pretty print numbers, such as "10,456,345" Excel will convert these numbers into Number cells after evaluating all the functions, and strip out all of the commas. While the 20,000 or so rows in the document are otherwise correct, the 4 or so rows this affects breaks the entire system.
Is there a way to trigger Excel to evaluate the functions from CSV without changing the cell formatting entirely from VBA? Changing formats from CSV to SYLK is not an option, as the Java CSV Generator is handled by a different division.
You could turn formatting to text on all the cells (Cells.NumberFormat="#") then loop over them and use your code on cells which start with '='.
If performance is an issue you should put the worksheet content in an array, work on the array and put it back to the sheet.
If you post more code and sample data people will be able to have a closer look.
EDIT
for example, putting the following values in column A (from A1 to A4) of "Sheet1", with a Text formatting:
13246
13564,4654,4565
654
=A1+A3
and using the following code:
Sub test()
Dim a As Variant
Dim result As Variant
a = Sheets("Sheet1").UsedRange
ReDim result(1 To UBound(a, 1), 1 To UBound(a, 2)) As Variant
For i = 1 To UBound(a, 1)
For j = 1 To UBound(a, 2)
If Left(a(i, j), 1) = "=" Then
result(i, j) = a(i, j)
Else
result(i, j) = "'" & a(i, j)
End If
Next j
Next i
Sheets("Sheet1").Cells(1, 2).Resize(UBound(result, 1), UBound(result, 2)) = result
End Sub
The result is put in column B and is :
13246
13564,4654,4565
654
13900
I have an alternative technique that may be suitable.
I created a worksheet with 26,000 values: strings, numbers, dates, numbers with embedded commas and formulae.
I ran a loop over the above sheet of which the inner code was:
ValueCell = .Cells(RowCrnt, ColCrnt).Formula
If IsNumeric(Replace(ValueCell, ",", "")) Then
.Cells(RowCrnt, ColCrnt).Formula = Replace(ValueCell, ",", "|")
End If
.Cells(RowCrnt, ColCrnt).Formula gives the formula if the cell contains one or the value if it does not. If the value or formula with any commas removed is numeric, I replace any commas with pipes.
With 26,000 cells this took 59 seconds. Does this compare favourable with your extra 45 seconds per 500 rows?
Any values like "1,2,3,15,654" would now be "1|2|3|15|654" but I assume that is not a problem. If you have nested strings such as "ab,cd,ef" they would still contain commas. Perhaps testing for a leading "=" to eliminate formulae and automatically replacing commas in everything else would be a possibility.
The following code took 78 seconds to run against 26,000 cells.
ValueCell = .Cells(RowCrnt, ColCrnt).Formula
If Left(ValueCell, 1) <> "=" Then
.Cells(RowCrnt, ColCrnt).Formula = Replace(ValueCell, ",", "|")
End If
Hope this helps if only to give you new ideas.
I propose an alternative, more straightforward approach:
Have you considered writing to an Excel spreadsheet directly from your Java program, using e.g. Apache POI? Seems to me it would be much more straightforward, less contrived, and less error-prone than this whole formulas-in-a-CSV business.
Related
I have inherited a very large spreadsheet and am trying to migrate it to a database. The table has over 300 columns, many of which reference other columns.
By converting it to a table (ListObject) in Excel, I thought it would be easier to deconstruct the logic... basically turn the formula:
=CJ6-CY6
into
=[#[Sale Price]]-[#[Standard Cost]]
Converting it to a table worked great... unfortunately it didn't change any of the embedded formulas. They still reference the ranges.
I think I may notionally understand why -- if a formula references a value in another row, then it's no longer a primitive calculation. But for formulas that are all on the same row, I'm wondering if there is any way to convert them without manually going into each of these 300+ columns and re-writing them. Some of them are beastly. No joke, this is an example:
=IF(IF(IF(HD6="",0,IF(HD6=24,0,IF(HD6="U",((FI6-(ES6*12))*$I6),($I6*FI6)*HS6)))<0,0,IF(HD6="",0,IF(HD6=24,0,IF(HD6="U",((FI6-(ES6*12))*$I6),($I6*FI6)*HS6))))>GO6,GO6,IF(IF(HD6="",0,IF(HD6=24,0,IF(HD6="U",((FI6-(ES6*12))*$I6),($I6*FI6)*HS6)))<0,0,IF(HD6="",0,IF(HD6=24,0,IF(HD6="U",((FI6-(ES6*12))*$I6),($I6*FI6)*HS6)))))
And it's not the worst one.
If anyone has ideas, I'd welcome them. I'm open to anything. VBA included.
I would never use this to teach computer science, but this is the hack that did the trick. To keep things simple, I transposed header names and the corresponding column into A17:
And then this VBA code successfully transformed each range into the corresponding column property.
Sub FooBomb()
Dim ws As Worksheet
Dim r, rw, translate As Range
Dim col, row As Integer
Dim find, anchored, repl As String
Set ws = ActiveWorkbook.ActiveSheet
Set rw = ws.Rows(6)
Set translate = ws.Range("A17:B363")
For col = 12 To 347
Set r = rw.Cells(1, col)
For row = 363 To 17 Step -1
find = ws.Cells(row, 1).Value2 & "6"
anchored = "$" & find
repl = "[#[" & ws.Cells(row, 2).Value2 & "]]"
r.Formula = VBA.Replace(r.Formula, anchored, repl)
r.Formula = VBA.Replace(r.Formula, find, repl)
Next row
Next col
End Sub
Hard-coded and not scalable, but I'm not looking to repeat this ever again.
-- EDIT --
Word to the wise to help performance, especially with as many columns and formulas are in this spreadsheet.
Set Formula calculation to manual before
Check before the field exists before doing a replacement -- skipping happens more often than not
Program ran in a few seconds (minutes prior) before these changes:
If InStr(r.Formula, anchored) > 0 Then
r.Formula = VBA.Replace(r.Formula, anchored, repl)
End If
If InStr(r.Formula, find) > 0 Then
r.Formula = VBA.Replace(r.Formula, find, repl)
End If
I have some larger files I need to validate the data in. I have most of it automated to input the formulas I need automatically. This helps eliminate errors of copy and paste on large files. The problem is with this latest validation.
One of the latest validations involves counting the number of rows that match 3 columns. The 3 columns are in Sheet 2 and the rows to count are in Sheet 1. Then compare this count with an expected number based on Sheet 2. It is easy enough to do with CountIFs, but there are large files and it can take up to an hour on some of them. I am trying to find something faster.
I am using a smaller file and it is still taking about 1 minute. There are only about 1800 rows.
I have something like this:
In Check1 I am using: =COUNTIFS(Sheet1!A:A,A2,Sheet1!B:B,B2,Sheet1!C:C,C2)
My code puts that formula in the active cell. Is there a better way to do this?
Is there anyway - using VB or anything - to improve the performance.
When the rows start getting into the 10's of thousands it is time to start this and get lunch. And, then hope it is done when I get back to my desk!
Thanks.
You basically have to iterate over all rows for each column, this is expensive. You might be able to split this into two tasks:
Merge your Columns A-C into one value =CONCAT(A2,B2,C2)
Then do only a single countif on this column =COUNTIF(D:D,D2)
That way you get rid of two (time) expensive countifs at the cost of the new concat.
You should narrow the range CountIf acts on from entire columns to the actual used range
And your code could write the result of the formula instead of the formula itself
Like follows:
With Sheet1
Set sheet1Rng = Intersect(.UsedRange, .Range("A:C"))
End With
With Sheet2
For Each cell in Intersect(.UsedRange, .Range("A:A"))
cell.Offset(,3) = WorksheetFunction.CountIfs(sheet1Rng.Columns(1), cell.Value, sheet1Rng.Columns(2), cell.Offset(,1).Value, sheet1Rng.Columns(3),cell.Offset(2).Value)
Next cell
End With
I set up a mock sheet, using a layout similar to what you show, with 10,000 rows, and manually filled it with the COUNTIFS formula you show. Changing a single item in the data triggered a recalculation which took about ten seconds or so to execute.
I then tried the following macro, which executed in well under one second. All of the counting is done within the VBA macro. So this Dictionary method may be an answer to your speed problems.
Before running this, you may want to set the Calculation state to Manual (or do it in the code) if you have COUNTIFS on the worksheet.
Option Explicit
'set reference to Microsoft Scripting Runtime
Sub CountCol123()
Dim DCT As Dictionary
Dim V As Variant
Dim WS As Worksheet, R As Range
Dim I As Long
Dim sKey As String
Set WS = Worksheets("sheet2")
'read the info into an array
With WS
Set R = .Range(.Cells(1, 1), .Cells(.Rows.Count, 1).End(xlUp)).Resize(columnsize:=4)
V = R
End With
'Get count of the matches
Set DCT = New Dictionary
For I = 2 To UBound(V, 1)
sKey = V(I, 1) & "|" & V(I, 2) & "|" & V(I, 3)
If DCT.Exists(sKey) Then
DCT(sKey) = DCT(sKey) + 1
Else
DCT.Add Key:=sKey, Item:=1
End If
Next I
'Get the results and write them out
For I = 2 To UBound(V, 1)
sKey = V(I, 1) & "|" & V(I, 2) & "|" & V(I, 3)
V(I, 4) = DCT(sKey)
Next I
'If you have COUNTIFS on the worksheet when testing this,
' or any other formulas that will be triggered,
' then uncomment the next line
'Application.Calculation = xlCalculationManual
With R
.EntireColumn.Clear
.Value = V
End With
End Sub
The Excel alternative named Cell in Hancom Office 2020 is insanely fast at countifs. Not sure why. On my i7-5775C, Excel 2019 takes 90 seconds for a countifs with two criteria for populating 10,000 rows with the results. Using Cell, the exact same operation completes in less than 28 seconds. I have verified that the results match those generated by Excel 2019.
In excel, I have a text string, which contains numbers and text, that I strip the numbers out of with a formula. However, sometimes it is possible that the text string will be blank to begin with so if this happens I use "" to return a blank instead of a 0. When I link this excel sheet to an access database it will not let me format this column as currency because it is picking up the "" as text along with the stripped out numbers as numbers. Any solutions on how to fix this? Is there another way besides "" to make a cell completely blank and not a zero length string?
This is along the lines of the problem I am having:
http://support.microsoft.com/kb/162539
Here is a quick routine that reduces formulas to their values and strips zero length strings to truly blank cells.
Sub strip_zero_length_string()
Dim c As Long
With Sheets("Sheet1").Cells(1, 1).CurrentRegion
.Cells = .Cells.Value
For c = 1 To .Columns.Count
.Columns(c).TextToColumns Destination:=.Cells(1, c), _
DataType:=xlFixedWidth, FieldInfo:=Array(0, 1)
Next c
End With
End Sub
I'm using .CurrentRegion so there can be no completely blank rows or columns within your data block but that is usually the case when preparing to export to a database.
I would like to find and replace a large list of words in excel; If I have the set of words I'd like to find/search through in Sheet1, ColumnA and then if Sheet2, ColumnA reflects what is to be found and ColumnB contains the word(s) to replace found word with (all comma separated values), how do I go about doing this so that the replacements end up back in Sheet1 ColumnA?
I suspect this requires a macro, which I am not very familiar with.
Many thanks in advance for your time and assistance!
It's not super efficient but it will get the job done. You will have to account for the length of your two lists as well as if you change your sheet names.
Public Sub findsometext()
For i = 1 To 10
' change 10 to however many items are in your replacement list
' start at 2 if your data has headers
Worksheets("Sheet2").Activate
target = Cells(i, 1)
replacer = Cells(i, 2)
Worksheets("Sheet1").Activate
For j = 1 To 10
' change 10 to however many items are in your data list to be processed
' start at 2 if your data has headers
Cells(j, 1) = Replace(Cells(j, 1), target, replacer)
Next j
Next i
End Sub
I'm using VBA to do some further formatting to a generated CSV file that's always in the same format. I have a problem with my For Each Loop. the loop deletes an entire row if there is more than one blank row which can be determined from the first column alone.
Dim rowCount As Integer
For Each cell In Columns("A").Cells
rowCount = rowCount + 1
'
' Delete blank row
'
If cell = "" And cell.Offset(1, 0) = "" Then
Rows(rowCount + 1).EntireRow.Delete
spaceCount = 0
End If
Next
At some point the value in the loop one of the calls does not have a value of "", it's just empty and causes it to crash. To solve this I think that changing the type of the cell to text before that compare would work but I can't figure out how (no intellisense!!!)
So how do you convert a cell type in VBA or how else would I solve the problem?
Thanks.
Use cell.Value instead of the cell.Text as it will evaluate the value of the cell regardless of the formating. Press F1 over .Value and .Text to read more about both.
Be carefull with the statement For Each cell In Columns("A").Cells as you will test every row in the sheet (over a million in Excel 2010) and it could make Excel to crash.
Edit:
Consider also the funcion TRIM. It removes every empty space before and after a string. If in the cell there is a white space " "; it will look like empty for the human eye, but it has a space inside therefore is different than "". If you want to treat it like an empty cell, then try:
If Trim(cell.value) = "" then
As #andy (https://stackoverflow.com/users/1248931/andy-holaday) said in a comment, For Each is definitely the way to go. This even allows for there to be spaces in between lines.
Example code:
Sub ListFirstCol()
Worksheets("Sheet1").Activate
Range("A1").Activate
For Each cell In Application.Intersect(Range("A:A"), Worksheets("Sheet1").UsedRange)
MsgBox (cell)
Next
End Sub
Thanks Andy!