Transform a worksheet to an array containing only strings - excel

I need to extract the data from an excel worksheet to an array that will be used in an application that uses VBScript as scripting language (Quick Test Professional). We can use the following code for that:
' ws must be an object of type Worksheet
Public Function GetArrayFromWorksheet(byref ws)
GetArrayFromWorksheet = ws.UsedRange.Value
End Function
myArray = GetArrayFromWorksheet(myWorksheet)
MsgBox "The value of cell C2 = " & myArray(2, 3)
All nice and well, but unfortunately the array that gets returned does not only contain the literal text strings, but also primitives of type date, integer, double etc. It happened multiple times that that data got transformed.
[edit] Example: when entering =NOW() in a cell and set the cell formatting to hh:mm makes the displayed value 17:45, the above method retuns a variable of type double and a value like 41194.7400990741
The following solution worked better: I can get the literal text from a cell by using the .Text property, but they only work on one cell and not on a range of cells. I cannot do this at once for an array as I could with the .Value property, so I have to fill the array one cell at a time:
Public Function GetArrayFromWorksheet_2(byref ws)
Dim range, myArr(), row, col
Set range = ws.UsedRange
' build a new array with the row / column count as upperbound
ReDim myArr(range.rows.count, range.columns.count)
For row = 1 to range.rows.count
For col = 1 to range.columns.count
myArr(row, col) = range.cells(row, col).text
Next
Next
GetArrayFromWorksheet_2 = myArr
End Function
But ouch... a nested for loop. And yes, on big worksheets there is a significant performance drop noticable.
Does somebody know a better way to do this?

As we covered in the comments, in order to avoid the issue you will need to loop through the array at some point. However, I am posting this because it may give you a significant speed boost depending on the type of data on your worksheet. With 200 cells half being numeric, this was about 38% faster. With 600 cells with the same ratio the improvement was 41%.
By looping through the array itself, and only retrieving the .Text for values interpreted as doubles (numeric), you can see speed improvement if there is a significant amount of non-double data. This will not check .Text for cells with Text, dates formatted as dates, or blank cells.
Public Function GetArrayFromWorksheet_3(ByRef ws)
Dim range, myArr, row, col
Set range = ws.UsedRange
'Copy the values of the range to temporary array
myArr = range
'Confirm that an array was returned.
'Value will not be an array if the used range is only 1 cells
If IsArray(myArr) Then
For row = 1 To range.Rows.Count
For col = 1 To range.Columns.Count
'Make sure array value is not empty and is numeric
If Not IsEmpty(myArr(row, col)) And _
IsNumeric(myArr(row, col)) Then
'Replace numeric value with a string of the text.
myArr(row, col) = range.Cells(row, col).Text
End If
Next
Next
Else
'Change myArr into an array so you still return an array.
Dim tempArr(1 To 1, 1 To 1)
tempArr(1, 1) = myArr
myArr = tempArr
End If
GetArrayFromWorksheet_3 = myArr
End Function

Copy your worksheet into a new worksheet.
Copy Paste values to remove formulas
Do a text to columns for each column, turning each column into Text
Load your array as you were initially doing
Delete the new worksheet

You cant do this quickly and easily without looping through the worksheet.
If you use the technique above with 2 lines of code it must a variant type array.
I've included a real example from my code that does it in 6 lines because I like to A) work with the worksheet object and B) keep a variable handy with the original last row.
Dim wsKeyword As Worksheet
Set wsKeyword = Sheets("Keywords")
Dim iLastKeywordRow As Long
iLastKeywordRow = wsKeyword.Range("A" & wsKeyword.Rows.Count).End(xlUp).Row
Dim strKeywordsArray As Variant
strKeywordsArray = wsKeyword.Range("A1:N" & iLastKeywordRow).Value
Note your array MUST be a variant to be used this way.
The reason that Variants work like this is that when you create an array of variants, each 'cell' in the array is set to a variant type. Each cell then get's it's variant type set to whatever kind of value is assigned to it. So a variant being assigned a string gets set to variant.string and can now only be used as a string. In your original example it looks like you had time values which were kind of stored as variant.time instead of variant.string.
There are two ways you can approach your original problem
1) loop through and do the process with more control, like the double nested for loop. explained in another answer which gives you complete control
2) store all the data in the array as is and then either re-format it into a second array, or format it as desired text as you use it (both should be faster)

Related

Finding cells that do not match a predefined specific pattern in Excel using VBA

Am trying to make a VBA validation sheet on Excel to find all the cells that do not match a predefined pattern and copy it to another sheet
My pattern is "4 numbers/5 numbers"
Ex: 1234/12345 is accepted
2062/67943 is accepted
372/13333 is not accepted
1234/1234 is not accepted etc...
I tried to put the following in the conditions sheet : <>****/***** and <>????/????? and both did not work (am not sure about the correctness of the approach as am still a beginner in VBA)
For the code itself, this is what I wrote :
Sub GuaranteeElig()
Sheets.Add After:=ActiveSheet
ActiveSheet.Name = SheetName
Sheets("MainSheet").UsedRange.AdvancedFilter Action:= _
xlFilterCopy,
CriteriaRange:=Sheets("ConditionsSheet").Range("B1:B2"), _
CopyToRange:=Range("A1"), Unique:=False
End Sub
Any tips on how I can do it ?
Thanks in advance :)
As long as the values of the numbers are independent and do not matter, and it is only the Length of the numerical strings that count, you could use a for loop on the cells from the "search" sheet (I assume this is the MainSheet as shown in your code?) where your values are contained.
From there, I'll give you a couple ways to place the data in the validation sheet (assuming this is your ConditionsSheet as shown in your code?) where you are trying to pinpoint the values.
(You may need to change part of your approach depending on how you want the incorrect set of values laid out on your secondary sheet - but this should get you started.) I added a TON of comments as you say you're new to VBA - these will help you understand what is being done.
Sub GuaranteeElig()
'Adding this to help with performance:
Application.ScreenUpdating = False
'Assuming you are adding a sheet here to work with your found criteria.
Sheets.Add After:=ActiveSheet
ActiveSheet.Name = "ConditionsSheet"
'Using the naming bits below I am assuming the data you are searching for is on MainSheet
'Get used range (most accurate and efficient way I have found yet, others on S.O.
'may have better ways for this - research it if this does not work for you)
'I have had problems using the Sheets().UsedRange method.
Dim c as Long 'This may not be necessary for you if you are looping through only column "A"
Dim r as Long
'Cells(y,x) method uses numerical values for each row (y) or column (x).
c = Cells(1, Columns.Count).End(xlToLeft).Column 'May not be necessary depending on your needs.
'Using this because you have "UsedRange" in your
'code.
'.End(xlToLeft) signifies we are going to the end of the available cell range of
'Row 1 and then performing a "Ctrl+Left Arrow" to skip all blank cells until we hit
'the first non-blank cell.
r = Cells(Rows.Count, 1).End(xlUp).Row
'.End(xlUp) method is similar - we go to the end of the available cell range for the
'column ("A" in this case), then performing a "Ctrl+Up Arrow" to skip all blank cells.
'If you have a header row which spans across the sheet, this is your best option,
'unless you have 'helper' cells which extend beyond the final column of this header
'row. I am assuming Row 1 is a header in this case - change to your needs.
'For your Rows - choose the column which contains congruent data to the bottom of
'your used range - I will assume column 1 in this case - change to suit your needs.
Dim i as long
Dim j as integer
Dim cel as Range
Dim working_Str() as String 'String Array to use later
Dim string1 as String
Dim string2 as String
Dim badString as Boolean
For i = 2 to r Step 1 'Step down from row 2 to the end of data 1 Row at a time
'Row 1 is header.
set cel=Cells(i, 1) 'Sets the cell to check - assuming data is in Column "A"
'i will change from for loop so 'cel' changes from "A2555"
'to "A2554" to "A2553" etc.
working_Str=Split(cel.Value, "/", -1) 'Splits the value based on "/" inside of cel
string1=working_Str(0) 'what we hope will always be 4 digits
string2=working_Str(1) 'what we hope will always be 5 digits
If Len(string1)<>4 Then 'string1 _(xxxx)_(/)(don't care) does not equal 4 digits in length
badString = True
Elseif Len(string2)<>5 Then ''string1 (don't care)(/)_(xxxxx)_ does not equal 5 digits in length
badString = True
End If
If badString Then 'If either strings above were not correct length, then
'We will copy cell value over to the new sheet "ConditionsSheet"
'Comment the next 2 commands to change from going to one row at a time to
'Matching same row/Cell on the 2nd sheet. Change to suit your needs.
j = j + 1 'Counter to move through the cells as you go, only moving one cell
'at a time as you find incorrect values.
Sheets("ConditionsSheet").Range("A" & j).Value=cel.Value 'sets the value on other sheet
'UNComment the next command to change from going to one row at a time to
'matching same row/cell on the 2nd sheet. Change to suit your needs.
'Sheets("ConditionsSheet").Range("A" & i).Value=cel.Value
End if
badString = False 'resets your boolean so it will not fail next check if strings are correct
Next i
'Returning ScreenUpdating back to True to prevent Excel from suppressing screen updates
Application.ScreenUpdating = True
End Sub
UPDATE
Check the beginning and ending lines I just added into the subroutine. Application.ScreenUpdating will suppress or show the changes as they happen - suppressing them makes it go MUCH quicker. You also do not want to leave this setting disabled, as it will prevent Excel from showing updates as you try to work in the cell (like editing cell values, scrolling etc. . . Learned the hard way. . .)
Also, if you have a lot of records in the given row, you could try putting the data into an array first. There is a great example here at this StackOverflow Article.
Accessing the values of a range across multiple rows takes a LOT of bandwidth, so porting the range into an Array first will make this go much quicker, but it still may take a bit. Additionally, how you access the array information will be a little different, but it'll make sense as you research it a little more.
Alternative To VBA
If you want to try using a formula instead, you can use this - just modify for the range you are looking to search. This will potentially take longer depending on processing speed. I am entering the formula on 'Sheet2' and accessing 'Sheet1'
=IF(COUNTIF(Sheet1!A1,"????/?????"),1,0)
You are spot on with the search pattern you want to use, you just need to use a function which uses wildcard characters within an "if" function. What you do with the "If value is true" vs "If value is false" bits are up to you. COUNTIF will parse wildcards, so if it is able to "count" the cell matching this string combination, it will result in a "True" value for your if statement.
Regex method, this will dump the mismatched value in a worksheet named Result, change the input range and worksheet name accordingly.
In my testing, 72k cells in UsedRange takes about 4seconds~:
Option Explicit
Sub GuaranteeElig()
Const outputSheetName As String = "Result"
Dim testValues As Variant
testValues = ThisWorkbook.Worksheets("MainSheet").UsedRange.Value 'Input Range, change accordingly
Const numPattern As String = "[\d]{4}\/[\d]{5}"
Dim regex As Object
Set regex = CreateObject("VBScript.Regexp")
regex.Pattern = numPattern
Dim i As Long
Dim n As Long
Dim failValues As Collection
Set failValues = New Collection
'Loop through all the values and test if it fits the regex pattern - 4 digits + / + 5 digits
'Add the value to failValues collection if it fails the test.
For i = LBound(testValues, 1) To UBound(testValues, 1)
For n = LBound(testValues, 2) To UBound(testValues, 2)
If Not regex.Test(testValues(i, n)) Then failValues.Add testValues(i, n)
Next n
Next i
Erase testValues
Set regex = Nothing
If failValues.Count <> 0 Then
'If there are mismatched value(s) found
'Tranfer the values to an array for easy output later
Dim outputArr() As String
ReDim outputArr(1 To failValues.Count, 1 To 1) As String
For i = 1 To failValues.Count
outputArr(i, 1) = failValues(i)
Next i
'Test if output worksheet exist
Dim outputWS As Worksheet
On Error Resume Next
Set outputWS = ThisWorkbook.Worksheets(outputSheetName)
On Error GoTo 0
'If output worksheet doesn't exist, create a new sheet else clear the first column for array dump
If outputWS Is Nothing Then
Set outputWS = ThisWorkbook.Worksheets.Add
outputWS.Name = outputSheetName
Else
outputWS.Columns(1).Clear
End If
'Dump the array starting from cell A1
outputWS.Cells(1, 1).Resize(UBound(outputArr, 1)).Value = outputArr
Else
MsgBox "No mismatched value found in range"
End If
Set failValues = Nothing
End Sub
If you do not need duplicate values in the list of mismatched (i.e. unique values) then sound out in the comment.

Copy last 3 char of text in one column to another column if cell is blank in excel spreadsheet with vba

If cell in Range("H1:H104000") is "" Then
Range("H1:H104000) = LEFT(Range("D1:D104000), 3
End If
This is the code I am trying with no success.
You're missing the Loop. You need to write the loop to iterate through the collection of cells, the code can't do it implicitly even if you compare a single range to a collection of ranges.
Also, to compare values use =. The Is operator is only used for Objects.
Dim Cell As Range
For Each Cell In Range("H1:H104000").Cells
If Cell.Value = "" Then
Cell.Value = Right(Cell.Offset(0, -4).Value, 3)
End If
Next
Once you're iterating through the column H. An easy way to refer to "column D in the current row" is by using Offset, which will return a cell, relative to your given starting position. In this case, we just need to move 4 columns to the left so I do .Offset(0,-4)
You might consider
assigning a worksheet related formula evaluation to a variant datafield array data and
writing data back to the referenced column H instead of looping through the given range; this approach lasts only a fraction of a second while the loop through each cell by means of VBA needs several seconds:
Option Explicit ' head of code module
Sub ExampleCall()
Dim t As Double: t = Timer ' start timer
Dim ws As Worksheet
Set ws = Sheet1 ' << change to your project's sheet Code(Name)
Dim data As Variant ' provide for a 1-based 2-dim datafield array
'assign worksheet related evaluation to data
data = ws.Evaluate("=If(IsBlank(H2:H104000),Right(D2:D104000,3),H2:H104000)")
'write to target
ws.Range("H2").Resize(UBound(data), 1).Value = data
Debug.Print Format(Timer - t, "0.00 secs") ' 0.20 secs
End Sub
Further note A worksheet related evaluation via ws.Evaluate(...) guarantees a fully qualified range reference, whereas Evaluate(...) would need more detailed range indications like Sheet1!H2:H104000 or a further textual insertion of ws.Range("H2:H104000").Address(External:=True) to the formula string.

Extract/Split multiple ranges from Range.Area method when it returns original Range object instead of collection?

I overlooked an obvious case scenario while writing the following, and am hoping the experts here can shed light on what path I should go down next to recover this code...
If Application.counta(wurgtbl.DataBodyRange.Columns(7)) > 0 Then 'checks if Note column contains any
wurgtbl.DataBodyRange.AutoFilter Field:=7, Criteria1:="=*" 'uses autofilter to show only rows with a note
Dim noterange As Range 'store note row addresses in range, each row (was supposed to) be treated as separate area
Set noterange = wurgtbl.DataBodyRange.SpecialCells(xlCellTypeVisible)
'oops, this only works if 100% discontinuous (multiple selection) ranges. when any contiguous rows have a note,
'their ranges are treated as a single selection which breaks my assumption of each row being a separate area!
'I had initially spaced my test notes out on different rows and it only broke when testing two contiguous notes
'add range to array: since discontinuous range is certain, cant mainnotes = noterange.Value, must loop instead
Dim mainnotes() As String
ReDim mainnotes(0 To 6, 0 To noterange.Areas.Count - 1)
Dim area As Range 'each group (area) will get its own row in array
Dim acell As Range 'this will be each actual cell in the sub-groups
Dim areaNum As Long
areaNum = 0
For Each area In noterange.Areas
i = 0
For Each acell In area
mainnotes(i, areaNum) = acell.Value
i = i + 1
Next acell
areaNum = areaNum + 1
Next
End If
From Microsoft documentation:
For a single selection, the Areas property returns a collection that contains one object—the original Range object itself.
For a multiple-area selection, the Areas property returns a collection that contains one object for each selected area.
Therein is the problem: I assumed Areas property would always return each row as an Area object, but in hindsight it makes sense that it would join the Areas by default if they are contiguous ranges.
The overall goal is to store any notes the user placed in a column before refreshing the workbook data. Once refreshed, it loops through the array and places the notes back into matching entries. I was using hidden helper worksheets (to store) previously but wanted to use arrays instead for all the obvious benefits. My initial assumption of only non-contiguous note rows was admittedly a terrible assumption in hindsight. While the workbook will absolutely have non-contiguous notes, it could also have them contiguous, and will most likely contain both, making noterange return a combination of areas where some areas are individual rows as I want them, but other areas are contiguous rows that need to be split somehow to work with my design.
So is it possible to take a contiguous range and then split it into component rows?
For example, the range object noterange.Address returns contiguous:
$A$4:$G$6
But I assumed it would return as a collection (before reading the documentation for Range.Areas)
$A$4:$G$4,$A$5:$G$5,$A$6:$G$6
containing N = 3 areas. (just as an example, in practice N is ~ 140K rows that might potentially contain a value in the note column.)
To be clear, the actual VBA error occurs here mainnotes(i, areaNum) = acell.Value because if an Area contains more than one row, the number of cells (or i) then exceeds the bounds of the array definition. Everything works great if there are no contiguous rows, where it then treats each row as a separate area.
In typing this out, hopefully my subconscious begins to solve it. But I feel I've reached my limit of knowledge on this and would appreciate being steered in the right direction!
Huge credit to BigBen for providing the necessary clues in his comments and introducing me to .CountLarge in an Answer he subsequently deleted.
I implemented most of his suggestions (though column# still hard-coded):
Dimensioned the array using standard practice of rows/columns instead of the reverse.
Dimensioned the array using noterange.Cells.CountLarge / (# of columns) to get actual number of rows that will exist, rather than using the number of Areas (because some areas will contain more than one row in this scenario.)
Iterated through each Range.Area, then each Row within that Range.Area, then through each Cell within that Row, assigning the Range.Value to it's position within the array.
Here is the final code which works no matter what the configuration of the Notes are in the note column:
'check if notes exist in note column
If Application.counta(wurgtbl.DataBodyRange.Columns(7)) > 0 Then
'use autofilter to show only rows with a note
wurgtbl.DataBodyRange.AutoFilter Field:=7, Criteria1:="=*"
'store visible addresses in range
Dim noterange As Range
Set noterange = wurgtbl.DataBodyRange.SpecialCells(xlCellTypeVisible)
'add range to array: since discontinuous range is certain, cant mainnotes = noterange.Value, must loop instead
Dim mainnotes() As String
ReDim mainnotes(0 To noterange.Cells.CountLarge / 7 - 1, 0 To 6) 'Credit to BigBen on SO for this (rows/columns)
Dim areaiterate As Long
Dim rowNumb As Long
Dim colNumb as Long
Dim arow As Range
Dim rcell As Range
rowNumb = 0
For areaiterate = 1 To noterange.Areas.Count
For Each arow In noterange.Areas(areaiterate).Rows
colNumb = 0
For Each rcell In arow.Cells
mainnotes(rowNumb, colNumb) = rcell.Value
colNumb = colNumb + 1
Next
rowNumb = rowNumb + 1
Next arow
Next areaiterate
End If
Thank you to BigBen for the suggestions and snippet!

Searching a Range in Excel VBA

I have a string array, and in each slot of the array is a 'range of columns' in a string format e.g "B:J" "k:W" "AC:AG"
The method is being passed a string which is a column name e.g "C"
I need to search see if "C" is inside "B:J".
So basically it needs to check to see if "C" is in "BCDEFGHIJ" which it is and if it is break from a loop
But if I input "A" it should then go to the next slot in the array.
Thank
This function will return the position in the array where the single column is included in the range. It uses the Intersect property to determine if the two ranges intersect.
Function ArrayPos(sColLetter As String, vaRanges As Variant) As Long
Dim i As Long
Dim sh As Worksheet
Dim lReturn As Long
Set sh = Sheet1
For i = LBound(vaRanges) To UBound(vaRanges)
If Not Intersect(sh.Columns(sColLetter), sh.Columns(vaRanges(i))) Is Nothing Then
lReturn = i
Exit For
End If
Next i
ArrayPos = lReturn
End Function
It's used like this, from the Immediate Window for example
?arraypos("M",array("B:J","K:W"))
1
You would want to add some error checking to make sure the arguments can be converted to ranges, but I'll leave that to you. You could do this without Range objects by splitting the range string on colon and comparing the ASCII values of the letters.
You might take a look at
How to convert a column number (eg. 127) into an excel column (eg. AA)
sheet.Application.ActiveCell.Column //for current selected column
int col=sheet.Range("A").Column; // for any choosen column
returns current column number, just apply the same for your range B:J, check if J (10) > C (3) > B ( 2)
Then you can say that it is inside that range.
In case of need the other way around this function with the link I gave you.
ConvertToAlphaColumnReferenceFromInteger(sheet.Application.ActiveCell.Column)
Done for C#/.NET

Excel Lookup return multiple values horizontally while removing duplicates

I would like to do a vertical lookup for a list of lookup values and then have multiple values returned into columns for each lookup value. I actually managed to do this after a long Google search, this is the code:
=INDEX(Data!$H$3:$H$70000, SMALL(IF($B3=Data!$J$3:$J$70000, ROW(Data!$J$3:$J$70000)-MIN(ROW(Data!$J$3:$J$70000))+1, ""), COLUMN(A$2)))
Now, my problem is, as you can see in the formula, my lookup range contains 70,000 rows, which means a lot of return values. But most of these return values are double. This means I have to drag above formula over many columns until all lookup values (roughly 200) return #NUM!.
Is there any possible way, I guess VBA is necessary, to return the values after duplicates have been removed? I'm new at VBA and I am not sure how to go about this. Also it takes forever to calculate having so many cells.
[Edited]
You can do what you want with a revised formula, not sure how efficient it will be with 70,000 rows, though.
Use this formula for the first match
=IFERROR(INDEX(Data!$H3:$H70000,MATCH($B3,Data!$J3:$J70000,0)),"")
Now assuming that formula in in F5 use this formula in G5 confirmed with CTRL+SHIFT+ENTER and copied across
=IFERROR(INDEX(Data!$H3:$H70000,MATCH(1,($B3=Data!$J3:$J70000)*ISNA(MATCH(Data!$H3:$H70000,$F5:F5,0)),0)),"")
changed the bolded part depending on location of formula 1
This will give you a list without repeats.....and when you run out of values you get blanks rather than an error
Not sure if you're still after a VBA answer but this should do the job - takes about 25 seconds to run on my machine - it could probably be accelerated by the guys on this forum:
Sub ReturnValues()
Dim rnSearch As Range, rnLookup As Range, rnTemp As Range Dim varArray
As Variant Dim lnIndex As Long Dim strTemp As String
Set rnSearch = Sheet1.Range("A1:A200") 'Set this to your 200 row value range
Set rnLookup = Sheet2.Range("A1:B70000") 'Set this to your lookup range (assume 2
columns)
varArray = rnLookup
For Each rnTemp In rnSearch
For lnIndex = LBound(varArray, 1) To UBound(varArray, 1)
strTemp = rnTemp.Value
If varArray(lnIndex, 1) = strTemp Then
If WorksheetFunction.CountIf(rnTemp.EntireRow, varArray(lnIndex, 2)) = 0 Then 'Check if value exists already
Sheet1.Cells(rnTemp.Row, rnTemp.EntireRow.Columns.Count).End(xlToLeft).Offset(0, 1).Value =
varArray(lnIndex, 2)
End If
End If
Next Next
End Sub

Resources