Replace weird or special character in a string - excel

I'm trying to convert a Excel file into a SQL query. My problem is that there are special characters in the file I was given. I cant replace them CTRL+H because they dont show at all in the Excel file. When I write my query(either in utf8 or ANSI), they show. With Ultra-Edit, they show as HEX C2 92. With Notepad++ in utf8, they show as PU2. In ANSI, they show as Â’. I suspect it's an apostrophe. This is a french file by the way.
So far I tried to put it in a string and do these operations, but nothing worked.
Dim Line as String
Line = Wb.Worksheets(1).Cells(LineNo, ColNo)
Line = Replace(Line, "Â’", "''")
Line = Replace(Line, "’", "''")
Line = Replace(Line, "Â", "''")
Line = Replace(Line, Chr(194) & Chr(146), "''") 'decimal value of C2 92
Line = Replace(Line, Chr(146) & Chr(194), "''") 'inverted decimal value of C2 92
Thanks!

Instead of trying to eliminate various junk characters, try to focus on keeping only the good ones. Say we know valid characters are upper and lower case letters, numbers, and the underscore. This code will keep only the good ones:
Public Function KeepOnlyGood(s As String) As String
Dim CH As String, L As Long, i As Long
KeepOnlyGood = ""
L = Len(s)
For i = 1 To L
CH = Mid(s, i, 1)
If CH Like "[0-9a-zA-Z]" Or CH = "_" Then
KeepOnlyGood = KeepOnlyGood & CH
End If
Next i
End Function
If you want to replace junk with a space, the code can be modified to do just that.

Related

VBA: Add Carriage Return + Line Feed at the start of Uppercase phrase

I have cells that contain various information.
In these cells, there are multiple Uppercase phrases.
I would like to be able to split the contents of the cell by adding the CHAR(13) + CHAR(10) Carriage return - linefeed combination
to the start of each new Uppercase phrase.
The only consistency is that the multiple Uppercase phrases begin after a period (.) and before open parenthesis "("
Example:
- Add CRLF to start of PERSUADER
- Add CRLF to start of RIVER JEWEL
- Add CRLF to start of TAHITIAN DANCER
- Add CRLF to start of AMBLEVE
- Add CRLF to start of GINA'S HOPE
NOTE:
There are multiple periods (.) in the text.
I have highlighted the text in red for a visual purpose only (normal text/font during import).
I am OK with either formula, UDF or VBA sub.
TEXT
PERSUADER (1) won by a margin first up at Kyneton. Bit of authority about her performance there and with the stable finding form it's easy to see her going right on with that. Ran really well when placed at Caulfield second-up last prep and that rates well against these. RIVER JEWEL (2) has been racing well at big odds. I have to like the form lines that she brings back in class now. Shapes as a key danger. TAHITIAN DANCER (5) will run well. She was okay without a lot of room at Flemington last time. AMBLEVE (13) is winning and can measure up while GINA'S HOPE (11) wasn't too far from River Jewel at Flemington and ties in as a hope off that form line.
I was able to extract with this function - but not able to manipulate the data in the cell
This is my code so far:
Function UpperCaseWords(ByVal S As String) As String
Dim X As Long, Words() As String
Const OkayPunctuation As String = ",."";:'&,-?!"
For X = 1 To Len(OkayPunctuation)
S = Replace(S, Mid(OkayPunctuation, X, 1), " ")
Next
Words = Split(WorksheetFunction.Trim(S))
For X = 0 To UBound(Words)
If Words(X) Like "*[!A-Z]*" Then Words(X) = ""
Next
UpperCaseWords = Trim(Join(Words))
End Function
Your description is not the same as your examples.
None of your examples start after a dot.
Most start after a dot-space except
PERSUADER starts at the start of the string
GINA'S HOPE starts after a space
I incorporated those rules into a regular expression, but, since your upper case words can include punctuation, for brevity I just looked for
- words that excluded lower case letters and digits
- words at least three characters long
If that is not sufficient in your real data, the regex can easily be made more specific:
Option Explicit
Function upperCaseWords(S As String) As String
Dim RE As Object
Set RE = CreateObject("vbscript.regexp")
With RE
.Global = True
.MultiLine = True
.Pattern = "^|\s(\b[^a-z0-9]+\b\s*\()"
upperCaseWords = .Replace(S, vbCrLf & "$1")
End With
End Function
as per your wording
The only consistency is that the multiple Uppercase phrases begin
after a period (.) and before open parenthesis "("
this should do:
Function UpperCaseWords(ByVal s As String) As String
Dim w As Variant
Dim s1 As String
For Each w In Split(s, ". ")
If InStr(w, "(") Then w = Chr(13) + Chr(10) & w
s1 = s1 & w
Next
UpperCaseWords = s1
End Function
Since the OP accepted the formula solution, and here is a formula answer .
Assume data put in A1
In B1, enter formula and copied across until blank :
=TRIM(RIGHT(SUBSTITUTE(TRIM(MID(SUBSTITUTE(SUBSTITUTE(" (. "&$A1," while ",". ")," (",REPT(" ",700)),COLUMN(A1)*700,700))&" ",". ",REPT(" ",300)),300))

How to trim spaces

I have text in Excel like this:
120
124569 abasd 12345
There are sapces both to the left and to the right side.
I copy this from Excel and paste as text. When I check this, it shows like this when I click on button.
Code:
abArray= abArray & "," & gridview1.Rows(i).Cells(2).Text
For k = 3 To 17
bArray= abArray& "," & Val(gridview1.Rows(i).Cells(k).Text)
Next
In abArray this shows as:
0, abasd ,12345,0,0,0,0,0
I want to remove/trim spaces both from left and right.
I have tried abArray.Trim() but this still show spaces.
If you want to remove all the spaces out of the end result consider String.Replace:
Returns a new string in which all occurrences of a specified Unicode character or String in the current string are replaced with another specified Unicode character or String.
Example use:
Dim s As String = "0, abasd ,12345,0,0,0,0,0"
s = s.Replace(" ", "")
This would output:
0,abasd,12345,0,0,0,0,0
It may also be worth using a StringBuilder to join all your values together as this is good practice when looping as you are. At this point you could use String.Trim. This would preserve any spaces that are within your value. In order words it would only remove the spaces from the beginning and the end of the value.
Example use:
Dim sb As New StringBuilder
For k = 0 To 17
sb.Append(String.Format("{0},", gridview1.Rows(i).Cells(k).Text.Trim()))
Next
Dim endResult As String = sb.ToString().TrimEnd(","c)
endResult would output:
0,abasd,12345,0,0,0,0,0
You will have to import System.Text in order to make use of the StringBuilder class.
Use the VB.NET Trim function to remove leading and trailing spaces, change this one line of code:
abArray= abArray& "," & Val(Trim(gridview1.Rows(i).Cells(k).Text))
abArray.Trim() does not work because you did not give the Trim function anything to trim.
Try it like this
abArray = abArray & "," & gridview1.Rows(i).Cells(2).Text.Trim
For k = 3 To 17
abArray= abArray& "," & Val(gridview1.Rows(i).Cells(k).Text.Trim)
Next

Preserving leading 0's in string - number - string conversion

I am working on a macro for a document-tracking sheet at work. I use a button that prompts the user to enter in the document number and I'd like to specify a default number based on the following numbering convention. The first two characters of the document number are the latter two year digits (15 in this case), then there is a "-" followed by a five digit serialization.
My current code looks at the last-entered document and increments those last 5 characters, but chops off any leading zeroes, which I want to keep. This is an extraction of the code to generate this default number (assuming the variable "prevNCRF" is the previous document name found in the doc):
Sub codeChunkTester()
Dim prevNCRF, defNCRFNum As String
Dim NCRFNumAr() As String
'pretend like we found this in the sheet.
prevNCRF = "15-00100"
'split the string into "15" and "00100" and throw those into an array.
NCRFNumAr() = Split(prevNCRF, "-")
'reconstruct the number by reusing the first part and dash, then converting
'the "00100" to a number with Val(), adding 1, then back to a string with CStr().
defNCRFNum = NCRFNumAr(0) & "-" & CStr(Val(NCRFNumAr(1)) + 1)
'message box shows "15-101" rather than "15-00101" as I had hoped.
MsgBox (defNCRFNum)
End Sub
So can anyone help me preserve those zeroes? I suppose I could include a loop that checks the length of the string and adds a leading zero until there are 5 characters, but perhaps there's a better way...
Converting "00100" to a Double using Val turned it into 100, so CStr(100) returns "100" as it should.
You need to format the string to what you want it to look like:
defNCRFNum = NCRFNumAr(0) & "-" & Format(Val(NCRFNumAr(1)) + 1, "00000")
If you need to parameterize the length of the string, you can use the String function to generate the format string:
Const digits As Integer = 5
Dim formatString As String
formatString = String(digits, "0")
defNCRFNum = NCRFNumAr(0) & "-" & Format(Val(NCRFNumAr(1)) + 1, formatString)
Here is that loop solution I mentioned above. If anyone's got something better, I'm all ears!
prevNCRF = "15-00100"
NCRFNumAr() = Split(prevNCRF, "-")
zeroAdder = CStr(Val(NCRFNumAr(1)) + 1)
'loop: everytime the zeroAdder string is not 5 characters long,
'put a zero in front of it.
Do Until Len(zeroAdder) = 5
zeroAdder = "0" & zeroAdder
Loop
defNCRFNum = NCRFNumAr(0) & "-" & zeroAdder
MsgBox (defNCRFNum)
defNCRFNum = NCRFNumAr(0) & "-" & Format(CStr(Val(NCRFNumAr(1)) + 1), String(Len(NCRFNumAr(1)), "0"))

VBA Trim leaving leading white space

I'm trying to compare strings in a macro and the data isn't always entered consistently. The difference comes down to the amount of leading white space (ie " test" vs. "test" vs. " test")
For my macro the three strings in the example should be equivalent. However I can't use Replace, as any spaces in the middle of the string (ex. "test one two three") should be retained. I had thought that was what Trim was supposed to do (as well as removing all trailing spaces). But when I use Trim on the strings, I don't see a difference, and I'm definitely left with white space at the front of the string.
So A) What does Trim really do in VBA? B) Is there a built in function for what I'm trying to do, or will I just need to write a function?
Thanks!
So as Gary's Student aluded to, the character wasn't 32. It was in fact 160. Now me being the simple man I am, white space is white space. So in line with that view I created the following function that will remove ALL Unicode characters that don't actual display to the human eye (i.e. non-special character, non-alphanumeric). That function is below:
Function TrueTrim(v As String) As String
Dim out As String
Dim bad As String
bad = "||127||129||141||143||144||160||173||" 'Characters that don't output something
'the human eye can see based on http://www.gtwiki.org/mwiki/?title=VB_Chr_Values
out = v
'Chop off the first character so long as it's white space
If v <> "" Then
Do While AscW(Left(out, 1)) < 33 Or InStr(1, bad, "||" & AscW(Left(out, 1)) & "||") <> 0 'Left(out, 1) = " " Or Left(out, 1) = Chr(9) Or Left(out, 1) = Chr(160)
out = Right(out, Len(out) - 1)
Loop
'Chop off the last character so long as it's white space
Do While AscW(Right(out, 1)) < 33 Or InStr(1, bad, "||" & AscW(Right(out, 1)) & "||") <> 0 'Right(out, 1) = " " Or Right(out, 1) = Chr(9) Or Right(out, 1) = Chr(160)
out = Left(out, Len(out) - 1)
Loop
End If 'else out = "" and there's no processing to be done
'Capture result for return
TrueTrim = out
End Function
TRIM() will remove all leading spaces
Sub demo()
Dim s As String
s = " test "
s2 = Trim(s)
msg = ""
For i = 1 To Len(s2)
msg = msg & i & vbTab & Mid(s2, i, 1) & vbCrLf
Next i
MsgBox msg
End Sub
It is possible your data has characters that are not visible, but are not spaces either.
Without seeing your code it is hard to know, but you could also use the Application.WorksheetFunction.Clean() method in conjunction with the Trim() method which removes non-printable characters.
MSDN Reference page for WorksheetFunction.Clean()
Why don't you try using the Instr function instead? Something like this
Function Comp2Strings(str1 As String, str2 As String) As Boolean
If InStr(str1, str2) <> 0 Or InStr(str2, str1) <> 0 Then
Comp2Strings = True
Else
Comp2Strings = False
End If
End Function
Basically you are checking if string1 contains string2 or string2 contains string1. This will always work, and you dont have to trim the data.
VBA's Trim function is limited to dealing with spaces. It will remove spaces at the start and end of your string.
In order to deal with things like newlines and tabs, I've always imported the Microsoft VBScript RegEx library and used it to replace whitespace characters.
In your VBA window, go to Tools, References, the find Microsoft VBScript Regular Expressions 5.5. Check it and hit OK.
Then you can create a fairly simple function to trim all white space, not just spaces.
Private Function TrimEx(stringToClean As String)
Dim re As New RegExp
' Matches any whitespace at start of string
re.Pattern = "^\s*"
stringToClean = re.Replace(stringToClean, "")
' Matches any whitespace at end of string
re.Pattern = "\s*$"
stringToClean = re.Replace(stringToClean, "")
TrimEx = stringToClean
End Function
Non-printables divide different lines of a Web page. I replaced them with X, Y and Z respectively.
Debug.Print Trim(Mid("X test ", 2)) ' first place counts as 2 in VBA
Debug.Print Trim(Mid("XY test ", 3)) ' second place counts as 3 in VBA
Debug.Print Trim(Mid("X Y Z test ", 2)) ' more rounds needed :)
Programmers prefer large text as may neatly be chopped with built in tools (inSTR, Mid, Left, and others). Use of text from several children (i.e taking .textContent versus .innerText) may result several non-printables to cope with, yet DOM and REGEX are not for beginners. Addressing sub-elements for inner text precisely (child elements one-by-one !) may help evading non-printable characters.

How to remove newline character from string in Fortran77?

I am specifying a filename to my Fortran77 program from the command line. However, I get a newline character appended to the filename string (obtained using getarg).
How can I remove the new line character?
You can use an alternative to len_trim from https://stackoverflow.com/a/1259426/721644 adapted to find the newline character
integer function findnl(s)
character(len=*) :: s
integer i
findnl = len(s)+1
do i = 1, len(s)
if (s(i:i) .eq. achar(10)) then
findln = i
return
end if
end do
end function
After that, change the rest of the string to spaces
l = findnl(str)
if (l .le. len(str)) str(l:) = " "

Resources