Convert a string to Hex in VB - string

In VB, I'm reading a file line by line using IO.File.Readline() Method. Each line of the file contains a string similar to the following
":1A2C003F4EDCFE3A2F5D66\r\n"
Now for each line I read, All I want to do is
1. Remove the ":" and "\r\n" from the line
2. pair the values as bytes e.g:"1A 2C 00"... (Now the line would be "1A 2C 00 3F 4E DC FE 3A 2F 5D 66")
3. Add all the bytes together and to find the result is zero or not. e.g: (1A+2C+00+3F+4E+DC+FE+3A+2F+5D+66)=0?
How can I proceed?
So far I have done
While endofstream = False
stringReader = fileReader.ReadLine()
If stringReader.StartsWith(":") Then
stringReader = stringReader.Replace(vbCr, "")
stringReader = stringReader.Replace(":", "")
MsgBox(stringReader)

But be careful. Shouldn't you have parts of 4 characters? 1A2C 003F 4EDC ...
All you have to do is to convert hex to decimal numbers and sum them
Dim sum As Integer
For index As Integer = 0 To stringReader .Length-1 Step 2
' we take 2 chars
' we use ToInt32 method http://msdn.microsoft.com/en-us/library/f1cbtwff.aspx
sum += Convert.ToInt32(stringReader.Chars(index) & stringReader.Chars(index+1), 16)
Next
' use sum
In my case, the result is 985

Related

The vbUnicode argument in StrConv is not giving me the desired output

I am trying to use the StrConv function to convert the letter H to its decimal value 72 and then from 72 back again to H.
Public Sub Testing()
Dim Text As String, Back As String, Deci() As Byte
Text = "H"
Deci = StrConv(Text, vbFromUnicode)
Debug.Print Deci(0)
Back = StrConv(Deci(0), vbUnicode)
Debug.Print Back
End Sub
In the above example I am using StrConv(Text, vbFromUnicode) to get the decimal value 72 which works great.
However when I then try to go from 72 to H by using StrConv(Deci(0), vbUnicode) I do not get H but 72 with a strange character inbetween.
Is it not correct to use the argument vbUnicode here? What am I doing wrong?

Converting surrogate pairs to emoji - python3

I found the solution to similar question on the other topic, but unfortunately it's not working for me.
Here is my problem:
I'm making dataframe from the surrogatepairs unicodes which I'd like to search for in another file (example: "\uD83C\uDFF3", "\u26F9", "\uD83C\uDDE6\uD83C\uDDE8"):
with open("unicodes.csv", "rt") as csvfile:
emoticons = pd.read_csv(csvfile, names=["xy"])
emoticons = pd.DataFrame(emoticons)
emoticons = emoticons.astype(str)
Next I'm reading my file with text where some lines contain surrogate pairs unicodes:
for chunk in pd.read_csv(path, names=["xy"], encoding="utf-8", chunksize=chunksize):
spam = pd.DataFrame(chunk)
spam = spam.astype(str)
In this for loop I'm checking if line contains surrogatepairs unicode, and if it's true, then I'd like to print this surrogatepair unicode as emoji - that's why I'm encoding and decoding this "i" value which is str:
(solution from: How to work with surrogate pairs in Python?)
for i in emoticons.xy:
if spam["xy"].str.contains(i, regex=False).any():
print(i.encode('utf-16', 'surrogatepass').decode('utf-16'))
#printing:
#\uD83C\uDFF3
#\u26F9
#\uD83C\uDDE6\uD83C\uDDE8
So, when I start the program it still prints surrogatepairs unicode as str, not as emoji, but when I input surrogatepair unicode into print function by myself, it works:
print("\uD83C\uDFF3".encode("utf-16", "surrogatepass").decode("utf-16", "surrogatepass"))
#printing:
#🏳
What am I doing wrong? I tried to make string from this i and another solutions, but it still doesn't work.
EDIT:
hexdump -C file.csv
00004b70 5c 75 44 38 33 44 5c 75 44 45 45 39 0a 5c 75 44 |\uD83D\uDEE9.\uD|
00004b80 38 33 44 5c 75 44 45 45 42 0a 5c 75 44 38 33 44 |83D\uDEEB.\uD83D|
00004b90 5c 75 44 45 45 43 0a 5c 75 44 38 33 44 5c 75 44 |\uDEEC.\uD83D\uD|
00004ba0 43 42 41 0a 5c 75 44 38 33 44 5c 75 44 45 38 31 |CBA.\uD83D\uDE81|
EDIT2:
So I've found something kind of working, but still need an improvement:
https://stackoverflow.com/a/54918256/4789281
Text from my another file which I want to convert looks file:
"O żółtku zapomniałaś \uD83D\uDE02"
"Piękny outfit \uD83D\uDE0D"
When I'm doing this what was recommended in another topic:
print(codecs.decode(i,encoding='unicode_escape',errors='surrogateescape').encode('utf-16', 'surrogatepass').decode('utf-16'))
I've got something like this:
O żóÅtku zapomniaÅaÅ 😂
PiÄkny outfit 😍
So my surrogatepairs are replaced, but my polish characters are replaced with something strange.
You are along the right track.
WHat you are trying to do breaks because what you have in your "str" after you read the file are not "surrogate pairs" - instead, they are backslash-encoded codepoints for your surrogate pairs, encoded as text.
That is: the sequence "5c 75 44 38 33 44" in your file are the ACTUAL ascii characters "\uD83D" (6 characters in total), not the surrogate codepoint 0xD83D (which, when properly decoded, along with the next surrogate "\uDE0D" will be a single character in your string).
The part I said you are on the right track is: you really have to encode this into a bytes-sequence, and then decode it back. What is wrong is that you have to encode it using "latin1" (just to try to preserve any other non-ascii char you have on the string- it may break if you have codepoints not representable in latin1), and decode it back with the special "unicode escape" codec. or a charmap encoding, that will preserve your other characters on the string, and then decode it back, using the same codec. At that point, both surrogate characters will be text as two characters in a Python string:
In [16]: "\\uD83D\\uDE0D".encode("latin1").decode("unicode escape", "surrogatepass")
Out[16]: '\ud83d\ude0d'
The bad news is - that is not a very valid STR - the surrogate characters should not exist by themselves in the internal representation - instead, they should be combined in to the desired final character. So, trying to print that out will break:
In [19]: a = "\\uD83D\\uDE0D".encode("utf-8").decode("unicode escape")
In [20]: print(a)
---------------------------------------------------------------------------
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-20-bca0e2660b9f> in <module>
----> 1 print(a)
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-1: surrogates not allowed
Using "surrogatepass" error policy here will be of no help - you will get an unprintable bytesequence.
Therefore, for a second time, this have to be "encoded" and "decoded" - this time, the characters you have in text are actual "surrogate" codepoints that would be valid utf-16 to be decoded. So, the path now is to encode this sequence, brute-forcing these chars with "surrogatepass", and decode then back from utf-16 - which will finally understand the surrogate pair as a single character:
In [30]: a = "\\uD83D\\uDE0D".encode("unicode escape").decode("unicode escape")
In [31]: a
Out[31]: '\ud83d\ude0d'
In [32]: b = a.encode("utf-16", "surrogatepass").decode("utf-16")
In [33]: print(b)
😍
Summarising:
You read your file as utf-8 text, to read possible other non-ascii characters,
encode the result as "unicode escape" and decode it back - this will convert the
extended human readable "\uXXXX" sequences in your file as the surrogate codepoints. Then you convert it back to utf-16, telling Python to ignore
surrogates and copy then "as is", and decode back from utf-16:
def decode_surrogate_ascii(txt):
interm = txt.encode("latin1").decode("unicode escape")
return interm.encode("utf-16", "surrogatepass").decode("utf-16")
All you have to do is to apply the above function in the columns of interest on your data frame:
emoticons = emoticons.apply(pd.Series(lambda row: (decode_surrogate_ascii(item) if isinstance(item, str) else item for item in row ))

Combining hex bytes into a string in VBA, without losing the leading Zero?

In VBA, I am trying to select 4 bytes out of a hex array, and convert them to decimal. However, if the byte is smaller than F, the first digit, which is 0, is lost in the compilation of the string, and the conversion is thus wrong.
I have tried various solution on this forum, without success.
The string I need to convert looks like this variable (called measHex):
AA 00 00 22 00 03 00 00 1F 07 00 BC 07
I am trying to convert bytes 7 to 10, to look like this:
00001F07
but what I get is 1F7
The following code is my function.
Private Function ToHexStringMeas(ByRef measHex As Variant) As String
ReDim bytes(LBound(measHex) + 6 To LBound(measHex) + 9)
Dim i As Long
For i = LBound(measHex) + 6 To LBound(measHex) + 9
bytes(i) = Hex(measHex(i))
Next
ToHexStringMeas = Strings.Join(bytes, "")
End Function
Any help would be highly appreciated.
After some more research, the solution was to add some code as follows:
Dim i As Long
For i = LBound(measHex) + 6 To LBound(measHex) + 9
bytes(i) = Hex(measHex(i))
Dim l As Integer
l = 2
h(i) = Replace(Space(l - Len(Hex(measHex(i)))), " ", "0") & Hex(measHex(i))
Next
ToHexStringMeas = Strings.Join(h, "")
You can also accomplish what I think is your goal using string functions.
VBA
Function ToHexStringShoot(ByRef measHex As String, Optional first As Long = 7, Optional last As Long = 10) As String
ToHexStringShoot = Replace(Mid(measHex, (first - 1) * 3, last * 3 - (first - 1) * 3), " ", "")
End Function
Worksheet Formula using the same logic
=SUBSTITUTE(MID(A1,6*3,10*3-6*3)," ","")

Adding space every 2 chars in VB.NET

Good day, experts. I'm getting 16 chars string value from uart, like this "0000000000001110", then i want to add space every 2 chars: "00 00 00 00 00 00 11 10". What i was thinking it's making for-next loop count every 2 chars in a "data", then add a space between it. But i'm really have no ideas how to accomplish it. That's what i tried so far:
Dim i As Long
Dim data As String = "0000000000001110"
For i = 0 To Len(data) Step 2 ' counting every 2 chars
data = Mid(data, i + 1, 2) ' assign 2 chars to data
' stucked here
Next i
Any input appreciated, thanks.
You can use a StringBuilder and a backwards loop:
Dim data As String = "0000000000001110"
Dim builder As New StringBuilder(data)
Dim startIndex = builder.Length - (builder.Length Mod 2)
For i As int32 = startIndex to 2 Step -2
builder.Insert(i, " "c)
Next i
data = builder.ToString()
The conditional operator(in VB If) using the Mod is used to find the start index(loooking from the end of the string). Because it will be different if the string has an even/odd number of characters. I use the backwards loop to prevent the problem that inserting characters changes the size of the string/StringBuilder, hence causing wrong indexes in the for-loop.
Here is an extension method that encapsulates the complexity and improves reusability:
Imports System.Runtime.CompilerServices
Imports System.Text
Module StringExtensions
<Extension()>
Public Function InsertEveryNthChar(str As String, inserString As String, nthChar As Int32) As String
If string.IsNullOrEmpty(str) then Return str
Dim builder as New StringBuilder(str)
Dim startIndex = builder.Length - (builder.Length Mod nthChar)
For i As int32 = startIndex to nthChar Step -nthChar
builder.Insert(i, inserString)
Next i
return builder.ToString()
End Function
End Module
Usage:
Dim data = "00000000000011101"
data = data.InsertEveryNthChar("[foo]", 3) ' 000[foo]000[foo]000[foo]000[foo]111[foo]01
I know you have already accepted an answer, however you could also do the required task like this.Make sure to import System.Text so you can use the StringBuilderOutput : 00 00 00 00 00 00 11 10
Dim data As String = "0000000000001110"
Dim sb As New StringBuilder()
Dim addSpace As Boolean = False
For Each c As Char In data
If addSpaceThen
sb.Append(c + " ")
addSpace = False
Else
sb.Append(c)
addSpace = True
End If
Next
sb.Length = sb.Length - 1 ''Remove last space on string
Console.WriteLine(sb.ToString())
If you NuGet "System.Interactive" you gt a very neat Buffer operator for IEnumerable(Of T). Then this works:
Dim data As String = "0000000000001110"
Dim result = String.Join(" ", data.Buffer(2).Select(Function (x) New String(x.ToArray())))
If you want to use straight LINQ then this works:
Dim result = String.Join(" ", data.Select(Function (x, n) New With { x, n }).GroupBy(Function (x) x.n \ 2, Function (x) x.x).Select(Function (y) New String(y.ToArray())))
'This is the easiest and a layman level solution
Dim i As Long
Dim A, b, C As String
A = Mid(mac, i + 1, 2) 'assign the first 2 value to the variable
C = A 'transfer it to main section
For i = 0 To Len(mac) - 4 Step 2 ' counting every 2 chars and looping should be 4 characters less.
b = Mid(mac, i + 3, 2) ' assign 2 chars to data
b = "-" + b'put the dashes in front of every other character
C = C + b
Next i

Perl CGI: get request not working unless URL is hard-coded

I'm trying to identify the differences between the two strings during a URL get request (using LWP::Simple).
I have a URL, say http://www.example.com?param1=x&param2=y&param3=z
I make sure any blank inputs are also taken care of, but that is irrelevant at this point, because I am making sure all parameters are exactly the same.
Also, the hard-coded URL is copied and pasted from the generated URL.
This URL works when I do the following:
my $url = "http://www.example.com?param1=x&param2=y&param3=z";
my $content = get($url);
Yet, when I build the URL from parameters provided by a user, the get request does not work (Error: 500 from the site).
I have compared the two URLs by printing them out, and see zero differences. I've tried removing all of the potential invisible characters.
The output for the generated code and static string, assuming user input is the same as the static string (which is what I'm making sure to do):
http://www.example.com?param1=x&param2=y&param3=z
http://www.example.com?param1=x&param2=y&param3=z
I'm assuming printing the outputs removes characters I can't see.
I've also followed a solution at http://www.perlmonks.org/?node_id=882590 and it is pointing out differences, but I don't know why, considering I see none at all.
Has anyone run into this problem before? Please let me know if I need to clarify anything or need to provide additional information.
EDIT: Problem and Solution
So, after using mob's suggestion to identify differences, I found there was a null character in the generated URL that was not getting printed in the output. That is: http://www.example.com?param1=x&param2=y&param3=z was actually http://www.example.com?param1=x&param2=y&param3=\000z.
I used a simple regex: $url =~ s/\000//g; to remove that (and any other) null value.
Use a data serialization function to inspect your strings for hidden characters.
$url1 = "http://www.example.com?param1=x&param2=y";
$url2 = "http://www.example.com?param1=x&param2=y\0";
$url3 = "http://www.example.com?param1=x&param2=y\n";
use JSON;
print JSON->new->pretty(1)->encode( [$url1,$url2,$url3] );
# Result:
# [
# "http://www.example.com?param1=x&param2=y",
# "http://www.example.com?param1=x&param2=y\u0000",
# "http://www.example.com?param1=x&param2=y\n"
# ]
use Data::Dumper;
$Data::Dumper::Useqq = 1;
print Dumper($url1,$url2,$url3);
# Result:
# $VAR1 = "http://www.example.com?param1=x&param2=y";
# $VAR2 = "http://www.example.com?param1=x&param2=y\0";
# $VAR3 = "http://www.example.com?param1=x&param2=y\n";
Clearly the string you have built is different from the hard-coded one. If you write code like this
my $ss = 'http://www.example.com?param1=x&param2=y&param3=z';
print join(' ', map " $_", $ss =~ /./g), "\n";
print join(' ', map sprintf('%02X', ord), $ss =~ /./g), "\n";
then you will be able to see the hex value of each character in the string, and you can compare the two of them more accurately. For instance, the code above outputs
h t t p : / / w w w . e x a m p l e . c o m ? p a r a m 1 = x & p a r a m 2 = y & p a r a m 3 = z
68 74 74 70 3A 2F 2F 77 77 77 2E 65 78 61 6D 70 6C 65 2E 63 6F 6D 3F 70 61 72 61 6D 31 3D 78 26 70 61 72 61 6D 32 3D 79 26 70 61 72 61 6D 33 3D 7A

Resources