i am working with sockets (some DNS stuff) and i can't figure out how from this:
a = 'www'
make
b = b'\x77\x77\x77'
I know/think i need to:
1) convert each to hex value with hex(ord(char))
2) format it from '0x77' to '\x77'
3) convert it to bytes with bytes(a,'utf-8')
I tried many combinations, but i always failed at 2) and generally i think my steps are too complicated. Is there some simple solution to this?
I'll attempt an answer.
What you wish to do is work with the binary messages being exchanged with a DNS server.
You are wondering how to convert strings and integers into into the binary form.
Have a look at the struct module to pack and unpack binary messages.
You will also need to convert IP addresses from there binary form into strings.
Have a look at socket.inet_ntoa and socket.inet_aton.
Barry
Related
I have some complicated array of structures and I want to write it into CSV file. So I need "variable to string" conversion.
Beckhoff as always doesn't care about documentation and their INT_TO_STRING function doesn't work (UNEXPECTED INT_TO_STRING TOKEN when I try to write INT_TO_STRING(20) ).
Moreover their string functions works correctly with only 255 chars.
So I need one of following:
working functions or function blocks or library which allows to convert different types to string
something like sprintf without limitations
some functions to convert between number and ascii char (0x55 is letter 'U') in both directions.
btw. Beckhoff gives us some weird CSV example code, but without data conversion (array has already strings in cells).
Thanks in advance!
I tried to use:
INT_TO_STRING() BYTE_TO_STRING() WHATEVER_TO_STRING()
but it is not working. And there is no clue how many arguments it should have or anything. There is no documentation in Beckhoff information System.
If dart and kotlin code communicate through binary(array of 8-bit integers (0-255)), then how does String end or even int end is represented in, or determined from binary sequence of bytes, is there some special charCode or something else.
Also is there a way to save a List<int> as-it-is to a file.txt, so it can be read directly to List<int> instead of serialization.
Please guide this new dev,
Thanking you...
Since Flutter handles the MethodChannel, in both the Dart side and Kotlin side, it can be allowed to have its own internal protocol to communicate between the native layer and Flutter. In theory they could use JSON but they are probably using something else based on the supported types and also making it more efficient: https://docs.flutter.dev/development/platform-integration/platform-channels?tab=type-mappings-kotlin-tab#codec
For saving a List<int> to a file, you need to determine how you want to encode the content in the file and then how you want to decode it. It can be as simply as just saving each number separated by comma or encode the list into JSON.
If your list of numbers can be represented with Uint8List or Int8List, then you can basically just save the numbers as raw bytes to the file and then read them again.
But List<int> is a list of 64-bit numbers and you should therefore determine how you want to encode this exactly.
For writing to files, there are several ways to do it but the specific way depends on what you exactly want. So without any more details I can just suggest you check the API: https://api.dart.dev/stable/2.17.3/dart-io/File-class.html
I have a string with commas in between. How should I convert this string into an integer. I tried using
x?number
but that gives me the following error
Exceptionfreemarker.core.NonNumericalException
e.g. The string is "453,000". I need to convert this to 453000.
Is there any other way of doing this?
There's no function built in for parsing numbers with national formats. ?number only deals with computer format, because when numbers are transferred as strings (which should be already rare), that's what used to be used. So in principle x should be already a number when it gets to FreeMarker, or at least it should use computer format. If that's not possible, you will need a custom function (or method) for that.
I have file which contains some data (text copied and pasted from the "What You Will Learn" portion of this PDF). Firstly, I have converted the contents in the file to bits successfully. However, when I try to convert it back to the original format, some of the characters are not correctly converted, as shown below:
Cisco has
developed the Cisco Open Network Environment (ONE)
architecture as a multifaceted approach to network
programmability delivered across three pillars:
??)É¥ Í?н??ÁÁ±¥?Ñ¥½¸ÁɽÉ?µµ¥¹?¥¹Ñ?É???Ì?¡A%̤?)?áÁ½Í??¥É?ѱ佸Íݥѡ?Ì?¹É½ÕÑ?ÉÌѼ?Õµ?¹Ð?)?á¥ÍÑ¥¹?=Á?¹±½ÜÍÁ?¥?¥?Ñ¥½¹Ì* ¤&öGV7F?öâ×&VG?÷VäfÆ÷r6öçG&öÆÆW"æB÷VäfÆ÷r ¦vVçG0¨?HÝZ]HÙ??ÙXÝÈÈ[]?\??\X[Ý?\?^\Ë?\X[?Ù\?XÙ\Ë[??\ÛÝ\?ÙHÜ?Ú\Ý?][Û?Ø\X?[]Y\È[?H?]HÙ[
As you can see here some characters are converted successfully, others are not.
My code is below:
file = open("test.txt",'r')
myfile = ''.join(map(str,file))
l = []
for i in myfile:
asc11 = ord(i)
b = "{0:08b}".format(asc11)
l.extend(int(y) for y in b)
string_bin = ''.join(map(str,l))
mydata = ''.join(chr(int(string_bin[i:i+8], 2)) for i in range(0,len(string_bin), 8))
print(mydata)
What wrong with my code? What I need to change to make it work properly?
What's Going On?
You are running into an encoding issue because some characters in the PDF are non-ASCII characters. For example, the bullet points are U+2022 which require 3 bytes of storage.
When Python reads from your file, it doesn't know what encoding you used to write that data. Thus it reads bytes from the file and uses a character encoding to translate them into strs which are stored using Python's own internal unicode format. (This differs from Python 2 where open() returned raw bytes stored in a str which you could then manually decoded to unicode.)
Thus, in Python 3, open() accepts a named encoding parameter. For example open("test.txt",'r', encoding='ascii'). Because you don't specify the encoding when you call open(), you end up using your system's default encoding. For instance, on my laptop, the default encoding is CP1252 (LATIN-1). Yours may differ.
Whatever encoding Python uses to interpret your file, it then internally uses it's own unicode format to store your string. This means that your string may internally use mutli-byte characters even if the original encoding did not. For example, my laptop uses CP1252 to interpret U+2022 as • which is internally stored as U+00e2, U+20AC and U+00A2 -- € is stored using a multi-byte character even though it was just one byte in the original file.
Let's assume you computer is sane and uses UTF-8 by default (this explanation is similar for many multi-byte characters). When you reach a bullet point, it is stored as U+2022. When you call ord('\u2022') the result is 8226. When you then call "{0:08b}".format(8226) this returns "10000000100010". That's a 14 character string. Your parsing code assumes all of the ordinals will generate 8 character strings. Because of this, the "binary" output becomes misaligned. This means that when you then parse the binary string in 8-character segments, it gets thrown off and starts interpreting things as control characters and all sorts of foreign language characters.
If you call open(..., encoding='ascii'), Python will actually throw an exception because it reads non-valid ASCII characters.
Possible Solutions
I'm not sure why exactly you are converting the input string into the representation that you are using. It's not binary, as your question title would suggest. Rather, you've converted the data into a textual representation of it's binary encoding.
Technically speaking, when you store encoded text to a file, it's stored using a binary representation. Python, and any text editor, has to decode those bytes into it's internal character representation before it can display them as text. Thus, calling open("test.txt", "r", encoding="utf-8") reads the binary data out of your text file and converts it into Python's internal unicode format. Similarly, calling myfile.encode('utf-8') will return the UTF-8 encoded bytes which can then be written to a file, network socket, etc.
If, however, you do need to use a format similar to what you are currently using, first, I still recommend you specify an encoding when you call open() (I recommend UTF-8). Then you can consider these options:
Detect and omit non-ASCII characters. They will have an ordinal >= 128.
Mimic UTF-16 or UTF-32 and output multi-byte output for all characters. For example, use "{0:032b}".format(asc11) and then parse the result in 32-character chunks. It's memory and storage inefficient, but it will preserve multi-byte characters.
Regardless, I highly recommend reading the Dive Into Python 3 chapter about strings.
This is a Python 101 type question, but it had me baffled for a while when I tried to use a package that seemed to convert my string input into bytes.
As you will see below I found the answer for myself, but I felt it was worth recording here because of the time it took me to unearth what was going on. It seems to be generic to Python 3, so I have not referred to the original package I was playing with; it does not seem to be an error (just that the particular package had a .tostring() method that was clearly not producing what I understood as a string...)
My test program goes like this:
import mangler # spoof package
stringThing = """
<Doc>
<Greeting>Hello World</Greeting>
<Greeting>你好</Greeting>
</Doc>
"""
# print out the input
print('This is the string input:')
print(stringThing)
# now make the string into bytes
bytesThing = mangler.tostring(stringThing) # pseudo-code again
# now print it out
print('\nThis is the bytes output:')
print(bytesThing)
The output from this code gives this:
This is the string input:
<Doc>
<Greeting>Hello World</Greeting>
<Greeting>你好</Greeting>
</Doc>
This is the bytes output:
b'\n<Doc>\n <Greeting>Hello World</Greeting>\n <Greeting>\xe4\xbd\xa0\xe5\xa5\xbd</Greeting>\n</Doc>\n'
So, there is a need to be able to convert between bytes and strings, to avoid ending up with non-ascii characters being turned into gobbledegook.
The 'mangler' in the above code sample was doing the equivalent of this:
bytesThing = stringThing.encode(encoding='UTF-8')
There are other ways to write this (notably using bytes(stringThing, encoding='UTF-8'), but the above syntax makes it obvious what is going on, and also what to do to recover the string:
newStringThing = bytesThing.decode(encoding='UTF-8')
When we do this, the original string is recovered.
Note, using str(bytesThing) just transcribes all the gobbledegook without converting it back into Unicode, unless you specifically request UTF-8, viz., str(bytesThing, encoding='UTF-8'). No error is reported if the encoding is not specified.
In python3, there is a bytes() method that is in the same format as encode().
str1 = b'hello world'
str2 = bytes("hello world", encoding="UTF-8")
print(str1 == str2) # Returns True
I didn't read anything about this in the docs, but perhaps I wasn't looking in the right place. This way you can explicitly turn strings into byte streams and have it more readable than using encode and decode, and without having to prefex b in front of quotes.
This is a Python 101 type question,
It's a simple question but one where the answer is not so simple.
In python3, a "bytes" object represents a sequence of bytes, a "string" object represents a sequence of unicode code points.
To convert between from "bytes" to "string" and from "string" back to "bytes" you use the bytes.decode and string.encode functions. These functions take two parameters, an encoding and an error handling policy.
Sadly there are an awful lot of cases where sequences of bytes are used to represent text, but it is not necessarily well-defined what encoding is being used. Take for example filenames on unix-like systems, as far as the kernel is concerned they are a sequence of bytes with a handful of special values, on most modern distros most filenames will be UTF-8 but there is no gaurantee that all filenames will be.
If you want to write robust software then you need to think carefully about those parameters. You need to think carefully about what encoding the bytes are supposed to be in and how you will handle the case where they turn out not to be a valid sequence of bytes for the encoding you thought they should be in. Python defaults to UTF-8 and erroring out on any byte sequence that is not valid UTF-8.
print(bytesThing)
Python uses "repr" as a fallback conversion to string. repr attempts to produce python code that will recreate the object. In the case of a bytes object this means among other things escaping bytes outside the printable ascii range.
TRY THIS:
StringVariable=ByteVariable.decode('UTF-8','ignore')
TO TEST TYPE:
print(type(StringVariable))
Here 'StringVariable' represented as a string. 'ByteVariable' represent as Byte. Its not relevent to question Variables..