Put space every two characters in text string - text

Given the following string:
6e000000b0040000044250534bb4f6fd02d6dc5bc0790c2fde3166a14146009c8684a4624
Which is a representation of a Byte array, every two characters represent a Byte.
I would like to put a space between each Byte using Sublime Text, something like:
6e 00 00 00 b0 04 00 00 04 42 50
Does Sublime Text help me on that issue?
As a bonus I would like to split into lines and add 0x before each Byte.
I found a similar question but that's not related no Sublime Text, Split character string multiple times every two characters.

Go to Find->Replace... and enable regular expressions.
Replace: (.{2})
With: $1SPACE
Where SPACE is a space.

To split it onto separate lines and add 0x before each byte do this:
Find (.{2})
Replace with: 0x\1\n

Related

Is a text file with 16 leading binary bytes a "normal" file format?

I've encountered files with leading byte values as follows when viewed in a hex editor:
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0f
I've seen this in 2 cases:
*.csproj.CopyComplete files. My Windows .NET Xunit projects contain these files. They consist of only 16 bytes with the sequential byte signature as shown above.
Macintosh text files (as generated from Excel file save in "Text (Macintosh) (*.txt)"). In this case the first 16 bytes follow the signature above, followed by the expected document text.
My understanding is that text files may have a leading binary byte signature if their encoding is not UTF-8.
Can anyone provide more information about this byte signature?
Right you are! My bad. The *.CopyComplete files are 0-length, so the hex editor display misled me.

How to switch gdb byte output grouping

When I try to see what's inside stack space, I use the following command:
x/100x $sp
However, sometimes output is formatted like this, grouping by 4 bytes:
0xbffff0ac: 0x00000000 0xb7fbc000 0xb7fbc000 0xbffff4e8
...
While sometimes I get this:
0xbffff0ac: 00 00 00 00 00 c0 fb b7 00 c0 fb b7 e8 f4 ff bf
But I can't determine how to switch between these formats and how gdb desides which format to use for output. Any suggestions?
sometimes output is formatted like this
However, sometimes output is formatted like this
This is because the x command remembers the last size you used.
If you want particular size with your x, just specify it directly:
(gdb) x/100wx $sp
Documentation.

BASH - Convert textfile containing binary numbers into a binary file

I have a long text file that looks something like this:
00000000
00001110
00010001
00010000
00001110
00000001
00010001
00001110
...and so on...
I'd like to take this data that is represented in ASCII and write it to a binary file. That is, i do NOT want to convert ASCII to binary, but rather take the actual 1s and 0s and put them in a binary file
The purpose of this is so that my EPROM programmer can read the file.
I've heard that ob and hexdump are useful in this case but I never really understood how they worked.
If it's to any help I also have the data in hex form:
00 0E 11 10 0E 01 11 0E
How do I do this using a shell script?
Something like perl -ne 'print pack "B*", $_' input should get you most of the way there.

What is std::wifstream::getline doing to my wchar_t array? It's treated like a byte array after getline returns

I want to read lines of Unicode text (UTF-16 LE, line feed delimited) from a file. I'm using Visual Studio 2012 and targeting a 32-bit console application.
I was not able to find a ReadLine function within WinAPI so I turned to Google. It is clear I am not the first to seek such a function. The most commonly recommended solution involves using std::wifstream.
I wrote code similar to the following:
wchar_t buffer[1024];
std::wifstream input(L"input.txt");
while (input.good())
{
input::getline(buffer, 1024);
// ... do stuff...
}
input.close();
For the sake of explanation, assume that input.txt contains two UTF-16 LE lines which are less than 200 wchar_t chars in length.
Prior to calling getline the first time, Visual Studio correctly identifies that buffer is an array of wchar_t. You can mouse over the variable in the debugger and see that the array is comprised of 16-bit values. However, after the call to getline returns, the debugger now displays buffer as if is a byte array.
After the first call to getline, the contents of buffer are correct (aside from buffer being treated like a byte array). If the first line of input.txt contains the UTF-16 string L"123", this is correctly stored in buffer as (hex) "31 00 32 00 33 00"
My first thought was to reinterpret_cast<wchar_t *>(buffer) which does produce the desired result (buffer is now treated like a wchar_t array) and it contains the values I expect.
However, after the second call to getline, (the second line of input.txt contains the string L"456") buffer contains (hex) "00 34 00 35 00 36 00". Note that this is incorrect (it should be [hex] 34 00 35 00 36 00)
The fact that the byte ordering gets messed up prevents me from using reinterpret_cast as a solution to work around this. More importantly, why is std::wifstream::getline even converting my wchar_t buffer into a char buffer anyways?? I was under the impression that if one wanted to use chars they would use ifstream and if they want to use wchar_t they use wifstream...
I am terrible at making sense of the stl headers, but it almost looks as if wifstream is intentionally converting my wchar_t to a char... why??
I would appreciate any insights and explanations for understanding these problems.
wifstream reads bytes from the file, and converts them to wide chars using codecvt facet installed into the stream's locale. The default facet assumes system-default code page and calls mbstowcs on those bytes.
To treat your file as UTF-16, you need to use codecvt_utf16. Like this:
std::wifstream fin("text.txt", std::ios::binary);
// apply facet
fin.imbue(std::locale(fin.getloc(),
new std::codecvt_utf16<wchar_t, 0x10ffff, std::little_endian>));

Decoding legacy binary format

I am trying to figure out how to decode a "legacy" binary file that is coming from a Windows application (anno ±1990). Specifically I have a trouble to understand what specific encoding is used for the strings that are being stored.
Example: a unicode string "Düsseldorf" is represented as "Du\06sseldorf" or hex "44 75 06 73 73 65 6C 64 6F 72 66" where everything is single-byte except "u + \06" that mysteriously become an u-umlaut.
Is it completely proprietary? Any ideas?
Since this app pre-dates DBCS and Unicode, I suspect that it is proprietary. It looks like they might be using the non-ASCII values below 31 to represent the various accent marks.
\06 may indicate "put an umlaut on the previous character".
Try replacing the string with "Du\05sseldorf" and see if the accent changes over the u. Then try other escaped values between 1 and 31, and I suspect you may be able to come up with a map for these escape characters. Of course, once you have the map, you could easily create a routine to replace all of the strings with proper modern Unicode strings with the accents in place.

Resources