Question mark not showing up in string - string

I copied a piece of text from a website. This piece of text contains a space. I later try to manipulate this string in C#, but my code doesn't recognize the space.
I started digging deeper, so I tried the following Powershell command to convert the string to hexadecimal to see what's going on:
"2+1 53" | Format-Hex
(see screenshot here: Powershell code)
As you can see in the image, it shows that the result is:
32 2B 31 3F 35 33
which converted back to normal text is
2+1?53
Notice that the question mark wasn't present in my original string. What is going on? How can a question mark be present but not show up? Or where did it come from if it was not present in my original string?
Update:
Perhaps I should stress that I need to figure out what that "space" character is, so that I can later get rid of it using "replace" method.

Most likely, there's another character in that text, which is not a space. You can check it by putting the text into a file and then using
Get-Content C:\temp\file.txt | Format-Hex
To reproduce, I used that text:
Get-Service –Name BITS
# ^ it's not a normal dash, check it at http://asciivalue.com/index.php
This happens if I paste into console window:
31.88 ms | C:\> "Get-Service -Name BITS" | Format-Hex
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 47 65 74 2D 53 65 72 76 69 63 65 20 3F 4E 61 6D Get-Service ?Nam
00000010 65 20 42 49 54 53 e BITS
And that when I get it from the script:
60.02 ms | C:\> Get-Content C:\temp\script.ps1 | Format-Hex
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 47 65 74 2D 53 65 72 76 69 63 65 20 3F 3F 3F 4E Get-Service ???N
00000010 61 6D 65 20 42 49 54 53 ame BITS
As you can see, that character is being converted to question mark (3F in hex output) or triple question mark (3F 3F 3F) while getting content from file.

It depends on the version of PowerShell. I have 5.1, there the default encoding of Format-Hex is ASCII, which will replace every non-ASCII character (like your space) with a question mark.
Specify a different encoding to prevent non-ASCII characters from being replaced. Example:
PS> "⇆" | Format-Hex -Encoding Unicode
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 C6 21 Æ!
Here, the code point is U+21C6. Google that and you will find out what it represents.

A regular space is 0x20. There are many unicode spaces. http://jkorpela.fi/chars/spaces.html How did you make the string? Here's an example of EN SPACE (nut), U+2002. You should be able to copy and paste this yourself. Hmm, in powershell 7 for windows, the special space doesn't paste.
[int[]][char[]]'foo bar' | % tostring x
66
6f
6f
2002
62
61
72
'foo bar' | Format-Hex -Encoding BigEndianUnicode
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 00 66 00 6F 00 6F 20 02 00 62 00 61 00 72 .f.o.o ..b.a.r
Format-hex will translate it to ascii by default in powershell 5. Unicode characters will be replaced by 3F or ?.
'foo bar' | format-hex -Encoding ascii
Label: String (System.String) <72F012A4>
Offset Bytes Ascii
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
------ ----------------------------------------------- -----
0000000000000 66 6F 6F 3F 62 61 72 foo?bar
UTF8 will encode in one to three bytes depending on how high the unicode number is (how many bits to encode). Three bytes in this case (U+2002), always starting with 'E' and then the first number, in this case '2'.
' ' | format-hex -Encoding utf8
# "`u{2002}" | format-hex # powershell 7
Label: String (System.String) <03ACFE1C>
Offset Bytes Ascii
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
------ ----------------------------------------------- -----
0000000000000 E2 80 82 �

I found the answer to my question here: String Comparison, .NET and non breaking space
The "space" that was present in the string was the "non-breaking space" and getting rid of it in C# was as easy as:
string cellText = "String with non breaking spaces.";
cellText = Regex.Replace(cellText, #"\u00A0", " ");

Related

Can't seem to be able to grab non-string output from run

I can't think of other way to run a command line that outputs binary files, so I'll have to go with this.
Let's add a binary file to a git repository
mkdir test
cd test
git init .
wget https://upload.wikimedia.org/wikipedia/commons/thumb/8/85/Camelia.svg/320px-Camelia.svg.png
git add 320px-Camelia.svg.png
git commit -am "Added Camelia"
Grab the commit hash that that outputs, we'll use it as <grabbed hash> below.
Now, run this:
say (run "git", "show", "<grabbed hash>:Camelia.svg.png", :out).out
This will return a Malformed UTF-8 message. Fair enough, it's not binary. However, I have tried to capture that exception with try and there's no way. I've tried to separate the run from the out, I still get an exception that can't be captured. Any idea?
Pass the :bin option to run in order to have it do binary I/O instead. Example using curl:
$ raku -e 'say (run "curl", "--no-progress-meter", "https://raku.org/camelia-logo.png", :out, :bin).out.slurp'
Buf[uint8]:0x<89 50 4E 47 0D 0A 1A 0A 00 00 00 0D 49 48 44 52 00 00 01 05 00 00 00 F3 08 06 00 00 00 8F 2A 03 21 00 00 00 01 73 52 47 42 00 AE CE 1C E9 00 00 00 09 70 48 59 73 00 00 0F 61 00 00 0F 61 01 A8 3F A7 69 00 00 00 07 74 49 4D 45 07 D9 07 11 03 07 3A 28 6B FA 81 00 00 00 1A 74 45 58 74 43 6F 6D 6D 65 6E ...>

Update row with Buffer into bytea type column, using Postgres and NodeJS

I'm trying to store a Buffer into a bytea type column. I'm using a Postgres database and I have successfully connected to this database with node-postgres. I am able to update any other field, but I just can't find out what the syntax is to properly store a Buffer.
At the moment, there are already images in that database, that were written with a different system and language. I am not able to to re-use this system to achieve what we need.
The output of those existing images is also a Buffer:
<Buffer 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 00 00 04 38 00 00 04 38 08 06 00 00 00 ec 10 6c 8f 00 00 00 04 73 42 49 54 08 08 08 08 7c 08 64 88 00 ... 13315 more bytes>
And I have prepared the an image that should overwrite this value:
<Buffer 75 ab 5a 8a 66 a0 7b fa 67 81 b6 ac 7b ae 22 54 13 91 c3 42 86 82 80 00 00 03 52 52 11 14 80 00 00 2a 00 00 00 2a 02 00 00 00 00 14 48 3e 9a 00 00 00 ... 3153 more bytes>.
All good, so far.
I now need to use the proper SQL UPDATE statement, but I have not been able to figure that out. I have found some answers suggesting converting it using .toString('hex') and prepending it with \\x, but this does not result in the same value format.
My update statement now looks something like this (where imageData is the second Buffer example above):
await pool.query(
`UPDATE image
SET data = '${imageData}'::bytea
WHERE id = '00413567-fdd7-4765-be30-7f80c2d8ce57'`
)
Some requirements:
I can not use an external file
I can not use a different value format
I can not use a different tech stack

How ImageMagic determines the ColorSpace of a PNG?

Suppose I create a simple PNG with:
convert -size 1x1 canvas:red red.png
Here is a similar image (bigger size) for reference:
Then run the command identify on it. It tells me the ColorSpace of the image is sRGB but there seems to be NO indication of this inside the file. In fact running
$ hexdump -C red.png
00000000 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52 |.PNG........IHDR|
00000010 00 00 00 01 00 00 00 01 01 03 00 00 00 25 db 56 |.............%.V|
00000020 ca 00 00 00 04 67 41 4d 41 00 00 b1 8f 0b fc 61 |.....gAMA......a|
00000030 05 00 00 00 20 63 48 52 4d 00 00 7a 26 00 00 80 |.... cHRM..z&...|
00000040 84 00 00 fa 00 00 00 80 e8 00 00 75 30 00 00 ea |...........u0...|
00000050 60 00 00 3a 98 00 00 17 70 9c ba 51 3c 00 00 00 |`..:....p..Q<...|
00000060 06 50 4c 54 45 ff 00 00 ff ff ff 41 1d 34 11 00 |.PLTE......A.4..|
00000070 00 00 01 62 4b 47 44 01 ff 02 2d de 00 00 00 07 |...bKGD...-.....|
00000080 74 49 4d 45 07 e5 01 0d 17 04 37 80 ef 04 02 00 |tIME......7.....|
00000090 00 00 0a 49 44 41 54 08 d7 63 60 00 00 00 02 00 |...IDAT..c`.....|
000000a0 01 e2 21 bc 33 00 00 00 25 74 45 58 74 64 61 74 |..!.3...%tEXtdat|
000000b0 65 3a 63 72 65 61 74 65 00 32 30 32 31 2d 30 31 |e:create.2021-01|
000000c0 2d 31 33 54 32 33 3a 30 34 3a 35 35 2b 30 30 3a |-13T23:04:55+00:|
000000d0 30 30 2d af d4 01 00 00 00 25 74 45 58 74 64 61 |00-......%tEXtda|
000000e0 74 65 3a 6d 6f 64 69 66 79 00 32 30 32 31 2d 30 |te:modify.2021-0|
000000f0 31 2d 31 33 54 32 33 3a 30 34 3a 35 35 2b 30 30 |1-13T23:04:55+00|
00000100 3a 30 30 5c f2 6c bd 00 00 00 00 49 45 4e 44 ae |:00\.l.....IEND.|
00000110 42 60 82 |B`.|
00000113
does not provide a clue, that I know of.
I understand that identifying the ColorSpace of an image, that does not contain that information, is a very hard problem -- see one proposed solution looking at the histogram of colors here.
So how identify, from the ImageMagick suite, determines the ColorSpace of this image?
It is common, but not standardized to assume that an image without an embedded or sidecar ICC profile or without an explicit encoding description is encoded according to IEC 61966-2-1:1999, i.e. sRGB specification.
This is just a bug in ImageMagick. You can use exiftool to check whether sRGB + intent chunk is present. In this case, no.
Gamma 2.2 is not sRGB. Thus ImageMagic is wrong here. That is a common problem on Wikipedia, all SVG images when converted to PNG have this and it destroys the colours. See: https://phabricator.wikimedia.org/T26768
We will have to reencode all images on Wikipedia, since we use ImageMagick. Sigh.

Vim: calling xxd with system command in substitution results in conversion error

Background is that I have a log file that contains hex dumps that I want to convert with xxd to get that nice ASCII column that shows possible strings in the binary data.
The log file format looks like this:
My interesting hex dump:
00 53 00 6f 00 6d 00 65 00 20 00 74 00 65 00 78
00 74 00 20 00 65 00 78 00 61 00 6d 00 70 00 6c
00 65 00 20 00 75 00 73 00 69 00 6e 00 67 00 20
00 55 00 54 00 46 00 2d 00 31 00 36 00 20 00 69
00 6e 00 20 00 6f 00 72 00 64 00 65 00 72 00 20
00 74 00 6f 00 20 00 67 00 65 00 74 00 20 00 30
00 78 00 30 00 30 00 20 00 62 00 79 00 74 00 65
00 73 00 2e
Visually selecting the hex dump and do xxd -r -p followed by a xxd -g1 on the result does exactly what I'm aiming for.
However, since the number of dumps I want to convert are quite a few I would rather automate the process.
So I'm using the following substitute command to do the conversion:
:%s/\(\x\{2\} \?\)\{16\}\_.*/\=system('xxd -g1',system('xxd -r -p',submatch(0)))
The expression matches the entire hex dump in the log file. The match is sent to xxd -r -p as stdin and its output is used as stdin for xxd -g1.
Well, that's the idea at least.
The thing is that the above almost works. It produces the following result:
My interesting hex dump:
00000000: 01 53 01 6f 01 6d 01 65 01 20 01 74 01 65 01 78 .S.o.m.e. .t.e.x
00000010: 01 74 01 20 01 65 01 78 01 61 01 6d 01 70 01 6c .t. .e.x.a.m.p.l
00000020: 01 65 01 20 01 75 01 73 01 69 01 6e 01 67 01 20 .e. .u.s.i.n.g.
00000030: 01 55 01 54 01 46 01 2d 01 31 01 36 01 20 01 69 .U.T.F.-.1.6. .i
00000040: 01 6e 01 20 01 6f 01 72 01 64 01 65 01 72 01 20 .n. .o.r.d.e.r.
00000050: 01 74 01 6f 01 20 01 67 01 65 01 74 01 20 01 30 .t.o. .g.e.t. .0
00000060: 01 78 01 30 01 30 01 20 01 62 01 79 01 74 01 65 .x.0.0. .b.y.t.e
00000070: 01 73 01 2e .s..
All 00 bytes have mysteriously transformed into 01.
It should have produced the following:
My interesting hex dump:
00000000: 00 53 00 6f 00 6d 00 65 00 20 00 74 00 65 00 78 .S.o.m.e. .t.e.x
00000010: 00 74 00 20 00 65 00 78 00 61 00 6d 00 70 00 6c .t. .e.x.a.m.p.l
00000020: 00 65 00 20 00 75 00 73 00 69 00 6e 00 67 00 20 .e. .u.s.i.n.g.
00000030: 00 55 00 54 00 46 00 2d 00 31 00 36 00 20 00 69 .U.T.F.-.1.6. .i
00000040: 00 6e 00 20 00 6f 00 72 00 64 00 65 00 72 00 20 .n. .o.r.d.e.r.
00000050: 00 74 00 6f 00 20 00 67 00 65 00 74 00 20 00 30 .t.o. .g.e.t. .0
00000060: 00 78 00 30 00 30 00 20 00 62 00 79 00 74 00 65 .x.0.0. .b.y.t.e
00000070: 00 73 00 2e .s..
What am I not getting here?
Of course I can use macros and other ways of doing this, but I want to understand why my substitution command doesn't do what I expect.
Edit:
For anyone that want to achieve the same thing I provide the substitution expression that works on an entire file. The expression above was only for testing purposes using the log file example also from above.
The one below is the one that performs a correct conversion, modified based on the information Kent provided in his answer.
:%s/\(\(\x\{2\} \)\{16\}\_.\)\+/\=system('xxd -p -r | xxd -g1',submatch(0))
very likely, the problem is string conversion in the system() The input will be converted into a string by vim, so does the output of your first xxd command.
You can try to extract that hex parts into a file. then:
xxd -r -p theFile|vim -
And then calling the system('xxd -g1', alltext), you are gonna get something else than 00 too.
This doesn't work in the same way of a pipe (xxd ...|xxd...). But unfortunately, the system() function doesn't accept pipes.
If you want to fix your :s command, you need to call systemlist() on your first xxd call to get the data in binary format, then pass it to the 2nd xxd:
:%s/\(\x\{2\} \?\)\{16\}\_.*/\=system('xxd -g1',systemlist('xxd -r -p',submatch(0)))
The cmd above will generate the 00s. since there is no string conversion.
However, when working with some data format other than plain string, perhaps we can use filters instead of calling system(). It would be a lot eaiser. For your example:
2,$!xxd -r -p|xxd -g1

unable to unzip zip file in linux centos

I am unable to unzip file in linux centos. Getting following error
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
As you are mentioning jar in your comments we can consider this a programming question ;-)
First of all you should try to validate your file. If available you can even compare the checksum provided for this file and / or the filesize with the location you downloaded it from.
To verify the zip file on a low level you can use this command:
hexdump -C -n 100 file.zip
This will show you the first 100 bytes of the zips structure which will look similar to this:
00000000 50 4b 03 04 0a 00 00 00 00 00 88 43 65 47 11 7a |PK.........CeG.z|
00000010 39 1e 15 00 00 00 15 00 00 00 0e 00 1c 00 66 69 |9.............fi|
00000020 6c 65 31 69 6e 7a 69 70 2e 74 78 74 55 54 09 00 |le1inzip.txtUT..|
00000030 03 0f 05 3b 56 2f 05 3b 56 75 78 0b 00 01 04 e8 |...;V/.;Vux.....|
00000040 03 00 00 04 e8 03 00 00 54 68 69 73 20 69 73 20 |........This is |
00000050 61 20 66 69 6c 65 0a 1b 5b 31 37 7e 0a 50 4b 03 |a file..[17~.PK.|
00000060 04 0a 00 00 |....|
The first two byte of the file have to be PK, if not the file is invalid. Some bytes later you will find the name of the first file stored. In this example it is file1inzip.txt.

Resources