Perl string weirdness : equal strings being not equal?

Perl string weirdness : equal strings being not equal? - string

I am using Perl v5.16.2
I am using the Net::SMPP modules and it returns me some data.
If I show this data, I get this (simplified) :
$VAR1 = bless( {
'receipted_message_id' => '400002F6E09C61701222120140',
'30' => '400002F6E09C61701222120140'
}, 'Net::SMPP::PDU' );
Now, let's assume this data is in $pdu and I do this :
$message_id = $pdu->{30}; # or $pdu->{receipted_message_id}, same result
myfunction($message_id);
Then, I have myfunction defined as :
sub myfunction {
my $message_id = shift;
my $message_id_static = '400002F6E09C61701222120140';
print Dumper($message_id);
print Dumper($message_id_static);
print hexdump($message_id);
print hexdump($message_id_static);
if ($message_id eq $message_id_static)
{
print "match\n";
}
else
{
print "no match\n";
}
}
The output of the program is :
$VAR1 = '400002F6E09C61701222120140';
$VAR1 = '400002F6E09C61701222120140';
Data::Hexdumper: data length isn't an integer multiple of lines
so has been padded with NULLs at the end.
0x0000 : 34 30 30 30 30 32 46 36 45 30 39 43 36 31 37 30 : 400002F6E09C6170
0x0010 : 31 32 32 32 31 32 30 31 34 30 00 00 00 00 00 00 : 1222120140......
Data::Hexdumper: data length isn't an integer multiple of lines
so has been padded with NULLs at the end.
0x0000 : 34 30 30 30 30 32 46 36 45 30 39 43 36 31 37 30 : 400002F6E09C6170
0x0010 : 31 32 32 32 31 32 30 31 34 30 00 00 00 00 00 00 : 1222120140......
no match
Which doesn't make any sense to me... !
If I try to use $message_id to do a SQLite query, it fails miserably. If I use $message_id_static instead, it works perfectly.
So, is this a weird internal Perl bug, or am I missing something ?
This has been driving me nuts for hours...
EDIT :
Using the perl debugger, I get this :
DB<3> x $message_id_static
0 '400002F6E09C61701222120140'
DB<4> x $message_id
0 "400002F6E09C61701222120140\c#"
So at least I see there is a difference in the strings, but why isn't it seen by the hexdump, and what is that \c# ?
Thanks !

The \c# character is Ctrl-#, which is the ASCII NUL character at code point zero
You can't see it in your hexdump output because it is indistinguishable from the 00 padding at the end of the dump
If you set $Data::Dumper::Useqq = 1 then it will be visible in the output from print Dumper $message_id
You can remove it from the variable by using s/\0\z// or tr/\0//d, but you should really investigate why it is there in the first place

Related

Sockets programming: DIG DNS Query Messages: Incorrect header length?

RFC Reference
I am working on a project which involves sockets programming and interpreting the output from DIG DNS queries.
I'm using RFC 1035 as my reference. Although this is quite old now (1987) as far as I can tell from later RFCs (for example 8490) the DNS headers are still the same.
https://www.rfc-editor.org/rfc/rfc1035
Code Overview: IPv6 TCP query
I have written a short program in C which reads from a IPv6 TCP socket. I send data to this socket using DIG. (My program simply reads all data it sees on a socket, and prints it to stdout.)
Note that there are two unusual things here:
Firstly the use of IPv6
Secondly the use of TCP (DNS messages are often UDP)
Here is the command used:
dig #::1 -p 8053 duckduckgo.com +tcp
I am running dig version DiG 9.16.13-Debian, on Debian Testing. (cera 2021-May)
Output, Discussion and Question
Here is the hexadecimal and printable character output which is read from the socket:
Hex:
00 37 61 78 01 20 00 01 00 00 00 00 00 01 0A 64 75 63 6B 64 75 63 6B 67 6F 03 63 6F 6D 00 00 01 00 01 00 00 29 10 00 00 00 00 00 00 0C 00 0A 00 08 00 7A 4* 48 2C 16 0* 33
Char:
00 7 61 x 01 20 00 01 00 00 00 00 00 01 0A d u c k d u c k g o 03 c o m 00 00 01 00 01 00 00 ) 10 00 00 00 00 00 00 0C 00 0A 00 08 00 z 4* H , 16 0* 33
If non-printable characters are encountered, the hex value is printed instead.
Although this is a fairly long stream of data, the question relates to the length of the header.
According to RFC 1035, the length of the header should be 12 bytes.
4.1.1. Header section format
The header contains the following fields:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
The header is followed by a QUESTION SECTION. The question section begins with a single byte which specifies the length.
Inspecting the data stream above, we see that the byte at offset 12 has a value of 0. I repeat it below with offset numbers to make it clear. The data is in the middle row, the row above and below are byte offsets.
0 1 2 3 4 5 6 7 8 9 10 11 <- byte 12
00 37 61 78 01 20 00 01 00 00 00 00 00 01 0A 64 75 63 6B ...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 <- byte 15
This clearly doesn't make any sense.
Looking again at the stream, we can see that "duckduckgo" is preceeded by the byte 0A. This is 10 in decimal and corresponds to the 10 characters of "duckduckgo". This string is followed by a byte 03 which corresponds to the 3 bytes of "com".
The offset of the byte 0A is 15. Not 12.
I must have misunderstood the RFC specification. But what have I misunderstood? Does the header itself start at a different offset to what I think it is? (Byte zero.) Or is there perhaps some padding between the end of the header and the beginning of the first question section?
Existing Question on this site:
Comments: The below link states that there is no padding. This is the only answer on this question. The question is about DNS responses rather than queries, and does not ask about the header section of a query. (Although information from one should presumably apply to the other, but possibly does not.)
Do DNS messages pad names to an even number of bytes?
Comments: The below link asks about the best way to build a data structure to handle DNS data. Additionally, the answer notes that one has to be careful about network byte order and machine byte order. I am already aware of this and I use ntohs() to convert from network byte order to x86_64 byte order before printing information to stdout. This is not the problem and does not explain why I see information about the dns query starting at byte 15 instead of 12, when the header should be a fixed size of 12 bytes.
Implementing a DNS Query in c++ according to RFC 1035

Thanks to #SteffenUllrich who prompted the solution for this in the comments.
RFC 1035 4.2.2 states
4.2.2. TCP usage
Messages sent over TCP connections use server port 53 (decimal). The
message is prefixed with a two byte length field which gives the message
Mockapetris [Page 32]
RFC 1035 Domain Implementation and Specification November 1987
length, excluding the two byte length field. This length field allows
the low-level processing to assemble a complete message before beginning
to parse it.
I had removed the 2-byte field at the start of my struct at some point.
This is what the structure looks like with the 2 byte length field re-enabled.
struct __attribute__((__packed__)) dns_header
{
unsigned short ID;
union
{
unsigned short FLAGS;
struct
{
unsigned short QR : 1;
unsigned short OPCODE : 4;
unsigned short AA : 1;
unsigned short TC : 1;
unsigned short RD : 1;
unsigned short RA : 1;
unsigned short Z : 3;
unsigned short RCODE : 4;
};
};
unsigned short QDCOUNT;
unsigned short ANCOUNT;
unsigned short NSCOUNT;
unsigned short ARCOUNT;
};
struct __attribute__((__packed__)) dns_struct_tcp
{
unsigned short length; // length excluding 2 bytes for length field
struct dns_header header;
};
For example: I recieved a TCP packet of length 53 bytes. The value of length is set to 51.
To read data into this struct:
memcpy(&dnsdata, buf, sizeof(struct dns_struct_tcp));
To interpret this data (since it is stored in network byte order):
void dns_header_print(FILE *file, const struct dns_header *header)
{
fprintf(file, "ID: %u\n", ntohs(header->ID));
char str_FLAGS[8 * sizeof(unsigned short) + 1];
str_FLAGS[8 * sizeof(unsigned short)] = '\0';
print_binary_16_fixed_width(str_FLAGS, header->FLAGS);
fprintf(file, "FLAGS: %s\n", str_FLAGS);
fprintf(file, "FLAGS: QOP ATRRZZZR \n");
fprintf(file, " RCODEACDA CODE\n");
fprintf(file, "QDCOUNT: %u\n", ntohs(header->QDCOUNT));
fprintf(file, "ANCOUNT: %u\n", ntohs(header->ANCOUNT));
fprintf(file, "NSCOUNT: %u\n", ntohs(header->NSCOUNT));
fprintf(file, "ARCOUNT: %u\n", ntohs(header->ARCOUNT));
}
Note that the flags are unchanged, since each field of flags is less than 8 bits in length. However on x86_64 systems, unsigned short is stored in little-endian format, hence ntohs() is use to convert data which is in big-endian (network) byte order to little-endian (host) byte order.

Excel: ignoring 0 at start of numbers

I have a list of times which i want to add to a string
0900 1730
0900 1730
1000 1700
0930 1700
i need to break these up to hours and minutes like so
09 00 17 30
09 00 17 30
10 00 17 00
09 30 17 00
to do this i am using the MID() function to get the first two characters from the cell and then the last two. But when i do this for numbers that start with 0 of have 00 it drops the first 0 like so
0930 = ",MID(B2,1,2),",",MID(B2,3,2)," output - 93 0 what i want = 09 30
0900 = ",MID(B2,1,2),",",MID(B2,3,2)," output - 90 0 what i want = 09 00
1000 = ",MID(B2,1,2),",",MID(B2,3,2)," output - 10 0 what i want = 10 00
is there a way to solve this?

You can use a mid of a pre-formatted block:
=MID(RIGHT("0000"&B2,4),1,2) =MID(RIGHT("0000"&B2,4),3,2)
This should give you two strings like 09 & 30.
If you want two numeric values you can add a value function:
=VALUE(MID(RIGHT("0000"&B2,4),1,2))

One way is place Single Quote(') before the 0 then it will store the 0930 as text in cell
and your formula will also work, No need to change in the formula.
So the value 0930 will be '0930

HEX & Decimal conversion

I have a binary file , the definition of its content is as below : ( all data is stored
in little endian (ie. least significant byte first)) . The example numbers below are HEX
11 63 39 46 --- Time, UTC in seconds since 1 Jan 1970.
01 00 --- 0001 = No Fix, 0002 = SPS
97 85 ff e0 7b db 4c 40 --- Latitude, as double
a1 d5 ce 56 8d 26 28 40 --- Longitude, as double
f0 37 e1 42 --- Height in meters, as float
fe 2b f0 3a --- Speed in km/h, as float
00 00 00 00 --- Heading (degrees ?), as float
01 00 --- RCR, log reason. 0001=Time, 0004=Distance
59 20 6a f3 4a 26 e3 3f --- Distance in meters, as double,
2a --- ? Don't know
a8 --- Checksum, xor of all bytes above not including 0x2a
the data from the Binary file "in HEX" is as below
"F25D39460200269652F5032445401F4228D79BCC54C09A3A2743B4ADE73F2A83"
I appreciate if you can support me to translate this data line based on the instruction before.

Probably wrong, but here's a shot at it using Ruby:
hex = "F25D39460200269652F5032445401F4228D79BCC54C09A3A2743B4ADE73F2A83"
ints = hex.scan(/../).map{ |s| s.to_i(16) }
raw = ints.pack('C*')
fields = raw.unpack( 'VvEEVVVvE')
p fields
#=> [1178164722, 2, 42.2813707974677, -83.1970117467067, 1126644378, 1072147892, nil, 33578, nil]
p Time.at( fields.first )
#=> 2007-05-02 21:58:42 -0600
I'd appreciate it if someone well-versed in #pack and #unpack would show me a better way to accomplish the first three lines.

My Cygnus Hex Editor could load such a file and, using structure templates, display the data in its native formats.
Beyond that, it's just a matter of doing through each value and working out the translation for each byte.

Haskell doubt: how to transform a Matrix represented as: [String] to a Matrix Represented as [[Int]]?

Im trying to solve Problem 11 of Project Euler in haskell. I almost did it, but right now im
stuck, i want to transform a Matrix represented as [String] to a Matrix represented as [[Int]].
I "drawed" the matrices:
What i want:
"08 02 22 97 38 15 00 40 [ ["08","02","22","97","38","15","00","40"], [[08,02,22,97,38,15,00,40]
49 49 99 40 17 81 18 57 map words lines ["49","49","99","40","17","81","18","57"], ??a [49,49,99,40,17,81,18,57]
81 49 31 73 55 79 14 29 ----------> ["81","49","31","73","55","79","14","29"], ---------> [81,49,31,73,55,79,14,29]
52 70 95 23 04 60 11 42 ["52","70","95","23","04","60","11","42"], [52,70,95,23,04,60,11,42]
22 31 16 71 51 67 63 89 ["22","31","16","71","51","67","63","89"], [22,31,16,71,51,67,63,89]
24 47 32 60 99 03 45 02" ["24","47","32","60","99","03","45","02"] ] [24,47,32,60,99,03,45,02]]
Im stuck in doing the last transformation (??a)
for curiosity(and learning) i also want to know how to do a matrix of digits:
Input:
"123456789 [ "123456789" [ [1,2,3,4,5,6,7,8,9]
124834924 lines "124834924" ??b [1,2,4,8,3,4,9,2,4]
328423423 ---------> "328423423" ---------> [3,2,8,4,2,3,4,2,3]
334243423 "334243423" [3,3,4,2,4,3,4,2,3]
932402343" "932402343" ] [9,3,2,4,0,2,3,4,3] ]
What is the best way to make (??a) and (??b) ?

What you want is the read function:
read :: (Read a) => String -> a
This thoughtfully parses a string into whatever you're expecting (as long as it's an instance of the class Read, but fortunately Int is such).
So just map that over the words, like so:
parseMatrix :: (Read a) => String -> [[a]]
parseMatrix s = map (map read . words) $ lines s
Just use that in a context that expects [[Int]] and Haskell's type inference will take it from there.
To get the digits, just remember that String is actually just [Char]. Instead of using words, map a function that turns each Char into a single-element list; everything else is the same.

Shell script printing contents of variable containing output of a command removes newline characters [duplicate]

This question already has answers here:
Capturing multiple line output into a Bash variable
(7 answers)
Closed 7 years ago.
I'm writing a shell script which will store the output of a command in a variable, process the output, and later echo the results. Here's what I've got:
stuff=$(diff -u pens tape)
# process the output
echo $stuff
The problem is, the output I get from running the script is this:
--- pens 2009-09-27 10:29:06.000000000 -0400 +++ tape 2009-09-18 16:45:08.000000000 -0400 ## -1,4 +1,2 ## -highlighter -marker -pencil -POSIX +masking +duct
Whereas I was expecting this:
--- pens 2009-09-27 10:29:06.000000000 -0400
+++ tape 2009-09-18 16:45:08.000000000 -0400
## -1,4 +1,2 ##
-highlighter
-marker
-pencil
-POSIX
+masking
+duct
It looks like the newline characters are being removed somehow. How do I get them to say in?

If you want to preserve the newlines, enclose the variable in double quotes:
echo "$stuff"
When you write it without the double quotes, the shell expands $stuff into a space-separated list of words (where 'words' are sequences of non-space characters, and the space characters are blanks and tabs and newlines; upon experimentation, it seems that form feeds, carriage returns and back-spaces are not counted as space).
Demonstrating interpretation of control characters as white space. ASCII 8 is backspace, 9 is tab, 10 is new line (LF), 11 is vertical tab, 12 is form feed, 13 is carriage return. The first command generates a sequence of characters separated by the various control characters. The second command echoes with the result with the original characters preserved - see the hex dump. The third command echoes the result with the shell splitting the words; you can see that the tab and newline were replaced by blank (0x20).
$ x=$(./ascii 64 65 8 66 67 9 68 69 10 70 71 11 72 73 12 74 75 13 76 77)
$ echo "$x" | odx
0x0000: 40 41 08 42 43 09 44 45 0A 46 47 0B 48 49 0C 4A #A.BC.DE.FG.HI.J
0x0010: 4B 0D 4C 4D 0A K.LM.
0x0015:
$ echo $x | odx
0x0000: 40 41 08 42 43 20 44 45 20 46 47 0B 48 49 0C 4A #A.BC DE FG.HI.J
0x0010: 4B 0D 4C 4D 0A K.LM.
0x0015:
$

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Perl string weirdness : equal strings being not equal? - string

Related

Sockets programming: DIG DNS Query Messages: Incorrect header length?

Excel: ignoring 0 at start of numbers

HEX & Decimal conversion

Haskell doubt: how to transform a Matrix represented as: [String] to a Matrix Represented as [[Int]]?

Shell script printing contents of variable containing output of a command removes newline characters [duplicate]

Categories

Resources