I was trying to understand the internal working of how JWT performs RS256 signature verification. The signature algorithm works on following basic steps:
Hash the original data
Encrypt the hash with RSA private key
And for verification it follows the following steps:
Decrypt using RSA public key
Match the hash generated from decryption with the SHA256 hash of original message.
While trying to test this on one of the jwt
eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiYWRtaW4iOnRydWUsImlhdCI6MTUxNjIzOTAyMn0.POstGetfAytaZS82wHcjoTyoqhMyxXiWdR7Nn7A29DNSl0EiXLdwJ6xC6AfgZWF1bOsS_TuYI3OG85AmiExREkrS6tDfTQ2B3WXlrr-wp5AokiRbz3_oB4OxG-W9KcEEbDRcZc0nH3L7LzYptiy1PtAylQGxHTWZXtGz4ht0bAecBgmpdgXMguEIcoqPJ1n3pIWk_dUZegpqx0Lka21H6XxUTxiy8OcaarA8zdnPUnV6AmNP3ecFawIFYdvJB_cm-GvpCSbr8G8y_Mllj8f4x9nBH8pQux89_6gUY618iYv7tuPWBFfEbLxtF2pZS6YC1aSfLQxeNe8djT9YjpvRZA
I found that the hash obtained from signature contains some extra characters.
E.g. the SHA256 of original message in case of above jwt in hex encoding is
8041fb8cba9e4f8cc1483790b05262841f27fdcb211bc039ddf8864374db5f53
but the hash obtained from signature of above jwt after decryption is
3031300d0609608648016503040201050004208041fb8cba9e4f8cc1483790b05262841f27fdcb211bc039ddf8864374db5f53
Which has 3031300d060960864801650304020105000420 extra characters infront of the hash.
What do these characters represent and shouldn't the hash obtained from message and signature be identical?
rfc7518 3.3 defines JWS algorithms RS256,384,512:
This section defines the use of the RSASSA-PKCS1-v1_5 digital
signature algorithm as defined in Section 8.2 of RFC 3447 [RFC3447]
(commonly known as PKCS #1), using SHA-2 [SHS] hash functions.
and rfc3447 8.2 defines RSASS-PKCS1-v1_5
RSASSA-PKCS1-v1_5 combines the RSASP1 and RSAVP1 primitives with the
EMSA-PKCS1-v1_5 encoding method. ....
where EMSA-PKCS1-v1_5 is defined in rfc3447 9.2 as:
1. Apply the hash function to the message M to produce a hash value
H:
H = Hash(M).
If the hash function outputs "message too long," output "message
too long" and stop.
2. Encode the algorithm ID for the hash function and the hash value
into an ASN.1 value of type DigestInfo (see Appendix A.2.4) with
the Distinguished Encoding Rules (DER), where the type DigestInfo
has the syntax
DigestInfo ::= SEQUENCE {
digestAlgorithm AlgorithmIdentifier,
digest OCTET STRING
}
The first field identifies the hash function and the second
contains the hash value. Let T be the DER encoding of the
DigestInfo value (see the notes below) and let tLen be the length
in octets of T.
3. If emLen < tLen + 11, output "intended encoded message length too
short" and stop.
4. Generate an octet string PS consisting of emLen - tLen - 3 octets
with hexadecimal value 0xff. The length of PS will be at least 8
octets.
5. Concatenate PS, the DER encoding T, and other padding to form the
encoded message EM as
EM = 0x00 || 0x01 || PS || 0x00 || T.
6. Output EM. [added: which is then modexp'ed with d by RSASP1 to
sign, or matched to the value modexp'ed with e by RSAVP1 to verify]
Notes.
1. For the six hash functions mentioned in Appendix B.1, the DER
encoding T of the DigestInfo value is equal to the following:
MD2: (0x)30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 02 05 00 04
10 || H.
MD5: (0x)30 20 30 0c 06 08 2a 86 48 86 f7 0d 02 05 05 00 04
10 || H.
SHA-1: (0x)30 21 30 09 06 05 2b 0e 03 02 1a 05 00 04 14 || H.
SHA-256: (0x)30 31 30 0d 06 09 60 86 48 01 65 03 04 02 01 05 00
04 20 || H.
SHA-384: (0x)30 41 30 0d 06 09 60 86 48 01 65 03 04 02 02 05 00
04 30 || H.
SHA-512: (0x)30 51 30 0d 06 09 60 86 48 01 65 03 04 02 03 05 00
04 40 || H.
The prefix you discovered corresponds exactly to that specified in Note 1 for the encoding in step 2 of a DigestInfo structure for a SHA-256 hash, as expected.
Note rfc3447=PKCS1v2.1 has been superseded by rfc8017=PKCS1v2.2, but the only relevant change in this area is the addition of the SHA512/224 and SHA512/256 hashes, which JWS doesn't use.
Describing signing and verifying as 'encrypting' and 'decrypting' the hash (really, the encoding aka padding of the hash) is considered obsolete. It was used originally, decades ago, and only for RSA not other signature algorithms, because of the mathematical similarity between the modexp operations used for encrypting and decrypting vs signing and verifying, but it was found that thinking of these as the same or interchangeable resulted in system implementations that were vulnerable and broken. In particular see rfc3447 5.2:
The main mathematical operation in each primitive is
exponentiation, as in the encryption and decryption primitives of
Section 5.1. RSASP1 and RSAVP1 are the same as RSADP and RSAEP
except for the names of their input and output arguments; they are
distinguished as they are intended for different purposes.
nodejs uses this obsolete terminology because it uses OpenSSL which via its predecessor SSLeay dates back to the early 1990s when this mistake was still common.
However, that isn't really a programming/development issue and is more on topic for crypto.SX and security.SX; see some of the links I collected at https://security.stackexchange.com/questions/159282/can-openssl-decrypt-the-encrypted-signature-in-an-amazon-alexa-request#159289 .
Related
Stumped as to where to go next with this issue.
My node.js azure function takes in a file represented as base64. It does some processing on the file and returns the processed file in the response:
module.exports = async function (context, req) {
const buf = Buffer.from(req.body.file, 'base64')
/// do processing
const file = filebuf.toString('base64')
context.res.status(200).json({ file })
}
So in my personal Azure tenant this works fine. In my clients azure tenant, the file is corrupted. This is with the same exact file as input.
Not sure if its interesting but the first few bytes of the different tenants processed files are as so:
{
"file": "UEsDBAoAAAAAAGh+r1SoFh7uRAwAAEQMAAA..."
}
{
"file": "UEsDBAoAAAAAANh8r1SoFh7uRAwAAEQMAAA..."
}
For some reason, around the 14th char is different and I can't think why that would be. Both function apps are running on Linux and the runtime version of the apps are the same.
Thanks
EDIT:
Following the first answer and my new understanding that the different byte is to be expected I am still not closer to understanding why one file is corrupt and another is not.
I now notice that the content length of the body is different:
Non-corrupt:
"headers":{"Transfer-Encoding":"chunked","Vary":"Accept-Encoding","Strict-Transport-Security":"max-age=31536000; includeSubDomains","x-ms-apihub-cached-response":"true","x-ms-apihub-obo":"true","Cache-Control":"private","Date":"Mon, 16 May 2022 08:25:38 GMT","Content-Type":"application/json; charset=utf-8","Content-Length":"2926356"}
Corrupt:
"headers":{"Transfer-Encoding":"chunked","Request-Context":"appId=cid-v1:XXX","x-ms-apihub-cached-response":"true","x-ms-apihub-obo":"true","Date":"Mon, 16 May 2022 08:24:48 GMT","Content-Type":"application/json; charset=utf-8","Content-Length":"2926368"}
Where could these extra bytes have come from? I wondering if it might be between different versions of Node but would be surprised.
The 2 bytes which are different are part of the ZIP file header and describe the modification time.
Let's take a look at the first partial base64 string
base64
UEsDBAoAAAAAAGh+r1SoFh7uRAwAAEQMAAA
hex
50 4b 03 04 0a 00 00 00 00 00 68 7e af 54 a8 16 1e ee 44 0c 00 00 44 0c 00 00
The first 4 bytes 50 4b 03 04 are the ZIP file signature, 0a 00 the version, 00 00 the flags, 00 00 the compression method and finally 68 7e the file modification time. This is in MS-DOS format which translates to 13:03:15. Needless to say the different time values in there do not indicate a corrupted file.
More information on the ZIP format: https://users.cs.jmu.edu/buchhofp/forensics/formats/pkzip.html
RFC Reference
I am working on a project which involves sockets programming and interpreting the output from DIG DNS queries.
I'm using RFC 1035 as my reference. Although this is quite old now (1987) as far as I can tell from later RFCs (for example 8490) the DNS headers are still the same.
https://www.rfc-editor.org/rfc/rfc1035
Code Overview: IPv6 TCP query
I have written a short program in C which reads from a IPv6 TCP socket. I send data to this socket using DIG. (My program simply reads all data it sees on a socket, and prints it to stdout.)
Note that there are two unusual things here:
Firstly the use of IPv6
Secondly the use of TCP (DNS messages are often UDP)
Here is the command used:
dig #::1 -p 8053 duckduckgo.com +tcp
I am running dig version DiG 9.16.13-Debian, on Debian Testing. (cera 2021-May)
Output, Discussion and Question
Here is the hexadecimal and printable character output which is read from the socket:
Hex:
00 37 61 78 01 20 00 01 00 00 00 00 00 01 0A 64 75 63 6B 64 75 63 6B 67 6F 03 63 6F 6D 00 00 01 00 01 00 00 29 10 00 00 00 00 00 00 0C 00 0A 00 08 00 7A 4* 48 2C 16 0* 33
Char:
00 7 61 x 01 20 00 01 00 00 00 00 00 01 0A d u c k d u c k g o 03 c o m 00 00 01 00 01 00 00 ) 10 00 00 00 00 00 00 0C 00 0A 00 08 00 z 4* H , 16 0* 33
If non-printable characters are encountered, the hex value is printed instead.
Although this is a fairly long stream of data, the question relates to the length of the header.
According to RFC 1035, the length of the header should be 12 bytes.
4.1.1. Header section format
The header contains the following fields:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
The header is followed by a QUESTION SECTION. The question section begins with a single byte which specifies the length.
Inspecting the data stream above, we see that the byte at offset 12 has a value of 0. I repeat it below with offset numbers to make it clear. The data is in the middle row, the row above and below are byte offsets.
0 1 2 3 4 5 6 7 8 9 10 11 <- byte 12
00 37 61 78 01 20 00 01 00 00 00 00 00 01 0A 64 75 63 6B ...
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 <- byte 15
This clearly doesn't make any sense.
Looking again at the stream, we can see that "duckduckgo" is preceeded by the byte 0A. This is 10 in decimal and corresponds to the 10 characters of "duckduckgo". This string is followed by a byte 03 which corresponds to the 3 bytes of "com".
The offset of the byte 0A is 15. Not 12.
I must have misunderstood the RFC specification. But what have I misunderstood? Does the header itself start at a different offset to what I think it is? (Byte zero.) Or is there perhaps some padding between the end of the header and the beginning of the first question section?
Existing Question on this site:
Comments: The below link states that there is no padding. This is the only answer on this question. The question is about DNS responses rather than queries, and does not ask about the header section of a query. (Although information from one should presumably apply to the other, but possibly does not.)
Do DNS messages pad names to an even number of bytes?
Comments: The below link asks about the best way to build a data structure to handle DNS data. Additionally, the answer notes that one has to be careful about network byte order and machine byte order. I am already aware of this and I use ntohs() to convert from network byte order to x86_64 byte order before printing information to stdout. This is not the problem and does not explain why I see information about the dns query starting at byte 15 instead of 12, when the header should be a fixed size of 12 bytes.
Implementing a DNS Query in c++ according to RFC 1035
Thanks to #SteffenUllrich who prompted the solution for this in the comments.
RFC 1035 4.2.2 states
4.2.2. TCP usage
Messages sent over TCP connections use server port 53 (decimal). The
message is prefixed with a two byte length field which gives the message
Mockapetris [Page 32]
RFC 1035 Domain Implementation and Specification November 1987
length, excluding the two byte length field. This length field allows
the low-level processing to assemble a complete message before beginning
to parse it.
I had removed the 2-byte field at the start of my struct at some point.
This is what the structure looks like with the 2 byte length field re-enabled.
struct __attribute__((__packed__)) dns_header
{
unsigned short ID;
union
{
unsigned short FLAGS;
struct
{
unsigned short QR : 1;
unsigned short OPCODE : 4;
unsigned short AA : 1;
unsigned short TC : 1;
unsigned short RD : 1;
unsigned short RA : 1;
unsigned short Z : 3;
unsigned short RCODE : 4;
};
};
unsigned short QDCOUNT;
unsigned short ANCOUNT;
unsigned short NSCOUNT;
unsigned short ARCOUNT;
};
struct __attribute__((__packed__)) dns_struct_tcp
{
unsigned short length; // length excluding 2 bytes for length field
struct dns_header header;
};
For example: I recieved a TCP packet of length 53 bytes. The value of length is set to 51.
To read data into this struct:
memcpy(&dnsdata, buf, sizeof(struct dns_struct_tcp));
To interpret this data (since it is stored in network byte order):
void dns_header_print(FILE *file, const struct dns_header *header)
{
fprintf(file, "ID: %u\n", ntohs(header->ID));
char str_FLAGS[8 * sizeof(unsigned short) + 1];
str_FLAGS[8 * sizeof(unsigned short)] = '\0';
print_binary_16_fixed_width(str_FLAGS, header->FLAGS);
fprintf(file, "FLAGS: %s\n", str_FLAGS);
fprintf(file, "FLAGS: QOP ATRRZZZR \n");
fprintf(file, " RCODEACDA CODE\n");
fprintf(file, "QDCOUNT: %u\n", ntohs(header->QDCOUNT));
fprintf(file, "ANCOUNT: %u\n", ntohs(header->ANCOUNT));
fprintf(file, "NSCOUNT: %u\n", ntohs(header->NSCOUNT));
fprintf(file, "ARCOUNT: %u\n", ntohs(header->ARCOUNT));
}
Note that the flags are unchanged, since each field of flags is less than 8 bits in length. However on x86_64 systems, unsigned short is stored in little-endian format, hence ntohs() is use to convert data which is in big-endian (network) byte order to little-endian (host) byte order.
I have a card support T0 protocol with an applet is installed on it. The host send a "multi-records reading" command to get records data. Records are read which are specified by record identifiers in this data field of this command. These are steps that I did:
Select DF
Send a command to read sequence of records
00 B2 00 06 16 73 0A 51 02 40 01 54 04 00 10 00 04 73 08 51 02 40 02 54 02 00 01 00
The meaning of the command is as bellow:
INS = 'B2': read record(s)
P1 = '00': references the current record (ISO 7814-4, 7.3.3, table 48)
P2 = '07' = '00000 110' :
'0000' indicate current short EF id (ISO 7814-4, 7.3.2, table 47)
'111' mean read all records from the last up to P1 (ISO 7814-4, 7.3.3, table 49)
Le = 16 : data length
Data field follow the BER-TLV, for example:
73 0A 51 02 40 01 54 04 00 10 00 04
Tag'73' indicate that sequence of bytes above consist hierarchy data object structure in data filed (length = '0A')
Tag'51' reference to 2-byte EF identifier = '40 01'
Tag'54' reference to one or more record identifiers, in this case are '00 10' and '00 04'
Le = '00'
This is expect respond from card:
53 |length of data| record data| 53| length of data| record data|......
I test this command with the card, the card return 'Unknown Error' message.
Could you tell me what is wrong with the command? Am I misunderstood at any points?
Thanks.
This cannot be answered without knowing the actual implementation. 6F00 - the status word indicating an unknown error - should only be returned when the implementation has an internal error. For Java Card implementations - for instance - the 6F00 is returned for uncaught exceptions in the process method handling the APDU's.
But just like the rest of ISO/IEC 7816-4, nothing is set in stone. It's not even defined when a specific error should be returned, so even the above is unsure. ISO/IEC 7816-4 is thoroughly useless in that regard.
Thank you for your answer. My problem is solved
Actually the return SW = 61 XY is not error message. According to ISO 7814-3 it mean:
Process completed normally (SW2 encodes N x , i.e., the number of extra data bytes still available). In cases 1 and 3, the
card should not use such a value. In cases 2 and 4, for transferring response data bytes, the card shall be ready to receive
a GET RESPONSE command with P3 set to the minimum of N x and N e .
So just need to send a GET RESPONSE command to get response data:
00 C0 00 00 XY
XY: the number of extra data bytes still available
I'm trying to reverse engineer a binary file format, but it has no magic bytes, and no specific extension. I can influence only a single aspect of the file: a short string. By trying different strings, I was able to figure out how data is stored in the file. It seems that the whole file uses some sort of simple encoding. I hope that finding the exact encoding allows me to narrow down my search for the file format. I know the file is generated by a Windows program written in C++.
Now, after much trial-and-error, I found that some sections of the file are encoded in runs. Each run starts with a byte that indicates how many bytes will follow and where the data is retrieved.
000ddddd (1 byte)Take the following (ddddd)+1 bytes from the encoded data.
111····· ···ddddd ···bbbbb (3 bytes)Go back (bbbbb)+1 bytes in the decoded data, and take the next (ddddd)+9 bytes from it.
ddd····· ··bbbbbb (2 bytes)Go back (bbbbbb)+1 bytes in the decoded data, and take the next (ddd)+2 bytes from it.
Here's an example:
This is the start of the file, with the UTF-16 string abracadabra encoded in it:
. . . a . b . r . . c . . d . € .
0C 20 03 04 61 00 62 00 72 20 05 00 63 20 03 00 64 20 03 80 0D
To decode the string:
0C number of Unicode chars: 12 (11 chars + \0)
20 03 . . . ??
04 next 5
61 00 a .
62 00 b .
72 r
20 05 . a . back 6, take 3
00 next 1
63 c
20 03 . a . back 4, take 3
00 next 1
64 d
20 03 . a . back 4, take 3
80 0D b . r . a . back 14, take 6
This results in (UTF-16):
a . b . r . a . c . a . d . a . b . r . a .
61 00 62 00 72 00 61 00 63 00 61 00 64 00 61 00 62 00 72 00 61 00
However, I have no clue as to what encoding/compression algorithm this might be. It looks like some variant of LZ that doesn't use a dictionary (like LZ77), but so far I haven't been able to find any algorithm that matches this description. I'm also not sure whether the entire file is encoded like this, or only portions of it.
Do you know this encoding? Or do you have any hints for things I might look for in the file to identify the encoding?
After your edit I think it's LZF with the following differences to your observations:
The magic header and indication of compressed vs uncompressed has been removed in your example (not too surprising if it's embedded in a file).
You took the block length as one byte, but it is two bytes and big-endian, so the prior 0x00 is part of the length, which still works.
Could be NTFS compression, which is LZNT1. This idea is supported by the platform and the apparent 2-byte structure, along with the byte-alignment of the actual data.
The following elements are specific to this algorithm.
Chunks: Segments of data that are compressed, uncompressed, or that denote the end of the buffer.
Chunk header: The header for a compressed or uncompressed chunk of data.
Flag bytes: A bit flag whose bits, read from low order to high order, specify the formats of the data elements that follow. For example, bit 0 corresponds to the first data element, bit 1 to the second, and so on. If the bit corresponding to a data element is set, the element is a 2-byte compressed word; otherwise, it is a 1-byte literal value.
Flag group: A flag byte followed by zero or more data elements, each of which is a single literal byte or a 2-byte compressed word.
I have a binary file , the definition of its content is as below : ( all data is stored
in little endian (ie. least significant byte first)) . The example numbers below are HEX
11 63 39 46 --- Time, UTC in seconds since 1 Jan 1970.
01 00 --- 0001 = No Fix, 0002 = SPS
97 85 ff e0 7b db 4c 40 --- Latitude, as double
a1 d5 ce 56 8d 26 28 40 --- Longitude, as double
f0 37 e1 42 --- Height in meters, as float
fe 2b f0 3a --- Speed in km/h, as float
00 00 00 00 --- Heading (degrees ?), as float
01 00 --- RCR, log reason. 0001=Time, 0004=Distance
59 20 6a f3 4a 26 e3 3f --- Distance in meters, as double,
2a --- ? Don't know
a8 --- Checksum, xor of all bytes above not including 0x2a
the data from the Binary file "in HEX" is as below
"F25D39460200269652F5032445401F4228D79BCC54C09A3A2743B4ADE73F2A83"
I appreciate if you can support me to translate this data line based on the instruction before.
Probably wrong, but here's a shot at it using Ruby:
hex = "F25D39460200269652F5032445401F4228D79BCC54C09A3A2743B4ADE73F2A83"
ints = hex.scan(/../).map{ |s| s.to_i(16) }
raw = ints.pack('C*')
fields = raw.unpack( 'VvEEVVVvE')
p fields
#=> [1178164722, 2, 42.2813707974677, -83.1970117467067, 1126644378, 1072147892, nil, 33578, nil]
p Time.at( fields.first )
#=> 2007-05-02 21:58:42 -0600
I'd appreciate it if someone well-versed in #pack and #unpack would show me a better way to accomplish the first three lines.
My Cygnus Hex Editor could load such a file and, using structure templates, display the data in its native formats.
Beyond that, it's just a matter of doing through each value and working out the translation for each byte.