I can see why it is useful to cast sockaddr to sockaddr_in, but I don't understand how this is possible. From what I've read, they're the same size and sockaddr_in is added with sin_zero to make it the same size. I would like to know how the compiler knows where to get the information from sockaddr_in if it is layed out differently to sockaddr.
It is possible because you normally cast pointers, not the structures themselves. You do what in natural language means "please treat this pointer to a socket structure as a pointer to an internet socket structure instead". Compiler has no problems to re-interpret the pointer.
Here is more detailed description taken up from comments:
A sockaddr is 16 bytes in size - the first two bytes are the sa_family, and the remaining 14 bytes are the sa_data which is arbitrary data. A sockaddr_in is also 16 bytes in size - the first 2 bytes are the sin_family (always AF_INET), the next 2 bytes are the sin_port, the next 4 bytes are the sin_addr (IP address), and the last 8 bytes are the sin_zero which is unused in IPv4 and provided only to ensure 16 bytes. This way, you can look at sockaddr.sa_family first, and if it is AF_INET then interpret the entire sockaddr as a sockaddr_in.
A sockaddr_in is not stored inside of sockaddr.sa_data field. The entire sockaddr is the entire sockaddr_in (when sockaddr.sa_family is AF_INET, that is). If you take a sockaddr* pointer and cast it to a sockaddr_in* pointer, then:
sockaddr.sa_family is sockaddr_in.sin_family
bytes 0-1 of sockaddr.sa_data are sockaddr_in.sin_port
bytes 2-5 are sockaddr_in.sin_addr
bytes 6-13 are sockaddr_in.sin_zero.
Related
I am using the NT native API NtQueryObject()/ZwQueryObject() from user mode (and I am aware of the risks in general and I have written kernel mode drivers for Windows in the past in my professional capacity).
Generally when one uses the typical "query information" function (of which there are a few) the protocol is first to ask with a too small buffer to retrieve the required size with STATUS_INFO_LENGTH_MISMATCH, then allocate a buffer of said size and query again -- this time using the buffer and previously returned size.
In order to get the list of object types (67 on my build) on the system I am doing just that:
ULONG Size = 0;
NTSTATUS Status = NtQueryObject(NULL, ObjectTypesInformation, &Size, sizeof(Size), &Size);
And in Size I get 8280 (WOW64) and 8968 (x64). I then proceed to allocate the buffer with calloc() and query again:
ULONG Size2 = 0;
BYTE* Buf = (BYTE*)::calloc(1, Size);
Status = NtQueryObject(NULL, ObjectTypesInformation, Buf, Size, &Size2);
NB: ObjectTypesInformation is 3. It isn't declared in winternl.h, but Nebbett (as ObjectAllTypesInformation) and others describe it. Since I am not querying for a particular object's traits but the system-wide list of object types, I pass NULL for the object handle.
Curiously on WOW64, i.e. 32-bit, the value in Size2 upon return from the second query is 16 Bytes (= 8296) bigger than the previously returned required size.
As far as alignment is concerned, I'd expect at most 8 Bytes for this sort of thing and indeed neither 8280 nor 8296 are at a 16 Byte alignment boundary, but on an 8 Byte one.
Certainly I can add some slack space on top of the returned required size (e.g. ALIGN_UP to the next 32 Byte alignment boundary), but this seems highly irregular to be honest. And I'd rather want to understand what's going on than to implement a workaround that breaks, because I miss something crucial.
The practical issue for the code is that in Debug configurations it tells me there's a corrupted heap somewhere, upon freeing Buf. Which suggests that NtQueryObject() was indeed writing these extra 16 Bytes beyond the buffer I provided.
Question: Any idea why it is doing that?
As usual for NT native API the sources of information are scarce. The x64 version of the exact same code returns the exact number of bytes required. So my thinking here is that WOW64 is the issue. A somewhat cursory look into wow64.dll with IDA didn't reveal any immediate points for suspicion regarding what goes wrong in translating the results to 32-bit here.
PS: Windows 10 (10.0.19043, ntdll.dll "timestamp" 77755782)
PPS: this may be related: https://wj32.org/wp/2012/11/30/obquerytypeinfo-and-ntqueryobject-buffer-overrun-in-windows-8/ Tested it, by checking that OBJECT_TYPE_INFORMATION::TypeName.Length + sizeof(WCHAR) == OBJECT_TYPE_INFORMATION::TypeName.MaximumLength in all returned items, which was the case.
The only part of ObjectTypesInformation that's public is the first field defined in winternl.h header in the Windows SDK:
typedef struct __PUBLIC_OBJECT_TYPE_INFORMATION {
UNICODE_STRING TypeName;
ULONG Reserved [22]; // reserved for internal use
} PUBLIC_OBJECT_TYPE_INFORMATION, *PPUBLIC_OBJECT_TYPE_INFORMATION;
For x86 this is 96 bytes, and for x64 this is 104 bytes (assuming you have the right packing mode enabled). The difference is the pointer in UNICODE_STRING which changes the alignment in x64.
Any additional memory space should be related to the TypeName buffer.
UNICODE_STRING accounts for 8 bytes of the difference between 8280 and 8296. The function uses the sizeof(ULONG_PTR) for alignment of the returned string plus an extra WCHAR, so that could easily account for the remaining 8 bytes.
AFAIK: The public use of NtQueryObject is supposed to be limited to kernel-mode use which of course means it always matches the OS native bitness (x86 code can't run as kernel in x64 native OS), so it's probably just a quirk of using the NT functions via the WOW64 thunk.
Alright, I think I figured out the issue with the help of WinDbg and a thorough look at wow64.dll using IDA.
NB: the wow64.dll I have has the same build number, but differs slightly in data only (checksum, security directory entry, pieces from version resources). The code is identical, which was to be expected, given deterministic builds and how they affect the PE timestamp.
There's an internal function called whNtQueryObject_SpecialQueryCase (according to PDBs), which covers the ObjectTypesInformation class queries.
For the above wow64.dll I used the following points of interest in WinDbg, from a 32 bit program which calls NtQueryObject(NULL, ObjectTypesInformation, ...) (the program itself is irrelevant, though):
0:000> .load wow64exts
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B0E0
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B14E
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B1A7
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B24A
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B252
Explanation of the above points of interest:
+B0E0: computing length required for 64 bit query, based on passed length for 32 bit
+B14E: call to NtQueryObject()
+B1A7: loop body for copying 64 to 32 bit buffer contents, after successful NtQueryObject() call
+B24A: computing written length by subtracting current (last + 1) entry from base buffer address
+B252: downsizing returned (64 bit) required length to 32 bit
The logic of this function in regards to just ObjectTypesInformation is roughly as follows:
Common steps
Take the ObjectInformationLength (32 bit query!) argument and size it up to fit the 64 bit info
Align the retrieved size up to the next 16 byte boundary
If necessary allocate the resulting amount from some PEB::ProcessHeap and store in TLS slot 3; otherwise using this as a scratch space
Call NtQueryObject() passing the buffer and length from the two previous steps
The length passed to NtQueryObject() is the one from step 1, not the one aligned to a 16 byte boundary. There seems to be some sort of header to this scratch space, so perhaps that's where the 16 byte alignment comes from?
Case 1: buffer size too small (here: 4), just querying required length
The up-sized length in this case equals 4, which is too small and consequently NtQueryObject() returns STATUS_INFO_LENGTH_MISMATCH. Required size is reported as 8968.
Down-size from the 64 bit required length to 32 bit and end up 16 bytes too short
Return the status from NtQueryObject() and the down-sized required length form the previous step
Case 2: buffer size supposedly (!) sufficient
Copy OBJECT_TYPES_INFORMATION::NumberOfTypes from queried buffer to 32 bit one
Step to the first entry (OBJECT_TYPE_INFORMATION) of source (64 bit) and target (32 bit) buffer, 8 and 4 byte aligned respectively
For for each entry up to OBJECT_TYPES_INFORMATION::NumberOfTypes:
Copy UNICODE_STRING::Length and UNICODE_STRING::MaximumLength for TypeName member
memcpy() UNICODE_STRING::Length bytes from source to target UNICODE_STRING::Buffer (target entry + sizeof(OBJECT_TYPE_INFORMATION32)
Add terminating zero (WCHAR) past the memcpy'd string
Copy the individual members past the TypeName from 64 to 32 bit struct
Compute pointer of next entry by aligning UNICODE_STRING::MaximumLength up to an 8 byte boundary (i.e. the ULONG_PTR alignment mentioned in the other answer) + sizeof(OBJECT_TYPE_INFORMATION64) (already 8 byte aligned!)
The next target entry (32 bit) gets 4 byte aligned instead
At the end compute required (32 bit) length by subtracting the value we arrived at for the "next" entry (i.e. one past the last) from the base address of the buffer passed by the WOW64 program (32 bit) to NtQueryObject()
In my debugged scenario these were: 0x008ce050 - 0x008cbfe8 = 0x00002068 (= 8296), which is 16 bytes larger than the buffer length we were told during case 1 (8280)!
The issue
That crucial last step differs between merely querying and actually getting the buffer filled. There is no further bounds checking in that loop I described for case 2.
And this means it will just overrun the passed buffer and return a written length bigger than the buffer length passed to it.
Possible solutions and workarounds
I'll have to approach this mathematically after some sleep, the workaround is obviously to top up the required length returned from case 1 in order to avoid the buffer overrun. The easiest method is to use my up_size_from_32bit() from the example below and use that on the returned required size. This way you are allocating enough for the 64 bit buffer, while querying the 32 bit one. This should never overrun during the copy loop.
However, the fix in wow64.dll is a little more involved, I guess. While adding bounds checking to the loop would help avert the overrun, it would mean that the caller would have to query for the required size twice, because the first time around it lies to us.
Which means the query-only case (1) would have to allocate that internal buffer after querying the required length for 64 bit, then get it filled and then walk the entries (just like the copy loop), skipping over the last entry to compute the required length the same as it is now done after the copy loop.
Example program demonstrating the "static" computation by wow64.dll
Build for x64, just the way wow64.dll was!
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <cstdio>
typedef struct
{
ULONG JustPretending[24];
} OBJECT_TYPE_INFORMATION32;
typedef struct
{
ULONG JustPretending[26];
} OBJECT_TYPE_INFORMATION64;
constexpr ULONG size_delta_3264 = sizeof(OBJECT_TYPE_INFORMATION64) - sizeof(OBJECT_TYPE_INFORMATION32);
constexpr ULONG down_size_to_32bit(ULONG len)
{
return len - size_delta_3264 * ((len - 4) / sizeof(OBJECT_TYPE_INFORMATION64));
}
constexpr ULONG up_size_from_32bit(ULONG len)
{
return len + size_delta_3264 * ((len - 4) / sizeof(OBJECT_TYPE_INFORMATION32));
}
// Trying to mimic the wdm.h macro
constexpr size_t align_up_by(size_t address, size_t alignment)
{
return (address + (alignment - 1)) & ~(alignment - 1);
}
constexpr auto u32 = 8280UL;
constexpr auto u64 = 8968UL;
constexpr auto from_64 = down_size_to_32bit(u64);
constexpr auto from_32 = up_size_from_32bit(u32);
constexpr auto from_32_16_byte_aligned = (ULONG)align_up_by(from_32, 16);
int wmain()
{
wprintf(L"32 to 64 bit: %u -> %u -(16-byte-align)-> %u\n", u32, from_32, from_32_16_byte_aligned);
wprintf(L"64 to 32 bit: %u -> %u\n", u64, from_64);
return 0;
}
static_assert(sizeof(OBJECT_TYPE_INFORMATION32) == 96, "Size for 64 bit struct does not match.");
static_assert(sizeof(OBJECT_TYPE_INFORMATION64) == 104, "Size for 64 bit struct does not match.");
static_assert(u32 == from_64, "Must match (from 64 to 32 bit)");
static_assert(u64 == from_32, "Must match (from 32 to 64 bit)");
static_assert(from_32_16_byte_aligned % 16 == 0, "16 byte alignment failed");
static_assert(from_32_16_byte_aligned > from_32, "We're aligning up");
This does not mimic the computation that happens in case 2, though.
On a Linux 2.6.32, i'm looking at /proc/net/tcp and wondering what is the unit of tx_queue and rx_queue.
I can't find this information about receive-queue and transmit-queue in https://www.kernel.org/doc/Documentation/networking/proc_net_tcp.txt
Nor in man 5 proc which shows only:
The "tx_queue" and "rx_queue" are the outgoing and incoming data queue
in terms of kernel memory usage.
Is it bytes? or number of buffers? or maybe i missed a great documentation about this?
Thanks
Short answer - These count bytes. By running netperf TCP_RR with different sizes you could see the exact value of 1 count (only 1 packet in the air in a given time). This value would always show the packet size.
More info:
According to this post:
tx_queue:rx_queue
The size of the transmit and receive queues.
This is per socket. For TCP, the values are updated in get_tcp4_sock() function. It is a bit different in 2.6.32 and 4.14, but the idea is the same. According to the socket state the rx_queue value is updated to either sk->sk_ack_backlog or tp->rcv_nxt - tp->copied_seq. The second value might be negative and is fixed in later kernels to 0 if it is. sk_ack_backlog counts the unacked segments, this is a bit strange since this doesn't seem to be in bytes. Probably missing something here.
From tcp.h:
struct tcp_sock {
...
u32 rcv_nxt; /* What we want to receive next */
u32 copied_seq; /* Head of yet unread data */
Both count in bytes, so tp->rcv_nxt - tp->copied_seq is counting the pending bytes in the receive buffer for incoming packets.
tx_queue is set to tp->write_seq - tp->snd_una. Again from tcp.h:
struct tcp_sock {
...
u32 snd_una; /* First byte we want an ack for */
u32 write_seq; /* Tail(+1) of data held in tcp send buffer */
Here is a bit more clear to see the count is in bytes.
For UDP, it is simpler. The values are updated in udp4_format_sock():
static void udp4_format_sock(struct sock *sp, struct seq_file *f,
int bucket)
{
...
seq_printf(f, "%5d: %08X:%04X %08X:%04X"
" %02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %d",
bucket, src, srcp, dest, destp, sp->sk_state,
sk_wmem_alloc_get(sp),
sk_rmem_alloc_get(sp),
sk_wmem_alloca_get and sk_rmem_alloc_get return sk_wmem_alloc and sk_rmem_alloc respectively, both are in bytes.
Hope this helps.
I've experienced a smashing stack (= buffer overflow) problem recently when trying to run iperf3. I pinpointed the reason to the getsockname() call (https://github.com/esnet/iperf/blob/master/src/net.c#L463) that makes the kernel copy more data (sizeof(sin_addr)) at the designed address (&sa) than the size of the variable on the stack at that address.
getsockname() redirects the call to getname() (AF_INET family) :
https://github.com/torvalds/linux/blob/master/net/ipv4/af_inet.c#L698
If I believe the manpage (ubuntu) it says:
int getsockname(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
The addrlen argument should be initialized to indicate the amount of space (in bytes) pointed to by addr. On return it contains the actual size of the socket address.
The returned address is truncated if the buffer provided is too small; in this case, addrlen will return a value greater than was supplied to the call.
But in the previous code excerpt, getname() does not care about the addrlen input value and uses the parameter as an output value only.
I had found a link (can't find it anymore) saying that BSD respects the previous manpage excerpt contrary to linux.
Am I missing something? I find it awkward that the documentation would be that much off, I've checked other linux XXX_getname calls and all I saw didn't care about the input length.
Short answer
I believe that the addrlen value is not checked in kernel just to not waste some CPU cycles, because it should always be of known type (e.g. struct sockaddr), therefore it should always has known and fixed size (which is 16 bytes). So kernel just rewrites addrlen to 16, no matter what.
Regarding the issue you are having: I'm not sure why it's happening, but it doesn't actually seem that it's about size mismatch. I'm pretty sure kernel and userspace both have the same size of that structure which should be passed to getsockname() syscall (proof is below). So basically the situation you are describing here:
...that makes the kernel copy more data (sizeof(sin_addr)) at the designed address (&sa) than the size of the variable on the stack at that address
is not the case. I could only imagine how many application would fail if it was true.
Detailed explanation
Userspace side
In iperf sources you have next definition of sockaddr struct (/usr/include/bits/socket.h):
/* Structure describing a generic socket address. */
struct sockaddr
{
__SOCKADDR_COMMON (sa_); /* Common data: address family and length. */
char sa_data[14]; /* Address data. */
};
And __SOCKADDR_COMMON macro defined as follows (/usr/include/bits/sockaddr.h):
/* This macro is used to declare the initial common members
of the data types used for socket addresses, `struct sockaddr',
`struct sockaddr_in', `struct sockaddr_un', etc. */
#define __SOCKADDR_COMMON(sa_prefix) \
sa_family_t sa_prefix##family
And sa_family_t defined as:
/* POSIX.1g specifies this type name for the `sa_family' member. */
typedef unsigned short int sa_family_t;
So basically sizeof(struct sockaddr) is always 16 bytes (= sizeof(char[14]) + sizeof(short)).
Kernel side
In inet_getname() function you see that addrlen param is rewritten by next value:
*uaddr_len = sizeof(*sin);
where sin is:
DECLARE_SOCKADDR(struct sockaddr_in *, sin, uaddr);
So you see that sin has type of struct sockaddr_in *. This structure is defined as follows (include/uapi/linux/in.h):
/* Structure describing an Internet (IP) socket address. */
#define __SOCK_SIZE__ 16 /* sizeof(struct sockaddr) */
struct sockaddr_in {
__kernel_sa_family_t sin_family; /* Address family */
__be16 sin_port; /* Port number */
struct in_addr sin_addr; /* Internet address */
/* Pad to size of `struct sockaddr'. */
unsigned char __pad[__SOCK_SIZE__ - sizeof(short int) -
sizeof(unsigned short int) - sizeof(struct in_addr)];
};
So sin variable is also 16 bytes long.
UPDATE
I'll try to reply to your comment:
if getsockname wants to allocate an ipv6 instead that may be why it overflows the buffer
When calling getsockname() for AF_INET6 socket, kernel will figure (in getsockname() syscall, by sockfd_lookup_light() function) that inet6_getname() should be called to handle your request. In that case, uaddr_len will be assigned with next value:
struct sockaddr_in6 *sin = (struct sockaddr_in6 *)uaddr;
...
*uaddr_len = sizeof(*sin);
So if you are using sockaddr_in6 struct in your user-space program too, the size will be the same. Of course, if your userspace application is passing sockaddr structure to getsockname for AF_INET6 socket, there will be some sort of overflow (because sizeof(struct sockaddr_in6) > sizeof(struct sockaddr)). But I believe it's not the case for iperf3 tool you are using. And if it is -- it's iperf that should be fixed in the first place, and not the kernel.
From this example
void process_packet(u_char *args, const struct pcap_pkthdr *header, const u_char *buffer)
{
int size = header->len;
//Get the IP Header part of this packet , excluding the ethernet header
struct iphdr *iph = (struct iphdr*)(buffer + sizeof(struct ethhdr));
++total;
switch (iph->protocol) //Check the Protocol and do accordingly...
{
case 1: //ICMP Protocol
++icmp;
print_icmp_packet( buffer , size);
break;
case 2: //IGMP Protocol
++igmp;
break;
case 6: //TCP Protocol
++tcp;
print_tcp_packet(buffer , size);
break;
case 17: //UDP Protocol
++udp;
print_udp_packet(buffer , size);
break;
default: //Some Other Protocol like ARP etc.
++others;
break;
}
printf("TCP : %d UDP : %d ICMP : %d IGMP : %d Others : %d Total : %d\r", tcp , udp , icmp , igmp , others , total);
}
variable size is, I guess, the size of the header. How do I get the size of the whole packet?
Also, how do I convert uint32_t IP addresses to human readable IP addresses of the form xxx.xxx.xxx.xxx?
variable size is, I guess, the size of the header.
You have guessed incorrectly.
To quote the pcap man page:
Packets are read with pcap_dispatch() or pcap_loop(), which
process one or more packets, calling a callback routine for each
packet, or with pcap_next() or pcap_next_ex(), which return the
next packet. The callback for pcap_dispatch() and pcap_loop() is
supplied a pointer to a struct pcap_pkthdr, which includes the
following members:
ts a struct timeval containing the time when the packet
was captured
caplen a bpf_u_int32 giving the number of bytes of the
packet that are available from the capture
len a bpf_u_int32 giving the length of the packet, in
bytes (which might be more than the number of bytes
available from the capture, if the length of the
packet is larger than the maximum number of bytes to
capture).
so "len" is the total length of the packet. However, there may not be "len" bytes of data available; if the capture was done with a "snapshot length", for example with tcpdump, dumpcap, or TShark using the -s option, the packet could have been cut short, and "caplen" would indicate how many bytes of data you actually have.
Note, however, that Ethernet packets have a minimum length of 60 bytes (not counting the 4-byte FCS at the end, which you probably won't get in your capture), including the 14-byte Ethernet header; this means that short packets must be padded. 60-14 = 46, so if a host sends, over Ethernet, an IP packet that's less than 46 bytes long, it must pad the Ethernet packet.
This means that the "len" field gives the total length of the Ethernet packet, but if you subtract the l4 bytes of Ethernet header from "len", you won't get the length of the IP packet. To get that, you'll need to look in the IP header at the "total length" field. (Don't assume it'll be less than or equal to the value of "len" - 14 - a machine might have sent an invalid IP packet.)
Also, how do I convert uint32_t IP addresses to human readable IP addresses of the form xxx.xxx.xxx.xxx?
By calling routines such as inet_ntoa(), inet_ntoa_r(), or inet_ntop().
No, header->len is length of this packet, just what you want.
see header file pcap.h
struct pcap_pkthdr {
struct timeval ts; /* time stamp */
bpf_u_int32 caplen; /* length of portion present */
bpf_u_int32 len; /* length this packet (off wire) */
};
you can use sprintf() to convert uint32_t ip field to xxx.xxx.xxx.xxx
So I am trying to dynamically allocate a buffer on module initialization. The buffer needs to be in scope at all times as it stores data that user space programs interact with. So here is my code:
static char* file_data
#define MAX_SIZE 256
.
.
.
{
file_data = kzalloc(MAX_SIZE, GFP_KERNEL)
.
.
.
}
However when I do sizeof file_data it always returns 4. What am I doing wrong?
Edit: The buffer stores input from a user space program, but 4 characters is all that can be stored.
size_t read_file(char* __user buf, size_t count)
{
unsigned int len = 0;
len = copy_to_user(buf, file_data, count);
return count;
}
ssize_t write_file(char* __user buf, size_t count)
{
if(count >= MAX_SIZE)
return -EINVAL;
copy_from_user(file_data, buf,count)
return count;
}
file_data is a pointer. On a 32-bit platform, it's size is 32 bits, or 4 bytes. What you want to know is the size of the data pointed to by file_data. You can't use the sizeof operator for this because sizeof is a compile time operation. You can't use it on things allocated dynamically at run time.
(Besides, you already know the size of the data pointed to by file_data -- it's MAX_SIZE?)
char *file_data is a pointer to a char. Evidently you're on a 32-bit system so any pointer is 4 bytes. The compiler (which handles sizeof) doesn't know or care how much memory you're allocating for file_data to point to, it just knows you're asking for the size of the pointer (which you are, whether you meant to or not). If you want the size of the memory it points to, you'll have to keep track of it yourself.