First packet to be sent when starting to browse - browser

Imagine a user sitting at an Ethernet-connected PC. He has a browser open. He types "www.google.com" in the address bar and hits enter.
Now tell me what the first packet to appear on the Ethernet is.
I found this question here: Interview Questions on Socket Programming and Multi-Threading
As I'm not a networking expert, I'd like to hear the answer (I'd assume it is "It depends" ;) ).
With a tool like Wireshark, I can obviously check my own computers behaviour. I'd like to know whether the packets I see (e.g. ARP, DNS, VRRP) are the same in each ethernet configuration (is it dependent on the OS? the driver? the browser even :)?) and which are the conditions in which they appear. Being on the data-link layer, is it maybe even dependent on the physical network (connected to a hub/switch/router)?

The answers that talk about using ARP to find the DNS server are generally wrong.
In particular, IP address resolution for off-net IP addresses is never done using ARP, and it's not the router's responsibility to answer such an ARP query.
Off-net routing is done by the client machine knowing which IP addresses are on the local subnets to which it is connected. If the requested IP address is not local, then the client machine refers to its routing table to find out which gateway to send the packet to.
Hence in most circumstances the first packet sent out will be an ARP request to find the MAC address of the default gateway, if it's not already in the ARP cache.
Only then can it send the DNS query via the gateway. In this case the packet is sent with the DNS server's IP address in the IP destination field, but with the gateway's MAC address on the ethernet packet.

You can always download wireshark and take a look.
Though to spoil the fun.
Assuming, the IP address of the host is not cached, and the MAC address of the DNS server is not cached, the first thing that will be sent will be a broadcast ARP message trying to find out the MAC address of the DNS server (which the router will respond to with its own address).
Next, the host name will be resolved using DNS. Then the returned IP address will be resolved using ARP (again the router will respond with its own address), and finally, the HTTP message will actually be sent.

Actually, it depends on a variety of initial conditions you left unspecified.
Assuming the PC is running an operating system containing a local DNS caching resolver (mine does), the first thing that happens before any packets are sent is the cache is searched for an IP address. This is complicated, because "www.google.com" isn't a fully-qualified domain name, i.e. it's missing the trailing dot, so the DNS resolver will accept any records already in its cache that match its search domain list first. For example, if your search domain list is "example.com." followed by "yoyodyne.com." then cached resources matching the names "www.google.com.example.com." "www.google.com.yoyodyne.com." and finally "www.google.com." will be used if available. Also note: if the web browser is one of the more popular ones, and the PC is running a reasonably current operating system, and the host has at least one network interface with a global scope IPv6 address assigned (and the host is on a network where www.google.com has AAAA records in its DNS horizon), then the remote address of the server might be IPv6 not IPv4. This will be important later.
If the remote address of the Google web server was locally cached in DNS, and the ARP/ND6 cache contains an entry for the IPv4/IPv6 address (respectively) of a default router, then the first transmitted packet will be a TCP SYN packet sourced from the interface address attached to the router and destined for the cached remote IPv4/IPv6 address. Alternatively, the default router could be reachable over some kind of layer-2 or layer-3 tunnel, in which case, the SYN packet will be appropriately encapsulated.
If the remote address of the Google web server was not locally cached, then the host will first need to query for the A and/or AAAA records in the DNS domain search list in sequence until it gets a positive response. If the first DNS resolving server address in the resolver configuration is in one of the local IPv4 subnet ranges, or in a locally attached IPv6 prefix with the L=1 bit set in the router advertisement, and the ARP/ND6 cache already contains an entry for the address in question, then the first packet the host will send is a direct DNS query for either an A record or a AAAA record matching the first fully-qualified domain name in the domain search list. Alternatively, if the first DNS server is not addressable on-link, and a default router has an ARP/ND6 cache entry already, then the DNS query packet will be sent to the default router to forward to the DNS server.
In the event the local on-link DNS server or a default router (respectively, as the case above may be) has no entry in the ARP/ND6 cache, then the first packet the host will send is either an ARP request or an ICMP6 neighbor solicitation for the corresponding address.
Oh, but wait... it's even more horrible. There are tweaky weird edge cases where the first packet the host sends might be a LLMNR query, an IKE initiation, or... or... or... how much do you really care about all this, buckaroo?

It depends
Got that right. E.g. does the local DNS cache contain the address? If not then a DNS lookup is likely to be the first thing.

If the host name is not in DNS cache nor in hosts file, first packet will go to DNS.
Otherwise, the first packet will be HTTP GET.

Well, whatever you try to do, the first thing happening is some Ethernet protocol related data. Notably, Ethernet adapters have to decide whether the Ethernet bus is available (so there's some collision detection taking place here)
It's hard to answer your question because it depends a lot on the type of ethernet network you're using. More information on Ethernet transmission can be found here and here

Related

Questions about mDns response packets

I have a few questions about the mdns protocol.
1) mdns additional records add additional data about the services and domains specified in the Answers section. Can the Answer section contain more than one service, and each additional record "point" to a differnt service, depending on its offset flag? in other words, when querying an mdns packet, should we assume additional records refer to different answers, if more than one answer exists? is this scenario possible?
2) mdns provide an "A" type which is the IP address of the service. can this IP mismatch the IP address the response packet was sent from? usually, the service response on its behalf, providing its IP address when responding. but the IP address is anyway known to the receiver because this is the source IP. can the mdns responder provide an IP different than the IP it currently using?
thanks!
Can the Answer section contain more than one service, and each additional record "point" to a different service, depending on its offset flag?
Yes, a single response can contain multiple answers for the same name, e.g. multiple services being announced by the same mdns responder, and so can the additional answers.
In other words, when querying an mdns packet, should we assume additional records refer to different answers?
The additional answer records refer to their name, like any record in dns. This might, or might not, be a name that is also used by/in a record in the answer section.
Can the IP from the "A" record mismatch the IP address the response packet was sent from?
Yes. A mdns responder can resolve any host name to any ip address, not just its own name to its own ip address. An example of this might be a proxy with two network interfaces that connects the two networks by responding on the behalf of the hosts in the other. Or simpler, in the context of service discovery, a local host might announce a service that is running elsewhere, not under a .local name, and has an IP address outside the network.

does the echo reply do another DNS lookup?

When I do ping www.google.com the protocol ICMP does a DNS lookup and send a echo request to the google IP. When google responds, another dns lookup happens?
And if I do ping -n www.google.com do I save time but not performing the two dns lookups?
the protocol ICMP does a DNS lookup
This is incorrect. ICMP, like IP deals with addresses. When you use names, applications (that is the ping program) will query the OS to translate the name to an IP.
This mapping may be done in different way and the DNS is only one among others. On Unix systems, see the /etc/nsswitch.conf file that dictates how various things should be resolved, like hostsnames. Typically it is at least a mix of the content of the /etc/hosts file and DNS queries.
So once the ping program, typically using getaddringo OS class has resolved the name to an IP address, then it starts using the ICMP protocol towards that address.
When google responds, another dns lookup happens?
No, why should it? Again, when ICMP happens we are already and now solely in a world of addresses, there are no names there.
do I save time but not performing the two dns lookups?
Yes, probably, but just once (DNS uses caches) and the difference will be marginal and lost normally in many other things.
But the interesting question is why do you have this question, I mean what do you try to attempt going that way? Note that ping (because it uses ICMP) is a poor troubleshooting tool as ICMP traffic is often configured very differently in networks than IP traffic.
(And you should not use www.google.com as an anchor for tests, have a look at RIPE anchors instead for example, but again it depends a lot on what is your true underlying problem here).

IGMPv2 flood source detection

In wireshark I can see Membership Query, general IGMPv2 requests coming over and over from 0.0.0.0 source which suggests ( according to RFC ) machine that hasn't received address yet. My question is how in Linux environment I can find such machine. This query triggers many answers and causes significant network communication slowdown.
When a machine is connected to a network for the first time, it will try to find the DHCP servers in order to get an IP address configuration. Untill then, as you already said, it has no IP address and the only identifier it has is it's MAC address, which is used to keep a comunication alive while it negotiates with the DHCP server (during this period it does not have an IP address until the very last).
Answering your question, you'd find the machine you are looking for making use of the MAC address. If you are on a small network, a manual check (ifconfig) will do it but, if you are on a big one, you better check the ARP table of your switch(es) to have a better idea where it could be.

Accuracy of remote IP address retrieved from packet

If it were possible to retrieve the remote IP from a packet received by my Apache2 server (through a custom plugin perhaps), would it always be guaranteed to be accurate? Or is this value as easy to spoof as the referrer header?
My intended use case is to rate-limit unauthenticated API calls.
If it's a TCP packet, then it'll be accurate as to the sending host. IPs in TCP packets cannot be spoofed unless you've got control of the routers involved. With spoofed source packets, only the initial SYN packet will come back, and then the SYN+ACK response from the server will go to the spoofed address, not wherever the forgery came from - e.g. you cannot do the full 3-way handshake unless you can control packet routing from the targetted machine.
UDP packets, on the other hand, can be trivially forged and you cannot trust anything about them.
As well, even simple things like proxy servers and NAT gateways can cloak the 'real' ip from where the packet originated. You'll get an IP, but it'll be the IP of the gateway/proxy, not the original machine.
It is not reliable. Not only because it can be spoofed, but also because a network element can make your server see a different IP address.
For example, it is very typical in a company to access the Internet through a proxy. Depending on the configuration, from your server point of view, all the different users come from the same IP address.
In any case is a filter you can use in many scenarios. For example, show a captcha when you detect too many login requests from the same IP address.
If your intention is to rate-limit invalid API calls you might want to consider using a service like spamhaus. They list IP's that are likely bots and probes. There are other companies and lists as well. But if your intention is to ever ID the 'bad guy' the source IP is very unlikely to be correct.

If IP addresses can be spoofed so

If IP addresses can be spoofed by creating false or manipulated http headers, and therefore it should not be relied upon in validating the incoming request in our PHP/ASP pages, how come servers take that and rely on it? For example, denying IPs or allowing them are all based on IP.
do servers get the IP information some other ( and more reliable ) way than say PHP/ASP gets it thru server variables?
Servers are typically willing to rely upon the IP address of a connection for low-risk traffic because setting up a TCP session requires a three-way handshake. This handshake can only succeed if the IP address in the packets is routable and some machine is prepared to handle the connection. A rogue router could fake IP addresses but in general, it is more difficult to fake connections the further away from either endpoint the router is, so most people are prepared to rely on it for low-risk uses. (DNS spoofing is far more likely way to misrepresent a connection endpoint, for example.)
Higher-risk users must use something more like TLS, IPsec, or CIPSO (rare) to validate the connection end-point, or build user authentication onto the lower layers to authenticate specific connections (OpenSSH).
But the actual contents of the TCP session can be anything and everything -- and a server should not rely upon the contents of the TCP session (such as HTTP headers) to faithfully report IP addresses or anything else vital.
IP addresses cannot be spoofed. The address is needed for the server to send a reply.
PHP gets the IP address for its $_SERVER global from the server (hence the variable name!), which determines the address from lower in the protocol stack.
EDIT:
sarnold makes a good point that, in principle, one could corrupt routing tables to misdirect traffic. (Indeed, I believe there was an incident of this in a Tier 1 router in Asia a couple years ago.) So I should clarify that my comment that "IP addresses cannot be spoofed" was narrowly tailored to point out that the server variables will always faithfully reflect the destination IP. What goes on beyond the the server's borders is another matter altogether.

Resources