Do Web browsers use different port numbers for open tabs? - multithreading

I'm wondering how browsers internally works.
Now, connecting to different Web sites using 'Tabs' within the same browser can be handled in one of two ways:
1 - Using threads
2 - Using Different Source Port Numbers for each open tab
I know there might not be a single answer for this question, and it might differ from one browser to another, however all responses are appreciated.
Thanks
Thanks for everyone.I really appreciate that.
My question relates to the Source Port at the client side. I'm asking if the browser uses different source ports for each tab it opens, or the same source port for the entire process 'I mean window that includes different tabs', or the same source port for the entire windows?
Or, do web browsers use threads?

Threads and ports are separate and mostly unrelated concepts.
Threads are what the local computer processor does to handle computations, such as drawing to the screen or waiting for Internet traffic. There's probably a separate thread (and more) for those operations in each tab.
Ports are what the traffic itself is identified by (in TCP and UDP). In order to communicate your browser would open a local port (usually something big like ~5000, and that doesn't matter as long as its unique) and connect to the server on usually port 80 (the one the server is listening on). If your computer didn't know the remote port it couldn't connect, but its standard to use 80 for HTTP, for example.
Browsers open tabs in separate threads (and new ones even in separate processes for security and reliability reasons), and use separate ports on the client side. So yes, the answer is both threads and ports. They always use the same remote port unless you physically specify otherwise (for example, connecting to a website using https:// instead of http:// uses a separate port because that's how that protocol was made). You can specify a port to use in modern browsers with :# after the name, too. (example: http://www.google.com:81/, however that will fail because that's not what port they listen on!)

A quick check using netstat (or sockstat on BSD machines) reveals that different source port numbers are used for different connections. In that regard, you are right.
Firefox uses at least one thread for each tab. Each thread could open multiple connections for different data (for example, loading images from a media server and content from the web server). Each connection should have its own source port.

Depending on the browser it uses different threads or different processes for each tab. The local ports used probably don't have much to do with different tabs.

Copied from Cisco CCNA course:
For instance, assume a host is initiating a web page request from a web server. When the host initiates the web page request, the source port number is dynamically generated by the host to uniquely identify the conversation. Each request generated by a host will use a different dynamically created source port number. This process allows multiple conversations to occur simultaneously.
In the request, the destination port number is what identifies the type of service being requested of the destination web server.. For example, when a client specifies port 80 in the destination port, the server that receives the message knows that web services are being requested.

no! it usually uses port 80 by default unless specified. for example www.someweb.com:8080.
Tabs within browser i am assuming ran on different threads

Related

What port number shall I use?

I am writing an app that is to be running on a Windows PC. I need to create a server socket listening on 127.0.0.1, and another client socket which is to connect with this server socket.
Since the data exchange between the two sockets are within the same machine and there is no client connecting from outside of the machine, what port to use is insignificant, as long as the two sockets use the same port number.
So, how do I decide which port number to use? Shall it be a hard-coded port number such as 49500? What if another unrelated app on this machine happen to use this port number? Or shall I get the list of all used ports and programmatically pick an unused port?
Just want to know what is the best approach. Thanks.
ports within 0 to 1023 are generally controlled and you should assign your socket with higher port numbers, although in that range ports within 1024 and 49151 can be registered for others to be informed about that and not use them.
if you want to avoid conflicts you can see registered ports on your machine and assign a port number to your socket which is empty but ports higher than that (49152 to 65535) are completely free and are not even registered.
generally, it is not common to worry about that. for example, two major applications like VMware and apache web server operate on the same port number (443), and if you want to use VMware workstation and Xampp (which works with apache) you have to simply make one of them listen on another port and its not a big deal. so in my opinion the best practice is to let your users change this via a config file or something similar.
for further information, you can search google. for instance this link might be useful:
https://www.sciencedirect.com/topics/computer-science/registered-port#:~:text=Well%2Dknown%20ports%E2%80%94Ports%20in,1023%20are%20assigned%20and%20controlled.&text=Registered%20ports%E2%80%94Ports%20in%20the,be%20registered%20to%20prevent%20duplication.&text=Dynamic%20ports%E2%80%94Ports%20in%20the,assigned%2C%20controlled%2C%20or%20registered.

How is it possible to identify a website using the process id and port?

I am trying to link a website to a particular opened port. However, using both netstat and lsof, I found that when I open a single website on firefox (say facebook), there are multiple process and ports opened.I don't understand why a single website open multiple processes and ports? Should it not be that one application associates with one single process and port number?
Could anyone please suggest me, how is it possible to identify a website using the process id and port used ?
As mentioned above, I have tried netstat and lsof, which gives multiple process id and port number which makes it hard to uniquely identify the website.
Thank you
If you are the server providing services, the port is manage by you and the service example http is 80, https is 443.
In the other hand if you are the client, your browser will use an available port, that means that the port you use is not going to be the same and is not going to be 80 or 443.
Websites can have content from many sources, images can be on a server, javascript, css can be on another server and the main content must be from the actual web-site.
Hope it helps.

Can TCPv4 source and destination ports conflict with each other? Or do source and destination ports live in their own address spaces?

Let me be more specific about my question with an example: Let's say that I have a slew of little servers that all start up on different ports using TCPv4. These ports are going to be destination ports, of course. Let's further assume that these little servers don't just start up at boot time like a typical server, but rather they churn dynamically based on demand. They start up when needed, and may shut themselves down for a while, and then later start up again.
Now let's say that on this same computer, we also have lots of client processes making requests to server processes on other computers via TCPv4. When a client makes such a request, it is assigned a source port by the OS.
Let's say for the sake of this example that a client processes makes a web request to a RESTful server running on a different computer. Let's also say that the source port assigned by the OS to this request is port 7777.
For this example let's also say that while the above request is still occurring, one of our little servers wants to start up, and it wants to start up on destination port 7777.
My question is will this cause a conflict? I.e., will the server get an error because port 7777 is already in use? Or will everything be fine because these two different kinds of ports live in different address spaces that cannot conflict with each other?
One reason I'm worried about the potential for conflict here is that I've seen web pages that say that "ephemeral source port selection" is typically done in a port number range that begins at a relatively high number. Here is such a web page:
https://www.cymru.com/jtk/misc/ephemeralports.html
A natural assumption for why source ports would begin at high numbers, rather than just starting at 1, is to avoid conflict with the destination ports used by server processes. Though I haven't yet seen anything that explicitly comes out and says that this is the case.
P.S. There is, of course, a potential distinction between what the TCPv4 protocol spec has to say on this issue, and what OSes actually do. E.g., perhaps the protocol is agnostic, but OSes tend to only use a single address space? Or perhaps different OSes treat the issue differently?
Personally, I'm most interested at the moment in what Linux would do.
The TCP specification says that connections are identified by the tuple:
{local addr, local port, remote addr, remote port}
Based on this, there theoretically shouldn't be a conflict between a local port used in an existing connection and trying to bind that same port for a server to listen on, since the listening socket doesn't have a remote address/port (these are represented as wildcards in the spec).
However, most TCP implementations, including the Unix sockets API, are more strict than this. If the local port is already in use in any existing socket, you won't be able to bind it, you'll get the error EADDRINUSE. A special exception is made if the existing sockets are all in TIME_WAIT state and the new socket has the SO_REUSEADDR socket option; this is used to allow a server to restart while the sockets left over from a previous process are still waiting to time out.
For this reason, the port range is generally divided into ranges with different uses. When a socket doesn't bind a local port (either because it just called connect() without calling bind(), or by specifying IPPORT_ANY as the port in bind()), the port is chosen from the ephemeral range, which is usually very high numbered ports. Servers, on the other hand, are expected to bind to low-numbered ports. If network applications follow this convention, you should not run into conflicts.

how to efficiently transfer file between 2 node.js instances?

I'm developing chat application using app.js which is webkit+node.js framework.
So i have node.js plus bridged web browser environment on both sides.
I want to make file transfer feature somewhat similar to Skype one.
So, initial idea is to:
1.connect clients to main server.
2.Each client gets ip of oposite ones.
3.Start socket or websocket server on both clients and connect to each other.
4.Sender reads the file and transmits it to the reciver.
Question are:
1.Im not really sure that one client can "see" the other.
2.file is a binary data, but websockets are made for text messages so i need some kind of coding/decoding stuff. I thought about base 64 but it has 30% of "overhead" information. So i need something more effitient (base 128?).
3.If it is not efficient to use websocket should i use TCP sockets instead? What problems can appear if i decide to use them?
Yeah i know about node2node and BinaryJS, i just dont know should i use them or not. And i really what to do something myself.
OK, with your communication looking like this:
(C->N)<->N<->(N->C)
(...) is installed on one client's machine. N's are node servers, C's are web clients.
This is out of your control. Some file sharing apps send test packets from the central server to clients, to check whether ports are open and NAT rules are configured correctly, etc. Your clients will start their own servers on some port, your master server can potentially create a test connection to these servers to see whether they're started correctly and open to the web, BEFORE telling other clients that they can send files.
Websockets are great for status messages from your servers to the web GUIs and general client-to-client communication. For the actual file transfers, I would use TCP sockets, see the next answer. On the other hand base64 encoding is really not a slow process, play with it and benchmark its performance, then decide with some data to back up your decision.
You could use a combination: websockets from your servers to the web GUIs, but TCP communication between the servers themselves. TCP servers (and streams) aren't hard to set up in Node, I see no disadvantages. It might actually be less complicated than installing node2node on those servers, since TCP is already built-in.

tcp_tw_reuse vs tcp_tw_recycle : Which to use (or both)?

I have a website and application which use a significant number of connections. It normally has about 3,000 connections statically open, and can receive anywhere from 5,000 to 50,000 connection attempts in a few seconds time frame.
I have had the problem of running out of local ports to open new connections due to TIME_WAIT status sockets. Even with tcp_fin_timeout set to a low value (1-5), this seemed to just be causing too much overhead/slowdown, and it would still occasionally be unable to open a new socket.
I've looked at tcp_tw_reuse and tcp_tw_recycle, but I am not sure which of these would be the preferred choice, or if using both of them is an option.
According to Linux documentation, you should use the TCP_TW_REUSE flag to allow reusing sockets in TIME_WAIT state for new connections.
It seems to be a good option when dealing with a web server that have to handle many short TCP connections left in a TIME_WAIT state.
As described here, The TCP_TW_RECYCLE could cause some problems when using load balancers...
EDIT (to add some warnings ;) ):
as mentionned in comment by #raittes, the "problems when using load balancers" is about public-facing servers. When recycle is enabled, the server can't distinguish new incoming connections from different clients behind the same NAT device.
NOTE: net.ipv4.tcp_tw_recycle has been removed from Linux in 4.12 (4396e46187ca tcp: remove tcp_tw_recycle).
SOURCE: https://vincent.bernat.im/en/blog/2014-tcp-time-wait-state-linux
pevik mentioned an interesting blog post going the extra mile in describing all available options at the time.
Modifying kernel options must be seen as a last-resort option, and shall generally be avoided unless you know what you are doing... if that were the case you would not be asking for help over here. Hence, I would advise against doing that.
The most suitable piece of advice I can provide is pointing out the part describing what a network connection is: quadruplets (client address, client port, server address, server port).
If you can make the available ports pool bigger, you will be able to accept more concurrent connections:
Client address & client ports you cannot multiply (out of your control)
Server ports: you can only change by tweaking a kernel parameter: less critical than changing TCP buckets or reuse, if you know how much ports you need to leave available for other processes on your system
Server addresses: adding addresses to your host and balancing traffic on them:
behind L4 systems already sized for your load or directly
resolving your domain name to multiple IP addresses (and hoping the load will be shared across addresses through DNS for instance)
According to the VMWare document, the main difference is TCP_TW_REUSE works only on outbound communications.
TCP_TW_REUSE uses server-side time-stamps to allow the server to use a time-wait socket port number for outbound communications once the time-stamp is larger than the last received packet. The use of these time-stamps allows duplicate packets or delayed packets from the old connection to be discarded safely.
TCP_TW_RECYCLE uses the same server-side time-stamps, however it affects both inbound and outbound connections. This is useful when the server is the first party to initiate connection closure. This allows a new client inbound connection from the source IP to the server. Due to this difference, it causes issues where client devices are behind NAT devices, as multiple devices attempting to contact the server may be unable to establish a connection until the Time-Wait state has aged out in its entirety.

Resources