How do Transfer Protocols work?

How do Transfer Protocols work? - protocols

Hypothetically, lets say that I wanted to study/create (a) transfer protocol such as http, ftp or ptp. How would I go about doing so? What do I need to know about the internet and servers and what do I need to make to be able to send and receive data through my own homemade transfer protocol?

That's a little backwards.
First you have a problem you need to solve that involves multiple machines.
Then you write software to solve it, which requires communication between those machines.
The details of that communication is called a 'protocol'.
Since the protocol is the interface between machines, it's beneficial if it is generic enough to let you swap out the software on one side or the other.
In this way, HTTP was invented to serve web pages to browsers, FTP was invented to let users transfer files, etc. The details of the protocol indicate the elements of communication required to solve the problem in the desired way.

Related

Transfer protocol for sending user uploaded files to a remote server?

I'm used to working with user-uploaded files to the same server, and transferring my own files to a remote server. But not transferring user-uploaded files to a remote server.
I'm looking for the best (industry) practice for selecting a transfer protocol in this regard.
My application is running Django on a Linux Server and the files live on a Windows Server.
Does it not matter which protocol I choose as long as it's secure (FTPS, SFTP, HTTPS)? Or is one better than the other in terms of performance/security specifically in regards to user-uploaded files?
Please do not link to questions that explain the differences of protocols, I am asking specifically in the context of user-uploaded files.

As long as you choose a standard protocol that provides (mutual) authentication, encryption and message authentication, there is not much difference security-wise. If all of this is provided by a layer of TLS in your chosen protocol (like in all of your examples), you can't make a big mistake on a design level (but implementation is key, many security bugs are bugs of implementation, and not design flaws). Such protocols might differ in the supported list of algorithms for different purposes though.
Performance-wise there can be quite significant differences, it depends on what you want to optimize for. If you choose HTTPS, you won't be able to keep a connection open for a long time, and would most probably have to bear the overhead of the whole connection setup with authentication and everything, for every transmitted file. (Well, you can actually keep a https connection open, but that would be quite a custom implementation for such file uploads.) Choosing FTPS/SFTP you will be able to keep a connection open and transmit as many files as you want, but would probably have to have more complex error handling logic (sometimes connections terminate without the underlying sockets knowing about it for a while and so on). So in short I think HTTPS would be more resilient, but secure FTP would be more performant for many small files.
It's also an architecture question, by using HTTPS, you would be able to implement all of this in your application code, while something like FTP would mean dependence on external components, which might be important from an operational point of view (think about how this will actually be deployed and whether there is already a devops function to manage proper operations).
Ultimately it's just a design decision you have to make, the above is just a few things that came to mind without knowing all the circumstances, and not at all a comprehensive list of things to consider.

What security risks are posed by using a local server to provide a browser-based gui for a program?

I am building a relatively simple program to gather and sort data input by the user. I would like to use a local server running through a web browser for two reasons:
HTML forms are a simple and effective means for gathering the input I'll need.
I want to be able to run the program off-line and without having to manage the security risks involved with accessing a remote server.
Edit: To clarify, I mean that the application should be accessible only from the local network and not from the Internet.
As I've been seeking out information on the issue, I've encountered one or two remarks suggesting that local servers have their own security risks, but I'm not clear on the nature or severity of those risks.
(In case it is relevant, I will be using SWI-Prolog for handling the data manipulation. I also plan on using the SWI-Prolog HTTP package for the server, but I am willing to reconsider this choice if it turns out to be a bad idea.)
I have two questions:
What security risks does one need to be aware of when using a local server for this purpose? (Note: In my case, the program will likely deal with some very sensitive information, so I don't have room for any laxity on this issue).
How does one go about mitigating these risks? (Or, where I should look to learn how to address this issue?)
I'm very grateful for any and all help!

There are security risks with any solution. You can use tools proven by years and one day be hacked (from my own experience). And you can pay a lot for security solution and never be hacked. So, you need always compare efforts with impact.
Basically, you need protect 4 "doors" in your case:
1. Authorization (password interception or, for example improper, usage of cookies)
2. http protocol
3. Application input
4. Other ways to access your database (not using http, for example, by ssh port with weak password, taking your computer or hard disk etc. In some cases you need properly encrypt the volume)
1 and 4 are not specific for Prolog but 4 is only one which has some specific in a case of local servers.
Protect http protocol level means do not allow requests which can take control over your swi-prolog server. For this purpose I recommend install some reverse-proxy like nginx which can prevent attacks on this level including some type of DoS. So, browser will contact nginx and nginx will redirect request to your server if it is a correct http request. You can use any other server instead of nginx if it has similar features.
You need install proper ssl key and allow ssl (https) in your reverse proxy server. It should be not in your swi-prolog server. Https will encrypt all information and will communicate with swi-prolog by http.
Think about authorization. There are methods which can be broken very easily. You need study this topic, there are lot of information. I think it is most important part.
Application input problem - the famose example is "sql injection". Study examples. All good web frameworks have "entry" procedures to clean all possible injections. Take an existing code and rewrite it with prolog.
Also, test all input fields with very long string, different charsets etc.
You can see, the security is not so easy, but you can select appropriate efforts considering with the impact of hacking.
Also, think about possible attacker. If somebody is very interested particulary to get your information all mentioned methods are good. But it can be a rare case. Most often hackers just scan internet and try apply known hacks to all found servers. In this case your best friend should be Honey-Pots and prolog itself, because the probability of hacker interest to swi-prolog internals is extremely low. (Hacker need to study well the server code to find a door).
So I think you will found adequate methods to protect all sensitive data.
But please, never use passwords with combinations of dictionary words and the same password more then for one purpose, it is the most important rule of security. For the same reason you shouldn't give access for your users to all information, but protection should be on the app level design.
The cases specific to a local server are a good firewall, proper network setup and encription of hard drive partition if your local server can be stolen by "hacker".
But if you mean the application should be accessible only from your local network and not from Internet you need much less efforts, mainly you need check your router/firewall setup and the 4th door in my list.
In a case you have a very limited number of known users you can just propose them to use VPN and not protect your server as in the case of "global" access.

I'd point out that my post was about a security issue with using port forwarding in apache
to access a prolog server.
And I do know of a successful prolog injection DOS attack on a SWI-Prolog http framework based website. I don't believe the website's author wants the details made public, but the possibility is certainly real.
Obviously this attack vector is only possible if the site evaluates Turing complete code (or code which it can't prove will terminate).
A simple security precaution is to check the Request object and reject requests from anything but localhost.
I'd point out that the pldoc server only responds by default on localhost.
- Anne Ogborn

I think SWI_Prolog http package is an excellent choice. Jan Wielemaker put much effort in making it secure and scalable.
I don't think you need to worry about SQL injection, indeed would be strange to rely on SQL when you have Prolog power at your fingers...
Of course, you need to properly manage the http access in your server...
Just this morning there has been an interesting post in SWI-Prolog mailing list, about this topic: Anne Ogborn shares her experience...

flow-based traffic classification for traffic shaping

I’m wondering if there are ways to achieve flow-based traffic shaping with linux.
Traditional traffic shaping approaches seem be based on creating classes for specific protocols or types of packets (such as ssh, http, SYN or ACK) that need high troughput.
Here I want to see every TCP connection as a flow characterized by a certain data-rate.
There’ll be
quick flows such as interactive ssh or IRC chat and
slow flows (bulk data) such as scp or http file transfers
Now I’m looking for a way to characterize / classify an incoming packet to one of these classes, so I can run a tc based traffic shaper on it. Any hints?

Since you mention a dedicated machine I'll assume that you are managing from a network bridge and, as such, have access to the entirety of the packet for the lifetime it is in your system.
First and foremost: throttling at the receiving side of a connection is meaningless when you are speaking of link saturation. By the time you see the packet it has already consumed resources. This is true even if you are a bridge; you can only realistically do anything intelligent on the egress interface.
I don't think you will find an off-the-shelf product that is going to do exactly what you want. You are going to have to modify something like dummynet to be dynamic according to rules you derive during execution or you are going to have to program a dynamic software router using some existing infrastructure. One I am familiar with is Click modular router, but there are others. I really dont know how things like tc and ipfw will react to being configured/reconfigured with high frequency - I suspect poorly.
There are things that you should address ahead of time, however. Things that are going to make this task difficult regardless of the implementation. For instance,
How do you plan on differentiating between scp bulk and ssh interactive behavior? Will you monitor initial behavior and apply a rule based on that?
You mention HTTP-specific throttling; this implies DPI. Will you be able to support that on this bridge/router? How many classes of application traffic will you support?
How do you plan on handling contention? (you allot for 'bulk' flows to each get 30% of the capacity but get 10 'bulk' flows trying to consume)
Will you hard-code the link capacity or measure it? Is it fixed or will it vary?
In general, you can get a fairly rough idea of 'flow' by just hashing the networking 5-tuple. Once you start dealing with applications semantics, however, all bets are off and you need to plow through packet contents to get what you want.
If you had a more specific purpose it might render some of these points moot.

Chat program without a central server

I'm developing a chat application (in VB.Net). It will be a "secure" chat program. All traffic will be encrypted (I also need to find the best approach for this, but that's not the question for now).
Currently the program works. I have a server application and a client application. However I want to setup the application so that it doesn't need a central server for it to work.
What approach can I take to decentralize the network?
I think I need to develop the clients in a way so that they do also act as a server.
How would the clients know what server it needs to connect with / what happens if a server is down? How would the clients / servers now what other nodes there are in the network without having a central server?
At best I don't want the clients to know what the IP addresses are of the different nodes, however I don't think this would be possible without having a central server.
As stated the application will be written in VB.Net, but I think the language doesn't really matter at this point.
Just want to know the different approaches I can follow.

Look for example at the paper of the Kademlia protocol (you can find it here). If you just want a quick overview, look at the Wikipedia page http://en.wikipedia.org/wiki/Kademlia. The Kademlia protocol defines a way of node lookups in a network in a decentral way. It has been successfully applied in the eMule software - so it is tested to really work.
It should cause no serious problems to apply it to your chat software.

You need some known IP address for clients to initially get into a network. Once a client is part of a network, things can be more decentralized, but that first step needs something.
There are basically only two options - either the user provides one (for an existing node of the network - essentially how BitTorrent trackers work), or you hard-code in a gateway node (which is effectively a central server).

Maybe you can see uChat program. It's a program from uTorrent creator with chat without server in mind.
The idea is connect to a swarm from a magnetlink and use it to send an receive messages. This is as Amber answer, you need an access point, may it be a server, a know swarm, manual ip, etc.
Here is uChat presentation: http://blog.bittorrent.com/2011/06/30/uchat-we-just-need-each-other/

What protocol should I use for fast command/response interactions?

I need to set up a protocol for fast command/response interactions. My instinct tells me to just knock together a simple protocol with CRLF separated ascii strings like how SMTP or POP3 works, and tunnel it through SSH/SSL if I need it to be secured.
While I could just do this, I'd prefer to build on an existing technology so people could use a friendly library rather than the socket library interface the OS gives them.
I need...
Commands and responses passing structured data back and forth. (XML, S expressions, don't care.)
The ability for the server to make unscheduled notifications to the client without being polled.
Any ideas please?

If you just want request/reply, HTTP is very simple. It's already a request/response protocol. The client and server side are widely implemented in most languages. Scaling it up is well understood.
The easiest way to use it is to send commands to the server as POST requests and for the server to send back the reply in the body of the response. You could also extend HTTP with your own verbs, but that would make it more work to take advantage of caching proxies and other infrastructure that understands HTTP.
If you want async notifications, then look at pub/sub protocols (Spread, XMPP, AMQP, JMS implementations or commercial pub/sub message brokers like TibcoRV, Tibco EMS or Websphere MQ). The protocol or implementation to pick depends on the reliability, latency and throughput needs of the system you're building. For example, is it ok for notifications to be dropped when the network is congested? What happens to notifications when a client is off-line -- do they get discarded or queued up for when the client reconnects.

AMQP sounds promising. Alternatively, I think XMPP supports much of what you want, though with quite a bit of overhead.
That said, depending on what you're trying to accomplish, a simple ad hoc protocol might be easier.

How about something like SNMP? I'm not sure if it fits exactly with the model your app uses, but it supports both async notify and pull (i.e., TRAP and GET).

That's a great question with a huge number of variables to consider, and the question only mentioned a few them: packet format, asynchronous vs. synchronized messaging, and security. There are many, many others one could think about. I suggest going through a description of the 7-layer protocol stack (OSI/ISO) and asking yourself what you need at those layers, and whether you want to build that layer or get it from somewhere else. (You seem mostly interested in layer 6 and 7, but also mentioned bits of lower layers.)
Think also about whether this is in a safety-critical application or part of a system with formal V&V. Really good, trustworthy communication systems are not easy to design; also an "underpowered" protocol can put a lot of coding burden on application to do error-recovery.
Finally, I would suggest looking at how other applications similar to yours do the job (check open source, read books, etc.) Also useful is the U.S. Patent Office database, etc; one can get great ideas just from reading the description of the communication problem they were trying to solve.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string