Uploading & extracting archive (zip, rar, targz, tarbz) automatically - security issue?

Uploading & extracting archive (zip, rar, targz, tarbz) automatically - security issue? - security

I'd like to create following functionality for my web-based application:
user uploads an archive file (zip/rar/tar.gz/tar.bz etc) (content - several image files)
archive is automatically extracted after upload
images are shown in the HTML list (whatever)
Are there any security issues involved with extraction process? E.g. possibility of malicious code execution contained within uploaded files (or well-prepared archive file), or else?

Aside the possibility of exploiting the system with things like buffer overflows if it's not implemented carefully, there can be issues if you blindly extract a well crafted compressed file with a large file with redundant patterns inside (a zip bomb). The compressed version is very small but when you extract, it'll take up the whole disk causing denial of service and possibly crashing the system.
Also, if you are not careful enough, the client might hand a zip file with server-side executable contents (.php, .asp, .aspx, ...) inside and request the file over HTTP, which, if not configured properly can result in arbitrary code execution on the server.

In addition to Medrdad's answer: Hosting user supplied content is a bit tricky. If you are hosting a zip file, then that can be used to store Java class files (also used for other formats) and therefore the "same origin policy" can be broken. (There was the GIFAR attack where a zip was attached to the end of another file, but that no longer works with the Java PlugIn/WebStart.) Image files should at the very least be checked that they actually are image files. Obviously there is a problem with web browsers having buffer overflow vulnerabilities, that now your site could be used to attack your visitors (this may make you unpopular). You may find some client side software using, say, regexs to pass data, so data in the middle of the image file can be executed. Zip files may have naughty file names (for instance, directory traversal with ../ and strange characters).
What to do (not necessarily an exhaustive list):
Host user supplied files on a completely different domain.
The domain with user files should use different IP addresses.
If possible decode and re-encode the data.
There's another stackoverflow question on zip bombs - I suggest decompressing using ZipInputStream and stopping if it gets too big.
Where native code touches user data, do it in a chroot gaol.
White list characters or entirely replace file names.
Potentially you could use an IDS of some description to scan for suspicious data (I really don't know how much this gets done - make sure your IDS isn't written in C!).

Related

OverlayFS on a single large file

I would like to solve the following set of constraints:
I want to be able to mount a copy of a large (16gb) remote file
if a part of the file is written to by the application, it is written to the local copy and not synced over the network
if a part of the file is read, if it was previously written to by the application, it will read the local copy. if it was never written to, it will first copy from the remote to local, then read from local
parts of the file that are never read before being written to should never be transmitted over the network (this is the most important constraint)
the file will always be the same size, so there is never ambiguity about what should happen when we read a specific byte from the file.
The reason for these constraints is that the vast majority of a single file will never be read, there are many such files (at least a small portion of each file is read), and network bandwidth is extremely limited.
OverlayFS comes very close to what I want. If I was able to apply overlayfs at the file level instead of the directory level, I would use the (perhaps nfs-mounted) remote file as the lower_file and an empty, sparse file as the upper_file.
Is there something that would allow me to do the above?

Counter file placement and naming convention

Ok this one might be stupid, but i'm losing too much time overthinking a solution.
I have a web app with 2 differents kind of payment modules.
These modules need (each) a counter file, incremented each time someone want to pay, and locked while incrementing to make sure the payment get a unique payment reference.
The files were placed inside the main directory (public_html) and have been overriden by a bad versionning move.
So I want to move them outside of public_html, where I already placed the main config file.
But having these critical file placed at the root of my ftp sounds stupind and dangerous. So I'll create a directory to place them.
This is a lot of text just to ask this :
How would you call this directory ?

IMO, your question has not related especially with PHP, it's a common issue. You can use of one of standard directories to share data between the applications.
/var
From the Filesystem Hierarchy Standard (FHS):
/var contains variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.
(read more)
Some options:
You can store your file directly in the /var.
Also /var/tmp can hold temporary files for a longer time and doesn't clean it after reboot (depends on your system).
Or you can create a custom subdirectory in the /var/opt with name that relevant to your applications.

Sending multiple sha1 checksums in a single file?

I'm working on an application that needs to verify a large number of files that have been transferred over the web. Our current thinking is to put the checksums for multiple files into a single file and name is foobar.sha1
I have some hesitation about using this extension as it seems to mostly be used for communicating a single checksum (as opposed to a large batch). Is this common usage?
Google has not yielded a clear answer.
Thanks

Node .fs Working with a HUGE Directory

Picture a directory with a ton of files. As a rough gauge of magnitude I think the most that we've seen so far is a couple of million but it could technically go another order higher. Using node, I would like to read files from this directory, process them (upload them, basically), and then move them out of the directory. Pretty simple. New files are constantly being added while the application is running, and my job (like a man on a sinking ship holding a bucket) is to empty this directory as fast as it's being filled.
So what are my options? fs.readdir is not ideal, it loads all of the filenames into memory which becomes a problem at this kind of scale. Especially as new files are being added all the time and so it would require repeated calls. (As an aside for anybody referring to this in the future, there is something being proposed to address this whole issue which may or may not have been realised within your timeline.)
I've looked at the myriad of fs drop-ins (graceful-fs, chokadir, readdirp, etc), none of which have this particular use-case within their remit.
I've also come across a couple of people suggesting that this can be handled with child_process, and there's a wrapper called inotifywait which tasks itself with exactly what I am asking but I really don't understand how this addresses the underlying problem, especially at this scale.
I'm wondering if what I really need to do is find a way to just get the first file (or, realistically, batch of files) from the directory without having the overhead of reading the entire directory structure into memory. Some sort of stream that could be terminated after a certain number of files had been read? I know Go has a parameter for reading the first n files from a directory but I can't find a node equivalent, has anybody here come across one or have any interesting ideas? Left-field solutions more than welcome at this point!

You can use your operation system listing file command, and stream the result into NodeJS.
For example in Linux:
var cp=require('child_process')
var stdout=cp.exec('ls').stdout
stdout.on('data',function(a){
console.log(a)
});0
RunKit: https://runkit.com/aminanadav/57da243180f3bb140059a31d

How to transfer large file from local to remote box with auto-resume and transfer only what has changed?

I try the following command
rsync -av --progress --inplace --rsh='ssh' /home/tom/workspace/myapp.war root#172.241.181.124:/home/rtom/uploads
But it seems it transfers the whole file again each time I execute the command when I make a small change in app that regenerates the myapp.war.
I want also the connection to automatically resume if connection is lost. I think this part is working.
The transfer should occur over ssh.
The connection speed is very slow and can break too so it is important that it transfers only what has changed. Of course it must also ensure that the file was correctly transfered.

rsync does handle relatively small changes and partial uploads in a file efficiently. There has been significant effort in the rsync algorithm towards this direction.
The problem is that WAR files are "extended" JAR files, which are essentially ZIP arhives and therefore compressed.
A small change in an uncompressed file will change the whole compressed segment where that file belongs and - most importantly - it can also change its size significantly. That can overcome the ability of rsync to detect and handle changes in the final compressed file.
On ZIP archives each uncompressed file has its own compressed segment. Therefore the order in which files are placed in the archive is also important with regard to achieving a degree of similarity to a previous version. Depending on how the WAR file is created, just adding a new file or renaming one can cause segments to move, essentially making the WAR file unrecognisable. In other words:
A small change in your application normally means a rather large change in your WAR file.
rsync is not designed to handle changes in compressed files. However, it can handle changes in your application. One solution would be to use it to upload your application files and then create the WAR file on the remote host.
A slightly different approach - that does not need any development tools on the remote host - would be to unpack (i.e. unzip) the WAR file locally, upload its contents and then pack (i.e. zip) it again on the remote host. This solution only requires a zip or jar implementation on the remote host.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string