Are /../ and /./ the only file system symbolic links? - security

I want to check that a file system path is valid and safe to use relative to another path. So I want to know if there are any other special characters like /../ and /./ which might cause a path to actually point somewhere else.
If that is all I have to worry about then a quick replace of those chars followed by something like this to check for any other bad filesystem chars should work right?
[^a-z0-9\.\-_]
(On windows stuff like C:\ would also have to be allowed)
The use case is that I have a folder which site administrators can create directories in and I want to FORCE them to only create directories in that folder. In other words, no being sneaky with ...path/uploads/../../../var/otherfolder/ if you know what I mean ;)

Which language are you using?
In PHP, for example, you can get the realpath of any string and then compare it to a base directory. If you find your base directoy is a prefix of the realpath, then you're good to go.
Although that's only for PHP, you should be able to find a similar approach in other languages.

There are several oddities on Windows/DOS. Opening any of these will both read and write to unexpected places. I havnt tried how .NET handles these, but I presume that you would get some kind of security exceptions.
CON Console. Reads from keyboard, writes to screen.
"COPY CON temp.txt", end input with ctrl-z.
PRN Printer. (Defaults to LPT1?)
LPTn Parallell ports.
AUX "Auxiliary device." Have never seen anyone use this myself.
COMn Serial ports.
NUL /dev/null

For resolving paths, ., and .., (and in most cases, // for Unix and \\ for Windows) are the main things you really need to worry about in terms of resolving paths. From RFC 3986, this is the algorithm for resolving relative paths in URIs. For the most part, it also applies to file system paths.
An algorithm, remove_dot_segments:
The input buffer is initialized with the now-appended path
components and the output buffer is initialized to the empty
string.
While the input buffer is not empty, loop as follows:
If the input buffer begins with a prefix of "../" or "./",
then remove that prefix from the input buffer; otherwise,
If the input buffer begins with a prefix of "/./" or "/.",
where "." is a complete path segment, then replace that
prefix with "/" in the input buffer; otherwise,
If the input buffer begins with a prefix of "/../" or "/..",
where ".." is a complete path segment, then replace that
prefix with "/" in the input buffer and remove the last
segment and its preceding "/" (if any) from the output
buffer; otherwise,
If the input buffer consists only of "." or "..", then remove
that from the input buffer; otherwise,
Move the first path segment in the input buffer to the end of
the output buffer, including the initial "/" character (if
any) and any subsequent characters up to, but not including,
the next "/" character or the end of the input buffer.
Finally, the output buffer is returned as the result of
remove_dot_segments.
Example run:
STEP OUTPUT BUFFER INPUT BUFFER
1 : /a/b/c/./../../g
2E: /a /b/c/./../../g
2E: /a/b /c/./../../g
2E: /a/b/c /./../../g
2B: /a/b/c /../../g
2C: /a/b /../g
2C: /a /g
2E: /a/g
STEP OUTPUT BUFFER INPUT BUFFER
1 : mid/content=5/../6
2E: mid /content=5/../6
2E: mid/content=5 /../6
2C: mid /6
2E: mid/6
Don't forget that it's possible to do things like specify more ".." segments than there are parent directories. So if you're trying to resolve a path, you could end up trying to resolve beyond /, or in the case of Windows, C:\.

The answer depends on the filesystem used. It's different on Windows, different on *nix.
For example, on Windows-based desktop platforms, invalid path characters might include quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0), and Unicode characters 16 through 18 and 20 through 25.
I don't know which platform/language are you using, but if you are using .NET you can get list of chars which cannot be in filename by calling Path.GetInvalidFilenameChars and list of chars which cannot be in path by calling Path.GetInvalidPathChars

Unix symbolic links can be tricky, and can even be created to cause pathing loops on some systems. You should lstat() the filename to get the actual inode and devno numbers to see if two pathnames are actually the same file.

Have you considered using something like chroot? You can create something called a "chroot jail" that will prevent people from getting outside it. This is enforced by the OS, so you don't have to write it yourself. Note that this only works on *nix, and on some variants of *nix, it does not have all the security features necessary to make it foolproof (i.e. there are known ways of escaping).

I've already directly answered the question, but as Tom said, what you're trying to do is inherently dangerous. What you should probably do instead is create one directory at a time. Pass it through a regexp validator and don't let them use dot segments at all. Just have a text field in a form for the directory name and a "Make Directory" button. Let them traverse the directory tree to create sub-directories. This way you can be absolutely confident that the files are going where they should.
This has the advantage of working on both Windows and *nix without the need for chroot.
Addenda:
This Regexp will only match illegitimate directory names, assuming that you're accepting directories one at a time:
/^(\.\.?|.*?[^a-zA-Z0-9\. _-]+.*?|^)$/
Valid directory names:
"This is a directory"
".hidden"
"example.com"
"10-28-2009"
Invalid directory names:
""
"."
".."
"../somewhere/else"
"/etc/passwd"
"would:be?rejected!by;OS"

Related

Rust reserved names for modules [duplicate]

I'm not asking about general syntactic rules for file names. I mean gotchas that jump out of nowhere and bite you. For example, trying to name a file "COM<n>" on Windows?
From: http://www.grouplogic.com/knowledge/index.cfm/fuseaction/view_Info/docID/111.
The following characters are invalid as file or folder names on Windows using NTFS: / ? < > \ : * | " and any character you can type with the Ctrl key.
In addition to the above illegal characters the caret ^ is also not permitted under Windows Operating Systems using the FAT file system.
Under Windows using the FAT file system file and folder names may be up to 255 characters long.
Under Windows using the NTFS file system file and folder names may be up to 256 characters long.
Under Window the length of a full path under both systems is 260 characters.
In addition to these characters, the following conventions are also illegal:
Placing a space at the end of the name
Placing a period at the end of the name
The following file names are also reserved under Windows:
aux,
com1,
com2,
...
com9,
lpt1,
lpt2,
...
lpt9,
con,
nul,
prn
Full description of legal and illegal filenames on Windows: http://msdn.microsoft.com/en-us/library/aa365247.aspx
A tricky Unix gotcha when you don't know:
Files which start with - or -- are legal but a pain in the butt to work with, as many command line tools think you are providing options to them.
Many of those tools have a special marker "--" to signal the end of the options:
gzip -9vf -- -mydashedfilename
As others have said, device names like COM1 are not possible as filenames under Windows because they are reserved devices.
However, there is an escape method to create and access files with these reserved names, for example, this command will redirect the output of the ver command into a file called COM1:
ver > "\\?\C:\Users\username\COM1"
Now you will have a file called COM1 that 99% of programs won't be able to open, and will probably freeze if you try to access.
Here's the Microsoft article that explains how this "file namespace" works. Basically it tells Windows not to do any string processing on the text and to pass it straight through to the filesystem. This trick can also be used to work with paths longer than 260 characters.
The boost::filesystem Portability Guide has a lot of good info.
Well, for MSDOS/Windows, NUL, PRN, LPT<n> and CON. They even cause problems if used with an extension: "NUL.TXT"
Unless you're touching special directories, the only illegal names on Linux are '.' and '..'. Any other name is possible, although accessing some of them from the shell requires using escape sequences.
EDIT: As Vinko Vrsalovic said, files starting with '-' and '--' are a pain from the shell, since those character sequences are interpreted by the application, not the shell.

Is there an alternative for the slash in a path?

I have an application which correctly escapes slashes ("/) in file names to avoid path traversal attacks.
The secret file has this path:
/tmp/secret.txt
I want to access this file by uploading a file with a special crafted file name (something like \/tmp\/secret.txt)
Is there any alternative syntax without the slashes which I can use so that Linux will read this file?
(I'm aware of URL encoding but as the escaping is done in the backend this has no use for me.)
No. The / is not allowed in a filename, no matter if it's escaped as \/ or not.
It is one out of only two characters that are not allowed in filenames, the other being \0.
This means that you obviously could use _tmp_secret.txt or -tmp-secret.txt, or replace the / in the path with any other character that you wish, to create a filename with a path "encoded into it". But in doing so, you can not encode pathnames that includes the chosen delimiter character in one or several of its path components and expect to decode it into the original pathname.
This is, by the way, how OpenBSD's ports system encodes filenames for patches to software. In (for example) /usr/ports/shells/fish/patches we find files with names like
patch-share_tools_create_manpage_completions_py
which comes from the pathname of a particular file in the fish shell source distribution (probably share/tools/create_manpage_completions.py). These pathnames are however never parsed, and the encoding is only there to create unique and somewhat intelligible filenames for the patches themselves. The real paths are included in the patch files.

How to rename a folder that contains smart quotes

I have a folder that was created automatically. The user unintentionally provided smart (curly) quotes as part of the name, and the process that sanitizes the inputs did not catch these. As a result, the folder name contains the smart quotes. For example:
this-is-my-folder’s-name-“Bob”
I'm now trying to rename/remove said folder on the command line, and none of the standard tricks for dealing with files/folders with special characters (enclosing in quotes, escaping the characters, trying to rename it by inode, etc.) are working. All result in:
mv: cannot move this-is-my-folder’s-name-“Bob” to this-is-my-folders-name-BOB: No such file or directory
Can anyone provide some advice as to how I can achieve this?
To get the name in a format you can copy-and-paste into your shell:
printf '%q\n' this*
...will print out the filename in a manner the shell will accept as valid input. This might look something like:
$'this-is-my-folder200\231s-name-200\234Bob200\235'
...which you can then use as an argument to mv:
mv $'this-is-my-folder200\231s-name-200\234Bob200\235' this-is-my-folders-name-BOB
Incidentally, if your operating system works the same way mine does (when running the test above), this would explain why using single-character globs such as ? for those characters didn't work: They're actually more than one byte long each!
You can use shell globbing token ? to match any single character, so matching the smart quotes using ? should do:
mv this-is-my-folder?s-name-?Bob? new_name
Here replacing the smart quotes with ? to match the file name.
There are several possibilities.
If an initial substring of the file name ending before the first quote is unique within the directory, then you can use filename completion to help you type an appropriate command. Type "mv" (without the quotes) and the unique initial substring, then press the TAB key to request filename completion. Bash will complete the filename with the correct characters, correctly escaped.
Use a graphical file browser. Then you can select the file to rename by clicking on it. (Details of how to proceed from there depend on the browser.) If you don't have a graphical terminal and can't get one, then you may be able to do the same with a text-mode browser such as Midnight Commander.
A simple glob built with the ? or * wildcard should be able to match the filename
Use a more complex glob to select the filename, and perhaps others with the same problem. Maybe something like *[^a-zA-Z0-9-]* would do. Use a pattern substitution to assign a new name. Something like this:
for f in *[^a-zA-Z0-9-]*; do
mv "$f" "${f//[^a-zA-Z0-9-]/}"
done
The substitution replaces all appearances of a characters that are not decimal digits, appercase or lowercase Latin letters, or hyphens with nothing (i.e. it strips them). Do take care before you use this, though, to make sure you're not going to make more changes than you intend to do.

Fortran: odd space-padding string behavior when opening files

I have a Fortran program which reads data from a bunch of input files. The first file contains, among other things, the names of three other files that I will read from, specified in the input file (which I redirect to stdin at execution of the program) as follows
"data/file_1.dat" "data/file2.dat" "data/file_number_3.txt"
They're separated by regular spaces and there's no trailing spaces on the line, just a line break. I read the file names like this:
character*30 fnames(3)
read *, fnames
and then I proceed to read the data, through calling on a function which takes the file name as parameter:
subroutine read_from_data_file(fname)
implicit none
character*(*) fname
open(15,file=fname)
! read some data
end subroutine read_from_data_file
! in the main program:
do i=1,3
call read_from_data_file(trim(fnames(i)))
end do
For the third file, regardless of in which order I put the file names in the input file, the padding doesn't work and Fortran tries to open a with a name like "data/file_number_3.txt ", i.e. with a bunch of trailing spaces. This creates an empty file named data/file_number_3.txt (White Space Conflict) in my folder, and as soon as I try to read from the file the program crashes with an EOF error.
I've tried adding trim() in various places, e.g. open(15,file=trim(fname)) without any success. I assume it has something to do with the fix length of character arrays in Fortran, but I thought trim() would take care of that - is that assumption incorrect?
How do I troubleshoot and fix this?
Hmmm. I wonder if there is a final character on the last line of your input file which is not whitespace, such as an EOF marker from a Linux system popping up on a Windows system or vice-versa. Try, if you are on a Linux box, dos2unix; on a Windows box try something else (I'm not sure what).
If that doesn't work, try using the intrinsic IACHAR function to examine each individual character in the misbehaving string and examine the entrails.
Like you, I expect trim to trim trailing whitespace from a string, but not all the characters which are not displayed are regarded as whitespace.
And, while I'm writing, your use of declarations such as
character*30
is obsolescent, the modern alternative is
character(len=30)
and
character(len=*)
is preferred to
character*(*)
EDIT
Have you tried both reading those names from a file and reading them from stdin ?

How to escape colon (:) in $PATH on UNIX?

I need to parse the $PATH environment variable in my application.
So I was wondering what escape characters would be valid in $PATH.
I created a test directory called /bin:d and created a test script called funny inside it. It runs if I call it with an absolute path.
I just can't figure out how to escape : in $PATH I tried escaping the colon with \ and wrapping it into single ' and double " quotes. But always when I run which funny it can't find it.
I'm running CentOS 6.
This is impossible according to the POSIX standard. This is not a function of a specific shell, PATH handling is done within the execvp function in the C library. There is no provision for any kind of quoting.
This is the reason why including certain characters (anything not in the "portable filename character set" - colon is specifically called out as an example.) is strongly recommended against.
From SUSv7:
Since <colon> is a separator in this context, directory names that might be used in PATH should not include a <colon> character.
See also source of GLIBC execvp. We can see it uses the strchrnul and memcpy functions for processing the PATH components, with absolutely no provision for skipping over or unescaping any kind of escape character.
Looking at the function
extract_colon_unit
it seems to me that this is impossible. The : is unconditionally and
inescapably used as the path separator.
Well, this is valid at least for bash. Other shells may vary.
You could try mounting it
mount /bin:d /bind
PATH=/bind
According to http://tldp.org/LDP/abs/html/special-chars.html single quotes should preserve all special characters, so without trying it, I would think that '/bin:d' would work (with)in $PATH.

Resources