How to handle space in file name in Marklogic? - document

If the file name or directory name has a space in it. How to handle it in Marklogic?
While loading it through xdmp:document-load it throws an error.
And in case of MLCP it replaces the space with %.

fn:escape-uri() may be what you want.
I suggest that the second parameter be set to false. That way slashes will still show up as slashes (important for some uses within MarkLogic).

Related

How to invalidate a cloudfront path that contains a tilde ~ character?

Trying to invalidate an AWS cloudfront path that contains a tilde ~ character results in an invalid argument error. A tilde is a valid URL character, and invoking things like encodeURI or encodeURIComponent against strings that contain it do not encode it.
What I've tried
I've tried encoding the tile as %7E in the invalidation URL. This does not yield an invalid argument error, but it does not invalidate the path for the desired file either.
Temporary workaround
I've been able to work around this by finding the first index of ~, replacing it with a *, and chopping off the rest of the string afterward. This creates the needed invalidation, though not the desired invalidation, as it can also invalidate paths that may be optimized by remaining cached.
AWS Support says that this is an issue with invalidating paths that contain special characters, that they are aware of the issue, and are actively working toward a solution.
The reason that %7E doesn't work is because cloudfront caches it separately from the tilde.

Is there an alternative for the slash in a path?

I have an application which correctly escapes slashes ("/) in file names to avoid path traversal attacks.
The secret file has this path:
/tmp/secret.txt
I want to access this file by uploading a file with a special crafted file name (something like \/tmp\/secret.txt)
Is there any alternative syntax without the slashes which I can use so that Linux will read this file?
(I'm aware of URL encoding but as the escaping is done in the backend this has no use for me.)
No. The / is not allowed in a filename, no matter if it's escaped as \/ or not.
It is one out of only two characters that are not allowed in filenames, the other being \0.
This means that you obviously could use _tmp_secret.txt or -tmp-secret.txt, or replace the / in the path with any other character that you wish, to create a filename with a path "encoded into it". But in doing so, you can not encode pathnames that includes the chosen delimiter character in one or several of its path components and expect to decode it into the original pathname.
This is, by the way, how OpenBSD's ports system encodes filenames for patches to software. In (for example) /usr/ports/shells/fish/patches we find files with names like
patch-share_tools_create_manpage_completions_py
which comes from the pathname of a particular file in the fish shell source distribution (probably share/tools/create_manpage_completions.py). These pathnames are however never parsed, and the encoding is only there to create unique and somewhat intelligible filenames for the patches themselves. The real paths are included in the patch files.

Possible names for a process Linux?

I am trying to write a script in which I read from /proc/.../stat. One of the values in the space separated list is the name of the process, which does not interest me for the time being. I would like to read some other value after it. My idea was to move forward a certain number of values using spaces as the separator. A potential problem with this though is that I could have /proc/.../stat containing something like 1234 (asdf asdf) S .... The space in the process name would cause the program to read asdf) instead of S as intended.
So my question is can the process name have spaces in it? If so how could I differentiate between the values in /proc/.../stat?
I, personally, hate the way this file is laid out for precisely the reason you stated. With that said, it is possible to parse it uniquely no matter what the process name is. This is important, because not only the process name may contain spaces, it may also contain the close bracket character.
The method I suggest is to manually parse out the process name, and use space delimiting for everything else.
The process name should be defined as starting at the first open-bracket character on the line and ending at the last close bracket on the line. Since the other fields on the line don't have user-controlled format, this should reliably single the process name out, no matter what weird ways the proces is named.

Fortran: odd space-padding string behavior when opening files

I have a Fortran program which reads data from a bunch of input files. The first file contains, among other things, the names of three other files that I will read from, specified in the input file (which I redirect to stdin at execution of the program) as follows
"data/file_1.dat" "data/file2.dat" "data/file_number_3.txt"
They're separated by regular spaces and there's no trailing spaces on the line, just a line break. I read the file names like this:
character*30 fnames(3)
read *, fnames
and then I proceed to read the data, through calling on a function which takes the file name as parameter:
subroutine read_from_data_file(fname)
implicit none
character*(*) fname
open(15,file=fname)
! read some data
end subroutine read_from_data_file
! in the main program:
do i=1,3
call read_from_data_file(trim(fnames(i)))
end do
For the third file, regardless of in which order I put the file names in the input file, the padding doesn't work and Fortran tries to open a with a name like "data/file_number_3.txt ", i.e. with a bunch of trailing spaces. This creates an empty file named data/file_number_3.txt (White Space Conflict) in my folder, and as soon as I try to read from the file the program crashes with an EOF error.
I've tried adding trim() in various places, e.g. open(15,file=trim(fname)) without any success. I assume it has something to do with the fix length of character arrays in Fortran, but I thought trim() would take care of that - is that assumption incorrect?
How do I troubleshoot and fix this?
Hmmm. I wonder if there is a final character on the last line of your input file which is not whitespace, such as an EOF marker from a Linux system popping up on a Windows system or vice-versa. Try, if you are on a Linux box, dos2unix; on a Windows box try something else (I'm not sure what).
If that doesn't work, try using the intrinsic IACHAR function to examine each individual character in the misbehaving string and examine the entrails.
Like you, I expect trim to trim trailing whitespace from a string, but not all the characters which are not displayed are regarded as whitespace.
And, while I'm writing, your use of declarations such as
character*30
is obsolescent, the modern alternative is
character(len=30)
and
character(len=*)
is preferred to
character*(*)
EDIT
Have you tried both reading those names from a file and reading them from stdin ?

Are /../ and /./ the only file system symbolic links?

I want to check that a file system path is valid and safe to use relative to another path. So I want to know if there are any other special characters like /../ and /./ which might cause a path to actually point somewhere else.
If that is all I have to worry about then a quick replace of those chars followed by something like this to check for any other bad filesystem chars should work right?
[^a-z0-9\.\-_]
(On windows stuff like C:\ would also have to be allowed)
The use case is that I have a folder which site administrators can create directories in and I want to FORCE them to only create directories in that folder. In other words, no being sneaky with ...path/uploads/../../../var/otherfolder/ if you know what I mean ;)
Which language are you using?
In PHP, for example, you can get the realpath of any string and then compare it to a base directory. If you find your base directoy is a prefix of the realpath, then you're good to go.
Although that's only for PHP, you should be able to find a similar approach in other languages.
There are several oddities on Windows/DOS. Opening any of these will both read and write to unexpected places. I havnt tried how .NET handles these, but I presume that you would get some kind of security exceptions.
CON Console. Reads from keyboard, writes to screen.
"COPY CON temp.txt", end input with ctrl-z.
PRN Printer. (Defaults to LPT1?)
LPTn Parallell ports.
AUX "Auxiliary device." Have never seen anyone use this myself.
COMn Serial ports.
NUL /dev/null
For resolving paths, ., and .., (and in most cases, // for Unix and \\ for Windows) are the main things you really need to worry about in terms of resolving paths. From RFC 3986, this is the algorithm for resolving relative paths in URIs. For the most part, it also applies to file system paths.
An algorithm, remove_dot_segments:
The input buffer is initialized with the now-appended path
components and the output buffer is initialized to the empty
string.
While the input buffer is not empty, loop as follows:
If the input buffer begins with a prefix of "../" or "./",
then remove that prefix from the input buffer; otherwise,
If the input buffer begins with a prefix of "/./" or "/.",
where "." is a complete path segment, then replace that
prefix with "/" in the input buffer; otherwise,
If the input buffer begins with a prefix of "/../" or "/..",
where ".." is a complete path segment, then replace that
prefix with "/" in the input buffer and remove the last
segment and its preceding "/" (if any) from the output
buffer; otherwise,
If the input buffer consists only of "." or "..", then remove
that from the input buffer; otherwise,
Move the first path segment in the input buffer to the end of
the output buffer, including the initial "/" character (if
any) and any subsequent characters up to, but not including,
the next "/" character or the end of the input buffer.
Finally, the output buffer is returned as the result of
remove_dot_segments.
Example run:
STEP OUTPUT BUFFER INPUT BUFFER
1 : /a/b/c/./../../g
2E: /a /b/c/./../../g
2E: /a/b /c/./../../g
2E: /a/b/c /./../../g
2B: /a/b/c /../../g
2C: /a/b /../g
2C: /a /g
2E: /a/g
STEP OUTPUT BUFFER INPUT BUFFER
1 : mid/content=5/../6
2E: mid /content=5/../6
2E: mid/content=5 /../6
2C: mid /6
2E: mid/6
Don't forget that it's possible to do things like specify more ".." segments than there are parent directories. So if you're trying to resolve a path, you could end up trying to resolve beyond /, or in the case of Windows, C:\.
The answer depends on the filesystem used. It's different on Windows, different on *nix.
For example, on Windows-based desktop platforms, invalid path characters might include quote ("), less than (<), greater than (>), pipe (|), backspace (\b), null (\0), and Unicode characters 16 through 18 and 20 through 25.
I don't know which platform/language are you using, but if you are using .NET you can get list of chars which cannot be in filename by calling Path.GetInvalidFilenameChars and list of chars which cannot be in path by calling Path.GetInvalidPathChars
Unix symbolic links can be tricky, and can even be created to cause pathing loops on some systems. You should lstat() the filename to get the actual inode and devno numbers to see if two pathnames are actually the same file.
Have you considered using something like chroot? You can create something called a "chroot jail" that will prevent people from getting outside it. This is enforced by the OS, so you don't have to write it yourself. Note that this only works on *nix, and on some variants of *nix, it does not have all the security features necessary to make it foolproof (i.e. there are known ways of escaping).
I've already directly answered the question, but as Tom said, what you're trying to do is inherently dangerous. What you should probably do instead is create one directory at a time. Pass it through a regexp validator and don't let them use dot segments at all. Just have a text field in a form for the directory name and a "Make Directory" button. Let them traverse the directory tree to create sub-directories. This way you can be absolutely confident that the files are going where they should.
This has the advantage of working on both Windows and *nix without the need for chroot.
Addenda:
This Regexp will only match illegitimate directory names, assuming that you're accepting directories one at a time:
/^(\.\.?|.*?[^a-zA-Z0-9\. _-]+.*?|^)$/
Valid directory names:
"This is a directory"
".hidden"
"example.com"
"10-28-2009"
Invalid directory names:
""
"."
".."
"../somewhere/else"
"/etc/passwd"
"would:be?rejected!by;OS"

Resources