Is sed line deletion atomic? - linux

Lets say I have a file 'queue'. From what I understand, appending to it using '>>' is atomic as defined by POSIX. But if I have multiple processes appending, is it safe to do the following without data loss?
sed -e '1d' -i queue
I have several different script services on Linux that may want to pass information between them. Each has a 'queue' where new data is pushed on the bottom with the next item to be popped from the top.
Is my use of sed guaranteed to not miss an append operation from another process between the time it reads the file to a buffer and writes it to back to disk?
Thanks.

I'll defer to anyone else who makes a stronger claim, but I think it is not safe to use that way. -i works by making a temporary copy of the file and then moving that back in place, which would replace any additional changes in the middle.
Source: http://www.gnu.org/software/sed/manual/sed.html
-i[SUFFIX]
--in-place[=SUFFIX]
This option specifies that files are to be edited in-place. GNU sed does this by creating a temporary file and sending output to this file rather than to the standard output.1.
This option implies -s.
When the end of the file is reached, the temporary file is renamed to the output file's original name. The extension, if supplied, is used to modify the name of the old file before renaming the temporary file, thereby making a backup copy2).
This rule is followed: if the extension doesn't contain a *, then it is appended to the end of the current filename as a suffix; if the extension does contain one or more * characters, then each asterisk is replaced with the current filename. This allows you to add a prefix to the backup file, instead of (or in addition to) a suffix, or even to place backup copies of the original files into another directory (provided the directory already exists).
If no extension is supplied, the original file is overwritten without making a backup.

Related

copy and append specific lines to a file with specific name format?

I am copying some specific lines from one file to another.
grep '^stringmatch' /path/sfile-*.cfg >> /path/nfile-*.cfg
Here what's happening: its creating a new file called nfile-*.cfg and copying those lines in that. The file names sfile- * and nfile- * are randomly generated and are generally followed by a number. Both sfile-* and nfile-* are existing files and there is only one such file in the same directory. Only the number that follows is randomly generated. The numbers following in sfile and nfile need not be same. The files are not created simultaneously but are generated when a specific command is given. But some lines from one file to the another file needs to be appended.
I'm guessing you actually want something like
for f in /path/sfile-*.cfg; do
grep '^stringmatch' "$f" >"/path/nfile-${f#/path/sfile-}"
done
This will loop over all sfile matches and create an nfile target file with the same number after the dash as the corresponding source sfile. (The parameter substitution ${variable#prefix} returns the value of variable with any leading match on the pattern prefix removed.)
If there is only one matching file, the loop will only run once. If there are no matches on the wildcard, the loop will still run once unless you enable nullglob, which changes the shell's globbing behavior so that wildcards with no matches expand into nothing, instead of to the wildcard expression itself. If you don't want to enable nullglob, a common workaround is to add this inside the loop, before the grep;
test -e "$f" || break
If you want the loop to only process the first match if there are several, add break on a line by itself before the done.
If I interpret your question correctly, you want to output to an existing nfile, which has a random number in it, but instead the shell is creating a file with an asterisk in it, so literally nfile-*.cfg.
This is happening because the nfile doesn't exist when you first run the command. If the file doesn't exist, bash will fail to expand nfile-*.cfg and will instead use the * as a literal character. This is correct behaviour in bash.
So, it looks like the problem is that the nfile doesn't exist when you start your grep. You'll need to create one.
I'll leave code to others, but I hope the explanation is useful.

Paste header line in multiple tsv (tab separated) files

I have multiple .tsv files named as choochoo1.tsv, choochoo2.tsv, ... choochoo(nth).tsv files. I also have a main.tsv file. I want to extract the header line in main.tsv and paste over all choochoo(nth).tsv files. Please note that there are other .tsv files in the directory that I don't want to change or paste header, so I can't do *.tsv and select all the .tsv files (so need to select choochoo string for wanted files). This is what I have tried using bash script, but could not make it work. Please suggest the right way to do it.
for x in *choochoo; do
head -n1 main.tsv > $x
done
You have a problem with the file glob, as well as the redirect:
the file glob will catch things like AAchoochoo but not choochoo1.tsv and not even AAchoochoo.tsv
the redirect will overwrite the existing files instead of adding to them. The redirect command for adding to a file is >>, but that will append text to the end and you want to prepend text in the beginning.
The problem with prepending text to an existing file, is that you have to open the file for both reading and writing and then stream both prepended text and original text, in order - and that is usually where people fail because the shell can't open files like that (there is a slightly more complex way of doing this directly, by opening the file for both reading and writing, but I'm not going to address that further).
You might want to use a temporary file, something like this:
for x in choochoo[0-9]*.tsv; do
mv "$x"{,.orig}
(head -n1 main.tsv; cat "$x.orig") > $x
rm "$x.orig"
done

Cygwin - How can I keep original file name when outputting results of a command on a file (cut command)?

I'm using a cut command to split up a file. I need the output of the file to keep the original file name. I will not know the name of the file, just what folder it is located in. I need to ultimately add a suffix and prefix to original file after the cut, which I've got figured out. My issue is that I do not know how to keep the original file name after I output the cut.
cut -d, -f1,2,3 for file in * $file > originalfilename.txt
There should only be 1 file in the "dropbox" folder at one time. So if I can store the variable of that file name somewhere and use later that works for me.
Also if there is a way to just modify the file using cut, rather than needing to output it somewhere this would satisfy my needs too, because I would obviously still have original file name then.
I just started using Cygwin a few days ago so I apologize if there is really an obvious answer to this! I have googled everything and couldn't find what I needed.
The answer is no, unix cut does not offer an in-place option. However you can look at alternate options here
You define a variable to store the name of the file and use that variable in the commands:
orig_file='originalfilename.txt'<br>
cut -d, -f1,2,3 for file in * $file > $orig_file <br>
echo "The name of the original file is $orig_file"

How does linux redirect IO work internally

When we use the redirect IO operator for a shell script does the operator keep all the data to be written in memory and write it all at once or does write it to file line by line.
Here is what i am working on.
I have about 200 small files ~1000 lines each in a specific format. I want to process (do a regex and change the format a little) each line in all the files and have the new transformed lines in a single combined file.
I have a transformscript.sh that takes a single file and applies the transformation. I run it in the following manner
sh transformscript.sh somefile.txt > newfile.txt
This works fine and fast for a single file.
How do i extend to do it for all the files. will it be efficient to change transformscript.sh to take a directory as argument instead of filename and add a for loop to transform all the lines of all the files together. Or should I run the above trnsformscript.sh for each file and create a new file for each one and combine then separately.
Thanks.
The redirect operator simply opens the file for writing and passes that file descriptor to the shell as its standard output. The shell then writes to the file directly.
You probably do NOT want to run the script separately for each file since you will incur the overhead of bash process creation for each pass. For example:
# don't do it this way
for somefile in $(ls somefiles*.txt); do
newfile=${somefile//some/new}
sh transformscript.sh $somefile > $newfile
done
The above starts one shell for every file found which is pretty inefficient. It would be better to rewrite transformscript.sh to handle multiple files if possible. Depending on how complicated your transform is and whether you need to keep the original filenames, you might be able to use a single sed process. For example, assume you have 200 files named test1.txt through test200.txt all with a "Hello world" line you want to change to "Hello joe". You could do something as simple a this:
sed -i.save 's/Hello world/Hello joe/' test*.txt
The -i tells sed to do an "in place" edit (edit the original file) and the optional ".save" argument to -i makes a backup copy of the original file with a .save extension before editing the original file. Note, this will leave the original contents in the .save files and the new content in the files with the original name which may not be what you want.

Shell script to nullify a file everytime it reaches a specific size

I am in the middle of writing a shell script to nullify/truncate a file if it reaches certain size. Also the file is being opened/written by a process all the time. Now every time when I nullify the file, will the file pointer be repositioned to the start of the file or will it remain in its previous position? Let me know if we could reset the file pointer once the file has been truncated?
The position of the file pointer depends on how the file was opened by the process that has it open. If it was opened in append mode, then truncating the file will mean that new data will be written at the end of the file, which is actually the beginning too the first time it writes after the file is truncated. If it was not opened in append mode, then truncating the file will simply mean that there is a series of virtual zero bytes at the start of the file, but the real data will continue to be written at the same point as the last write finished. If the file is being reopened by the other process, rather than being held open, then roughly the same rules apply, but there is a better chance that the file will be written at the beginning. It all depends on how the first process to write to the file after the truncation is managing its file pointer.
You can't reset the file pointer of another process, AFAIK.
A cron job or something like this will do the task; it will find every files bigger than 4096bytes then nullified the files
$ find -type f -size 4096c -print0 | while IFS= read -r -d $'\0' line; do cat /dev/null > $line; done
enter link description here

Resources