Weird behavior when prepending to a file with cat and tee - linux

One solution to the problem from prepend to a file one liner shell? is:
cat header main | tee main > /dev/null
As some of the comments noticed, this doesn't work for large files.
Here's an example where it works:
$ echo '1' > h
$ echo '2' > t
$ cat h t | tee t > /dev/null
$ cat t
1
2
And where it breaks:
$ head -1000 /dev/urandom > h
$ head -1000 /dev/urandom > t
$ cat h t | tee t > /dev/null
^C
The command hangs and after killing it we are left with:
$ wc -l t
7470174 t
What causes the above behavior where the command gets stuck and adds lines infinitely? What is different in the 1 line files scenario?

The behavior is completely non-deterministic. When you do cat header main | tee main > /dev/null, the following things happen:
1) cat opens header
2) cat opens main
3) cat reads header and writes its content to stdout
4) cat reads main and writes its content to stdout
5) tee opens main for writing, truncating it
6) tee reads stdin and writes the data read into main
The ordering above is one possible ordering, but these events may occur in many different orders. 5 must precede 6, 2 must precede 4, and 1 must precede 3, but it is entirely possible for the ordering to be 5,1,3,2,4,6. In any case, if the files are large, it is very likely that step 5 will take place before step 4 is complete, which will cause portions of data to be discarded. It is entirely possible that step 5 happens first, in which case all of the data previously in main will be lost.
The particular case that you are seeing is very likely a result of cat blocking on a write and going to sleep before it has finished reading the input. tee then writes more data to t and tries to read from the pipe, then goes to sleep until cat writes more data. cat writes a buffer, tee puts it into t, and the cycle repeats, with cat re-reading the data that tee is writing into t.

cat header main | tee main > /dev/null
That is a terrible, terrible idea. You should never have a pipeline both reading from and writing to a file.
You can put the result in a temporary file first, and then move it into place:
cat header main >main.new && mv main{.new,}
Or to minimize the amount of time two copies of the file exist and never have both visible in the directory at the same time, you could delete the original once you've opened it up for reading and write the new file directly into its previous location. However, this does mean there's a brief gap during which the file doesn't exist at all.
exec 3<main && rm main && cat header - <&3 >main && exec 3<&-

Related

How to get printf to write a new file, append an existing file, and write to stdout?

I have a printf command that will write a file but won't print to stdout. I would like to have both so I can let the user see what's happening, and at the same time, write a record to a log file.
printf "%s\n" "This is some text" "That will be written to a file" "There will be several lines" | tee -a bin/logfile.log > bin/newfile.conf
That command appends to the log file and writes to the new file, but writes no output to the screen :(
OS: Centos 7
It's because you're redirecting the screen output with > bin/newfile.conf in addition to what you're doing with tee. Just drop the > and everything after it. If you want to output to both of those files at once in addition to the screen, you can use tee twice, e.g.:
printf ... | tee -a bin/logfile.log | tee bin/newfile.conf
That appends to logfile.log and overwrites newfile.conf, and also writes out to the screen. Use or omit the -a option as needed.
As John1024 points out you can also use tee once since it accepts multiple filenames, although in that case -a applies to all filenames, but it can be useful in the case where you want the append vs. overwrite behavior to be the same for all files.

Reading FIFO doesn't show for the first time

In Unix, I've made a FIFO and I tried to read it with tail:
mkfifo fifo.file
tail -f fifo.file
Then I try to write messages into it from another process so I do as below:
cat > fifo.file
Then I type messages such as:
abc
def
Before I type Ctrl-D, nothing is printed at the first process (tail -f fifo.file).
Then I type Ctrl-D, the two lines above are printed.
Now If I do cat > fifo.file again and I type one line such as qwe and type Enter at the end of line, this string will be printed immediately at the first process.
I'm wondering why I get two different behaviors with the same command.
Is it possible to make it the second behavior without the first, meaning that when I cat the first time, I can see messages printed once I type Enter, instead of Ctrl-D?
This is just how tail works. Basically it outputs the specified file contents only when EOF occurs which Ctrl-D effectively sends to the terminal. And the -f switch just makes tail not exit and continue reading when that happens.
Meaning no matter the switches tail still needs EOF to output anything at all.
Just to test this you can use simple cat instead of tail:
term_1$ mkfifo fifo.file
term_1$ cat < fifo.file
...
term_2$ cat > fifo.file

"cat a | cat b" ignoring contents of a

The formal definition of pipe states that the STDOUT of the left file will be immediately piped to the STDIN of the right file.I have two files, hello.txt and human.txt. cat hello.txt returns Hello and cat human.txt returns I am human.Now if I do cat hello.txt | cat human.txt, shouldn't that return Hello I am human?Instead I'm seeing command not found.I am new to shell scripting.Can someone explain?
Background: A pipe arranges for the output of the command on the left (that is, contents written to FD 1, stdout) to be delivered as input to the command on the right (on FD 0, stdin). It does this by connecting the processes with a "pipe", or FIFO, and executing them at the same time; attempts to read from the FIFO will wait until the other process has written something, and attempts to write to the FIFO will wait until the other process is ready to read.
cat hello.txt | cat human.txt
...feeds the content of hello.txt into the stdin of cat human.txt, but cat human.txt isn't reading from its stdin; instead, it's been directed by its command line arguments to read only from human.txt.
Thus, that content on the stdin of cat human.txt is ignored and never read, and cat hello.txt receives a SIGPIPE when cat human.txt exits, and thereafter exits as well.
cat hello.txt | cat - human.txt
...by contrast will have the second cat read first from stdin (you could also use /dev/stdin in place of - on many operating systems, including Linux), then from a file.
You don't need to pipe them rather you can read from multiple file like below which will in-turn concatenate the content of both file content
cat hello.txt human.txt
| generally used when you want to fed output of first command to the second command in pipe. In this case specifically your second command is reading from a file and thus don't need to be piped. If you want to you can do like
echo "Hello" | cat - human.txt
First thing the command will not give an error it will print I m human i.e the contents of human.txt
Yeah you are right about pipe definition , but on the right side of pipe there should be some command.
If the command is for receiving the input and providing the output than it will give you output,otherwise the command will do its own behaviour
But here there is a command i.e cat human.txt on the right side but it will print its own contents and does no operation on the received input .
And also this error comes when when you write like
cat hello.txt | human.txt
bash will give you this error :
human.txt: command not found

Cat several thousand files

I have several(60,000) files in a folder that need to be combined into 3 separate files.
How would I cat this so that I could have each file containing the contents of ~20,000 of these files?
I know it would be like a loop:
for i in {1..20000}
do
cat file-$i > new_file_part_1
done
Doing:
cat file-$i > new_file_part_1
Will truncate new_file_part_1 every time the loop iterates. You want to append to the file:
cat file-$i >> new_file_part_1
The other answers close and open the file on every iteration. I would prefer
for i in {1..20000}
do
cat file-$i
done > new_file_part_1
so the output of all cat runs are piped into one file opend once for all.
Assuming it doesn't matter which input file goes to which output file:
for i in {1..60000}
do
cat file$i >> out$(($i % 3))
done
This script uses the modulo operator % to divide the input into 3 bins; it will generate 3 output files:
out0 contains file3, file6, file9, ...
out1 contains file1, file4, file7, ...
out2 contains file2, file5, file8, ...
#!/bin/bash
cat file-{1..20000} > new_file_part_1
This launches cat only once and opens and closes the output file only once. No loop required, since cat can accept all 20000 arguments.
An astute observer noted that on some systems, the 20000 arguments may exceed the system's ARG_MAX limit. In such a case, xargs can be used, with the penalty that cat will be launched more than once (but still significantly fewer than 20000 times).
echo file-{1..20000} | xargs cat > new_file_part_1
This works because, in Bash, echo is a shell built-in and as such is not subject to ARG_MAX.

Problem with Bash output redirection [duplicate]

This question already has answers here:
Why doesnt "tail" work to truncate log files?
(6 answers)
Closed 1 year ago.
I was trying to remove all the lines of a file except the last line but the following command did not work, although file.txt is not empty.
$cat file.txt |tail -1 > file.txt
$cat file.txt
Why is it so?
Redirecting from a file through a pipeline back to the same file is unsafe; if file.txt is overwritten by the shell when setting up the last stage of the pipeline before tail starts reading off the first stage, you end up with empty output.
Do the following instead:
tail -1 file.txt >file.txt.new && mv file.txt.new file.txt
...well, actually, don't do that in production code; particularly if you're in a security-sensitive environment and running as root, the following is more appropriate:
tempfile="$(mktemp file.txt.XXXXXX)"
chown --reference=file.txt -- "$tempfile"
chmod --reference=file.txt -- "$tempfile"
tail -1 file.txt >"$tempfile" && mv -- "$tempfile" file.txt
Another approach (avoiding temporary files, unless <<< implicitly creates them on your platform) is the following:
lastline="$(tail -1 file.txt)"; cat >file.txt <<<"$lastline"
(The above implementation is bash-specific, but works in cases where echo does not -- such as when the last line contains "--version", for instance).
Finally, one can use sponge from moreutils:
tail -1 file.txt | sponge file.txt
You can use sed to delete all lines but the last from a file:
sed -i '$!d' file
-i tells sed to replace the file in place; otherwise, the result would write to STDOUT.
$ is the address that matches the last line of the file.
d is the delete command. In this case, it is negated by !, so all lines not matching the address will be deleted.
Before 'cat' gets executed, Bash has already opened 'file.txt' for writing, clearing out its contents.
In general, don't write to files you're reading from in the same statement. This can be worked around by writing to a different file, as above:$cat file.txt | tail -1 >anotherfile.txt
$mv anotherfile.txt file.txtor by using a utility like sponge from moreutils:$cat file.txt | tail -1 | sponge file.txt
This works because sponge waits until its input stream has ended before opening its output file.
When you submit your command string to bash, it does the following:
Creates an I/O pipe.
Starts "/usr/bin/tail -1", reading from the pipe, and writing to file.txt.
Starts "/usr/bin/cat file.txt", writing to the pipe.
By the time 'cat' starts reading, 'file.txt' has already been truncated by 'tail'.
That's all part of the design of Unix and the shell environment, and goes back all the way to the original Bourne shell. 'Tis a feature, not a bug.
tmp=$(tail -1 file.txt); echo $tmp > file.txt;
This works nicely in a Linux shell:
replace_with_filter() {
local filename="$1"; shift
local dd_output byte_count filter_status dd_status
dd_output=$("$#" <"$filename" | dd conv=notrunc of="$filename" 2>&1; echo "${PIPESTATUS[#]}")
{ read; read; read -r byte_count _; read filter_status dd_status; } <<<"$dd_output"
(( filter_status > 0 )) && return "$filter_status"
(( dd_status > 0 )) && return "$dd_status"
dd bs=1 seek="$byte_count" if=/dev/null of="$filename"
}
replace_with_filter file.txt tail -1
dd's "notrunc" option is used to write the filtered contents back, in place, while dd is needed again (with a byte count) to actually truncate the file. If the new file size is greater or equal to the old file size, the second dd invocation is not necessary.
The advantages of this over a file copy method are: 1) no additional disk space necessary, 2) faster performance on large files, and 3) pure shell (other than dd).
As Lewis Baumstark says, it doesn't like it that you're writing to the same filename.
This is because the shell opens up "file.txt" and truncates it to do the redirection before "cat file.txt" is run. So, you have to
tail -1 file.txt > file2.txt; mv file2.txt file.txt
echo "$(tail -1 file.txt)" > file.txt
Just for this case it's possible to use cat < file.txt | (rm file.txt; tail -1 > file.txt)
That will open "file.txt" just before connection "cat" with subshell in "(...)". "rm file.txt" will remove reference from disk before subshell will open it for write for "tail", but contents will be still available through opened descriptor which is passed to "cat" until it will close stdin. So you'd better be sure that this command will finish or contents of "file.txt" will be lost
It seems to not like the fact you're writing it back to the same filename. If you do the following it works:
$cat file.txt | tail -1 > anotherfile.txt
tail -1 > file.txt will overwrite your file, causing cat to read an empty file because the re-write will happen before any of the commands in your pipeline are executed.

Resources