Problem with Bash output redirection [duplicate] - linux

This question already has answers here:
Why doesnt "tail" work to truncate log files?
(6 answers)
Closed 1 year ago.
I was trying to remove all the lines of a file except the last line but the following command did not work, although file.txt is not empty.
$cat file.txt |tail -1 > file.txt
$cat file.txt
Why is it so?

Redirecting from a file through a pipeline back to the same file is unsafe; if file.txt is overwritten by the shell when setting up the last stage of the pipeline before tail starts reading off the first stage, you end up with empty output.
Do the following instead:
tail -1 file.txt >file.txt.new && mv file.txt.new file.txt
...well, actually, don't do that in production code; particularly if you're in a security-sensitive environment and running as root, the following is more appropriate:
tempfile="$(mktemp file.txt.XXXXXX)"
chown --reference=file.txt -- "$tempfile"
chmod --reference=file.txt -- "$tempfile"
tail -1 file.txt >"$tempfile" && mv -- "$tempfile" file.txt
Another approach (avoiding temporary files, unless <<< implicitly creates them on your platform) is the following:
lastline="$(tail -1 file.txt)"; cat >file.txt <<<"$lastline"
(The above implementation is bash-specific, but works in cases where echo does not -- such as when the last line contains "--version", for instance).
Finally, one can use sponge from moreutils:
tail -1 file.txt | sponge file.txt

You can use sed to delete all lines but the last from a file:
sed -i '$!d' file
-i tells sed to replace the file in place; otherwise, the result would write to STDOUT.
$ is the address that matches the last line of the file.
d is the delete command. In this case, it is negated by !, so all lines not matching the address will be deleted.

Before 'cat' gets executed, Bash has already opened 'file.txt' for writing, clearing out its contents.
In general, don't write to files you're reading from in the same statement. This can be worked around by writing to a different file, as above:$cat file.txt | tail -1 >anotherfile.txt
$mv anotherfile.txt file.txtor by using a utility like sponge from moreutils:$cat file.txt | tail -1 | sponge file.txt
This works because sponge waits until its input stream has ended before opening its output file.

When you submit your command string to bash, it does the following:
Creates an I/O pipe.
Starts "/usr/bin/tail -1", reading from the pipe, and writing to file.txt.
Starts "/usr/bin/cat file.txt", writing to the pipe.
By the time 'cat' starts reading, 'file.txt' has already been truncated by 'tail'.
That's all part of the design of Unix and the shell environment, and goes back all the way to the original Bourne shell. 'Tis a feature, not a bug.

tmp=$(tail -1 file.txt); echo $tmp > file.txt;

This works nicely in a Linux shell:
replace_with_filter() {
local filename="$1"; shift
local dd_output byte_count filter_status dd_status
dd_output=$("$#" <"$filename" | dd conv=notrunc of="$filename" 2>&1; echo "${PIPESTATUS[#]}")
{ read; read; read -r byte_count _; read filter_status dd_status; } <<<"$dd_output"
(( filter_status > 0 )) && return "$filter_status"
(( dd_status > 0 )) && return "$dd_status"
dd bs=1 seek="$byte_count" if=/dev/null of="$filename"
}
replace_with_filter file.txt tail -1
dd's "notrunc" option is used to write the filtered contents back, in place, while dd is needed again (with a byte count) to actually truncate the file. If the new file size is greater or equal to the old file size, the second dd invocation is not necessary.
The advantages of this over a file copy method are: 1) no additional disk space necessary, 2) faster performance on large files, and 3) pure shell (other than dd).

As Lewis Baumstark says, it doesn't like it that you're writing to the same filename.
This is because the shell opens up "file.txt" and truncates it to do the redirection before "cat file.txt" is run. So, you have to
tail -1 file.txt > file2.txt; mv file2.txt file.txt

echo "$(tail -1 file.txt)" > file.txt

Just for this case it's possible to use cat < file.txt | (rm file.txt; tail -1 > file.txt)
That will open "file.txt" just before connection "cat" with subshell in "(...)". "rm file.txt" will remove reference from disk before subshell will open it for write for "tail", but contents will be still available through opened descriptor which is passed to "cat" until it will close stdin. So you'd better be sure that this command will finish or contents of "file.txt" will be lost

It seems to not like the fact you're writing it back to the same filename. If you do the following it works:
$cat file.txt | tail -1 > anotherfile.txt

tail -1 > file.txt will overwrite your file, causing cat to read an empty file because the re-write will happen before any of the commands in your pipeline are executed.

Related

Netcat: cat hello.txt | nc -l 14223 > hello.txt does not transfer things in hello.txt to another machine [duplicate]

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?
Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.
Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name
try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.
You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?
One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name
Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.
This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.
The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}
There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name
You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example
This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}
Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC
I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.

Bash add line numbers to a file and save the output to the input file itself [duplicate]

Basically I want to take as input text from a file, remove a line from that file, and send the output back to the same file. Something along these lines if that makes it any clearer.
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > file_name
however, when I do this I end up with a blank file.
Any thoughts?
Use sponge for this kind of tasks. Its part of moreutils.
Try this command:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | sponge file_name
You cannot do that because bash processes the redirections first, then executes the command. So by the time grep looks at file_name, it is already empty. You can use a temporary file though.
#!/bin/sh
tmpfile=$(mktemp)
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name > ${tmpfile}
cat ${tmpfile} > file_name
rm -f ${tmpfile}
like that, consider using mktemp to create the tmpfile but note that it's not POSIX.
Use sed instead:
sed -i '/seg[0-9]\{1,\}\.[0-9]\{1\}/d' file_name
try this simple one
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
Your file will not be blank this time :) and your output is also printed to your terminal.
You can't use redirection operator (> or >>) to the same file, because it has a higher precedence and it will create/truncate the file before the command is even invoked. To avoid that, you should use appropriate tools such as tee, sponge, sed -i or any other tool which can write results to the file (e.g. sort file -o file).
Basically redirecting input to the same original file doesn't make sense and you should use appropriate in-place editors for that, for example Ex editor (part of Vim):
ex '+g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' -scwq file_name
where:
'+cmd'/-c - run any Ex/Vim command
g/pattern/d - remove lines matching a pattern using global (help :g)
-s - silent mode (man ex)
-c wq - execute :write and :quit commands
You may use sed to achieve the same (as already shown in other answers), however in-place (-i) is non-standard FreeBSD extension (may work differently between Unix/Linux) and basically it's a stream editor, not a file editor. See: Does Ex mode have any practical use?
One liner alternative - set the content of the file as variable:
VAR=`cat file_name`; echo "$VAR"|grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' > file_name
Since this question is the top result in search engines, here's a one-liner based on https://serverfault.com/a/547331 that uses a subshell instead of sponge (which often isn't part of a vanilla install like OS X):
echo "$(grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name)" > file_name
The general case is:
echo "$(cat file_name)" > file_name
Edit, the above solution has some caveats:
printf '%s' <string> should be used instead of echo <string> so that files containing -n don't cause undesired behavior.
Command substitution strips trailing newlines (this is a bug/feature of shells like bash) so we should append a postfix character like x to the output and remove it on the outside via parameter expansion of a temporary variable like ${v%x}.
Using a temporary variable $v stomps the value of any existing variable $v in the current shell environment, so we should nest the entire expression in parentheses to preserve the previous value.
Another bug/feature of shells like bash is that command substitution strips unprintable characters like null from the output. I verified this by calling dd if=/dev/zero bs=1 count=1 >> file_name and viewing it in hex with cat file_name | xxd -p. But echo $(cat file_name) | xxd -p is stripped. So this answer should not be used on binary files or anything using unprintable characters, as Lynch pointed out.
The general solution (albiet slightly slower, more memory intensive and still stripping unprintable characters) is:
(v=$(cat file_name; printf x); printf '%s' ${v%x} > file_name)
Test from https://askubuntu.com/a/752451:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do (v=$(cat file_uniquely_named.txt; printf x); printf '%s' ${v%x} > file_uniquely_named.txt); done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Should print:
hello
world
Whereas calling cat file_uniquely_named.txt > file_uniquely_named.txt in the current shell:
printf "hello\nworld\n" > file_uniquely_named.txt && for ((i=0; i<1000; i++)); do cat file_uniquely_named.txt > file_uniquely_named.txt; done; cat file_uniquely_named.txt; rm file_uniquely_named.txt
Prints an empty string.
I haven't tested this on large files (probably over 2 or 4 GB).
I have borrowed this answer from Hart Simha and kos.
This is very much possible, you just have to make sure that by the time you write the output, you're writing it to a different file. This can be done by removing the file after opening a file descriptor to it, but before writing to it:
exec 3<file ; rm file; COMMAND <&3 >file ; exec 3>&-
Or line by line, to understand it better :
exec 3<file # open a file descriptor reading 'file'
rm file # remove file (but fd3 will still point to the removed file)
COMMAND <&3 >file # run command, with the removed file as input
exec 3>&- # close the file descriptor
It's still a risky thing to do, because if COMMAND fails to run properly, you'll lose the file contents. That can be mitigated by restoring the file if COMMAND returns a non-zero exit code :
exec 3<file ; rm file; COMMAND <&3 >file || cat <&3 >file ; exec 3>&-
We can also define a shell function to make it easier to use :
# Usage: replace FILE COMMAND
replace() { exec 3<$1 ; rm $1; ${#:2} <&3 >$1 || cat <&3 >$1 ; exec 3>&- }
Example :
$ echo aaa > test
$ replace test tr a b
$ cat test
bbb
Also, note that this will keep a full copy of the original file (until the third file descriptor is closed). If you're using Linux, and the file you're processing on is too big to fit twice on the disk, you can check out this script that will pipe the file to the specified command block-by-block while unallocating the already processed blocks. As always, read the warnings in the usage page.
The following will accomplish the same thing that sponge does, without requiring moreutils:
shuf --output=file --random-source=/dev/zero
The --random-source=/dev/zero part tricks shuf into doing its thing without doing any shuffling at all, so it will buffer your input without altering it.
However, it is true that using a temporary file is best, for performance reasons. So, here is a function that I have written that will do that for you in a generalized way:
# Pipes a file into a command, and pipes the output of that command
# back into the same file, ensuring that the file is not truncated.
# Parameters:
# $1: the file.
# $2: the command. (With $3... being its arguments.)
# See https://stackoverflow.com/a/55655338/773113
siphon()
{
local tmp file rc=0
[ "$#" -ge 2 ] || { echo "Usage: siphon filename [command...]" >&2; return 1; }
file="$1"; shift
tmp=$(mktemp -- "$file.XXXXXX") || return
"$#" <"$file" >"$tmp" || rc=$?
mv -- "$tmp" "$file" || rc=$(( rc | $? ))
return "$rc"
}
There's also ed (as an alternative to sed -i):
# cf. http://wiki.bash-hackers.org/howto/edit-ed
printf '%s\n' H 'g/seg[0-9]\{1,\}\.[0-9]\{1\}/d' wq | ed -s file_name
You can use slurp with POSIX Awk:
!/seg[0-9]\{1,\}\.[0-9]\{1\}/ {
q = q ? q RS $0 : $0
}
END {
print q > ARGV[1]
}
Example
This does the trick pretty nicely in most of the cases I faced:
cat <<< "$(do_stuff_with f)" > f
Note that while $(…) strips trailing newlines, <<< ensures a final newline, so generally the result is magically satisfying.
(Look for “Here Strings” in man bash if you want to learn more.)
Full example:
#! /usr/bin/env bash
get_new_content() {
sed 's/Initial/Final/g' "${1:?}"
}
echo 'Initial content.' > f
cat f
cat <<< "$(get_new_content f)" > f
cat f
This does not truncate the file and yields:
Initial content.
Final content.
Note that I used a function here for the sake of clarity and extensibility, but that’s not a requirement.
A common usecase is JSON edition:
echo '{ "a": 12 }' > f
cat f
cat <<< "$(jq '.a = 24' f)" > f
cat f
This yields:
{ "a": 12 }
{
"a": 24
}
Try this
echo -e "AAA\nBBB\nCCC" > testfile
cat testfile
AAA
BBB
CCC
echo "$(grep -v 'AAA' testfile)" > testfile
cat testfile
BBB
CCC
I usually use the tee program to do this:
grep -v 'seg[0-9]\{1,\}\.[0-9]\{1\}' file_name | tee file_name
It creates and removes a tempfile by itself.

Paste files from list of paths into single output file

I have a file containing a list of filenames and their paths, as in the example below:
$ cat ./filelist.txt
/trunk/data/9.20.txt
/trunk/data/9.30.txt
/trunk/data/50.3.txt
/trunk/data/55.100.txt
...
All of these files, named as X.Y.txt, contain a list of double values. For example:
$ cat ./9.20.txt
1.23
1.0e-6
...
I'm trying to paste all of these X.Y.txt files into a single file, but I'm not sure about how to do it. Here's what I've been able to do so far:
cat ./filelist.txt | xargs paste output.txt >> output.txt
Any ideas on how to do it properly?
You could simply cat-append each file into your output file, as in:
$ cat <list_of_paths> | xargs -I {} cat {} >> output.txt
In the above command, each line from your input file will be taken by xargs, and will be used to replace {}, so that each actual command being run is:
$ cat <X.Y.txt> >> output.txt
If all you're looking to do is to read each line from filelist.txt and append the contents of the file that the line refers to to a single output file, use this:
while read -r file; do
[[ -f "$file" ]] && cat "$file"
done < "filelist.txt" > "output.txt"
Edit: If you know your input file to only contain lines that are file paths (and optionally empty lines) - and no comments, etc. - #Rubens' xargs-based solution is the simplest.
The advantage of the while loop is that you can pre-process each line from the input file, as demonstrated by the -f test above, which ensures that the input line refers to an existing file.
More complex but without argument length limit
Well, the limit here is the available computer memory.
The file buffer.txt must not exist already.
touch buffer.txt
cat filelist.txt | xargs -iXX bash -c 'paste buffer.txt XX > output.txt; mv output.txt buffer.txt';
mv buffer.txt output.txt
What this does, by line:
Create a buffer.txt file which must be initially empty. (paste does not seem to like non-existent files. There does not seem to be a way to make it treat such files as empty.)
Run paste buffer.txt XX > output.txt; mv output.txt buffer.txt. XX is replaced by each file in the filelist.txt file. You can't just do paste buffer.txt XX > buffer.txt because buffer.txt will be truncated before paste processes it. Hence the mv rigmarole.
Move buffer.txt to output.txt so that you get your output with the file name you wanted. Also makes it safe to rerun the whole process.
The previous version forced xargs to issue exactly one paste per file you want to paste but for even better performance, you can do this:
touch buffer.txt;
cat filelist.txt | xargs bash -c 'paste buffer.txt "$#" > output.txt; mv output.txt buffer.txt' FILLER;
mv buffer.txt output.txt
Note the presence of "$#" in the command that bash executes. So paste gets the list of arguments from the list of arguments given to bash. The FILLER parameter passed to bash is to give it a value for $0. If it were not there, then the first file that xargs gives to bash would be used for $0 and thus paste would skip some files.
This way, xargs can pass hundreds of parameters to paste with each invocation and thus reduce dramatically the number of times paste is invoked.
Simpler but limited way
This method suffer from limitations on the number of arguments that a shell can pass to a command it executes. However, in many cases it is good enough. I can't count the number of times when I was performing spur-of-the-moment operations where using xargs would have been superfluous. (As part of a long term solution, that's another matter.)
The simpler way is:
paste `cat filelist.txt` > output.txt
It seems you were thinking that xargs would execute paste output.txt >> output.txt multiple times but that's not how it works. The redirection applies to the entire cat ./filelist.txt | xargs paste output.txt (as you initially had it). If you want to have redirection apply to the individual commands launched by xargs you have it launch a shell, like I do above.
#!/usr/bin/env bash
set -x
while read -r
do
echo "${REPLY}" >> output.txt
done < filelist.txt
OR, to get the files directly:-
#!/usr/bin/env bash
set -x
find *.txt -type f | while read $files
do
echo "${files}" >> output.txt
done
A simple while loop should do the trick:
while read line; do
cat ${line} >> output.txt
done < filelist.txt

Remove a specific line from a file WITHOUT using sed or awk

I need to remove a specific line number from a file using a bash script.
I get the line number from the grep command with the -n option.
I cannot use sed for a variety of reasons, least of which is that it is not installed on all the systems this script needs to run on and installing it is not an option.
awk is out of the question because in testing, on different machines with different UNIX/Linux OS's (RHEL, SunOS, Solaris, Ubuntu, etc.), it gives (sometimes wildly) different results on each. So, no awk.
The file in question is just a flat text file, with one record per line, so nothing fancy needs to be done, except for remove the line by number.
If at all possible, I need to avoid doing something like extracting the contents of the file, not including the line I want gone, and then overwriting the original file.
Since you have grep, the obvious thing to do is:
$ grep -v "line to remove" file.txt > /tmp/tmp
$ mv /tmp/tmp file.txt
$
But it sounds like you don't want to use any temporary files - I assume the input file is large and this is an embedded system where memory and storage are in short supply. I think you ideally need a solution that edits the file in place. I think this might be possible with dd but haven't figured it out yet :(
Update - I figured out how to edit the file in place with dd. Also grep, head and cut are needed. If these are not available then they can probably be worked around for the most part:
#!/bin/bash
# get the line number to remove
rline=$(grep -n "$1" "$2" | head -n1 | cut -d: -f1)
# number of bytes before the line to be removed
hbytes=$(head -n$((rline-1)) "$2" | wc -c)
# number of bytes to remove
rbytes=$(grep "$1" "$2" | wc -c)
# original file size
fsize=$(cat "$2" | wc -c)
# dd will start reading the file after the line to be removed
ddskip=$((hbytes + rbytes))
# dd will start writing at the beginning of the line to be removed
ddseek=$hbytes
# dd will move this many bytes
ddcount=$((fsize - hbytes - rbytes))
# the expected new file size
newsize=$((fsize - rbytes))
# move the bytes with dd. strace confirms the file is edited in place
dd bs=1 if="$2" skip=$ddskip seek=$ddseek conv=notrunc count=$ddcount of="$2"
# truncate the remainder bytes of the end of the file
dd bs=1 if="$2" skip=$newsize seek=$newsize count=0 of="$2"
Run it thusly:
$ cat > file.txt
line 1
line two
line 3
$ ./grepremove "tw" file.txt
7+0 records in
7+0 records out
0+0 records in
0+0 records out
$ cat file.txt
line 1
line 3
$
Suffice to say that dd is a very dangerous tool. You can easily unintentionally overwrite files or entire disks. Be very careful!
Try ed. The here-document-based example below deletes line 2 from test.txt
ed -s test.txt <<!
2d
w
!
You can do it without grep using posix shell builtins which should be on any *nix.
while read LINE || [ "$LINE" ];do
case "$LINE" in
*thing_you_are_grepping_for*)continue;;
*)echo "$LINE";;
esac
done <infile >outfile
If n is the line you want to omit:
{
head -n $(( n-1 )) file
tail +$(( n+1 )) file
} > newfile
Given dd is deemed too dangerous for this in-place line removal, we need some other method where we have fairly fine-grained control over the file system calls. My initial urge is to write something in c, but while possible, I think that is a bit of overkill. Instead it is worth looking to common scripting (not shell-scripting) languages, as these typically have fairly low-level file APIs which map to the file syscalls in a fairly straightforward manner. I'm guessing this can be done using python, perl, Tcl or one of many other scripting language that might be available. I'm most familiar with Tcl, so here we go:
#!/bin/sh
# \
exec tclsh "$0" "$#"
package require Tclx
set removeline [lindex $argv 0]
set filename [lindex $argv 1]
set infile [open $filename RDONLY]
for {set lineNumber 1} {$lineNumber < $removeline} {incr lineNumber} {
if {[eof $infile]} {
close $infile
puts "EOF at line $lineNumber"
exit
}
gets $infile line
}
set bytecount [tell $infile]
gets $infile rmline
set outfile [open $filename RDWR]
seek $outfile $bytecount start
while {[gets $infile line] >= 0} {
puts $outfile $line
}
ftruncate -fileid $outfile [tell $outfile]
close $infile
close $outfile
Note on my particular box I have Tcl 8.4, so I had to load the Tclx package in order to use the ftruncate command. In Tcl 8.5, there is chan truncate which could be used instead.
You can pass the line number you want to remove and the filename to this script.
In short, the script does this:
open the file for reading
read the first n-1 lines
get the offset of the start of the next line (line n)
read line n
open the file with a new FD for writing
move the file location of the write FD to the offset of the start of line n
continue reading the remaining lines from the read FD and write them to the write FD until the whole read FD is read
truncate the write FD
The file is edited exactly in place. No temporary files are used.
I'm pretty sure this can be re-written in python or perl or ... if necessary.
Update
Ok, so in-place line removal can be done in almost-pure bash, using similar techniques to the Tcl script above. But the big caveat is that you need to have truncate command available. I do have it on my Ubuntu 12.04 VM, but not on my older Redhat-based box. Here is the script:
#!/bin/bash
n=$1
filename=$2
exec 3<> $filename
exec 4<> $filename
linecount=1
bytecount=0
while IFS="" read -r line <&3 ; do
if [[ $linecount == $n ]]; then
echo "omitting line $linecount: $line"
else
echo "$line" >&4
((bytecount += ${#line} + 1))
fi
((linecount++))
done
exec 3>&-
exec 4>&-
truncate -s $bytecount $filename
#### or if you can tolerate dd, just to do the truncate:
# dd of="$filename" bs=1 seek=$bytecount count=0
#### or if you have python
# python -c "open(\"$filename\", \"ab\").truncate($bytecount)"
I would love to hear of a more generic (bash-only?) way to do the partial truncate at the end and complete this answer. Of course the truncate can be done with dd as well, but I think that was already ruled out for my earlier answer.
And for the record this site lists how to do an in-place file truncation in many different languages - in case any of these could be used in your environment.
If you can indicate under which circumstances on which platform(s) the most obvious Awk script is failing for you, perhaps we can devise a workaround.
awk "NR!=$N" infile >outfile
If course, obtaining $N with grep just to feed it to Awk is pretty bass-ackwards. This will delete the line containing the first occurrence of foo:
awk '/foo/ { if (!p++) next } 1' infile >outfile
Based on Digital Trauma's answere, I found an improvement that just needs grep and echo, but no tempfile:
echo $(grep -v PATTERN file.txt) > file.txt
Depending on the kind of lines your file contains and whether your pattern requires a more complex syntax or not, you can embrace the grep command with double quotes:
echo "$(grep -v PATTERN file.txt)" > file.txt
(useful when deleting from your crontab)

How can i add StdOut to a top of a file (not the bottom)?

I am using bash with linux to accomplish adding content to the top of a file.
Thus far i know that i am able to get this done by using a temporary file. so
i am doing it this way:
tac lines.bar > lines.foo
echo "a" >> lines.foo
tac lines.foo > lines.bar
But is there a better way of doing this without having to write a second file?
echo a | cat - file1 > file2
same as shellter's
and sed in one line.
sed -i -e '1 i<whatever>' file1
this will insert to file1 inplace.
the sed example i referred to
tac is very 'expensive' solution, especially as you need to use it 2x. While you still need to use a tmp file, this will take less time:
edit per notes from KeithThompson, now using '.$$' filename and condtional /bin/mv.
{
echo "a"
cat file1
} > file1.$$ && /bin/mv file1.$$ file1
I hope this helps
Using a named pipe and in place replacement with sed, you could add the output of a command at the top of a file without explicitly needing a temporary file:
mkfifo output
your_command >> output &
sed -i -e '1x' -e '1routput' -e '1d' -e '2{H;x}' file
rm output
What this does is buffering the output of your_command in a named pipe (fifo), and inserts in place this output using the r command of sed. For that, you need to start your_command in the background to avoid blocking on output in the fifo.
Note that the r command output the file at the end of the cycle, so we need to buffer the 1st line of file in the hold space, outputting it with the 2nd line.
I write without explicitly needing a temporary file as sed might use one for itself.

Resources