Best way to overwrite file with itself - linux

What is the fastest / most elegant way to read out a file and then write the content to that same file?
On Linux, this is not always the same as 'touching' a file, e.g. if the file represents some hardware device.
One possibility that worked for me is echo $(cat $file) > $file but I wondered if this is best practice.

You cannot generically do this in one step because writing to the file can interfere with reading from it. If you try this on a regular file, it's likely the > redirection will blank the file out before anything is even read.
Your safest bet is to split it into two steps. You can hide it behind a function call if you want.
rewrite() {
local file=$1
local contents=$(< "$file")
cat <<< "$contents" > "$file"
}
...
rewrite /sys/misc/whatever

I understand that you are not interested in simple touch $file but rather a form of transformation of the file.
Since transformation of the file can cause its temporary corruption it is NOT good to make it in place while others may read it in the same moment.
Because of that temporary file is the best choice of yours, and only when you convert the file you should replace the file in a one step:
cat "$file" | process >"$file.$$"
mv "$file.$$" "$file"
This is the safest set of operations.
Of course I omitted operations like 'check if the file really exists' and so on, leaving it for you.

The sponge utility in the moreutils package allows this
~$ cat file
foo
~$ cat file | sponge file
~$ cat file
foo

Related

How to copy the contents of a file to the end of the same file?

I need to copy the contents of a file to the end of the same file.
I wrote the following code.
#!/bin/bash
cat file.txt >> file1.txt
cat file1.txt >> file.txt
rm file1.txt
But it creates an additional file. How can this be done without creating an additional file?
You could do something like:
input=file1.txt
dd bs="$(wc -c < "$input")" count=1 if="$input" >> "$input"
but....why? There's nothing wrong with using auxiliary files, and if you remove it shortly after it is created there is almost zero chance it will ever cause any actual disk operations. Just create the temp file and remove it. Note that this has an inherent race condition, since the computation of the file size (and wc -c is a terrible way to do that!) may produce a value that is wrong if any other process is mutating the file, but this issue is inherent in your problem description.
You can create a variable that as the value of you file and put it to the end :)
Or use sponge - that's exactly what it's there for.
cat file1.txt file1.txt | sponge file1.txt

Copy a txt file twice to a different file using bash

I am trying to cat a file.txt and loop it twice through the whole content and copy it to a new file file_new.txt. The bash command I am using is as follows:
for i in {1..3}; do cat file.txt > file_new.txt; done
The above command is just giving me the same file contents as file.txt. Hence file_new.txt is also of the same size (1 GB).
Basically, if file.txt is a 1GB file, then I want file_new.txt to be a 2GB file, double the contents of file.txt. Please, can someone help here? Thank you.
Simply apply the redirection to the for loop as a whole:
for i in {1..3}; do cat file.txt; done > file_new.txt
The advantage of this over using >> (aside from not having to open and close the file multiple times) is that you needn't ensure that a preexisting output file is truncated first.
Note that the generalization of this approach is to use a group command ({ ...; ...; }) to apply redirections to multiple commands; e.g.:
$ { echo hi; echo there; } > out.txt; cat out.txt
hi
there
Given that whole files are being output, the cost of invoking cat for each repetition will probably not matter that much, but here's a robust way to invoke cat only once:[1]
# Create an array of repetitions of filename 'file' as needed.
files=(); for ((i=0; i<3; ++i)); do files[i]='file'; done
# Pass all repetitions *at once* as arguments to `cat`.
cat "${files[#]}" > file_new.txt
[1] Note that, hypothetically, you could run into your platform's command-line length limit, as reported by getconf ARG_MAX - given that on Linux that limit is 2,097,152 bytes (2MB) that's not likely, though.
You could use the append operator, >>, instead of >. Then adjust your loop count as needed to get the output size desired.
You should adjust your code so it is as follows:
for i in {1..3}; do cat file.txt >> file_new.txt; done
The >> operator appends data to a file rather than writing over it (>)
if file.txt is a 1GB file,
cat file.txt > file_new.txt
cat file.txt >> file_new.txt
The > operator will create file_new.txt(1GB),
The >> operator will append file_new.txt(2GB).
for i in {1..3}; do cat file.txt >> file_new.txt; done
This command will make file_new.txt(3GB),because for i in {1..3} will run three times.
As others have mentioned, you can use >> to append. But, you could also just invoke cat once and have it read the file 3 times. For instance:
n=3; cat $( yes file.txt | sed ${n}q ) > file_new.txt
Note that this solution exhibits a common anti-pattern and fails to properly quote the arguments, which will cause issues if the filename contains whitespace. See mklement's solution for a more robust solution.

Sort files according to their filetype

After an HD problem and some work, I have a bunch of files with names like "f1234", "f1235", etc.
My goal is to sort this files according to their filetype. For example, I want to move all the PDF files in the "pdfs" directory.
For one file, I can do : "file f1234", and if it's a PDF, I can "mv f1234 pdfs/". But I have thousands of file... Can you help me with a bash or zsh command for sort all the PDF in one pass ? Thanks
The hard part here is reliably turning the output of file into a directory name. I think probably the best candidate for that is the mime-type of the file rather than the human readable output of file. I'd use something like:
mkdir sorted
for f in f*
do
d=$(file -b --mime-type "$f" | tr / -)
mkdir -p "sorted/$d"
mv "$f" "sorted/$d/"
done
Obviously I'd test that out a bit before running it on your files, but something pretty close to that should work.

Append data without using cat command

i was wondering weather is it possible to append data in a file without using cat command.
I've considered using sed to append data , but as of my knowledge sed only operates after loading the full data into the memory. please do correct me if i'm wrong on this.
If you want to append data to a file, you can simply use the append I/O-redirection >>. For instance:
echo "first line" > file
echo "next line" >> file
Or you could append an entire file
echo "$(<otherfile)" >> file
This command is however not advisable since it will load the entire file first into memory.
A better way is to use tee:
tee < otherfile >> file
Instead of cat, you can also use echo command to do the same.
And ofcourse, >> operator does it.

How to make many edits to files, without writing to the harddrive very often, in BASH?

I often need to make many edits to text files. The files are typically 20 MB in size and require ~500,000 individual edits, all which must be made in a very specific order. Here is a simple example of a script I might need to use:
while read -r line
do
...
(20-100 lines of BASH commands preparing $a and $b)
...
sed -i "s/$a/$b/g" ./editfile.txt
...
done < ./readfile.txt
As many other lines of code appear before and after the sed script, it seems the only option for editing the file is sed with the -i option. Many have warned me against using sed -i, as that makes too many writes to the file. Recently, I had to replace two computers, as the hard drives stopped working after running the scripts. I need to find a solution that does not damage my computer's hardware.
Is there some way to send files somewhere else, such as storing the whole file into a BASH variable, or into RAM, where I sed, grep, and awk, can make the edits without making millions of writes to the hard drive?
Don't use sed -i once per transform. A far better approach -- leaving you with more control -- is to construct a pipeline (if you can't use a single sed with multiple -e arguments to perform multiple operations within a single instance), and redirect to or from disk at only the beginning and end.
This can even be done recursively, if you use a FD other than stdin for reading from your file:
editstep() {
read -u 3 -r line # read from readfile into REPLY
if [[ $REPLY ]]; then # we read something new from readfile
sed ... | editstep # perform the edits, then a recursive call!
else
cat
fi
}
editstep <editfile.txt >editfile.txt.new 3<readfile.txt
Better than that, though, is to consolidate to a single sed instance.
sed_args=( )
while read -r line; do
sed_args+=( -e "s/in/out/" )
done <readfile.txt
sed -i "${sed_args[#]}" editfile.txt
...or, for edit lists too long to pass in on the command line:
sed_args=( )
while read -r line; do
sed_args+=( "s/in/out/" )
done <readfile.txt
sed -i -f <(printf '%s\n' "${sed_args[#]}") editfile.txt
(Please don't read the above as an endorsement of sed -i, which is a non-POSIX extension and has its own set of problems; the POSIX-specified editor intended for in-place rather than streaming operations is ex, not sed).
Even better? Don't use sed at all, but keep all the operations inline in native bash.
Consider the following:
content=$(<editfile.txt)
while IFS= read -r; do
# put your own logic here to set `in` and `out`
content=${content//$in/$out}
done <readfile.txt
printf '%s\n' "$content" >editfile.new
One important caveat: This approach treats in as a literal string, not a regular expression. Depending on the edits you're actually making, this may actually improve correctness over the original code... but in any event, it's worth being aware of.
Another caveat: Reading the file's contents into a bash string is not necessarily a lossless operation; expect content to be truncated at the first NUL byte (if any exist), and a trailing newline to be added at the end of the file if none existed before.
simple ...
instead of trying too many threads, you can simple copy all your files and dirs to /dev/shm
This is representation of ram drive. When you are done editing, copy all back to the original destination. Do not forget to run sync after you are done :-)

Resources