Paste files from list of paths into single output file - linux

I have a file containing a list of filenames and their paths, as in the example below:
$ cat ./filelist.txt
/trunk/data/9.20.txt
/trunk/data/9.30.txt
/trunk/data/50.3.txt
/trunk/data/55.100.txt
...
All of these files, named as X.Y.txt, contain a list of double values. For example:
$ cat ./9.20.txt
1.23
1.0e-6
...
I'm trying to paste all of these X.Y.txt files into a single file, but I'm not sure about how to do it. Here's what I've been able to do so far:
cat ./filelist.txt | xargs paste output.txt >> output.txt
Any ideas on how to do it properly?

You could simply cat-append each file into your output file, as in:
$ cat <list_of_paths> | xargs -I {} cat {} >> output.txt
In the above command, each line from your input file will be taken by xargs, and will be used to replace {}, so that each actual command being run is:
$ cat <X.Y.txt> >> output.txt

If all you're looking to do is to read each line from filelist.txt and append the contents of the file that the line refers to to a single output file, use this:
while read -r file; do
[[ -f "$file" ]] && cat "$file"
done < "filelist.txt" > "output.txt"
Edit: If you know your input file to only contain lines that are file paths (and optionally empty lines) - and no comments, etc. - #Rubens' xargs-based solution is the simplest.
The advantage of the while loop is that you can pre-process each line from the input file, as demonstrated by the -f test above, which ensures that the input line refers to an existing file.

More complex but without argument length limit
Well, the limit here is the available computer memory.
The file buffer.txt must not exist already.
touch buffer.txt
cat filelist.txt | xargs -iXX bash -c 'paste buffer.txt XX > output.txt; mv output.txt buffer.txt';
mv buffer.txt output.txt
What this does, by line:
Create a buffer.txt file which must be initially empty. (paste does not seem to like non-existent files. There does not seem to be a way to make it treat such files as empty.)
Run paste buffer.txt XX > output.txt; mv output.txt buffer.txt. XX is replaced by each file in the filelist.txt file. You can't just do paste buffer.txt XX > buffer.txt because buffer.txt will be truncated before paste processes it. Hence the mv rigmarole.
Move buffer.txt to output.txt so that you get your output with the file name you wanted. Also makes it safe to rerun the whole process.
The previous version forced xargs to issue exactly one paste per file you want to paste but for even better performance, you can do this:
touch buffer.txt;
cat filelist.txt | xargs bash -c 'paste buffer.txt "$#" > output.txt; mv output.txt buffer.txt' FILLER;
mv buffer.txt output.txt
Note the presence of "$#" in the command that bash executes. So paste gets the list of arguments from the list of arguments given to bash. The FILLER parameter passed to bash is to give it a value for $0. If it were not there, then the first file that xargs gives to bash would be used for $0 and thus paste would skip some files.
This way, xargs can pass hundreds of parameters to paste with each invocation and thus reduce dramatically the number of times paste is invoked.
Simpler but limited way
This method suffer from limitations on the number of arguments that a shell can pass to a command it executes. However, in many cases it is good enough. I can't count the number of times when I was performing spur-of-the-moment operations where using xargs would have been superfluous. (As part of a long term solution, that's another matter.)
The simpler way is:
paste `cat filelist.txt` > output.txt
It seems you were thinking that xargs would execute paste output.txt >> output.txt multiple times but that's not how it works. The redirection applies to the entire cat ./filelist.txt | xargs paste output.txt (as you initially had it). If you want to have redirection apply to the individual commands launched by xargs you have it launch a shell, like I do above.

#!/usr/bin/env bash
set -x
while read -r
do
echo "${REPLY}" >> output.txt
done < filelist.txt
OR, to get the files directly:-
#!/usr/bin/env bash
set -x
find *.txt -type f | while read $files
do
echo "${files}" >> output.txt
done

A simple while loop should do the trick:
while read line; do
cat ${line} >> output.txt
done < filelist.txt

Related

Copy a txt file twice to a different file using bash

I am trying to cat a file.txt and loop it twice through the whole content and copy it to a new file file_new.txt. The bash command I am using is as follows:
for i in {1..3}; do cat file.txt > file_new.txt; done
The above command is just giving me the same file contents as file.txt. Hence file_new.txt is also of the same size (1 GB).
Basically, if file.txt is a 1GB file, then I want file_new.txt to be a 2GB file, double the contents of file.txt. Please, can someone help here? Thank you.
Simply apply the redirection to the for loop as a whole:
for i in {1..3}; do cat file.txt; done > file_new.txt
The advantage of this over using >> (aside from not having to open and close the file multiple times) is that you needn't ensure that a preexisting output file is truncated first.
Note that the generalization of this approach is to use a group command ({ ...; ...; }) to apply redirections to multiple commands; e.g.:
$ { echo hi; echo there; } > out.txt; cat out.txt
hi
there
Given that whole files are being output, the cost of invoking cat for each repetition will probably not matter that much, but here's a robust way to invoke cat only once:[1]
# Create an array of repetitions of filename 'file' as needed.
files=(); for ((i=0; i<3; ++i)); do files[i]='file'; done
# Pass all repetitions *at once* as arguments to `cat`.
cat "${files[#]}" > file_new.txt
[1] Note that, hypothetically, you could run into your platform's command-line length limit, as reported by getconf ARG_MAX - given that on Linux that limit is 2,097,152 bytes (2MB) that's not likely, though.
You could use the append operator, >>, instead of >. Then adjust your loop count as needed to get the output size desired.
You should adjust your code so it is as follows:
for i in {1..3}; do cat file.txt >> file_new.txt; done
The >> operator appends data to a file rather than writing over it (>)
if file.txt is a 1GB file,
cat file.txt > file_new.txt
cat file.txt >> file_new.txt
The > operator will create file_new.txt(1GB),
The >> operator will append file_new.txt(2GB).
for i in {1..3}; do cat file.txt >> file_new.txt; done
This command will make file_new.txt(3GB),because for i in {1..3} will run three times.
As others have mentioned, you can use >> to append. But, you could also just invoke cat once and have it read the file 3 times. For instance:
n=3; cat $( yes file.txt | sed ${n}q ) > file_new.txt
Note that this solution exhibits a common anti-pattern and fails to properly quote the arguments, which will cause issues if the filename contains whitespace. See mklement's solution for a more robust solution.

How to append contents of multiple files into one file

I want to copy the contents of five files to one file as is. I tried doing it using cp for each file. But that overwrites the contents copied from the previous file. I also tried
paste -d "\n" 1.txt 0.txt
and it did not work.
I want my script to add the newline at the end of each text file.
eg. Files 1.txt, 2.txt, 3.txt. Put contents of 1,2,3 in 0.txt
How do I do it ?
You need the cat (short for concatenate) command, with shell redirection (>) into your output file
cat 1.txt 2.txt 3.txt > 0.txt
Another option, for those of you who still stumble upon this post like I did, is to use find -exec:
find . -type f -name '*.txt' -exec cat {} + >> output.file
In my case, I needed a more robust option that would look through multiple subdirectories so I chose to use find. Breaking it down:
find .
Look within the current working directory.
-type f
Only interested in files, not directories, etc.
-name '*.txt'
Whittle down the result set by name
-exec cat {} +
Execute the cat command for each result. "+" means only 1 instance of cat is spawned (thx #gniourf_gniourf)
>> output.file
As explained in other answers, append the cat-ed contents to the end of an output file.
if you have a certain output type then do something like this
cat /path/to/files/*.txt >> finalout.txt
If all your files are named similarly you could simply do:
cat *.log >> output.log
If all your files are in single directory you can simply do
cat * > 0.txt
Files 1.txt,2.txt, .. will go into 0.txt
for i in {1..3}; do cat "$i.txt" >> 0.txt; done
I found this page because I needed to join 952 files together into one. I found this to work much better if you have many files. This will do a loop for however many numbers you need and cat each one using >> to append onto the end of 0.txt.
Edit:
as brought up in the comments:
cat {1..3}.txt >> 0.txt
or
cat {0..3}.txt >> all.txt
Another option is sed:
sed r 1.txt 2.txt 3.txt > merge.txt
Or...
sed h 1.txt 2.txt 3.txt > merge.txt
Or...
sed -n p 1.txt 2.txt 3.txt > merge.txt # -n is mandatory here
Or without redirection ...
sed wmerge.txt 1.txt 2.txt 3.txt
Note that last line write also merge.txt (not wmerge.txt!). You can use w"merge.txt" to avoid confusion with the file name, and -n for silent output.
Of course, you can also shorten the file list with wildcards. For instance, in case of numbered files as in the above examples, you can specify the range with braces in this way:
sed -n w"merge.txt" {1..3}.txt
if your files contain headers and you want remove them in the output file, you can use:
for f in `ls *.txt`; do sed '2,$!d' $f >> 0.out; done
All of the (text-) files into one
find . | xargs cat > outfile
xargs makes the output-lines of find . the arguments of cat.
find has many options, like -name '*.txt' or -type.
you should check them out if you want to use it in your pipeline
If the original file contains non-printable characters, they will be lost when using the cat command. Using 'cat -v', the non-printables will be converted to visible character strings, but the output file would still not contain the actual non-printables characters in the original file. With a small number of files, an alternative might be to open the first file in an editor (e.g. vim) that handles non-printing characters. Then maneuver to the bottom of the file and enter ":r second_file_name". That will pull in the second file, including non-printing characters. The same could be done for additional files. When all files have been read in, enter ":w". The end result is that the first file will now contain what it did originally, plus the content of the files that were read in.
Send multi file to a file(textall.txt):
cat *.txt > textall.txt
If you want to append contents of 3 files into one file, then the following command will be a good choice:
cat file1 file2 file3 | tee -a file4 > /dev/null
It will combine the contents of all files into file4, throwing console output to /dev/null.

How to insert a text at the beginning of a file?

So far I've been able to find out how to add a line at the beginning of a file but that's not exactly what I want. I'll show it with an example:
File content
some text at the beginning
Result
<added text> some text at the beginning
It's similar but I don't want to create any new line with it...
I would like to do this with sed if possible.
sed can operate on an address:
$ sed -i '1s/^/<added text> /' file
What is this magical 1s you see on every answer here? Line addressing!.
Want to add <added text> on the first 10 lines?
$ sed -i '1,10s/^/<added text> /' file
Or you can use Command Grouping:
$ { echo -n '<added text> '; cat file; } >file.new
$ mv file{.new,}
If you want to add a line at the beginning of a file, you need to add \n at the end of the string in the best solution above.
The best solution will add the string, but with the string, it will not add a line at the end of a file.
sed -i '1s/^/your text\n/' file
If the file is only one line, you can use:
sed 's/^/insert this /' oldfile > newfile
If it's more than one line. one of:
sed '1s/^/insert this /' oldfile > newfile
sed '1,1s/^/insert this /' oldfile > newfile
I've included the latter so that you know how to do ranges of lines. Both of these "replace" the start line marker on their affected lines with the text you want to insert. You can also (assuming your sed is modern enough) use:
sed -i 'whatever command you choose' filename
to do in-place editing.
Use subshell:
echo "$(echo -n 'hello'; cat filename)" > filename
Unfortunately, command substitution will remove newlines at the end of file. So as to keep them one can use:
echo -n "hello" | cat - filename > /tmp/filename.tmp
mv /tmp/filename.tmp filename
Neither grouping nor command substitution is needed.
To insert just a newline:
sed '1i\\'
You can use cat -
printf '%s' "some text at the beginning" | cat - filename
To add a line to the top of the file:
sed -i '1iText to add\'
my two cents:
sed -i '1i /path/of/file.sh' filename
This will work even is the string containing forward slash "/"
Hi with carriage return:
sed -i '1s/^/your text\n/' file
Note that on OS X, sed -i <pattern> file, fails. However, if you provide a backup extension, sed -i old <pattern> file, then file is modified in place while file.old is created. You can then delete file.old in your script.
There is a very easy way:
echo "your header" > headerFile.txt
cat yourFile >> headerFile.txt
PROBLEM: tag a file, at the top of the file, with the base name of the parent directory.
I.e., for
/mnt/Vancouver/Programming/file1
tag the top of file1 with Programming.
SOLUTION 1 -- non-empty files:
bn=${PWD##*/} ## bn: basename
sed -i '1s/^/'"$bn"'\n/' <file>
1s places the text at line 1 of the file.
SOLUTION 2 -- empty or non-empty files:
The sed command, above, fails on empty files. Here is a solution, based on https://superuser.com/questions/246837/how-do-i-add-text-to-the-beginning-of-a-file-in-bash/246841#246841
printf "${PWD##*/}\n" | cat - <file> > temp && mv -f temp <file>
Note that the - in the cat command is required (reads standard input: see man cat for more information). Here, I believe, it's needed to take the output of the printf statement (to STDIN), and cat that and the file to temp ... See also the explanation at the bottom of http://www.linfo.org/cat.html.
I also added -f to the mv command, to avoid being asked for confirmations when overwriting files.
To recurse over a directory:
for file in *; do printf "${PWD##*/}\n" | cat - $file > temp && mv -f temp $file; done
Note also that this will break over paths with spaces; there are solutions, elsewhere (e.g. file globbing, or find . -type f ... -type solutions) for those.
ADDENDUM: Re: my last comment, this script will allow you to recurse over directories with spaces in the paths:
#!/bin/bash
## https://stackoverflow.com/questions/4638874/how-to-loop-through-a-directory-recursively-to-delete-files-with-certain-extensi
## To allow spaces in filenames,
## at the top of the script include: IFS=$'\n'; set -f
## at the end of the script include: unset IFS; set +f
IFS=$'\n'; set -f
# ----------------------------------------------------------------------------
# SET PATHS:
IN="/mnt/Vancouver/Programming/data/claws-test/corpus test/"
# https://superuser.com/questions/716001/how-can-i-get-files-with-numeric-names-using-ls-command
# FILES=$(find $IN -type f -regex ".*/[0-9]*") ## recursive; numeric filenames only
FILES=$(find $IN -type f -regex ".*/[0-9 ]*") ## recursive; numeric filenames only (may include spaces)
# echo '$FILES:' ## single-quoted, (literally) prints: $FILES:
# echo "$FILES" ## double-quoted, prints path/, filename (one per line)
# ----------------------------------------------------------------------------
# MAIN LOOP:
for f in $FILES
do
# Tag top of file with basename of current dir:
printf "[top] Tag: ${PWD##*/}\n\n" | cat - $f > temp && mv -f temp $f
# Tag bottom of file with basename of current dir:
printf "\n[bottom] Tag: ${PWD##*/}\n" >> $f
done
unset IFS; set +f
Just for fun, here is a solution using ed which does not have the problem of not working on an empty file. You can put it into a shell script just like any other answer to this question.
ed Test <<EOF
a
.
0i
<added text>
.
1,+1 j
$ g/^$/d
wq
EOF
The above script adds the text to insert to the first line, and then joins the first and second line. To avoid ed exiting on error with an invalid join, it first creates a blank line at the end of the file and remove it later if it still exists.
Limitations: This script does not work if <added text> is exactly equal to a single period.
echo -n "text to insert " ;tac filename.txt| tac > newfilename.txt
The first tac pipes the file backwards (last line first) so the "text to insert" appears last. The 2nd tac wraps it once again so the inserted line is at the beginning and the original file is in its original order.
The simplest solution I found is:
echo -n "<text to add>" | cat - myFile.txt | tee myFile.txt
Notes:
Remove | tee myFile.txt if you don't want to change the file contents.
Remove the -n parameter if you want to append a full line.
Add &> /dev/null to the end if you don't want to see the output (the generated file).
This can be used to append a shebang to the file. Example:
# make it executable (use u+x to allow only current user)
chmod +x cropImage.ts
# append the shebang
echo '#''!'/usr/bin/env ts-node | cat - cropImage.ts | tee cropImage.ts &> /dev/null
# execute it
./cropImage.ts myImage.png
Another solution with aliases. Add to your init rc/ env file:
addtail () { find . -type f ! -path "./.git/*" -exec sh -c "echo $# >> {}" \; }
addhead () { find . -type f ! -path "./.git/*" -exec sh -c "sed -i '1s/^/$#\n/' {}" \; }
Usage:
addtail "string to add at the beginning of file"
addtail "string to add at the end of file"
With the echo approach, if you are on macOS/BSD like me, lose the -n switch that other people suggest. And I like to define a variable for the text.
So it would be like this:
Header="my complex header that may have difficult chars \"like these quotes\" and line breaks \n\n "
{ echo "$Header"; cat "old.txt"; } > "new.txt"
mv new.txt old.txt
TL;dr -
Consider using ex. Since you want the front of a given line, then the syntax is basically the same as what you might find for sed but the option of "in place editing" is built-in.
I cannot imagine an environment where you have sed but not ex/vi, unless it is a MS Windows box with some special "sed.exe", maybe.
sed & grep sort of evolved from ex / vi, so it might be better to say sed syntax is the same as ex.
You can change the line number to something besides #1 or search for a line and change that one.
source=myFile.txt
Front="This goes IN FRONT "
man true > $source
ex -s ${source} <<EOF
1s/^/$Front/
wq
EOF
$ head -n 3 $source
This goes IN FRONT TRUE(1) User Commands TRUE(1)
NAME
Long version, I recommend ex (or ed if you are one of the cool kids).
I like ex because it is portable, extremely powerful, allows me to write in-place, and/or make backups all without needing GNU (or even BSD) extensions.
Additionally, if you know the ex way, then you know how to do it in vi - and probably vim if that is your jam.
Notice that EOF is not quoted when we use "i"nsert and using echo:
str="+++ TOP +++" && ex -s <<EOF
r!man true
1i
`echo "$str"`
.
"0r!echo "${str}"
wq! true.txt
EOF
0r!echo "${str}" might also be used as shorthand for :0read! or :0r! that you have likely used in vi mode (it is literally the same thing) but the : is optional here and some implementations do not support "r"ead address of zero.
"r"eading directly to the special line #0 (or from line 1) would automatically push everything "down", and then you just :wq to save your changes.
$ head -n 3 true.txt | nl -ba
1 +++ TOP +++
2 TRUE(1) User Commands TRUE(1)
3
Also, most classic sed implementations do not have extensions (like \U&) that ex should have by default.
cat concatenates multiple files. <() sends output of a command as a file. Combining these two, we can insert lines at the beginning and end of a file by,
cat <(echo "line before the file") file.txt <(echo "line after the file")

How can i add StdOut to a top of a file (not the bottom)?

I am using bash with linux to accomplish adding content to the top of a file.
Thus far i know that i am able to get this done by using a temporary file. so
i am doing it this way:
tac lines.bar > lines.foo
echo "a" >> lines.foo
tac lines.foo > lines.bar
But is there a better way of doing this without having to write a second file?
echo a | cat - file1 > file2
same as shellter's
and sed in one line.
sed -i -e '1 i<whatever>' file1
this will insert to file1 inplace.
the sed example i referred to
tac is very 'expensive' solution, especially as you need to use it 2x. While you still need to use a tmp file, this will take less time:
edit per notes from KeithThompson, now using '.$$' filename and condtional /bin/mv.
{
echo "a"
cat file1
} > file1.$$ && /bin/mv file1.$$ file1
I hope this helps
Using a named pipe and in place replacement with sed, you could add the output of a command at the top of a file without explicitly needing a temporary file:
mkfifo output
your_command >> output &
sed -i -e '1x' -e '1routput' -e '1d' -e '2{H;x}' file
rm output
What this does is buffering the output of your_command in a named pipe (fifo), and inserts in place this output using the r command of sed. For that, you need to start your_command in the background to avoid blocking on output in the fifo.
Note that the r command output the file at the end of the cycle, so we need to buffer the 1st line of file in the hold space, outputting it with the 2nd line.
I write without explicitly needing a temporary file as sed might use one for itself.

Problem with Bash output redirection [duplicate]

This question already has answers here:
Why doesnt "tail" work to truncate log files?
(6 answers)
Closed 1 year ago.
I was trying to remove all the lines of a file except the last line but the following command did not work, although file.txt is not empty.
$cat file.txt |tail -1 > file.txt
$cat file.txt
Why is it so?
Redirecting from a file through a pipeline back to the same file is unsafe; if file.txt is overwritten by the shell when setting up the last stage of the pipeline before tail starts reading off the first stage, you end up with empty output.
Do the following instead:
tail -1 file.txt >file.txt.new && mv file.txt.new file.txt
...well, actually, don't do that in production code; particularly if you're in a security-sensitive environment and running as root, the following is more appropriate:
tempfile="$(mktemp file.txt.XXXXXX)"
chown --reference=file.txt -- "$tempfile"
chmod --reference=file.txt -- "$tempfile"
tail -1 file.txt >"$tempfile" && mv -- "$tempfile" file.txt
Another approach (avoiding temporary files, unless <<< implicitly creates them on your platform) is the following:
lastline="$(tail -1 file.txt)"; cat >file.txt <<<"$lastline"
(The above implementation is bash-specific, but works in cases where echo does not -- such as when the last line contains "--version", for instance).
Finally, one can use sponge from moreutils:
tail -1 file.txt | sponge file.txt
You can use sed to delete all lines but the last from a file:
sed -i '$!d' file
-i tells sed to replace the file in place; otherwise, the result would write to STDOUT.
$ is the address that matches the last line of the file.
d is the delete command. In this case, it is negated by !, so all lines not matching the address will be deleted.
Before 'cat' gets executed, Bash has already opened 'file.txt' for writing, clearing out its contents.
In general, don't write to files you're reading from in the same statement. This can be worked around by writing to a different file, as above:$cat file.txt | tail -1 >anotherfile.txt
$mv anotherfile.txt file.txtor by using a utility like sponge from moreutils:$cat file.txt | tail -1 | sponge file.txt
This works because sponge waits until its input stream has ended before opening its output file.
When you submit your command string to bash, it does the following:
Creates an I/O pipe.
Starts "/usr/bin/tail -1", reading from the pipe, and writing to file.txt.
Starts "/usr/bin/cat file.txt", writing to the pipe.
By the time 'cat' starts reading, 'file.txt' has already been truncated by 'tail'.
That's all part of the design of Unix and the shell environment, and goes back all the way to the original Bourne shell. 'Tis a feature, not a bug.
tmp=$(tail -1 file.txt); echo $tmp > file.txt;
This works nicely in a Linux shell:
replace_with_filter() {
local filename="$1"; shift
local dd_output byte_count filter_status dd_status
dd_output=$("$#" <"$filename" | dd conv=notrunc of="$filename" 2>&1; echo "${PIPESTATUS[#]}")
{ read; read; read -r byte_count _; read filter_status dd_status; } <<<"$dd_output"
(( filter_status > 0 )) && return "$filter_status"
(( dd_status > 0 )) && return "$dd_status"
dd bs=1 seek="$byte_count" if=/dev/null of="$filename"
}
replace_with_filter file.txt tail -1
dd's "notrunc" option is used to write the filtered contents back, in place, while dd is needed again (with a byte count) to actually truncate the file. If the new file size is greater or equal to the old file size, the second dd invocation is not necessary.
The advantages of this over a file copy method are: 1) no additional disk space necessary, 2) faster performance on large files, and 3) pure shell (other than dd).
As Lewis Baumstark says, it doesn't like it that you're writing to the same filename.
This is because the shell opens up "file.txt" and truncates it to do the redirection before "cat file.txt" is run. So, you have to
tail -1 file.txt > file2.txt; mv file2.txt file.txt
echo "$(tail -1 file.txt)" > file.txt
Just for this case it's possible to use cat < file.txt | (rm file.txt; tail -1 > file.txt)
That will open "file.txt" just before connection "cat" with subshell in "(...)". "rm file.txt" will remove reference from disk before subshell will open it for write for "tail", but contents will be still available through opened descriptor which is passed to "cat" until it will close stdin. So you'd better be sure that this command will finish or contents of "file.txt" will be lost
It seems to not like the fact you're writing it back to the same filename. If you do the following it works:
$cat file.txt | tail -1 > anotherfile.txt
tail -1 > file.txt will overwrite your file, causing cat to read an empty file because the re-write will happen before any of the commands in your pipeline are executed.

Resources