Combining two files in different folders in Linux - linux

I have two set of folders that have files with the same filenames and structure. The folder structure is something like this:
\outputfolder\
|---\folder1\
| |---file1.txt
| |---file2.txt
|
|---\folder2\
|---file1.txt
|---file2.txt
So what I need to do is to combine (append) all the files with the same name in these folders (file1.txt with file1.txt etc.) into another file inside the outputfolder. After getting these combined files I also need to create a tar.gz file from all of these combined files.
How can I accomplish this in a Linux based command line environment? The folder name (folder1 and folder2 etc) is variable so this needs to be given but the files need not and it should automatically combine all the files with the same name.
Also, these files have headers for column names, so I would need to remove that as well while appending.

Here's some code to get you started
topdir=outputfolder
dir1=folder1
dir2=folder2
for f in $topdir/$dir1/*.txt
do
outf=$topdir/`basename $f .txt`-concat.txt
cp $f $outf
sed -e '1 d' $topdir/$dir2/`basename $f` >> $outf
done
tar czf foo.tar.gz $topdir/*-concat.txt
Edit: added the part removing the header of the 2nd file.

find . -name 'file1.txt' | xargs cat >file1_concat.txt

This will work even if there are some files only in folder1 and some files only in folder2:
concat_files() {
for dir in "$#"; do
for file in "$dir"/*; do
this=$(basename "$file")
{ [[ -f "$this" ]] && sed 1d "$file" || cat "$file"; } >> "$this"
done
done
tar zcvf allfiles.tar.gz *
}
concat_files folder1 folder2
It will work if you have more than 2 folders for your concatenation job.
I assume you want to keep the header in the resulting file.

Have you tried the cat command (concatenation)?
cat file1 file2 >> outputfile
Might want to chuck this in a small bash script to go through the directory. This should start you off.
Best of luck.
Leo

Related

Splitting a large directory into smaller ones in Linux

I have a large directory named as application_pdf which contains 93k files. My use-case is to split the directory into 3 smaller subdirectories (to a different location that the original large directory) containing around 30k files each.
Can this be done directly from the commandline.
Thanks!
Using bash:
x=("path/to/dir1" "path/to/dir2" "path/to/dir3")
c=0
for f in *
do
mv "$f" "${x[c]}"
c=$(( (c+1)%3 ))
done
If you have the rename command from Perl, you could try it like this:
rename --dry-run -pe 'my #d=("dirA","dirB","dirC"); $_=$d[$N%3] . "/$_"' *.pdf
In case you are not that familiar with the syntax:
-p says to create output directories, à la mkdir -p
-e says to execute the following Perl snippet
$d[$N%3] selects one of the directories in array #d as a function of the serially incremented counter $N provided to the snippet by rename
The output value is passed back to rename by setting $_
Remove the --dry-run if it looks good. Please run on a small directory with a copy of 8-10 files first, and make a backup before trying on all your 93k files.
Test
touch {0,1,2,3,4,5,6}.pdf
rename --dry-run -pe 'my #d=("dirA","dirB","dirC"); $_=$d[$N%3] . "/$_"' *.pdf
'0.pdf' would be renamed to 'dirB/0.pdf'
'1.pdf' would be renamed to 'dirC/1.pdf'
'2.pdf' would be renamed to 'dirA/2.pdf'
'3.pdf' would be renamed to 'dirB/3.pdf'
'4.pdf' would be renamed to 'dirC/4.pdf'
'5.pdf' would be renamed to 'dirA/5.pdf'
'6.pdf' would be renamed to 'dirB/6.pdf'
More for my own reference, but if you don't have the Perl rename command, you could do it just in Perl:
perl -e 'use File::Copy qw(move);my #d=("dirA","dirB","dirC"); my $N=0; #files = glob("*.pdf"); foreach $f (#files){my $t=$d[$N++%3] . "/$f"; print "Moving $f to $t\n"; move $f,$t}'
Something like this might work:
for x in $(ls -1 originPath/*.pdf | head -30000); do
mv originPath/$x destinationPath/
done

Linux - How to rename files in batch while removing pattern in filename

I have hundreds of files in a directory all have the suffix "_aac" in the name. For example: weoi32rijwef_aac.mp4
How can I rename all of these files in a batch process to remove the "_aac" from their filenames?
Much simpler way is
cd /to/that/directory
rename 's/_aac//' *
Something like this?:
for i in *_aac.*
do
mv "$i" "`echo $i | sed -e 's/_aac././'`"
done

Copy text from multiple files, same names to different path in bash (linux)

I need help copying content from various files to others (same name and format, different path).
For example, $HOME/initial/baby.desktop has text which I need to write into $HOME/scripts/baby.desktop. This is very simple for a single file, but I have 2500 files in $HOME/initial/ and the same number in $HOME/scripts/ with corresponding names (same names and format). I want append (copy) the content of file in path A to path B (which have the same name and format), to the end of file in path B without erase the content of file in path B.
Example content of $HOME/initial/*.desktop to final $HOME/scripts/*.desktop. I tried the following, but it don't work:
cd $HOME/initial/
for i in $( ls *.desktop ); do egrep "Icon" $i >> $HOME/scripts/$i; done
Firstly, I would backup $HOME/initial and $HOME/scripts, because there is lots of scope for people misunderstanding your question. Like this:
cd $HOME
tar -cvf initial.tar initial
tar -cvf scripts.tar scripts
That will put all the files in $HOME/initial into a single tarfile called initial.tar and all the files in $HOME/scripts into a single tarfile called scripts.tar.
Now for your question... in general, if you want to put the contents of FileB onto the end of FileA, the command is
cat FileB >> FileA
Note the DOUBLE ">>" which means "append" rather than single ">" which means overwrite.
So, I think you want to do this:
cd $HOME/initial/baby.desktop
cat SomeFile >> $HOME/scripts/baby.desktop/SomeFile
where SomeFile is the name of any file you choose to test with. I would test that has worked and then, if you are happy with that, go ahead and run the same command inside a loop:
cd $HOME/initial/baby.desktop
for SOURCE in *
do
DESTINATION="$HOME/scripts/baby.desktop/$SOURCE"
echo Appending "$SOURCE" to "$DESTINATION"
#cat "$SOURCE" >> "$DESTINATION"
done
When the output looks correct, remove the "#" at the start of the penultimate line and run it again.
I solved it, if some people want learn how to resolve is very simple:
using Sed
I need only the match (or pattern) line "Icon=/usr/share/some_picture.png into $HOME/initial/example.desktop to other with same name and format $HOME/scripts/example.desktop, but I had a lot of .desktop files (2500 files)
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do sed -ne '/Icon=/ p' $i >> $HOME/scripts/$i ; done
_________
If you need only copy all to other file with same name and format
using cat
cd $HOME/initial
STRING_LINE=`grep -l -R "Icon=" *.desktop`
for i in $STRING_LINE; do cat $i >> $HOME/scripts/$i ; done

Move files and rename - one-liner

I'm encountering many files with the same content and the same name on some of my servers. I need to quarantine these files for analysis so I can't just remove the duplicates. The OS is Linux (centos and ubuntu).
I enumerate the file names and locations and put them into a text file.
Then I do a for statement to move the files to quarantine.
for file in $(cat bad-stuff.txt); do mv $file /quarantine ;done
The problem is that they have the same file name and I just need to add something unique to the filename to get it to save properly. I'm sure it's something simple but I'm not good with regex. Thanks for the help.
Since you're using Linux, you can take advantage of GNU mv's --backup.
while read -r file
do
mv --backup=numbered "$file" "/quarantine"
done < "bad-stuff.txt"
Here's an example that shows how it works:
$ cat bad-stuff.txt
./c/foo
./d/foo
./a/foo
./b/foo
$ while read -r file; do mv --backup=numbered "$file" "./quarantine"; done < "bad-stuff.txt"
$ ls quarantine/
foo foo.~1~ foo.~2~ foo.~3~
$
I'd use this
for file in $(cat bad-stuff.txt); do mv $file /quarantine/$file.`date -u +%s%N`; done
You'll get everyfile with a timestamp appended (in nanoseconds).
You can create a new file name composed by the directory and the filename. Thus you can add one more argument in your original code:
for ...; do mv $file /quarantine/$(echo $file | sed 's:/:_:g') ; done
Please note that you should replace the _ with a proper character which is special enough.

Script for renaming files with logical

Someone has very kindly help get me started on a mass rename script for renaming PDF files.
As you can see I need to add a bit of logical to stop the below happening - so something like add a unique number to a duplicate file name?
rename 's/^(.{5}).*(\..*)$/$1$2/' *
rename -n 's/^(.{5}).*(\..*)$/$1$2/' *
Annexes 123114345234525.pdf renamed as Annex.pdf
Annexes 123114432452352.pdf renamed as Annex.pdf
Hope this makes sense?
Thanks
for i in *
do
x='' # counter
j="${i:0:2}" # new name
e="${i##*.}" # ext
while [ -e "$j$x" ] # try to find other name
do
((x++)) # inc counter
done
mv "$i" "$j$x" # rename
done
before
$ ls
he.pdf hejjj.pdf hello.pdf wo.pdf workd.pdf world.pdf
after
$ ls
he.pdf he1.pdf he2.pdf wo.pdf wo1.pdf wo2.pdf
This should check whether there will be any duplicates:
rename -n [...] | grep -o ' renamed as .*' | sort | uniq -d
If you get any output of the form renamed as [...], then you have a collision.
Of course, this won't work in a couple corner cases - If your files contain newlines or the literal string renamed as, for example.
As noted in my answer on your previous question:
for f in *.pdf; do
tmp=`echo $f | sed -r 's/^(.{5}).*(\..*)$/$1$2/'`
mv -b ./"$f" ./"$tmp"
done
That will make backups of deleted or overwritten files. A better alternative would be this script:
#!/bin/bash
for f in $*; do
tar -rvf /tmp/backup.tar $f
tmp=`echo $f | sed -r 's/^(.{5}).*(\..*)$/$1$2/'`
i=1
while [ -e tmp ]; do
tmp=`echo $tmp | sed "s/\./-$i/"`
i+=1
done
mv -b ./"$f" ./"$tmp"
done
Run the script like this:
find . -exec thescript '{}' \;
The find command gives you lots of options for specifing which files to run on, works recursively, and passes all the filenames in to the script. The script backs all file up with tar (uncompressed) and then renames them.
This isn't the best script, since it isn't smart enough to avoid the manual loop and check for identical file names.

Resources