Renaming long file names in bulk - linux

I have file names like:
5_END_1033_ACAGTG_L002_R1_001.fastq.gz
5_END_1033_ACAGTG_L002_R2_001.fastq.gz
40_END_251_GTGAAA_L002_R1_001.fastq.gz
40_END_251_GTGAAA_L002_R2_001.fastq.gz
I want something like:
END_1033_R1.fastq.gz
END_1033_R2.fastq.gz
END_251_R1.fastq.gz
END_251_R2.fastq.gz
Are there good ways to rename these files in linux?

You could try using a loop to extract the important part of the filename:
for file in ./*.gz; do newname=$(echo $file | sed -re 's/^([^ACAGTG]+).*(R[1-3]).*/\1\2\.fastq\.gz/g'); echo $newname; done
This will simply give you a new list of filenames. You can then move them:
for file in ./*.gz; do newname=$(echo $file | sed -re 's/^([^ACAGTG]+).*(R[1-3]).*/\1\2\.fastq\.gz/g'); mv $file $newname; done
To break this down a little:
loop over the *.gz files
create a variable which strips out the unnecessary content from the name
move the file name to that new name
I expect there are better ways to do this, but it's what I came up with off the top of my head.
Test:
$ ls
40_END_251_GTGAAA_L002_R1_001.fastq.gz 40_END_251_GTGAAA_L002_R2_001.fastq.gz 5_END_1033_ACAGTG_L002_R1_001.fastq.gz 5_END_1033_ACAGTG_L002_R2_001.fastq.gz
$ for file in ./*.gz; do newname=$(echo $file | sed -re 's/^([^ACAGTG]+).*(R[1-3]).*/\1\2\.fastq\.gz/g'); echo $newname; done
./40_END_251_R1.fastq.gz
./40_END_251_R2.fastq.gz
./5_END_1033_R1.fastq.gz
./5_END_1033_R2.fastq.gz
$ for file in ./*.gz; do newname=$(echo $file | sed -re 's/^([^ACAGTG]+).*(R[1-3]).*/\1\2\.fastq\.gz/g'); mv $file $newname; done
$ ls
40_END_251_R1.fastq.gz 40_END_251_R2.fastq.gz 5_END_1033_R1.fastq.gz 5_END_1033_R2.fastq.gz
Note I'm doing this in bash 4.4.5
EDIT
Given I'm not entirely sure which columns in the name are the most important, awk might work better:
for file in ./*.gz; do newname=$(echo $file | awk -F'_' '{print $2 "_" $3 "_" $6}' -); echo $newname; done
This will split the filename by _ and allow you to reference the columns you want using $X:
for file in ./*.gz; do newname=$(echo $file | awk -F'_' '{print $2 "_" $3 "_" $6}' -); mv $file "${newname}.fastq.gz"; done

Related

Created directory with for loop in bash

I have these files. Imagine that each "test" represent the name of one server:
test10.txt
test11.txt
test12.txt
test13.txt
test14.txt
test15.txt
test16.txt
test17.txt
test18.txt
test19.txt
test1.txt
test20.txt
test21.txt
test22.txt
test23.txt
test24.txt
test25.txt
test26.txt
test27.txt
test28.txt
test29.txt
test2.txt
test30.txt
test31.txt
test32.txt
test33.txt
test34.txt
test35.txt
test36.txt
test37.txt
test38.txt
test39.txt
test3.txt
test40.txt
test4.txt
test5.txt
test6.txt
test7.txt
test8.txt
test9.txt
In each txt file, I have this type of data:
2019-10-14-00-00;/dev/hd1;1024.00;136.37;/
2019-10-14-00-00;/dev/hd2;5248.00;4230.53;/usr
2019-10-14-00-00;/dev/hd3;2560.00;481.66;/var
2019-10-14-00-00;/dev/hd4;3584.00;67.65;/tmp
2019-10-14-00-00;/dev/hd5;256.00;26.13;/home
2019-10-14-00-00;/dev/hd1;1024.00;476.04;/opt
2019-10-14-00-00;/dev/hd5;384.00;0.38;/usr/xxx
2019-10-14-00-00;/dev/hd4;256.00;21.39;/xxx
2019-10-14-00-00;/dev/hd2;512.00;216.84;/opt
2019-10-14-00-00;/dev/hd3;128.00;21.46;/var/
2019-10-14-00-00;/dev/hd8;256.00;75.21;/usr/
2019-10-14-00-00;/dev/hd7;384.00;186.87;/var/
2019-10-14-00-00;/dev/hd6;256.00;0.63;/var/
2019-10-14-00-00;/dev/hd1;128.00;0.37;/admin
2019-10-14-00-00;/dev/hd4;256.00;179.14;/opt/
2019-10-14-00-00;/dev/hd3;2176.00;492.93;/opt/
2019-10-14-00-00;/dev/hd1;256.00;114.83;/opt/
2019-10-14-00-00;/dev/hd9;256.00;41.73;/var/
2019-10-14-00-00;/dev/hd1;3200.00;954.28;/var/
2019-10-14-00-00;/dev/hd10;256.00;0.93;/var/
2019-10-14-00-00;/dev/hd10;64.00;1.33;/
2019-10-14-00-00;/dev/hd2;1664.00;501.64;/opt/
2019-10-14-00-00;/dev/hd4;256.00;112.32;/opt/
2019-10-14-00-00;/dev/hd9;2176.00;1223.1;/opt/
2019-10-14-00-00;/dev/hd11;22784.00;12325.8;/opt/
2019-10-14-00-00;/dev/hd12;256.00;2.36;/
2019-10-14-06-00;/dev/hd12;1024.00;137.18;/
2019-10-14-06-00;/dev/hd1;256.00;2.36;/
2019-10-14-00-00;/dev/hd1;1024.00;136.37;/
2019-10-14-00-00;/dev/hd2;5248.00;4230.53;/usr
2019-10-14-00-00;/dev/hd3;2560.00;481.66;/var
2019-10-14-00-00;/dev/hd4;3584.00;67.65;/tmp
2019-10-14-00-00;/dev/hd5;256.00;26.13;/home
2019-10-14-00-00;/dev/hd1;1024.00;476.04;/opt
2019-10-14-00-00;/dev/hd5;384.00;0.38;/usr/xxx
2019-10-14-00-00;/dev/hd4;256.00;21.39;/xxx
2019-10-14-00-00;/dev/hd2;512.00;216.84;/opt
2019-10-14-00-00;/dev/hd3;128.00;21.46;/var/
2019-10-14-00-00;/dev/hd8;256.00;75.21;/usr/
2019-10-14-00-00;/dev/hd7;384.00;186.87;/var/
2019-10-14-00-00;/dev/hd6;256.00;0.63;/var/
2019-10-14-00-00;/dev/hd1;128.00;0.37;/admin
2019-10-14-00-00;/dev/hd4;256.00;179.14;/opt/
2019-10-14-00-00;/dev/hd3;2176.00;492.93;/opt/
2019-10-14-00-00;/dev/hd1;256.00;114.83;/opt/
2019-10-14-00-00;/dev/hd9;256.00;41.73;/var/
2019-10-14-00-00;/dev/hd1;3200.00;954.28;/var/
2019-10-14-00-00;/dev/hd10;256.00;0.93;/var/
2019-10-14-00-00;/dev/hd10;64.00;1.33;/
2019-10-14-00-00;/dev/hd2;1664.00;501.64;/opt/
2019-10-14-00-00;/dev/hd4;256.00;112.32;/opt/
I would like to create a directory for each server, create in each directory a txt file for each FS and put in these txt files each lines which correspond to the FS.
For that, I've tried loop :
#!/bin/bash
directory=(ls *.txt | cut -d'.' -f1)
for d in $directory
do
if [ ! -d $d ]
then
mkdir $d
fi
done
for i in $(cat *.txt)
do
file=$(echo $i | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3 )
data=$(echo $i | awk -F';' '{print $2}' )
echo $i | grep -w $data >> /xx/xx/xx/xx/xx/${directory/${file}.txt
done
But this loop doesn't work properly. The directories are created but not the file inside each directory.
I would like something like :
test1/hd1.txt ( with each line which for the hd1 fs in the hd1.txt)
And same thing for each server.
Can you show me how to do that?
#!/bin/bash
for src in *.txt; do
# start a subshell so we don't need to cd back afterwards
# make "$src" be stdin before cd, so we don't need full path
# be careful that in subshell only awk reads from stdin
(
# extract server name to use as directory
dir=/xx/xx/xx/xx/xx/"${src%.txt}"
# chain with "&&" so failures don't cause bad files
mkdir -p "$dir" &&
cd "$dir" &&
awk -F \; '{ split($2, dev, "/"); print > dev[3]".txt" }'
) < "$src"
done
The awk script reads lines delimited by semi-colons.
It splits the second field on slashes to extract the device name (assumption is that the devices always have form: /dev/name
Finally, the > sends output to the relevant file.
For reference, you can make your script work by doing directory=$(...); adding the prefix to mkdir (assuming the prefix directories already exist); closing the reference ${directory}; and quoting all variable references for safety:
#!/bin/bash
directory=$(ls *.txt | cut -d'.' -f1)
for d in "$directory"
do
if [ ! -d "$d" ]
then
mkdir /xx/xx/xx/xx/xx/"$d"
fi
done
for i in $(cat *.txt)
do
file=$(echo "$i" | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3 )
data=$(echo $i | awk -F';' '{print $2}' )
echo "$i" | grep -w "$data" >> /xx/xx/xx/xx/xx/"${directory}"/"${file}".txt
done
for file in `ls *.txt`
do
echo ${file}
directory=`echo ${file} | cut -d'.' -f1`
#echo ${directory}
if [ ! -d ${directory} ]
then
mkdir ${directory}
fi
FS=`cat ${file} | awk -F';' '{print $2}' | sort | uniq | cut -d'/' -f3`
#echo $FS
for f in $FS
do
cat ${file} |grep -w -e $f > ${directory}/${f}.txt
done
done
Explanation:
For each file in the current directory, the outer for loop will run.
In the loop for the selected file, a respective directory will be created first.
Next using the FS variable we take all the possible file systems from that selected file.
Finally, an inner loop will be run using the FS types to grep and create separate file system files in the directory.

How to rename a file by retaining first 6 characters of my file name and remove rest of the characters?

I have file names like: Rv0012_gyrB.txt, Rv0001_Rv.txt
How to rename a file by retaining first 6 characters of my file name and remove rest of the characters?
My desired output should be:
Rv0012.txt and Rv0001.txt
Please let me know, how to do it using a script in Linux for multiple files.
for file in *; do
filename=${file%_*}
fileext=${file##*.}
if [ "$fileext" = "$file" ]; then
mv "$file $filename"
else
mv "$file $filename.$fileext"
fi
done
This should do it, assuming you want to separate at the first occurence of underscore.
If you want to keep first 6 characters, then this:
for file in `ls | grep .txt`;
do
extension="${file##*.}"
filename="${file%.*}"
filename=${filename:0:6}
echo $filename.$extension
mv $file $filename.$extension
done
If you want to get all characters before "_", then this will do the job
for file in `ls | grep .txt`;
do
extension="${file##*.}"
filename="${file%.*}"
filename=`echo $filename | cut -d "_" -f1`
echo $filename.$extension
mv $file $filename.$extension
done
In case you have some files without extensions, try this
for file in `ls`;
do
extension="${file##*.}"
filename="${file%.*}"
filename=`echo $filename | cut -d "_" -f1`
if [ $file == $extension ]
then
mv $file $filename
else
mv $file $filename.$extension
fi
done

Rename multiple files with sed

How can i rename files with titles like Stargate SG-1 Season 01 Episode 01 to just "s01e01"? Variable numbering of course.
I already have something like this:
for file in *.mkv; do mv "$file" "$(echo "$file" | sed -e "REGEX HERE")
I just need the sed command that does what i need.
Thanks
No need for sed, try this:
#!/bin/bash
for f in *.mkv;
do
set -- $f
mv "$f" s${4}e${6}
done
in action:
$ ls
Stargate SG-1 Season 01 Episode 01.mkv
$ ./l.sh
$ ls
s01e01.mkv
GNU sed
for file in *.mkv; do mv "$file" "$(echo "$file" | sed -e 's/.*\(\S\+\)\s\+\S\+\s\(\S\+\)$/s\1e\2/')
Awk is also good for this
for file in *.mkv; do
mv "$file" $(awk '{print "s", $4, "e", $6}' <<<$file).mkv
done
I think that this is not a problem for sed :)
I would go this way to rename all *.mkv files:
ls *.mkv | awk '{print "mv \"" $0 "\" s" $4 "e" $6}' | sh
or
ls *.mkv | awk '{print "\"" $0 "\" s" $4 "e" $6}' | xargs mv

splitting folder names with awk in a directory

There are some directories in the working directory with this template
cas-2-32
sat-4-64
...
I want to loop over the directory names and grab the second and third part of folder names. I have wrote this script. The body shows what I want to do. But the awk command seems to be wrong
#!/bin/bash
for file in `ls`; do
if [ -d $file ]; then
arg2=`awk -F "-" '{print $2}' $file`
echo $arg2
arg3=`awk -F "-" '{print $3}' $file`
echo $arg3
fi
done
but it says
awk: cmd. line:1: fatal: cannot open file `cas-2-32' for reading (Invalid argument)
awk expects a filename as input. Since you have said the cas-2-32 etc are directories, awk fails for the same reason.
Feed the directory names to awk using echo:
#!/bin/bash
for file in `ls`; do
if [ -d $file ]; then
arg2=$(echo $file | awk -F "-" '{print $2}')
echo $arg2
arg3=$(echo $file | awk -F "-" '{print $3}')
echo $arg3
fi
done
Simple comand: ls | awk '{ FS="-"; print $2" "$3 }'
If you want the values in each line just add "\n" instead of a space in awk's print.
When executed like this
awk -F "-" '{print $2}' $file
awk treats $file's value as the file to be parsed, instead of parsing $file's value itself.
The minimal fix is to use a here-string which can feed the value of a variable into stdin of a command:
awk -F "-" '{print $2}' <<< $file
By the way, you don't need ls if you merely want a list of files in current directory, use * instead, i.e.
for file in *; do
One way:
#!/bin/bash
for file in *; do
if [ -d $file ]; then
tmp="${file#*-}"
arg2="${tmp%-*}"
arg3="${tmp#*-}"
echo "$arg2"
echo "$arg3"
fi
done
The other:
#!/bin/bash
IFS="-"
for file in *; do
if [ -d $file ]; then
set -- $file
arg2="$2"
arg3="$3"
echo "$arg2"
echo "$arg3"
fi
done

File name printed twice when using wc

For printing number of lines in all ".txt" files of current folder, I am using following script:
for f in *.txt;
do l="$(wc -l "$f")";
echo "$f" has "$l" lines;
done
But in output I am getting:
lol.txt has 2 lol.txt lines
Why is lol.txt printed twice (especially after 2)? I guess there is some sort of stream flush required, but I dont know how to achieve that in this case.So what changes should i make in the script to get the output as :
lol.txt has 2 lines
You can remove the filename with 'cut':
for f in *.txt;
do l="$(wc -l "$f" | cut -f1 -d' ')";
echo "$f" has "$l" lines;
done
The filename is printed twice, because wc -l "$f" also prints the filename after the number of lines. Try changing it to cat "$f" | wc -l.
wc prints the filename, so you could just write the script as:
ls *.txt | while read f; do wc -l "$f"; done
or, if you really want the verbose output, try
ls *.txt | while read f; do wc -l "$f" | awk '{print $2, "has", $1, "lines"}'; done
There is a trick here. Get wc to read stdin and it won't print a file name:
for f in *.txt; do
l=$(wc -l < "$f")
echo "$f" has "$l" lines
done

Resources