Rename part of file name based on exact match in contents of another file - linux

I would like to rename a bunch of files by changing only one part of the file name and doing that based on an exact match in a list in another file. For example, if I have these file names:
sample_ACGTA.txt
sample_ACGTA.fq.abc
sample_ACGT.txt
sample_TTTTTC.tsv
sample_ACCCGGG.fq
sample_ACCCGGG.txt
otherfile.txt
and I want to find and replace based on these exact matches, which are found in another file called replacements.txt:
ACGT name1
TTTTTC longername12
ACCCGGG nam7
ACGTA another4
So that the desired resulting file names would be
sample_another4.txt
sample_another4.fq.abc
sample_name1.txt
sample_longername12.tsv
sample_nam7.fq
sample_nam7.txt
otherfile.txt
I do not want to change the contents. So far I have tried sed and mv based on my search results on this website. With sed I found out how to replace the contents of the file using my list:
while read from to; do
sed -i "s/$from/$to/" infile ;
done < replacements.txt,
and with mv I have found a way to rename files if there is one simple replacement:
for files in sample_*; do
mv "$files" "${files/ACGTA/another4}"
done
But how can I put them together to do what I would like?
Thank you for your help!

You can perfectly combine your for and while loops to only use mv:
while read from to ; do
for i in test* ; do
if [ "$i" != "${i/$from/$to}" ] ; then
mv $i ${i/$from/$to}
fi
done
done < replacements.txt
An alternative solution with sed could consist in using the e command that executes the result of a substitution (Use with caution! Try without the ending e first to print what commands would be executed).
Hence:
sed 's/\(\w\+\)\s\+\(\w\+\)/mv sample_\1\.txt sample_\2\.txt/e' replacements.txt
would parse your replacements.txt file and rename all your .txt files as desired.
We just have to add a loop to deal with the other extentions:
for j in .txt .bak .tsv .fq .fq.abc ; do
sed "s/\(\w\+\)\s\+\(\w\+\)/mv 'sample_\1$j' 'sample_\2$j'/e" replacements.txt
done
(Note that you should get error messages when it tries to rename non-existing files, for example when it tries to execute mv sample_ACGT.fq sample_name1.fq but file sample_ACGT.fq does not exist)

You could use awk to generate commands:
% awk '{print "for files in sample_*; do mv $files ${files/" $1 "/" $2 "}; done" }' replacements.txt
for files in sample_*; do mv $files ${files/ACGT/name1}; done
for files in sample_*; do mv $files ${files/TTTTTC/longername12}; done
for files in sample_*; do mv $files ${files/ACCCGGG/nam7}; done
for files in sample_*; do mv $files ${files/ACGTA/another4}; done
Then either copy/paste or pipe the output directly to your shell:
% awk '{print "for files in sample_*; do mv $files ${files/" $1 "/" $2 "}; done" }' replacements.txt | bash
If you want the longer match string to be used first, sort the replacements first:
% sort -r replacements.txt | awk '{print "for files in sample_*; do mv $files ${files/" $1 "/" $2 "}; done" }' | bash

Related

bash/awk/unix detect changes in lines of csv files

I have a timestamp in this format:
(normal_file.csv)
timestamp
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
19/02/2002
The dates are usually uniform, however, there are files with irregular dates pattern such as this example:
(abnormal_file.csv)
timestamp
19/02/2002
19/02/2003
19/02/2005
19/02/2006
In my directory, there are hundreds of files that consist of normal.csv and abnormal.csv.
I want to write a bash or awk script that detect the dates pattern in all files of a directory. Files with abnormal.csv should be moved automatically to a new, separate directory (let's say dir_different/).
Currently, I have tried the following:
#!/bin/bash
mkdir dir_different
for FILE in *.csv;
do
# pipe 1: detect the changes in the line
# pipe 2: print the timestamp column (first column, columns are comma-separated)
awk '$1 != prev {print ; prev = $1}' < $FILE | awk -F , '{print $1}'
done
If the timestamp in a given file is normal, then only one single timestamp will be printed; but for abnormal files, multiple dates will be printed.
I am not sure how to separate the abnormal files from the normal files, and I have tried the following:
do
output=$(awk 'FNR==3{print $0}' $FILE)
echo ${output}
if [[ ${output} =~ ([[:space:]]) ]]
then
mv $FILE dir_different/
fi
done
Or is there an easier method to detect changes in lines and separate files that have different lines? Thank you for any suggestions :)
Assuming that none of your "normal" CSV files have trailing newlines this should do the separation just fine:
#!/bin/bash
mkdir -p dir_different
for FILE in *.csv;
do
if awk '{a[$1]++}END{if(length(a)<=2){exit 1}}' "$FILE" ; then
echo mv "$FILE" dir_different
fi
done
After a dry-run just get rid of the echo :)
Edit:
{a[$1]++} This bit creates an array a that gets the first field of each line as an index, and that gets incremented every time the same value is seen.
END{if(length(a)<=2){exit 1}} This checks how many elements are in the array. If there there are less than 3 (which should be the case if there's always the same date and we only get 1 header, 1 date) exit the processing with 1.
"$FILE" is part of the bash script, not awk, and I quoted your variable out of habit, should you ever have files w/ spaces in their names you'll see why :)
So, a "normal" file contains only two different lines:
timestamp
dd/mm/yyyy
Testing if a file is normal is thus as simple as:
[ $(sort -u file.csv | wc -l) -eq 2 ]
This leads to the following possible solution:
#!/usr/bin/env bash
mkdir -p dir_different
for FILE in *.csv;
do
if [ $(sort -u "$FILE" | wc -l) -ne 2 ] ; then
echo mv "$FILE" dir_different
fi
done

One liner terminal command for renaming all files in directory to their hash values

I am new to bash loops and trying to rename all files in a directory to their appropriate md5 values.
There are 5 sample files in the directory.
For testing purpose, I am trying to first just print md5 hashes of all files in the directory using below command and it is working fine.
for i in `ls`; do md5sum $i; done
Output:
edc47be8af3a7d4d55402ebae9f04f0a file1
72cf1321d5f3d2e9e1be8abd971f42f5 file2
4b7b590d6d522f6da7e3a9d12d622a07 file3
357af1e7f8141581361ac5d39efa4d89 file4
1445c4c1fb27abd9061ada3b30a18b44 file5
Now I am trying to rename each file with its appropriate md5 hashes by following command:
for i in `ls`; do mv $i md5sum $i; done
Failed Output:
mv: target 'file1' is not a directory
mv: target 'file2' is not a directory
mv: target 'file3' is not a directory
mv: target 'file4' is not a directory
mv: target 'file5' is not a directory
What am I missing here?
Your command is expanded to
mv file1 edc47be8af3a7d4d55402ebae9f04f0a file1
When mv has more than two non-option arguments, it understands the last argument to be the target directory to which all the preceding files should be moved. But there's no directory file1.
You can use parameter expansion to remove the filename from the string.
Parameter expansion is usually faster then running an external command like cut or sed, but if you aren't renaming thousands of files, it probably doesn't matter.
for f in *; do
m=$(md5sum "$f")
mv "$f" ${m%% *} # Remove everything after the first space
done
Also note that I don't parse the output of ls, but let the shell expand the glob. It's safer and works (with proper quoting) for filenames containing whitespace.
Syntax. Yes I was giving wrong syntax.
With some trial and errors with the command, I finally came up with the correct syntax.
I noticed that md5sum $i was giving me 2 column-ed output.
edc47be8af3a7d4d55402ebae9f04f0a file1
72cf1321d5f3d2e9e1be8abd971f42f5 file2
4b7b590d6d522f6da7e3a9d12d622a07 file3
357af1e7f8141581361ac5d39efa4d89 file4
1445c4c1fb27abd9061ada3b30a18b44 file5
By firing second command for i in ls; do mv $i md5sum $i; done, I was basically telling terminal to do something like :
mv $i md5sum $i
which, upto my knowledge, turns out to be
mv file1 <md5 value> file1 <-- this was the issue.
How I resolved the issue?
I used cut command to filter out required value and made new one-liner as below:
for i in `ls`; do mv $i "$(md5sum $i | cut -d " " -f 1)"; done
[Edit]
According to another answer and comment by #stark, #choroba and #tripleee, it is better to use * instead of ls.
for i in *; do mv $i "$(md5sum $i | cut -d " " -f 1)"; done
#choroba's answer is also a good addition here. Turning it into one-liner requirement, below is his solution:
for i in *; do m=$(md5sum $i); mv "$i" ${m%% *};done

Rename file by swaping text

I need to rename files by swaping some text.
I had for example :
CATEGORIE_2017.pdf
CLASSEMENT_2016.pdf
CATEGORIE_2018.pdf
PROPRETE_2015.pdf
...
and I want them
2017_CATEGORIE.pdf
2016_CLASSEMENT.pdf
2018_CATEGORIE.pdf
2015_PROPRETE.pdf
I came up with this bash version :
ls *.pdf | while read i
do
new_name=$(echo $i |sed -e 's/\(.*\)_\(.*\)\.pdf/\2_\1\.pdf/')
mv $i $new_name
echo "---"
done
It is efficient but seems quite clumsy to me. Does anyone have a smarter solution, for example with rename ?
Using rename you can do the renaming like this:
rename -n 's/([^_]+)_([^.]+).pdf/$2_$1.pdf/g' *.pdf
The option -n does nothing, it just prints what would happen. If you are satisfied, remove the -n option.
I use [^_]+ and [^.]+ to capture the part of the filename before and after the the _. The syntax [^_] means everything but a _.
One way:
ls *.pdf | awk -F"[_.]" '{print "mv "$0" "$2"_"$1"."$3}' | sh
Using awk, swap the positions and form the mv command and pass it to shell.
Using only bash:
for file in *_*.pdf; do
no_ext=${file%.*}
new_name=${no_ext##*_}_${no_ext%_*}.${file##*.}
mv -- "$file" "$new_name"
done

Linux : Move files that have more than 100 commas in one line

I have 100 files in a specific directory that contains several records with fields delimited with commas.
I need to use a Linux command that check the lines in each file
and if the line contains more than 100 comma move it to another directory.
Is it possible ?
Updated Answer
Although my original answer below is functional, Glenn's (#glennjackman) suggestion in the comments is far more concise, idiomatic, eloquent and preferable - as follows:
#!/bin/bash
mkdir subdir
for f in file*; do
awk -F, 'NF>100{exit 1}' "$f" || mv "$f" subdir
done
It basically relies on awk's exit status generally being 0, and then only setting it to 1 when encountering files that need moving.
Original Answer
This will tell you if a file has more than 100 commas on any line:
awk -F, 'NF>100{print 1;exit} END{print 0}' someFile
It will print 1 and exit without parsing the remainder of the file if the file has any line with more than 100, and print 0 at the end if it doesn't.
If you want to move them as well, use
#!/bin/bash
mkdir subdir
for f in file*; do
if [[ $(awk -F, 'NF>100{print 1;exit}END{print 0}' "$f") != "0" ]]; then
echo mv "$f" subdir
fi
done
Try this and see if it selects the correct files, and, if you like it, remove the word echo and run it again so it actually moves them. Back up first!

Change Names of Multiple Files Linux

I have a number of files with names a1.txt, b1.txt, c1,txt...on ubuntu machine.
Is there any quick way to change all file names to a2.txt, b2.txt, c2.txt...?
In particular, I'd like to replace part of the name string. For instance, every file name contains a string called "apple" and I want to replace "apple" with "pear" in all file names.
Any command or script?
without any extra software you can:
for FILE in *1.txt; do mv "$FILE" $(echo "$FILE" | sed 's/1/2/'); done
for f in {a..c}1.txt; do echo "$f" "${f/1/2}"; done
replace 'echo' with 'mv' if the output looks correct.
and I want to replace "apple" with "linux"
for f in *apple*; do mv "$f" "${f/apple/linux}"; done
The curly brackets in line 1 should work with bash at least.
The following command will rename the specified files by replacing the first occurrence of 1 in their name by 2:
rename 1 2 *1.txt
ls *1.txt | perl -ne 'chomp; $x = $_; $x =~ s/1/2/; rename $_, $x;'
Here's another option that worked for me (following the examples above) for files in different subdirectories
for FILE in $(find . -name *1.txt); do mv "$FILE" "${FILE/1/2}"; done;
Something like this should work:
for i in *1.txt; do
name=$(echo $i | cut -b1)
mv $i ${name}2.txt
done
Modify to suit your needs.

Resources