Capture part of name so that I can loop in linux - linux

how do i capture a pattern in a filename and use that to do in linux?
example in a folder contains these files:
BBB137O19_rc.fa
BBB921N08_cleaned.fa
BBB002O19_cc.fa
I would like to capture the front part of the filename and use that to do things like renaming, run a program etc. Apparently, basename is greedy and works for everything before the extension.
thanks in advance
I tried this command but failed
for i in *.fa; base=$(basename $i _*.fa); comb="${base}_ec.txt"; mv ec.txt $comb; done

You can use BASH string manipulations:
s='BBB921N08_cleaned.fa'
echo "${s%%_*}"
BBB921N08

Also, sed can be used:
echo "$i"|sed 's/_.*//'
This removes _ and any character (.) occuring any number of times after it (*).
Sed with its regular expressions is especially useful, if you have more complicated patterns to process.

Related

How do I replace ".net" with space using sed in Linux?

I'm using for loop, with arguments i. Each argument contains ".net" at the end and in directory they are in one line, divided by some space. Now I need to get rid of these ".net" using substitution of sed, but it's not working. I went through different options, the most recent one is
sed 's/\.(net)//g' $i;
which is obviously not correct, but I just can't find anything online about this.
To make it clear, lets say I have a directory with 5 files with names
file1.net
file2.net
file3.net
file4.net
file5.net
I would like my output to be
file1
file2
file3
file
file5
...Could somebody give me some advice?
You can use
for f in *.net; do mv "$f" "${f%.*}"; done
Details:
for f in *.net; - iterates over files with net extension
mv "$f" "${f%.*}" - renames the files with the file without net extension (${f%.*} removes all text - as few as possible - from the end of f till the first ., see Parameter expansion).
This is a work for perl's rename :
rename -n 's/\.net//' *.net
The -n is for test purpose. Remove it if the output looks good for you
This way:
sed -i.backup 's/\.net$//g' "$1";
It will create a backup for safeness

Linux - rename all files by replacing last hyphen with '##'

Please anyone.
How do I in Linux rename a bunch of files like:
abc-def-0001.xxx
acb-def-0002.xxx
to:
abc-def##0001.xxx
...
I have tried several suggestions from SO like:
rename 's/(.*)-/$1##/' *.xxx
But didn't worked as expected in my environment.
You can use lookahead in your regex:
rename -n 's/-(?=\d)/##/' *.xxx
This will match & replace first _ that is followed by a digit.
However your pattern 's/(.*)-/$1##/' would also work for given examples but this assumes you're always replacing last underscore.
So I ended up using:
for i in *; do mv "$i" "`echo $i | sed "s/\(.*\)-/\1##/"`"; done
I think my version of the rename command does not support the perl expressions...

How to extract string in shell script

I have file names like Tarun_Verma_25_02_2016_10_00_10.csv. How can I extract the string like 25_02_2016_10_00_10 from it in shell script?
It is not confirmed that how many numeric parts there would be after "firstName"_"lastName"
A one-line solution would be preferred.
with sed
$ echo Tarun_Verma_25_02_2016_10_00_10.csv | sed -r 's/[^0-9]*([0-9][^.]*)\..*/\1/'
25_02_2016_10_00_10
extract everything between the first digit and dot.
If you want some control over which parts you pick out (assuming the format is always like <firstname>_<lastname>_<day>_<month>_<year>_<hour>_<minute>_<second>.csv) awk would be pretty handy
echo "Tarun_Verma_25_02_2016_10_00_10.csv" | awk -F"[_.]" 'BEGIN{OFS="_"}{print $3,$4,$5,$6,$7,$8}'
Here awk splits by both underscore and period, sets the Output Field Seperator to an underscore, and then prints the parts of the file name that you are interested in.
ksh93 supports the syntax bash calls extglobs out-of-the-box. Thus, in ksh93, you can do the following:
f='Tarun_Verma_25_02_2016_10_00_10.csv'
f=${f##+([![:digit:]])} # trim everything before the first digit
f=${f%%+([![:digit:]])} # trim everything after the last digit
echo "$f"
To do the same in bash, you'll want to run the following command first
shopt -s extglob
Since this uses shell-native string manipulation, it runs much more quickly than invoking an external command (sed, awk, etc) when processing only a single line of input. (When using ksh93 rather than bash, it's quite speedy even for large inputs).

Shell scripting : to print selected text in the string

Log file name: "/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
From the above string i need to extract the content BCH1043.
The directory structure may differ so the solution should check for the string with BCH until the dot
No need to call basename, you can use parameter substitution that is built-in to the shell for the whole thing:
$ cat x.sh
filepath="/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
# Strip off the path. Everything between and including the slashes.
filename=${filepath##/*/}
# Then strip off everything after and including the first dot.
part1=${filename%%.*}
echo $part1
$ ./x.sh
BCH1043
$
A dot in the filepath will not cause trouble either.
See section 4.5.4 here for more info: http://docstore.mik.ua/orelly/unix3/korn/ch04_05.htm
Oh and resist the temptation to get tricky and do it all in one line. Breaking into separate components is much easier to debug and maintain down the road, and who knows you may need to use those components too (the path and the rest of the file name).
basename will reduce /home/msubra/WORK/tmo/LOG/BCH1043.9987.log to BCH1043.9987.log
echo /home/msubra/WORK/tmo/LOG/BCH1043.9987.log | basename
You can use regular expressions, awk, perl, sed etc to extract "BCH1043" from "BCH1043.9987.log". First I need to know what the range of possible filenames is before I can suggest a regular expression for you.
Use basename to extract only the filename and then use parameter expansion to strip off the data you don't want.
log=/home/msubra/WORK/tmo/LOG/BCH1043.9987.log
log=$(basename "$log")
echo "${log%%.*}"
The following is almost equivalent but doesn't use the external basename process. However there are cases where it will give different results (though whether those cases are relevant here is up to you and your usage/input). See this answer for examples/details.
log=/home/msubra/WORK/tmo/LOG/BCH1043.9987.log
log=${log#*/}
echo "${log%%.*}"
try like this:
a="/home/msubra/WORK/tmo/LOG/BCH1043.9987.log"
echo ${a##*/} | cut -d "." -f 1
or
basename $a | cut -d "." -f 1
or
var=${a##*/}; echo ${var%%.*}
output:
BCH1043
It dosent include dot. Your question is not clear, but you can extract like that
${a##*/} will extract after last / like same as basename

Extracting sub-strings in Unix

I'm using cygwin on Windows 7. I want to loop through a folder consisting of about 10,000 files and perform a signal processing tool's operation on each file. The problem is that the files names have some excess characters that are not compatible with the operation. Hence, I need to extract just a certain part of the file names.
For example if the file name is abc123456_justlike.txt.rna I need to use abc123456_justlike.txt. How should I write a loop to go through each file and perform the operation on the shortened file names?
I tried the cut - b1-10 command but that doesn't let my tool perform the necessary operation. I'd appreciate help with this problem
Try some shell scripting, using the ${NAME%TAIL} parameter substitution: the contents of variable NAME are expanded, but any suffix material which matches the TAIL glob pattern is chopped off.
$ NAME=abc12345.txt.rna
$ echo ${NAME%.rna} #
# process all files in the directory, taking off their .rna suffix
$ for x in *; do signal_processing_tool ${x%.rna} ; done
If there are variations among the file names, you can classify them with a case:
for x in * ; do
case $x in
*.rna )
# do something with .rna files
;;
*.txt )
# do something else with .txt files
;;
* )
# default catch-all-else case
;;
esac
done
Try sed:
echo a.b.c | sed 's/\.[^.]*$//'
The s command in sed performs a search-and-replace operation, in this case it replaces the regular expression \.[^.]*$ (meaning: a dot, followed by any number of non-dots, at the end of the string) with the empty string.
If you are not yet familiar with regular expressions, this is a good point to learn them. I find manipulating string using regular expressions much more straightforward than using tools like cut (or their equivalents).
If you are trying to extract the list of filenames from a directory use the below command.
ls -ltr | awk -F " " '{print $9}' | cut -c1-10

Resources