I have an excersise in which I have to print all the file names which are contained in the current folder, which contain in the them one of the letters [a-k] and [m-p] and [1-9] atleast 1 time (each).
I probably have to use ls (glob-style).
If order is important then you can use globbing:
$ ls *[a-k]*[m-p]*[1-9]*
ajunk404 am1 cn5
Else just grep for each group separately:
ls | grep "[a-k]" | grep "[m-p]" | grep "[1-9]"
1ma
ajunk404
am1
cn5
m1a
Note: ls will show directories if you really only want files use find inside:
find . -maxdepth 1 -type f | grep "[a-k]" | grep "[m-p]" | grep "[1-9]"
A 100% pure bash (and funny!) possibility:
#!/bin/bash
shopt -s nullglob
a=( *[a-k]* )
b=(); for i in "${a[#]}"; do [[ "$i" = *[p-z]* ]] && b+=( "$i" ); done
c=(); for i in "${b[#]}"; do [[ "$i" = *[1-9]* ]] && c+=( "$i" ); done
printf "%s\n" "${c[#]}"
No external processes whatsoever! No pipes! Only pure bash! 100% safe regarding files with funny symbols in their name (e.g., newlines) (and that's not the case with other methods using ls). And if you want to actually see the funny symbols in the file names and have them properly quoted, so as to reuse the output, use
printf "%q\n" "${c[#]}"
in place of the last printf statement.
Note. The patterns [a-k], [p-z] are locale-dependent. You might want to set LC_ALL=C to be sure that [a-k] really means [abcdefghijk] and not something else, e.g., [aAbBcCdDeEfFgGhHiIjJk].
Hope this helps!
If order isn't important, and the letters appear once or more, you can use chained greps.
ls | egrep "[a-k]" | egrep "[m-p]" | egrep "[1-9]"
If order matters, then just use a glob pattern
ls *[a-k]*[m-p]*[1-9]*
To be complete, you need to search all the combinations:
ls *[a-k]*[m-p]*[1-9]* *[a-k]*[1-9]*[m-p]* \
*[m-p]*[a-k]*[1-9]* *[m-p]*[1-9]*[a-k]* \
*[1-9]*[m-p]*[a-k]* *[1-9]*[a-k]*[m-p]*
Related
I have multiple fasta files, where the first line always contains a > with multiple words, for example:
File_1.fasta:
>KY620313.1 Hepatitis C virus isolate sP171215 polyprotein gene, complete cds
File_2.fasta:
>KY620314.1 Hepatitis C virus isolate sP131957 polyprotein gene, complete cds
File_3.fasta:
>KY620315.1 Hepatitis C virus isolate sP127952 polyprotein gene, complete cds
I would like to take the word starting with sP* from each file and rename each file to this string (for example: File_1.fasta to sP171215.fasta).
So far I have this:
$ for match in "$(grep -ro '>')";do
fname=$("echo $match|awk '{print $6}'")
echo mv "$match" "$fname"
done
But it doesn't work, I always get the error:
grep: warning: recursive search of stdin
I hope you can help me!
you can use something like this:
grep '>' *.fasta | while read -r line ; do
new_name="$(echo $line | cut -d' ' -f 6)"
old_name="$(echo $line | cut -d':' -f 1)"
mv $old_name "$new_name.fasta"
done
It searches for *.fasta files and handles every "hitted" line
it splits each result of grep by spaces and gets the 6th element as new name
it splits each result of grep by : and gets the first element as old name
it
moves/renames from old filename to new filename
There are several things going on with this code.
For a start, .. I actually don't get this particular error, and this might be due to different versions.
It might resolve to the fact that grep interprets '>' the same as > due to bash expansion being done badly. I would suggest maybe going for "\>".
Secondly:
fname=$("echo $match|awk '{print $6}'")
The quotes inside serve unintended purpose. Your code should like like this, if anything:
fname="$(echo $match|awk '{print $6}')"
Lastly, to properly retrieve your data, this should be your final code:
for match in "$(grep -Hr "\>")"; do
fname="$(echo "$match" | cut -d: -f1)"
new_fname="$(echo "$match" | grep -o "sP[^ ]*")".fasta
echo mv "$fname" "$new_fname"
done
Explanations:
grep -H -> you want your grep to explicitly use "Include Filename", just in case other shell environments decide to alias grep to grep -h (no filenames)
you don't want to be doing grep -o on your file search, as you want to have both the filename and the "new filename" in one data entry.
Although, i don't see why you would search for '>' and not directory for 'sP' as such:
for match in "$(grep -Hro "sP[0-9]*")"
This is not the exact same behaviour, and has different edge cases, but it just might work for you.
Quite straightforward in (g)awk :
create a file "script.awk":
FNR == 1 {
for (i=1; i<=NF; i++) {
if (index($i, "sP")==1) {
print "mv", FILENAME, $i ".fasta"
nextfile
}
}
}
use it :
awk -f script.awk *.fasta > cmmd.txt
check the content of the output.
mv File_1.fasta sP171215.fasta
mv File_2.fasta sP131957.fasta
if ok, launch rename with . cmmd.txt
For all fasta files in directory, search their first line for the first word starting with sP and rename them using that word as the basename.
Using a bash array:
for f in *.fasta; do
arr=( $(head -1 "$f") )
for word in "${arr[#]}"; do
[[ "$word" =~ ^sP* ]] && echo mv "$f" "${word}.fasta" && break
done
done
or using grep:
for f in *.fasta; do
word=$(head -1 "$f" | grep -o "\bsP\w*")
[ -z "$word" ] || echo mv "$f" "${word}.fasta"
done
Note: remove echo after you are ok with testing.
I'm trying to grep multiple arguments in shell.
I put orders like ./script arg1 arg2.. argN
And I want them to act
egrep -i "arg1" mydata | egrep -i "arg2" | ... egrep -i "argN" | awk -f display.awk
in order to match patterns in AND format.
What's wrong in my process?
Is it even right to code like
egrep -i "arg1" mydata | egrep -i "arg2" | ... egrep -i "argN" | awk -f display.awk
to get multiple patterns in AND format??
if [ $# -eq 0 ]
then
echo "Usage:phone searchfor [...searchfor]"
echo "(You didn't tell me what you want to search for.)"
exit 0
else
for arg in $*
do
if [ $arg -eq $1 ]
then
egrep -i "arg" mydata |
else
egrep -i "arg" |
fi
done
awk -f display.awk
fi
If my data has
'happy sunny bunny',
'sleepy bunny',
and 'happy sunny'
I want them to perform if I tried ./script happy sunny bunny
then only
'happy sunny bunny'
comes out.
and if i tried ./script bunny then
'happy sunny bunny'
'sleepy bunny'
both coming out.
The immediate fix is to move the pipe character to after the done.
Also, you should loop over "$#" to preserve the quoting of your arguments, and generally quote your variables.
if [ $# -eq 0 ]
then
# print diagnostics to stderr
echo "Usage: phone searchfor [...searchfor]" >&2
echo "(You didn't tell me what you want to search for.)" >&2
exit 0
fi
for arg in "$#"
do
# Add missing dash before eq
if [ "$arg " -eq "$1" ]
then
# Surely you want "$arg" here, not the static string "arg"?
grep -E -i "$arg" mydata
else
grep -E -i "$arg"
fi
done |
awk -f display.awk
The overall logic still seems flawed; you will be grepping standard input for the first argument if there are more than two arguments. Perhaps you want to add an option to allow the user to specify an input file name, with - to specify standard input? And then all the regular arguments will be search strings, like the usage message suggests.
If indeed the intent is to loop over all the arguments to produce a logical AND, try this:
also () {
local what
what=$1
shift
if [ $# -gt 0 ]; then
grep -E -i "$what" | also "$#"
else
grep -E -i "$what"
fi
}
also "$#" <mydata | awk -f display.awk
... though a better implementation might be to build a simple Awk or sed script from the arguments:
script='1'
for arg in "$#"; do
script="$script && tolower(\$0) ~ tolower(\"$arg\")"
done
awk "$script" | awk -f display.awk
This breaks down if the search phrases could contain regex specials, though (which of course is true for the grep -E version as well; but then you could easily switch to grep -F).
Merging the two Awk scripts into one should probably not be hard either, though without seeing display.awk, this is speculative.
You can solve it recursively:
#! /bin/bash
if (( $# == 0)); then
exec cat
else
arg=$1; shift
egrep "$arg" | "$0" "$#"
fi
The recursion ends, if the script is called with no arguments. In this case it behaves like cat. In your example you can put your awk there. If the script is called with one or more arguemnts, it calles egrep with the first argument ($1) and passes the remaining arguments ($# after shift) to itself ($0).
Example:
$ ./recursive-egrep sys < /etc/passwd
sys:x:3:3:sys:/dev:/usr/sbin/nologin
systemd-timesync:x:100:102:systemd Time Synchronization,,,:/run/systemd:/bin/false
systemd-network:x:101:103:systemd Network Management,,,:/run/systemd/netif:/bin/false
systemd-resolve:x:102:104:systemd Resolver,,,:/run/systemd/resolve:/bin/false
systemd-bus-proxy:x:103:105:systemd Bus Proxy,,,:/run/systemd:/bin/false
$ ./recursive-egrep sys no < /etc/passwd
sys:x:3:3:sys:/dev:/usr/sbin/nologin
Use G from https://gitlab.com/ole.tange/tangetools/tree/master/G which does this (except for the awk part).
SYNOPSIS
G [[grep options] string] [[grep options] string] ...
DESCRIPTION
G is a shorthand of writing (search for single lines matching expressions):
grep --option string | grep --option2 string2
or with -g (search full files matching expressions):
find . -type f | xargs grep -l string1 | xargs grep -l string1
Assignment: I have to create a shell script using diff and sort, and a pipeline using ls -l, grep '^d', and awk '{print $9}' to print a full directory tree.
I wrote a C program to display what I am looking for. Here is the output:
ryan#chrx:~/Documents/OS-Projects/Project5_DirectoryTree$ ./a.out
TestRoot/
[Folder1]
[FolderC]
[FolderB]
[FolderA]
[Folder2]
[FolderD]
[FolderF]
[FolderE]
[Folder3]
[FolderI]
[FolderG]
[FolderH]
I wrote this so far:
ls -R -l $1 | grep '^d' | awk '{print $9}'
to print the directory tree but now I need a way to sort it by folder depth and possibly indent but not required. Any suggestions? I can't use find or tree commands.
EDIT: The original assignment & restrictions were mistaken and changed at a later date. The current answers are good solutions if you disregard the restrictions so please leave them for any people with similar issues. As for the the new assignment in case anybody was wondering. I was to recursively print all sub directories, sort them, then compare them with my program to make sure they have similar results. Here was my solution:
#!/bin/bash
echo Program:
./a.out $1 | sort
echo Shell Script:
ls -R -l $1 | grep '^d' | awk '{print $9}' | sort
diff <(./a.out $1 | sort) <(ls -R -l $1 | grep '^d' | awk '{print $9}' | sort)
DIFF=$?
if [[ $DIFF -eq 0 ]]
then
echo "The outputs are similar!"
fi
You don't need neither ls nor grep nor awk for getting the tree. The Simple recursive bash function will be enouh, like:
#!/bin/bash
walk() {
local indent="${2:-0}"
printf "%*s%s\n" $indent '' "$1"
for entry in "$1"/*; do
[[ -d "$entry" ]] && walk "$entry" $((indent+4))
done
}
walk "$1"
If you run it as bash script.sh /etc it will print the dir-tree like:
/etc
/etc/apache2
/etc/apache2/extra
/etc/apache2/original
/etc/apache2/original/extra
/etc/apache2/other
/etc/apache2/users
/etc/asl
/etc/cups
/etc/cups/certs
/etc/cups/interfaces
/etc/cups/ppd
/etc/defaults
/etc/emond.d
/etc/emond.d/rules
/etc/mach_init.d
/etc/mach_init_per_login_session.d
/etc/mach_init_per_user.d
/etc/manpaths.d
/etc/newsyslog.d
/etc/openldap
/etc/openldap/schema
/etc/pam.d
/etc/paths.d
/etc/periodic
/etc/periodic/daily
/etc/periodic/monthly
/etc/periodic/weekly
/etc/pf.anchors
/etc/postfix
/etc/postfix/postfix-files.d
/etc/ppp
/etc/racoon
/etc/security
/etc/snmp
/etc/ssh
/etc/ssl
/etc/ssl/certs
/etc/sudoers.d
Borrowing from #jm666's idea of running it on /etc:
$ find /etc -type d -print | awk -F'/' '{printf "%*s[%s]\n", 4*(NF-2), "", $0}'
[/etc]
[/etc/alternatives]
[/etc/bash_completion.d]
[/etc/defaults]
[/etc/defaults/etc]
[/etc/defaults/etc/pki]
[/etc/defaults/etc/pki/ca-trust]
[/etc/defaults/etc/pki/nssdb]
[/etc/defaults/etc/profile.d]
[/etc/defaults/etc/skel]
[/etc/fonts]
[/etc/fonts/conf.d]
[/etc/fstab.d]
[/etc/ImageMagick]
[/etc/ImageMagick-6]
[/etc/pango]
[/etc/pkcs11]
[/etc/pki]
[/etc/pki/ca-trust]
[/etc/pki/ca-trust/extracted]
[/etc/pki/ca-trust/extracted/java]
[/etc/pki/ca-trust/extracted/openssl]
[/etc/pki/ca-trust/extracted/pem]
[/etc/pki/ca-trust/source]
[/etc/pki/ca-trust/source/anchors]
[/etc/pki/ca-trust/source/blacklist]
[/etc/pki/nssdb]
[/etc/pki/tls]
[/etc/postinstall]
[/etc/preremove]
[/etc/profile.d]
[/etc/sasl2]
[/etc/setup]
[/etc/skel]
[/etc/ssl]
[/etc/texmf]
[/etc/texmf/tlmgr]
[/etc/texmf/web2c]
[/etc/xml]
Sorry, I couldn't find a sensible way to use the other tools you mentioned so it may not help you but maybe it'll help others with the same question but without the requirement to use specific tools.
I have many files with matching strings in file names.
foostring.bar
barstring.bar
fuustring.bar
aha_foostring.abc
meh_barstring.abc
lol_fuustring.abc
...
I need to find the bar and abc files with matching strings, and rename the *.bar-files basename to the look like the *.abc-files. In other words, add a string prefix.
The result I'm looking for should look like this:
aha_foostring.bar
meh_barstring.bar
lol_fuustring.bar
aha_foostring.abc
meh_barstring.abc
lol_fuustring.abc
...
Clarification Edit: The strings in the *.abc-files are always situated after the last underscore _ and before the dot . The string only contains letters and numbers. The prefix can contain any number of characters, and any type of character, including _ and . This means I also need to take the below example into consideration.
dindongstring.bar
w_h.a.t_e_v_e.r_dingdongstring.abc
I've been experimenting with find, prefix and basename, but I need help and advice here.
Thanks
I would go with something like this:
(I am sure there are more elegant ways to do it (awk/sed))
#!/bin/bash
for filename in *.abc
do
prefix=${filename%_*}
searchstring=${filename%.abc}
searchstring=${searchstring#*_}
if [[ -f "$searchstring.bar" ]]
then
mv "${searchstring}.bar" "${prefix}_${searchstring}.bar"
fi
done
# show the result
ls -al
Apologies for adding this in your answer but since I've deleted my answer and you answer is closest to what OP needs. (I dont mind... I care about solutions =)
EDIT: Probably this is what OP wants:
for f in *.abc; do
prefix=${f%_*}
bar=${f%.abc}
bar="${bar##*_}.bar"
[[ -f "$bar" ]] && mv "$bar" "${prefix}_${bar}"
done
I suggest to try the following "magick":
$ join -j 2 <(ls -1 . | sed -n '/\.bar/s/^\(.*\)\(\.[^.]\+\)$/\1\2\t\1/p' | sort -k2) <(ls -1 . | sed -n '/\.abc/s/^\(.\+_\)\?\([a-zA-Z0-9]\+\)\(\.[^.]\+\)$/\1\2\3\t\2\t\1/p' | sort -k2) | awk '{print $2 " " $4}' | while read FILE PREFIX; do echo mv -v "$FILE" "$PREFIX$FILE"; done
mv -v barstring.bar meh_barstring.bar
mv -v dingdongstring.bar w_h.a.t_e_v_e.r_dingdongstring.bar
mv -v foostring.bar aha_foostring.bar
mv -v fuustring.bar lol_fuustring.bar
If it will show expected commands then remove echo before mv and run again to do the changes.
Note also that there I use ls -1 . command to show files of the current directory, probably you'll need to change directory or run command in directory with files.
Little explanation:
The idea behind that code is to create pairs of filename-common part for .bar and .abc files:
$ ls -1 . | sed -n '/\.bar/s/^\(.*\)\(\.[^.]\+\)$/\1\2\t\1/p' | sort -k2
barstring.bar barstring
dingdongstring.bar dingdongstring
foostring.bar foostring
fuustring.bar fuustring
$ ls -1 . | sed -n '/\.abc/s/^\(.\+_\)\?\([a-zA-Z0-9]\+\)\(\.[^.]\+\)$/\1\2\3\t\2\t\1/p' | sort -k2
meh_barstring.abc barstring meh_
w_h.a.t_e_v_e.r_dingdongstring.abc dingdongstring w_h.a.t_e_v_e.r_
aha_foostring.abc foostring aha_
lol_fuustring.abc fuustring lol_
As you can see there the 2nd field is common part. After that we join these lists together by common part and leave only .abc filename and prefix:
$ join -j 2 <(ls -1 . | sed -n '/\.bar/s/^\(.*\)\(\.[^.]\+\)$/\1\2\t\1/p' | sort -k2) <(ls -1 . | sed -n '/\.abc/s/^\(.\+_\)\?\([a-zA-Z0-9]\+\)\(\.[^.]\+\)$/\1\2\3\t\2\t\1/p' | sort -k2) | awk '{print $2 " " $4}'
barstring.bar meh_
dingdongstring.bar w_h.a.t_e_v_e.r_
foostring.bar aha_
fuustring.bar lol_
And final step is to rename files by adding appropriate prefix to them.
Hey I'm star struck on how to count the different amounts of file types / extensions recursively in a folder. I also need to print them to a .txt file.
For example I have 10 txt's 20 .docx files mixed up in multiple folders.
Help me !
find ./ -type f |awk -F . '{print $NF}' | sort | awk '{count[$1]++}END{for(j in count) print j,"("count[j]" occurences)"}'
Gets all filenames with find, then uses awk to get the extension, then uses awk again to count the occurences
Just with bash: version 4 required for this code
#!/bin/bash
shopt -s globstar nullglob
declare -A exts
for f in * **/*; do
[[ -f $f ]] || continue # only count files
filename=${f##*/} # remove directories from pathname
ext=${filename##*.}
[[ $filename == $ext ]] && ext="no_extension"
: ${exts[$ext]=0} # initialize array element if unset
(( exts[$ext]++ ))
done
for ext in "${!exts[#]}"; do
echo "$ext ${exts[$ext]}"
done | sort -k2nr | column -t
this one seems unsolved so far, so here is how far I got counting files and ordering them:
find . -type f | sed -n 's/..*\.//p' | sort -f | uniq -ic