Collect only numbers in a file's extension - linux

I need some help, I'm creating a script with the purpose of going through a text file line by line and validating it with the images of a folder.
My doubt is: when I search for images, I only want the number not the extension.
find /mnt/62-PASTA/01.GERAL/ -mindepth 2 | head -19 | cut -d/ -f6
I get:
111066.jpg
88008538.jpg
11241.jpg
88008563.jpg
116071.PNG
But I want
111066
88008538
11241
88008563
116071
Any help?

A really simple way given the examples shown would be to use cut again to split on .:
find /mnt/62-PASTA/01.GERAL/ -mindepth 2 | head -19 | cut -d/ -f6 | cut -d'.' -f1

What we can do here is use another cut command.
cut -d .
Now this will give me strings separated by . as delimiter. Then we can grab all but last part as below.
cut -d . -f 1
I think this should work.Check below link for additional details.

Suggesting sed pipe instead of cut
find /mnt/62-PASTA/01.GERAL/ -mindepth 2 | sed 's|\.[[:alpha:]]*$||'

With pure shell solution, try following solution. Simple explanation would be, using a for loop to loop through .jpg, .PNG format files in your current directory. Then in main loop, using bash's capability to perform substitution and substitute everything apart from digits with NULL in file's name which will give only digits from file's names.
Running code from directory/mnt/62-PASTA/01.GERAL/:
for file in *.jpg *.PNG;
do
echo "${file//[^0-9]/}"
done
OR with full path(/mnt/62-PASTA/01.GERAL/) to run from any other path try following code:
for file in /mnt/62-PASTA/01.GERAL/*.jpg /mnt/62-PASTA/01.GERAL/*.PNG;
do
file1="${file##*/}" ##Removing values till / to get only filename.
echo "${file1//[^0-9]/}" ##Removing everything apart from digits in filename.
done
Output will be as follows:
111066
11241
88008538
88008563
116071

Related

How to sort and print array listing of specific file type in shell

I am trying to write a loop with which I want to extract text file names in all sub-directories and append certain strings to it. Additionally, I want the text file name sorted for numbers after ^.
For example, I have three sub directories mydir1, mydir2, mydir3. I have,
in mydir1,
file223^1.txt
file221^2.txt
file666^3.txt
in mydir2,
file111^1.txt
file4^2.txt
In mydir3,
file1^4.txt
file5^5.txt
The expected result final.csv:
STRINGmydir1file223^1
STRINGmydir1file221^2
STRINGmydir1file666^3
STRINGmydir2file111^1
STRINGmydir2file4^2
STRINGmydir3file1^4
STRINGmydir3file5^5
This is the code I tried:
for dir in my*/; do
array=(${dir}/*.txt)
IFS=$'\n' RGBASE=($(sort <<<"${array[#]}"));
for RG in ${RGBASE[#]}; do
RGTAG=$(basename ${RG/.txt//})
echo "STRING${dir}${RGTAG}" >> final.csv
done
done
Can someone please explain what is wrong with my code? Also, there could be other better ways to do this, but I want to use the for-loop.
The output with this code:
$ cat final.csv
STRINGdir1file666^3.txt
STRINGdir2file4^2.txt
STRINGdir3file5^5.txt
As a starting point which works for your special case, I got a two liner for this.
mapfile -t array < <( find my* -name "*.txt" -printf "STRING^^%H^^%f\n" | cut -d"." -f1 | LANG=C sort -t"^" -k3,3 -k6 )
printf "%s\n" "${array[#]//^^/}"
To restrict the directory depth, you can add -maxdepth with the number of subdirs to search. The find command can also use regex in the search, which is applied to the whole path, which can be used to work on a more complex directory-tree.
The difficulty was the sort on two positions and the delimiter.
My idea was to add a delimiter, which easily can be removed afterwards.
The sort command can only handle one delimiter, therefore I had to use the double hat as delimiter which can be removed without removing the single hat in the filename.
A solution using decorate-sort-undecorate idiom could be:
printf "%s\n" my*/*.txt |
sed -E 's_(.*)/(.*)\^([0-9]+).*_\1\t\3\tSTRING\1\2^\3_' |
sort -t$'\t' -k1,1 -k2,2n |
cut -f3
assuming filenames don't contain tab or newline characters.
A basic explanation: The printf prints each pathname on a separate line. The sed converts the pathname dir/file^number.txt into dir\tnumber\tSTRINGdirfile^number (\t represents a tab character). The aim is to use the tab character as a field separator in the sort command. The sort sorts the lines by the first (lexicographically) and second fields (numerically). The cut discards the first and second fields; the remaining field is what we want.

Grep a word out of a file and save the file as that word

I am using Ubuntu Linux and grepping info out of a file (lets say filename.log) and want to save the file using some of the info inside of (filename.log).
example:
The info in the (filename.log) has version_name and date.
When displaying this info on screen using cat it will display:
version_name=NAME
date=TODAY
I then want to save the file as NAME-TODAY.log and have no idea how to do this.
Any help will be appreciated
You can chain a bunch of basic linux commands with the pipe character |. Combined with a thing called command substitution (taking the output of a complex command, to use in another command. syntax: $(your command)) you can achieve what you want to do.
This is what I came up with, based on your question:
cp filename.log $(grep -E "(version_name=)|(date=)" filename.log | cut -f 2 -d = | tr '\n' '-' | rev | cut -c 2- | rev).log
So here I used cp, $(), grep, cut, tr and finally rev.
Since you said you had no idea where to start, let me walk you trough this oneliner:
cp - it is used to copy the filename.log file to a new file,
with the name based on the values of version_name and date (step 2 and up)
command substitution $() the entire command between the round brackets is 'resolved' before finishing the cp command in step 1. e.g. in your example it would be NAME-TODAY. notice the .log at the end outside of the round brackets to give it a proper file extension. The output of this command in your example will be NAME-TODAY.log
grep -E "(version_name=)|(date=)" grep with regexp flag -E to be able to do what we are doing. Matches any lines that contain version_name= OR date=. The expected output is:
version_name=NAME
date=TODAY
cut -f 2 -d = because I am not interested in version_name
, but instead in the value associated with that field, I use cut to split the line at the equals character = with the flag -d =. I then select the value behind the equals character (the second field) with the flag -f 2. The expected output is:
NAME
TODAY
tr '\n' '-' because grep outputs on multiple lines, I want to remove all new lines and replace them with a dash. Expected output:
NAME-TODAY-
rev | cut -c 2- | rev I am grouping these. rev reverses the word I have created. with cut -c 2- I cut away all characters starting from the second character of the reversed word. This is required because I replaced new lines with dashes and this means I now have NAME-TODAY-. Basicly this is just an extra step to remove the last dash. See expected outputs of each step:
-YADOT-EMAN
YADOT-EMAN
NAME-TODAY
remember this value is in the command substituion of step 2, so the end result will be:
cp filename.log NAME-TODAY.log
I manged to solve this by doing the following: grep filename.log > /tmp/file.info && filename=$(echo $(grep "version_name" /tmp/filename.info | cut -d " " -f 3)-$(grep "date" /tmp/filename.info | cut -d " " -f 3)-$filename.log

Using cut in Linux Mint Terminal more precisely

In the directory /usr/lib on Linux Mint there are files, among other things, that goes by the name of xxx.so.d where xxx is their name, and d being a number. The assignment is to find all files with .so file ending and write out their name, xxx. The code I got so far is
ls | grep "\.so\." | cut -d "." -f 1
The problem now is that cut cuts of some filenames short, as an example there is an file called libgimp-2.0.so.0, where the wanted output would be libgimp-2.0 since that part is infront of .so
Is there anyway to make cut cut at ".so" instead of the first .?
The answer given by pacholik can give you wrong files (ie: 'xyz.socket' will appear on your list). To correct his script:
for i in *.so.*; do echo "${i%%.so*}"; done
Another way to do this (easier to read in my opinion) is to use a little Perl:
ls | grep "\.so\." | perl -n0e "print ((split(/\.so/))[0], \"\n\")"
Sorry, I don't think there is a way to use only "cut" as you asked.
for i in *.so*; do echo "${i%.so*}"; done
just a bash parameter substitution
http://www.tldp.org/LDP/abs/html/parameter-substitution.html
Just use sed instead:
ls | grep -v ".socket" | grep .so | sed "s/.so.*//"
This will delete everything behind the first found .so in the file names. So also files named xxx.so.so would work.
Depend on the size of the directory probably using find could be the best option, as a start point give a try to this:
find . -iname "*.so.*" -exec basename {} \; | cut -d "." -f 1
Like cut there are many other options, like sed, awk that could help you achieve in some cases the same result in a faster way.

Shell Scripting - URL manipulation

I need to manipulate a URL from the values from a file. This is what I could do
var=$(grep -A2 -i "some_text" /path/to/file | grep -v "some_text" | cut -d'"' -f 4-5 | cut -d'"' -f 1 | tr -d '\n')
This will give output : /text/to/be/appended/to/domain
Now, I need to append the domain name to var value.
So I did,
var1="http://mydomain"
and then
echo ${var1}${var}
So I expect
http://mydomain/text/to/be/appended/to/domain
to be the output. But am getting just /text/to/be/appended/to/domain.
I guessed it'd be due to the / as the first char, but if i use cut to remove the first /, am getting value of var1 as output.
Where did I go wrong?
Update (not sure if this would help even a bit, still) :
If I do echo ${var}${var1}, am getting /text/to/be/appended/to/domainhttp://mydomain
Sample entry :
<tr><td><a id="value">some_text</a></td></tr>
<tr><td><a id="value" href="/text/to/be/appended/to/domain">2013</a></td></tr>
this line ending (^M) points that at some point the file was edited(created) in dos like environment. Use "dos2unix yourfile" to fix the problem. BOTH your script and the sample entries.

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

Resources