Need to cut part of a string in shell scripting - linux

I have a string example.123.ytu.tar.gz
I want to have example.123.ytu.tar how can i get in shell scripting
I tried with
echo example.123.ytu.tar | cut -d "." -f3, But it is giving only tar

I would use basename for this.
$ basename --suffix ".gz" example.123.ytu.tar.gz
example.123.ytu.tar
You can find out more about it via man basename

echo example.123.ytu.tar.gz | cut -d "." -f1-4
example.123.ytu.tar
The -f option takes any comma separated list of fields and/or field ranges (with '-'). Eg
echo example.123.ytu.tar.gz | cut -d "." -f1,2,3,4
example.123.ytu.tar
gives the same output.

You could simply use grep
echo example.123.ytu.tar | grep -Eo '.*tar'
output: example.123.ytu.tar
The -E option enables extended regular expression mode and the -o option makes grep print only the part of the word that matched the regex.

Related

how to remove the extension of multiple files using cut in a shell script?

I'm studying about how to use 'cut'.
#!/bin/bash
for file in *.c; do
name = `$file | cut -d'.' -f1`
gcc $file -o $name
done
What's wrong with the following code?
There are a number of problems on this line:
name = `$file | cut -d'.' -f1`
First, to assign a shell variable, there should be no whitespace around the assignment operator:
name=`$file | cut -d'.' -f1`
Secondly, you want to pass $file to cut as a string, but what you're actually doing is trying to run $file as if it were an executable program (which it may very well be, but that's beside the point). Instead, use echo or shell redirection to pass it:
name=`echo $file | cut -d. -f1`
Or:
name=`cut -d. -f1 <<< $file`
I would actually recommend that you approach it slightly differently. Your solution will break if you get a file like foo.bar.c. Instead, you can use shell expansion to strip off the trailing extension:
name=${file%.c}
Or you can use the basename utility:
name=`basename $file .c`
You should use the command substitution (https://www.gnu.org/software/bash/manual/bashref.html#Command-Substitution) to execute a command in a script.
With this the code will look like this
#!/bin/bash
for file in *.c; do
name=$(echo "$file" | cut -f 1 -d '.')
gcc $file -o $name
done
With the echo will send the $file to the standard output.
Then the pipe will trigger after the command.
The cut command with the . delimiter will split the file name and will keep the first part.
This is assigned to the name variable.
Hope this answer helps

Script to replace tokens with values mentioned in properties file

I have a file values.properties which contain data, like:
$ABC=10
$XYZ=20
I want to create a shell script that will take each element one by one from above file.
Say $ABC, then go to file ABC.txt & replace the value of $ABC with 10.
Similarly, then go to file XYZ.txt and replace $XYZ with 20.
I think maybe this should be in the Unix and Linux section, the solution I've hacked together is as follows:
cat values.properties | grep "=" | cut -d "$" -f2 | awk -F "=" '{print "s/$"$1"/"$2"/g "$1".txt"}' | xargs -n2 sed -i
The flow is like so:
Filter out all the value assignments via: grep "="
Remove the '$' via: cut -d "$" -f2
Use awk to split the variable name and value and construct sed replacement command
Use xargs to pull in the replacement parameter and target file via: xargs -n2
Finally pass sed to as the command to xargs: xargs -n2 sed

Bash script to return domains instead of URL's

I have this bash script that i wrote to analyse the html of any given web page. What its actually supposed to do is to return the domains on that page. Currently its returning the number of URL's on that web page.
#!/bin/sh
echo "Enter a url eg www.bbc.com:"
read url
content=$(wget "$url" -q -O -)
echo "Enter file name to store URL output"
read file
echo $content > $file
echo "Enter file name to store filtered links:"
read links
found=$(cat $file | grep -o -E 'href="([^"#]+)"' | cut -d '"' -f2 | sort | uniq | awk '/http/' > $links)
output=$(egrep -o '^http://[^/]+/' $links | sort | uniq -c > out)
cat out
How can i get it to return the domains instead of the URL's. From my programming knowledge I know its supposed to do parsing from the right but i am a newbie at bash scripting. Can someone please help me. This is as far as I have gone.
I know there's a better way to do this in awk but you can do this with sed, by appending this after your awk '/http/':
| sed -e 's;https\?://;;' | sed -e 's;/.*$;;'
Then you want to move your sort and uniq to the end of that.
So that the whole line will look like:
found=$(cat $file | grep -o -E 'href="([^"#]+)"' | cut -d '"' -f2 | awk '/http/' | sed -e 's;https\?://;;' | sed -e 's;/.*$;;' | sort | uniq -c > out)
You can get rid of this line:
output=$(egrep -o '^http://[^/]+/' $links | sort | uniq -c > out)
EDIT 2:
Please note, that you might want to adapt the search patterns in the sed expressions to your needs. This solution considers only http[s]?://-protocol and www.-servers...
EDIT:
If you want count and domains:
lynx -dump -listonly http://zelleke.com | \
sed -n '4,$ s#^.*http[s]?://\([^/]*\).*$#\1#p' | \
sort | \
uniq -c | \
sed 's/www.//'
gives
2 wordpress.org
10 zelleke.com
Original Answer:
You might want to use lynx for extracting links from URL
lynx -dump -listonly http://zelleke.com
gives
# blank line at the top of the output
References
1. http://www.zelleke.com/feed/
2. http://www.zelleke.com/comments/feed/
3. http://www.zelleke.com/
4. http://www.zelleke.com/#content
5. http://www.zelleke.com/#secondary
6. http://www.zelleke.com/
7. http://www.zelleke.com/wp-login.php
8. http://www.zelleke.com/feed/
9. http://www.zelleke.com/comments/feed/
10. http://wordpress.org/
11. http://www.zelleke.com/
12. http://wordpress.org/
Based on this output you achieve desired result with:
lynx -dump -listonly http://zelleke.com | \
sed -n '4,$ s#^.*http://\([^/]*\).*$#\1#p' | \
sort -u | \
sed 's/www.//'
gives
wordpress.org
zelleke.com
You can remove path from url with sed:
sed s#http://##; s#/.*##
I want to say you also, that these two lines are wrong:
found=$(cat $file | grep -o -E 'href="([^"#]+)"' | cut -d '"' -f2 | sort | uniq | awk '/http/' > $links)
output=$(egrep -o '^http://[^/]+/' $links | sort | uniq -c > out)
You must make either redirection ( > out ), or command substitution $(), but not two thing at the same time. Because the variables will be empty in this case.
This part
content=$(wget "$url" -q -O -)
echo $content > $file
would be also better to write this way:
wget "$url" -q -O - > $file
you may be interested by it:
https://www.rfc-editor.org/rfc/rfc3986#appendix-B
explain the way to parse uri using regex.
so you can parse an uri from the left this way, and extract the "authority" that contains domain and subdomain names.
sed -r 's_^([^:/?#]+:)?(//([^/?#]*))?.*_\3_g';
grep -Eo '[^\.]+\.[^\.]+$' # pipe with first line, give what you need
this is interesting to:
http://www.scribd.com/doc/78502575/124/Extracting-the-Host-from-a-URL
assuming that url always begin this way
https?://(www\.)?
is really hazardous.

How to trim specific text with grep

I am in need of trimming some text with grep, I have tried various other methods and havn't had much luck, so for example:
C:\Users\Admin\Documents\report2011.docx: My Report 2011
C:\Users\Admin\Documents\newposter.docx: Dinner Party Poster 08
How would it be possible to trim the text file, so to trim the ":" and all characters after it.
E.g. so the output would be like:
C:\Users\Admin\Documents\report2011.docx
C:\Users\Admin\Documents\newposter.docx
use awk?
awk -F: '{print $1':'$2}' inputFile > outFile
you can use grep
(note that -o returns only the matching text)
grep -oe "^C:[^:]" inputFile > outFile
That is pretty simple to do with grep -o:
$ grep -o '^C:[^:]*' input
C:\Users\Admin\Documents\report2011.docx
C:\Users\Admin\Documents\newposter.docx
If you can have other drives just replace C by .:
$ grep -o '^.:[^:]*' input
If a line can start with something different than a drive name, you can consider both the occurrence a drive name in the beginning of the line and the case where there is no such drive name:
$ grep -o '^\(.:\|\)[^:]*' input
cat inputFile | cut -f1,2 -d":"
The -d specifies your delimiter, in this case ":". The -f1,2 means you want the first and second fields.
The first part doesn't necessarily have to be cat inputFile, it's just whatever it takes to get the text that you referred to. The key part being cut -f1,2 -d":"
Your text looks like output of grep. If what you're asking is how to print filenames matching a pattern, use GNU grep option --files-with-matches
You can use this as well for your example
grep -E -o "^C\S+"| tr -d ":"
egrep -o "^C\S+"| tr -d ":"
\S here is non-space character match

xargs with multiple arguments

I have a source input, input.txt
a.txt
b.txt
c.txt
I want to feed these input into a program as the following:
my-program --file=a.txt --file=b.txt --file=c.txt
So I try to use xargs, but with no luck.
cat input.txt | xargs -i echo "my-program --file"{}
It gives
my-program --file=a.txt
my-program --file=b.txt
my-program --file=c.txt
But I want
my-program --file=a.txt --file=b.txt --file=c.txt
Any idea?
Don't listen to all of them. :) Just look at this example:
echo argument1 argument2 argument3 | xargs -l bash -c 'echo this is first:$0 second:$1 third:$2'
Output will be:
this is first:argument1 second:argument2 third:argument3
None of the solutions given so far deals correctly with file names containing space. Some even fail if the file names contain ' or ". If your input files are generated by users, you should be prepared for surprising file names.
GNU Parallel deals nicely with these file names and gives you (at least) 3 different solutions. If your program takes 3 and only 3 arguments then this will work:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -N 3 my-program --file={1} --file={2} --file={3}
Or:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo a2.txt; echo b2.txt; echo c2.txt;) |
parallel -X -N 3 my-program --file={}
If, however, your program takes as many arguments as will fit on the command line:
(echo a1.txt; echo b1.txt; echo c1.txt;
echo d1.txt; echo e1.txt; echo f1.txt;) |
parallel -X my-program --file={}
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
How about:
echo $'a.txt\nb.txt\nc.txt' | xargs -n 3 sh -c '
echo my-program --file="$1" --file="$2" --file="$3"
' argv0
It's simpler if you use two xargs invocations: 1st to transform each line into --file=..., 2nd to actually do the xargs thing ->
$ cat input.txt | xargs -I# echo --file=# | xargs echo my-program
my-program --file=a.txt --file=b.txt --file=c.txt
You can use sed to prefix --file= to each line and then call xargs:
sed -e 's/^/--file=/' input.txt | xargs my-program
Here is a solution using sed for three arguments, but is limited in that it applies the same transform to each argument:
cat input.txt | sed 's/^/--file=/g' | xargs -n3 my-program
Here's a method that will work for two args, but allows more flexibility:
cat input.txt | xargs -n 2 | xargs -I{} sh -c 'V="{}"; my-program -file=${V% *} -file=${V#* }'
I stumbled on a similar problem and found a solution which I think is nicer and cleaner than those presented so far.
The syntax for xargs that I have ended with would be (for your example):
xargs -I X echo --file=X
with a full command line being:
my-program $(cat input.txt | xargs -I X echo --file=X)
which will work as if
my-program --file=a.txt --file=b.txt --file=c.txt
was done (providing input.txt contains data from your example).
Actually, in my case I needed to first find the files and also needed them sorted so my command line looks like this:
my-program $(find base/path -name "some*pattern" -print0 | sort -z | xargs -0 -I X echo --files=X)
Few details that might not be clear (they were not for me):
some*pattern must be quoted since otherwise shell would expand it before passing to find.
-print0, then -z and finally -0 use null-separation to ensure proper handling of files with spaces or other wired names.
Note however that I didn't test it deeply yet. Though it seems to be working.
xargs doesn't work that way. Try:
myprogram $(sed -e 's/^/--file=/' input.txt)
It's because echo prints a newline. Try something like
echo my-program `xargs --arg-file input.txt -i echo -n " --file "{}`
I was looking for a solution for this exact problem and came to the conclution of coding a script in the midle.
to transform the standard output for the next example use the -n '\n' delimeter
example:
user#mybox:~$ echo "file1.txt file2.txt" | xargs -n1 ScriptInTheMiddle.sh
inside the ScriptInTheMidle.sh:
!#/bin/bash
var1=`echo $1 | cut -d ' ' -f1 `
var2=`echo $1 | cut -d ' ' -f2 `
myprogram "--file1="$var1 "--file2="$var2
For this solution to work you need to have a space between those arguments file1.txt and file2.txt, or whatever delimeter you choose, one more thing, inside the script make sure you check -f1 and -f2 as they mean "take the first word and take the second word" depending on the first delimeter's position found (delimeters could be ' ' ';' '.' whatever you wish between single quotes .
Add as many parameters as you wish.
Problem solved using xargs, cut , and some bash scripting.
Cheers!
if you wanna pass by I have some useful tips http://hongouru.blogspot.com
Actually, it's relatively easy:
... | sed 's/^/--prefix=/g' | xargs echo | xargs -I PARAMS your_cmd PARAMS
The sed 's/^/--prefix=/g' is optional, in case you need to prefix each param with some --prefix=.
The xargs echo turns the list of param lines (one param in each line) into a list of params in a single line and the xargs -I PARAMS your_cmd PARAMS allows you to run a command, placing the params where ever you want.
So cat input.txt | sed 's/^/--file=/g' | xargs echo | xargs -I PARAMS my-program PARAMS does what you need (assuming all lines within input.txt are simple and qualify as a single param value each).
There is another nice way of doing this, if you do not know the number of files upront:
my-program $(find . -name '*.txt' -printf "--file=%p ")
Nobody has mentioned echoing out from a loop yet, so I'll put that in for completeness sake (it would be my second approach, the sed one being the first):
for line in $(< input.txt) ; do echo --file=$line ; done | xargs echo my-program
Old but this is a better answer:
cat input.txt | gsed "s/\(.*\)/\-\-file=\1/g" | tr '\n' ' ' | xargs my_program
# i like clean one liners
gsed is just gnu sed to ensure syntax matches version brew install gsed or just sed if your on gnu linux already...
test it:
cat input.txt | gsed "s/\(.*\)/\-\-file=\1/g" | tr '\n' ' ' | xargs echo my_program

Resources