Shell Scripting - URL manipulation - linux

I need to manipulate a URL from the values from a file. This is what I could do
var=$(grep -A2 -i "some_text" /path/to/file | grep -v "some_text" | cut -d'"' -f 4-5 | cut -d'"' -f 1 | tr -d '\n')
This will give output : /text/to/be/appended/to/domain
Now, I need to append the domain name to var value.
So I did,
var1="http://mydomain"
and then
echo ${var1}${var}
So I expect
http://mydomain/text/to/be/appended/to/domain
to be the output. But am getting just /text/to/be/appended/to/domain.
I guessed it'd be due to the / as the first char, but if i use cut to remove the first /, am getting value of var1 as output.
Where did I go wrong?
Update (not sure if this would help even a bit, still) :
If I do echo ${var}${var1}, am getting /text/to/be/appended/to/domainhttp://mydomain
Sample entry :
<tr><td><a id="value">some_text</a></td></tr>
<tr><td><a id="value" href="/text/to/be/appended/to/domain">2013</a></td></tr>

this line ending (^M) points that at some point the file was edited(created) in dos like environment. Use "dos2unix yourfile" to fix the problem. BOTH your script and the sample entries.

Related

Grep a word out of a file and save the file as that word

I am using Ubuntu Linux and grepping info out of a file (lets say filename.log) and want to save the file using some of the info inside of (filename.log).
example:
The info in the (filename.log) has version_name and date.
When displaying this info on screen using cat it will display:
version_name=NAME
date=TODAY
I then want to save the file as NAME-TODAY.log and have no idea how to do this.
Any help will be appreciated
You can chain a bunch of basic linux commands with the pipe character |. Combined with a thing called command substitution (taking the output of a complex command, to use in another command. syntax: $(your command)) you can achieve what you want to do.
This is what I came up with, based on your question:
cp filename.log $(grep -E "(version_name=)|(date=)" filename.log | cut -f 2 -d = | tr '\n' '-' | rev | cut -c 2- | rev).log
So here I used cp, $(), grep, cut, tr and finally rev.
Since you said you had no idea where to start, let me walk you trough this oneliner:
cp - it is used to copy the filename.log file to a new file,
with the name based on the values of version_name and date (step 2 and up)
command substitution $() the entire command between the round brackets is 'resolved' before finishing the cp command in step 1. e.g. in your example it would be NAME-TODAY. notice the .log at the end outside of the round brackets to give it a proper file extension. The output of this command in your example will be NAME-TODAY.log
grep -E "(version_name=)|(date=)" grep with regexp flag -E to be able to do what we are doing. Matches any lines that contain version_name= OR date=. The expected output is:
version_name=NAME
date=TODAY
cut -f 2 -d = because I am not interested in version_name
, but instead in the value associated with that field, I use cut to split the line at the equals character = with the flag -d =. I then select the value behind the equals character (the second field) with the flag -f 2. The expected output is:
NAME
TODAY
tr '\n' '-' because grep outputs on multiple lines, I want to remove all new lines and replace them with a dash. Expected output:
NAME-TODAY-
rev | cut -c 2- | rev I am grouping these. rev reverses the word I have created. with cut -c 2- I cut away all characters starting from the second character of the reversed word. This is required because I replaced new lines with dashes and this means I now have NAME-TODAY-. Basicly this is just an extra step to remove the last dash. See expected outputs of each step:
-YADOT-EMAN
YADOT-EMAN
NAME-TODAY
remember this value is in the command substituion of step 2, so the end result will be:
cp filename.log NAME-TODAY.log
I manged to solve this by doing the following: grep filename.log > /tmp/file.info && filename=$(echo $(grep "version_name" /tmp/filename.info | cut -d " " -f 3)-$(grep "date" /tmp/filename.info | cut -d " " -f 3)-$filename.log

Looping though a file in bash and filtering for a directory name beginning

I'm really new to bash scripting and I am trying to write a bash script to use with subversion hooks.
The goal is to get a list of the java projects that were comitted, then write that list to a file (so I can create a change log from it and display it inside an application) along some other information gathered using svnlook that I've already worked out.
Since subversion itself doesn't really care for projects that I commit in eclipse and instead works with directories, I have to use the "svnlook changed" command which spits out each and every file with its full path that was included in the commit.
Here's an example of what "svnlook changed" returns:
U Project/branches/11.4.11.001/LIB1/com/some/other/directory/some_java_class.java
U Project/branches/11.4.11.001/LIB2/com/some/other/directory/another_thingy.java
U Project/branches/11.4.11.001/new/directories/LIB1/com/something/some_java_class.java
U Project/branches/11.4.11.001/PRJ1/com/directory/some_java_class.java
Now I can't guarantee that this directory structure will always be the same, so what I want to do is to find the java project names by filtering for "PRJ" and "LIB", since they will always begin with those letters and I can be sure this will not change.
So what I am trying to do in the script:
Step 1: Put the output of "svnlook changed" into a file:
tempfile=/data/svn/scratch.txt
input=/data/svn/scratch2.txt
touch $tempfile
touch $input
$SVNLOOK changed -r "$REVISION" "$REPOS" >> $input
This works.
Step 2: iterate over every line of the file, get the substring that represents the project by looking for "LIB" or "PRJ" in the path. Write these into a file. For now I am just assuming there are no more than 9 child directories and I am looping over the path 10 times to look at each:
for i in {1..10}
do
cat $input | cut -d "/" -f $i | grep -i -s -e PRJ -e LIB | tr [:lower:]äöü [:upper:]ÄÖÜ >> $tempfile
done
This doesn't work in the script. The file at /data/svn/scratch.txt is always empty. When I run the command from the command line and replace $input and $i, it works and all of the following commands work, too.
Step 3: Set all the other variables. I iterate over the temporary file, filtering out the duplicates and putting commas to seperate them:
DATE=$(date '+%d-%m-%Y')
REVISION=$($SVNLOOK youngest "$REPOS")
CHANGEDPRJ=$(cat $tempfile | sort -u | xargs | sed -e 's/ /, /g')
COMMENT=$($SVNLOOK log -t "$TXN" "$REPOS")
Step 4:Output this variable along other stuff into the file in which I'd like to store it all:
echo "$DATE" , "$REVISION" , "$CHANGEDPRJ" , "$COMMENT" >> /data/svn/commit.log
So right now I am getting:
17-02-2020 , 571 , , Cleanup of outdated comments
Instead of:
17-02-2020 , 571 , LIB1,LIB2,PRJ1 , Cleanup of outdated comments
I'm pretty sure there is a very obvious and easy solution to get this working, but I'm bash scripting for the very first time today and I can't seem to google for the right thing to find out what I'm doing wrong.... If someone could point me into the right direction that'd be amazing!
The following:
cat <<EOF |
U Project/branches/11.4.11.001/LIB1/com/some/other/directory/some_java_class.java
U Project/branches/11.4.11.001/LIB2/com/some/other/directory/another_thingy.java
U Project/branches/11.4.11.001/new/directories/LIB1/com/something/some_java_class.java
U Project/branches/11.4.11.001/PRJ1/com/directory/some_java_class.java
EOF
# remove the U<space><space>
sed 's/^U //' |
# replace `/` with a newline
tr '/' '\n' |
# grep only lines with PRJ and LIB
grep -e 'PRJ\|LIB' |
# sort unique
sort -u |
# join elements with a comma
paste -sd,
outputs:
LIB1,LIB2,PRJ1
So you want:
"$SVNLOOK" changed -r "$REVISION" "$REPOS" |
sed 's/^U //' | tr '/' '\n' | grep -e 'PRJ\|LIB' | sort -u | paste -sd,
Note: remember to quote variable expansions. Don't touch $tempfile do touch "$tempfile".

Creating a short shell script to print out a table using cut, sort, and head to arrange values

I need help on this homework. I thought I basically solved it, but two results does not match. I had "psychology" at line 5 where it's supposed to be line 1 and I have "finance" as the last row instead of "Political science". The output (autograder) is attached below for clarity.
Can anyone figure out what I'm doing wrong? Any help would be greatly appreciated.
Question is:
write a short shell script to first download the data set with wget from the included url
Next, print out "Major,Total" (the column names of interest) to the screen
Then using cut, sort, and head, print out the n most popular majors (largest Total) in descending order
You will want to use the -k and -t arguments of sort (on top of ones you should already know) to achieve this.
The value of n will be passed into the script as a command-line argument
I attached output differences between mine and the autograder below. My code goes like this:
number=$1
if [ ! -f recent-grads.csv ]; then
wget https://raw.githubusercontent.com/fivethirtyeight/data/master/college-majors/recent-grads.csv
fi
echo Major,Total
cat recent-grads.csv | sed -i | sort -k4 -n -r -t, recent-grads.csv | cut -d ',' -f 3-4 | head -n ${number}

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

using grep in a If statement to get all items, ignoring spaces

This is part of a homework problem in a beginning bash class.
I need to bring in the passwd file, which I have done with my passfile variable, then I need to be able to extract certain pieces of it and display the different fields. When I manually grep from CLI using this statement below it works fine. I'm wanting all the variables and I get them all.
grep 1000 passfile | cut -c1-
However, when I do this from the script it stops or breaks or starts over at the first 'blank space' in the users full name. John D. Doe will return 3 lines when I only want one. I see this by echoing the value of i and the following.
for i in `grep 1000 ${passfile} | cut -c1-
user=`echo $1 | cut -d : -f1`
userID=`echo $1 | cut -d : -f3`
For example, if the line reads
jdoe:x:123:1000:John D Doe:/home/jdoe:/bin/bash
I get the following:
i = jdoe:x:123:1000:John
which gives me:
User is jdoe, UID is 509
but then in the next line i starts at R.
i = R. so User is R., UID is R.
next line
i = Johnson:/home/jjohnson:/bin/bash
which returns User is Johnson, UID is /bin/bash
The passwd file holds many users so I need to use the for loop to process them all. I think if I can get it to ignore the space I can get it. But not knowing a whole lot about linux, I'm not sure if I'm even going down the right path. Thanks in Advance for guidence/help.
By default, cut splits on spaces, not colons. If you continue to use it, specify the separator.
You probably want to use IFS=: and a read statement in a while loop to get the values in:
while IFS=: read user password uid gid comment home shell
do
...whatever...
done < /etc/passwd
Or you can pipe the output of grep into the while loop.
Are you allowed to use any external program? If so, I'd recommend awk
UID=1000
awkcmd="\$4==\"$UID\" {print \"user:\",\$1}"
cat $PASSWORDFILE | awk -F ":" "$awkcmd"
when parsing structured files with specific field delimiters such as passwd file, the appropriate tool for the job is awk.
UID=1000
awk -vuid="$UID" '$4==uid{print "user: "$1}' /etc/passwd
you do not have to use grep or cut or anything else. ( Of course, you can also use pure bash while read loops as demonstrated.)

Resources