Grep a word out of a file and save the file as that word - linux

I am using Ubuntu Linux and grepping info out of a file (lets say filename.log) and want to save the file using some of the info inside of (filename.log).
example:
The info in the (filename.log) has version_name and date.
When displaying this info on screen using cat it will display:
version_name=NAME
date=TODAY
I then want to save the file as NAME-TODAY.log and have no idea how to do this.
Any help will be appreciated

You can chain a bunch of basic linux commands with the pipe character |. Combined with a thing called command substitution (taking the output of a complex command, to use in another command. syntax: $(your command)) you can achieve what you want to do.
This is what I came up with, based on your question:
cp filename.log $(grep -E "(version_name=)|(date=)" filename.log | cut -f 2 -d = | tr '\n' '-' | rev | cut -c 2- | rev).log
So here I used cp, $(), grep, cut, tr and finally rev.
Since you said you had no idea where to start, let me walk you trough this oneliner:
cp - it is used to copy the filename.log file to a new file,
with the name based on the values of version_name and date (step 2 and up)
command substitution $() the entire command between the round brackets is 'resolved' before finishing the cp command in step 1. e.g. in your example it would be NAME-TODAY. notice the .log at the end outside of the round brackets to give it a proper file extension. The output of this command in your example will be NAME-TODAY.log
grep -E "(version_name=)|(date=)" grep with regexp flag -E to be able to do what we are doing. Matches any lines that contain version_name= OR date=. The expected output is:
version_name=NAME
date=TODAY
cut -f 2 -d = because I am not interested in version_name
, but instead in the value associated with that field, I use cut to split the line at the equals character = with the flag -d =. I then select the value behind the equals character (the second field) with the flag -f 2. The expected output is:
NAME
TODAY
tr '\n' '-' because grep outputs on multiple lines, I want to remove all new lines and replace them with a dash. Expected output:
NAME-TODAY-
rev | cut -c 2- | rev I am grouping these. rev reverses the word I have created. with cut -c 2- I cut away all characters starting from the second character of the reversed word. This is required because I replaced new lines with dashes and this means I now have NAME-TODAY-. Basicly this is just an extra step to remove the last dash. See expected outputs of each step:
-YADOT-EMAN
YADOT-EMAN
NAME-TODAY
remember this value is in the command substituion of step 2, so the end result will be:
cp filename.log NAME-TODAY.log

I manged to solve this by doing the following: grep filename.log > /tmp/file.info && filename=$(echo $(grep "version_name" /tmp/filename.info | cut -d " " -f 3)-$(grep "date" /tmp/filename.info | cut -d " " -f 3)-$filename.log

Related

Creating 3 column TAB file using name of files in directory

I have over 100 files in a directory with format xxx_1_sequence.fastq.gz and xxx_2_sequence.fastq.gz
The goal is to create a TAB file with 3 columns in this format:
xxx ---> xxx_1_sequence.fastq.gz ---> xxx_2_sequence.fastq.gz
where ---> is a tab.
I was thinking of creating a for loop or maybe using string manipulation in order to achieve this. My knowledge is rudimentary at this stage, so any help would be much appreciated.
Would you please try the following:
shopt -s extglob # enable extended pattern matching
suffix="sequence.fastq.gz"
for f in !(*"$suffix"); do # files which does not match the pattern
if [[ -f ${f}_1_$suffix && -f ${f}_2_$suffix ]]; then
# check the existence of the files just in case
printf "%s\t%s\t%s\n" "$f" "${f}_1_$suffix" "${f}_2_$suffix"
fi
done
If your files are in a directory called files:
paste -d '\t' \
<(printf "%s\n" files/*_1_sequence.fastq.gz | sort) \
<(printf "%s\n" files/*_2_sequence.fastq.gz | sort) \
| sed 's/\(.*\)_1_sequence.fastq.gz/\1\t\1_1_sequence.fastq.gz/' \
> out.tsv
Explanation:
printf "%s\n" will print every argument in a new line. So:
printf "%s\n" files/*_1_sequence.fastq.gz | sort
prints a sorted list of the first type of files (the second column in your output). And of course it's symmetrical with *_2_sequence.fastq.gz (the third column).
(We probably don't need the sort part, but it helps clarify the intention.)
The syntax <(some shell command) runs some shell command, puts its output into a temporary input file, and passes that file as an argument. You can see the temporary file like so:
$ echo <(echo a) <(echo b)
/dev/fd/63 /dev/fd/62
So we are passing 2 (temporary) files to paste. If each output file has N lines, then paste outputs N lines, where line number K is a concatenation of line K of each of the files, in order.
For example, if line 4 of the first file is hello and line 4 if the second file is world, paste will have hello\tworld as line 4 of the output. But instead of trusting the default, we're setting the delimiter to TAB explicitly with -d '\t'.
That gives us the last 2 columns of our tab-separated-values file, but the first column is the * part of *_1_sequence.fastq.gz, which is where sed comes in.
We tell sed to replace \(.*\)_1_sequence.fastq.gz with \1\t\1_1_sequence.fastq.gz. .* will match anything, and \(some-pattern\) tells sed to remember the text that matched the pattern.
The first parentheses in sed's regex are can be read back into the replacement pattern as \1, which is why we have \1_1_sequence.fastq.gz in the replacement pattern.
But now we can also use \1 to create the first column of our tsv, which is why we have \1\t.
Thankyou for the help guys- I was thrown into a coding position a week ago with no prior experience and have been struggling.
I ended up with this printf "%s\n" *_1_sequence.fastq.gz | sort | sed 's/\(.*\)_1_sequence.fastq.gz/\1\t\1_1_sequence.fastq.gz\t\1_2_sequence.fastq.gz/ ' > NULLARBORformat.tab
and it does the job perfectly!

Looping though a file in bash and filtering for a directory name beginning

I'm really new to bash scripting and I am trying to write a bash script to use with subversion hooks.
The goal is to get a list of the java projects that were comitted, then write that list to a file (so I can create a change log from it and display it inside an application) along some other information gathered using svnlook that I've already worked out.
Since subversion itself doesn't really care for projects that I commit in eclipse and instead works with directories, I have to use the "svnlook changed" command which spits out each and every file with its full path that was included in the commit.
Here's an example of what "svnlook changed" returns:
U Project/branches/11.4.11.001/LIB1/com/some/other/directory/some_java_class.java
U Project/branches/11.4.11.001/LIB2/com/some/other/directory/another_thingy.java
U Project/branches/11.4.11.001/new/directories/LIB1/com/something/some_java_class.java
U Project/branches/11.4.11.001/PRJ1/com/directory/some_java_class.java
Now I can't guarantee that this directory structure will always be the same, so what I want to do is to find the java project names by filtering for "PRJ" and "LIB", since they will always begin with those letters and I can be sure this will not change.
So what I am trying to do in the script:
Step 1: Put the output of "svnlook changed" into a file:
tempfile=/data/svn/scratch.txt
input=/data/svn/scratch2.txt
touch $tempfile
touch $input
$SVNLOOK changed -r "$REVISION" "$REPOS" >> $input
This works.
Step 2: iterate over every line of the file, get the substring that represents the project by looking for "LIB" or "PRJ" in the path. Write these into a file. For now I am just assuming there are no more than 9 child directories and I am looping over the path 10 times to look at each:
for i in {1..10}
do
cat $input | cut -d "/" -f $i | grep -i -s -e PRJ -e LIB | tr [:lower:]äöü [:upper:]ÄÖÜ >> $tempfile
done
This doesn't work in the script. The file at /data/svn/scratch.txt is always empty. When I run the command from the command line and replace $input and $i, it works and all of the following commands work, too.
Step 3: Set all the other variables. I iterate over the temporary file, filtering out the duplicates and putting commas to seperate them:
DATE=$(date '+%d-%m-%Y')
REVISION=$($SVNLOOK youngest "$REPOS")
CHANGEDPRJ=$(cat $tempfile | sort -u | xargs | sed -e 's/ /, /g')
COMMENT=$($SVNLOOK log -t "$TXN" "$REPOS")
Step 4:Output this variable along other stuff into the file in which I'd like to store it all:
echo "$DATE" , "$REVISION" , "$CHANGEDPRJ" , "$COMMENT" >> /data/svn/commit.log
So right now I am getting:
17-02-2020 , 571 , , Cleanup of outdated comments
Instead of:
17-02-2020 , 571 , LIB1,LIB2,PRJ1 , Cleanup of outdated comments
I'm pretty sure there is a very obvious and easy solution to get this working, but I'm bash scripting for the very first time today and I can't seem to google for the right thing to find out what I'm doing wrong.... If someone could point me into the right direction that'd be amazing!
The following:
cat <<EOF |
U Project/branches/11.4.11.001/LIB1/com/some/other/directory/some_java_class.java
U Project/branches/11.4.11.001/LIB2/com/some/other/directory/another_thingy.java
U Project/branches/11.4.11.001/new/directories/LIB1/com/something/some_java_class.java
U Project/branches/11.4.11.001/PRJ1/com/directory/some_java_class.java
EOF
# remove the U<space><space>
sed 's/^U //' |
# replace `/` with a newline
tr '/' '\n' |
# grep only lines with PRJ and LIB
grep -e 'PRJ\|LIB' |
# sort unique
sort -u |
# join elements with a comma
paste -sd,
outputs:
LIB1,LIB2,PRJ1
So you want:
"$SVNLOOK" changed -r "$REVISION" "$REPOS" |
sed 's/^U //' | tr '/' '\n' | grep -e 'PRJ\|LIB' | sort -u | paste -sd,
Note: remember to quote variable expansions. Don't touch $tempfile do touch "$tempfile".

bash - Diff a command with a file (specific)

so its pretty hard to describe for me what I want to do, but I'll try it:
(Because of some private information I changed the names)
I want to "diff" a command output with a text file created from me.
The command output looks like:
'Blabla1' '12.34.56.78' (24 objects + dependencies), STATUS: 'RUNNING'
'Blabla3' '12.34.56.89' (89 objects + dependencies), STATUS: 'RUNNING'
And the txtfile:
Blabla1
Blabla2
If it finds Blabla1 anywhere in the command output its fine. But you see, he will not find Blabla2 anywhere in the command output and this difference I want as an output.
I hope you understand what I mean and you could possible help me.
Greetings,
Can
UPDATE::::
#hek2mgl
So my command is:
./factory.sh listapplications | grep -i running
This command shows this:
'ftp' '1' (7 objects + dependencies), STATUS: 'RUNNING' - 'XSD Da
'abc' '5.1.0' (14 objects + dependencies), STATUS: 'RUNNING' - '2017-10-13: Fix fuer Bug 2150'
'name' '1.0.2' (5 objects + dependencies), STATUS: 'RUNNING'
And I want to compare that output with my textfile:
ftp
abc
name
missing
alsomissing
So if I compare this 2 now it should check if he finds the words from my textfile ANYWHERE in the command output. If it does find it anywhere -> not output.
And as you see he'll not find "missing" and "alsomissing". I want this two as an output at the end.
What you might be interested in is grep in combination with 'process substitution'. If your file with patterns is file.txt and your command to execute is cmd then you can use
grep -o -F -f file.txt <(cmd) | grep -v -F -f - file.txt
This will output the patterns is file.txt which are not matched in the output of cmd.
In case of the Blabla example, the above line will output
Blabla2
How it works is the following. The first part will search for all patterns listed in file.txt in the output of cmd and will only output the matched parts. This means that
% grep -o -F -f file.txt <(cmd)
Blabla1
This output is now piped to another command that will try to find all lines in file.txt which do not match any of the patterns comming from the pipe (-f -)
% grep -o -F -f file.txt <(cmd) | grep -v -F -f - file.txt
Blabla2
So ... this seems to do it, using bash process substitution:
$ cat file1
'Blabla1' '12.34.56.78' (24 objects + dependencies), STATUS: 'RUNNING'
'Blabla3' '12.34.56.89' (89 objects + dependencies), STATUS: 'RUNNING'
$ cat file2
Blabla1
Blabla2
$ grep -vFf <(awk '{gsub(/[^[:alnum:]]/,"",$1);print $1}' file1) file2
Blabla2
The awk script takes the first field, strips non-alphanumeric characters from it (i.e. the single quotes) and outputs just that first field. The grep option -f uses the "virtual" file created by the aforementioned process substitution as a list of fixed strings to search for within the input file (file2), and the -v reverses the search, showing you only what was not found.
If the regex in the gsub() is too greedy, you might replace it with something like $1=substr($1,2,length($1)-2).
You could alternately do this in (POSIX) awk alone, without relying on bash process substitution:
$ awk 'NR==FNR{a[substr($1,2,length($1)-2)];next} $1 in a{next} 1' file1 file2
Blabla2
This reads the stripped first field of file1 into the keys of an array, then for each line of file2 checks for the existence of that key in the array, skipping lines that match and printing any left over. (The 1 at the end of the script is short-hand for "print this line".)
You can also use awk only:
awk '
# Store patterns of text.file in an array (p)atterns.
# Initialize their count of occurrence with 0
NR==FNR{
p[$0]=0
next
}
# Replace the quotes around BlaBla... in cmd output.
# Increase the count of occurrence of the pattern
{
gsub("'\''", "")
p[$1]++
}
# At the end of the input print those patterns which
# did not appear in cmd output, meaning their count of
# occurrence is zero.
END{
for(i in p){
if(p[i]==0){
print i
}
}
}' text.file cmd.txt
PS: Alternatively you use process substitution instead of storing the command output in a file. Replace cmd.txt by <(cmd) then.

Shell Scripting - URL manipulation

I need to manipulate a URL from the values from a file. This is what I could do
var=$(grep -A2 -i "some_text" /path/to/file | grep -v "some_text" | cut -d'"' -f 4-5 | cut -d'"' -f 1 | tr -d '\n')
This will give output : /text/to/be/appended/to/domain
Now, I need to append the domain name to var value.
So I did,
var1="http://mydomain"
and then
echo ${var1}${var}
So I expect
http://mydomain/text/to/be/appended/to/domain
to be the output. But am getting just /text/to/be/appended/to/domain.
I guessed it'd be due to the / as the first char, but if i use cut to remove the first /, am getting value of var1 as output.
Where did I go wrong?
Update (not sure if this would help even a bit, still) :
If I do echo ${var}${var1}, am getting /text/to/be/appended/to/domainhttp://mydomain
Sample entry :
<tr><td><a id="value">some_text</a></td></tr>
<tr><td><a id="value" href="/text/to/be/appended/to/domain">2013</a></td></tr>
this line ending (^M) points that at some point the file was edited(created) in dos like environment. Use "dos2unix yourfile" to fix the problem. BOTH your script and the sample entries.

Count the number of occurrences in a string. Linux

Okay so what I am trying to figure out is how do I count the number of periods in a string and then cut everything up to that point but minus 2. Meaning like this:
string="aaa.bbb.ccc.ddd.google.com"
number_of_periods="5"
number_of_periods=`expr $number_of_periods-2`
string=`echo $string | cut -d"." -f$number_of_periods`
echo $string
result: "aaa.bbb.ccc.ddd"
The way that I was thinking of doing it was sending the string to a text file and then just greping for the number of times like this:
grep -c "." infile
The reason I don't want to do that is because I want to avoid creating another text file for I do not have permission to do so. It would also be simpler for the code I am trying to build right now.
EDIT
I don't think I made it clear but I want to make finding the number of periods more dynamic because the address I will be looking at will change as the script moves forward.
If you don't need to count the dots, but just remove the penultimate dot and everything afterwards, you can use Bash's built-in string manuipulation.
${string%substring}
Deletes shortest match of $substring from back of $string.
Example:
$ string="aaa.bbb.ccc.ddd.google.com"
$ echo ${string%.*.*}
aaa.bbb.ccc.ddd
Nice and simple and no need for sed, awk or cut!
What about this:
echo "aaa.bbb.ccc.ddd.google.com"|awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
(further shortened by helpful comment from #steve)
gives:
aaa.bbb.ccc.ddd
The awk command:
awk 'BEGIN{FS=OFS="."}{NF=NF-2}1'
works by separating the input line into fields (FS) by ., then joining them as output (OFS) with ., but the number of fields (NF) has been reduced by 2. The final 1 in the command is responsible for the print.
This will reduce a given input line by eliminating the last two period separated items.
This approach is "shell-agnostic" :)
Perhaps this will help:
#!/bin/sh
input="aaa.bbb.ccc.ddd.google.com"
number_of_fields=$(echo $input | tr "." "\n" | wc -l)
interesting_fields=$(($number_of_fields-2))
echo $input | cut -d. -f-${interesting_fields}
grep -o "\." <<<"aaa.bbb.ccc.ddd.google.com" | wc -l
5

Resources