Cut a string after certain a specific character, but just one field [closed] - linux

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
This is from a vhost file. This is the output I get
ServerName uat3-dam-something1.prg-dc.brb.com
Hello,
I'm wondering how to cut from this output so only this part remains
something1.prg-dc.brb.com
Keep in mind that "something" could be "something4141411" or "something23". So length operations won't work. Tried with cut command and AWK, but didn't work. I would be happy receive a tips from the bash experts :)

Like this :
grep -o 'something.*' file
or more specific:
grep -oE 'something[0-9]+\..*' file
 Output:
something1.prg-dc.brb.com

Could you please try following, written and tested with provided samples only.
awk -F'uat3-dam-' '{print $NF}' Input_file
Description: Making uat3-dam- as field separator and printing last field of it.
2nd solution:
awk 'match($0,/something.*/){print substr($0,RSTART,RLENGTH)}' Input_file

Using:
echo "ServerName uat3-dam-something1.prg-dc.brb.com" |cut -d\- -f3-4
Will return:
something1.prg-dc.brb.com
And if you change the string (as you mention):
echo "ServerName uat3-dam-something111111.prg-dc.brb.com" |cut -d\- -f3-4
It will keep returning:
something111111.prg-dc.brb.com

$ echo 'ServerName uat3-dam-something1.prg-dc.brb.com' | awk -F- '{sub(".*" $2 FS,"")}1'
something1.prg-dc.brb.com

This will work:
echo "ServerName uat3-dam-something1.prg-dc.brb.com" | sed -E 's/.*(something.*)/\1/'
Or, if the string is in a file named file
sed -E 's/.*(something.*)/\1/' file
Explanation:
-E is for extended regex
.*(something.*) means "any char 0 or more times followed by something and any other char 0 or more times".
\1 is used to print only the matching part inside the brackets.

You could also use :
echo ${test#*dam-}
Example :
test="ServerName uat3-dam-something1.prg-dc.brb.com"
echo ${test#*dam-}
which gives:
something1.prg-dc.brb.com
Note that the opposite version would be echo ${test%something*}

Related

How to create a Unix script to segregate data Line by Line? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
I have some data in a MyFile.CSV file like this:
id,name,country
100,tom cruise,USA
101,Johnny depp,USA
102,John,India
What will be the shell script to take the above file as input and segregate the data in 2 different files as per the country?
I tried using the FOR loop and then using 2 IFs inside it but I am unable to do so. How to do it using awk?
For LINE in MyFile.CSV
Do
If grep "USA" $LINE >0 Then
$LINE >> Out_USA.csv
Else
$LINE >> Out_India.csv
Done
You can try with this
grep -R "USA" /path/to/file >> Out_USA.csv
grep -R "India" /path/to/file >> Out_India.csv
Many ways to do:
One way:
$ for i in `awk -F"," '{if(NR>1)print $3}' MyFile.csv|uniq|sort`;
do
echo $i;
egrep "${i}|country" MyFile.csv > Out_${i}.csv;
done
This assumes that the country name would not clash with other columns.
If it does, then you can fine tune that by adding additional regex.
For example, it country will be the last field, then you can add $ to the grep

How can I shorten header in a fasta file? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have a file that looks like this:
>Gene.10::S0008.1::g.10::m.10 Gene.10::S0008.1::g.10 ORF type:complete len:250 (-),score=22.42 S_0008.1:286-1035(-)
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11 Gene.11::S0009.1::g.10 ORF type:complete len:250 (-),score=22.42 S_0008.1:286-1035(-)
QSAISNDEELNKIMDA
....
I want to delete everything in the header after the first space. How can I do this easily in linux?
Resultant file:
>Gene.10::S0008.1::g.10::m.10
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11
QSAISNDEELNKIMDA
I would use sed:
sed '/^>/s/^>\([^ ]*\) .*/>\1 /'
If a line starts with > then remove everything after the first space. The following:
echo '>Gene.10::S0008.1::g.10::m.10 Gene.10::S0008.1::g.10 ORF type:complete len:250 (-),score=22.42 Sxl_rink_0008.1:286-1035(-)
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11 Gene.11::S0009.1::g.10 ORF type:complete len:250 (-),score=22.42 Sxl_rink_0008.1:286-1035(-)
QSAISNDEELNKIMDA' | sed '/^>/s/^>\([^ ]*\) .*/>\1 /'
outputs:
>Gene.10::S0008.1::g.10::m.10
MKGDDFNIITAPVPINRIWWYSLTNRQRIALVFYMSFYVAGTLTNTASMFIDKFYIYIMR
LESLQMGSADPIDYKYLLEVQIVRGFWREDVHEVVDKVFRGKSIGYIKTNLMIPVEIWNN
CQVRSFRGIPCHSVAIICLIFGMLILYYHCTTVALFRTFMILNANLAAILLFIAMSMEYS
AAVEYDYCVNSVFMNRKTGGKAFVRGRYYNRTLEASGSTFKLMMVGDILFFCPMIGLGCY
LLFCNRENL*
>Gene.11::S0009.1::g.10::m.11
QSAISNDEELNKIMDA
I don't know if the one space left after the header is relevant or not, but I left it.
If in those long lines of characters are no spaces anywhere, you can just remove everything until the first space with cut:
cut -d' ' -f1
which will remove all characters after the first space (including the space, dunno if the space is relevant).
#edit: As the OP edited both the input and the output, the answer now removes everything up to the first space, as to removing up to the second space...
Using awk you will have a more readable solution :
awk 'NR==1{print $1}NR!=1{print}' test.txt
Then you can redirect output to new file to store the fix :
awk 'NR==1{print $1}NR!=1{print}' test.txt > new_test.txt
EDIT
I thought there was multiple files, and just one header per file.
awk '{print $1}' test.txt
would work on your example as other lines does not contain spaces
Perl to the rescue!
perl -pe 's/ .*// if /^>/' -- file.fasta

Grep the most recent value of a particular column from a CSV file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
"cola","colb","colc","cold","cole","colf"
"a","b","c","d","e","f"
"a1","b1","c1","d1","e1","f1"
"a2","b2","c2","d2","e2","f2"
Assuming this is the CSV file, I want to grep the value "e" from the column "cole" and store it into a shell variable. And then use the shell variable as a part of a wget command.
How would I do this?
set -f # disable globbing
variable="$(awk 'NR==2 {print $5}' file)"
set +f
Awk is well suited to this. If you know the column number you can simply do:
$ awk 'NR==2{print $5}' file.csv
e
This will print the fifth field on the second line. If you want to use the column name then:
$ awk 'NR==1{for(i=1;i<=NF;i++)c[$i]=i}NR==2{print $c[col]}' col="cole" file.csv
e
Just set col="<name of column to use>".
You can use command substitution to store the value in variable:
$ val="$(awk 'NR==2{print $5}' file.csv)"
$ wget --what-ever-option "$val"
Or just use it in place:
$ wget --what-ever-option "$(awk 'NR==2{print $5}' file.csv)"

How can i took specific word from line basic in linux [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
Suppose i have one line Script :
Script Name is script1.sh has below line on it -
# sh script.sh #
So how can i take only script.sh name from script1.sh.
What I have done is below but that is not fully fruitful to me get the exact output that I want.
while read line
do
called_script= awk -F ':' '{print $1 }' final_calling_script_name
qwe= grep '*.sh' $called_script
echo $called_script " : $qwe"
done<'file_that_contains_data_of_script1_line_by_line'
Can anybody help me?
If what you want here is basically "the second word" you can use "cut"
echo "sh script.sh" | cut -d ' ' -f 2
The -d ' ' tells cut that the "delimiting character" is a space, the -f 2 tells cut that you want column number 2.
echo "sh script.sh" | { read a b; echo "$b"; }
EDIT:
After you've clarified your requirements in the notes below, I would propose this command:
echo "script1.ksh script2.pig script3.sh" | grep -oe '\w*\.sh'

Bash script manipulation [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I work with Bash script and I want to get line from big text by special text
for example i have these lines
first fffffffffffffffffffffffffff
.................................
second ssssssssssssssssssssssssss
.................................
third ttttttttttttttttttttttttttt
and I want to get ssssssssssssssssssssssssss string .
Can anybody help me?
Is this what you want?
echo "$longstring" | awk '$1 == "second" { print $2 }'
since you seem to not have any criterion as to which line you want to output, i suggest something like:
echo "ssssssssssssssssssssssssss"
this is pretty robust regarding the content of your input, doesn't depend on a "file", and is a fast solution.
cat filename | grep "^second" | cut -d " " -f 2
Or, if you are ALF:
<filename grep "^second" | cut -d " " -f 2
Or
grep "^second" filename | cut -d " " -f 2

Resources