Move last words of strings from one file to another - linux

I have a file that contains:
/usr/bin/alias, /usr/bin/clear, /usr/bin/echo, /usr/bin/cat, /usr/bin/netstat,
/usr/sbin/shutdown, /usr/bin/less
and I need to move the last words to another file
alias
clear
echo
cat
netstat
shutdown
less
I have tried awk, grep, sed, cut all kinds of combinations but can't seem to get the right result
Thank you in advance for your help.

I used this:
grep -Po '(?<=/)[^/]+$' filename
-P means to use Perl style regex
-o means to output just the matched text
(?<=/) is a zero-width lookbehind to match the leading slash
[^/]+$ matches any text except a slash to the end of the line

Related

Bash script - Get part of a line of text from another file

I'm quite new to bash scripting. I have a script where I want to extract part of the value of a particular line in a separate config file and then use that value as a variable in the script.
For example:
Line 75 in a file named config.cfg
"ssl_cert_location=/etc/ssl/certs/thecert.cer"
I want just the value at the end of "thecert.cer" to then use in the script. I've tried awk and various uses of grep but I can't quite get just the name of the certificate.
Any help would be appreciated. Thanks
These are some examples of the commands I ran:
awk -F "/" '{print $4}' config.cfg
grep -o *.cer config.cfg
Is this possible to extract the value on that line and then edit the output so it just contains the name of the certificate file?
This is a pure Bash version of the basic functionality of basename:
cert=${line##*/}
which removes everything up to and including the final slash. It presupposes that you've already read the line.
Or, using sed:
cert=$(sed -n '75s/^.*\///p' filename)
or
cert=$(sed -n '/^ssl_cert_location=/s/^.*\///p' filename)
This gets the specified line based on the line number or the setting name and replaces everything up to and including the final slash with nothing. It ignores all other lines in the file (unless the setting is repeated in the case of the text match version). The text match version is better because it works no matter what line number the setting is on.
grep uses regular expressions (as does sed). The grep command in your command appears to have a glob expression which won't work. One way to use grep (GNU grep) is to use the PCRE feature (Perl Compatible Regular Expressions):
cert=$(grep -Po '^ssl_cert_location=.*/\K.*' filename)
This works similarly to the sed command.
I have anchored the regular expressions to the beginning of the line. If there may be leading white spaces (the line may be indented), change the regex so it looks something like this:
^[[:space:]]*ssl_cert_location=
which works for both indented and unindented lines.
There are many variants, but a simple one that comes to mind with grep is first getting the line, then matching only non-slashes at the end of the line:
<config.cfg grep '^ssl_cert_location=' | grep -o '[^/]*$'
Why didn't your grep command (grep -o *.cer config.cfg) work? Becasue *.cer is a shell glob pattern and will be expanded by the shell to matching file names, even before the grep process is even started. If there are no matching files, it will be passed verbatim, but * in regular expressions is a quantifier which needs a preceeding expression. . in regex is "match any single character". So what you wanted is probably grep -o '.*\.cer', but .* matches anything, including slashes.
An awk solution would look like the following:
awk -F/ '/^ssl_cert_location=/{print $NF}' config.cfg
It uses "/" as separator, finds only lines starting with "ssl_cert_location" and then prints the last (NF) field in from this line.
Or an equivalent sed solution, which matches the same line and then deletes everything including the last slash:
sed -n '/^ssl_cert_location=/s#^.*/##p' config.cfg
To store the output of any command in a variable, use command substitution:
var="$(command with arguments)"

How to add character at the end of specific line in UNIX/LINUX?

Here is my input file. I want to add a character ":" into the end of lines that have ">" at the beginning of the line. I tried seq -i 's|$|:|' input.txt but ":" was added to all the ending of each line. It is also hard to call out specific line numbers because, in each of my input files, the line contains">" present in different line numbers. I want to run a loop for multiple files so it is useless.
>Pas_pyrG_2
AAAGTCACAATGGTTAAAATGGATCCTTATATTAATGTCGATCCAGGGACAATGAGCCCA
TTCCAGCATGGTGAAGTTTTTGTTACCGAAGATGGTGCAGAAACAGATCTGGATCTGGGT
>Pas_rpoB_4
CAAACTCACTATGGTCGTGTTTGTCCAATTGAAACTCCTGAAGGTCCAAACATTGGTTTG
ATCAACTCGCTTTCTGTATACGCAAAAGCGAATGACTTCGGTTTCTTGGAAACTCCATAC
CGCAAAGTTGTAGATGGTCGTGTAACTGATGATGTTGAATATTTATCTGCAATTGAAGAA
>Pas_cpn60_2
ATGAACCCAATGGATTTAAAACGCGGTATCGACATTGCAGTAAAAACTGTAGTTGAAAAT
ATCCGTTCTATTGCTAAACCAGCTGATGATTTCAAAGCAATTGAACAAGTAGGTTCAATC
TCTGCTAACTCTGATACTACTGTTGGTAAACTTATTGCTCAAGCAATGGAAAAAGTAGGT
AAAGAAGGCGTAATCACTGTAGAAGAAGGCTCAGGCTTCGAAGACGCATTAGACGTTGTA
Here is experted output file:
>Pas_pyrG_2:
AAAGTCACAATGGTTAAAATGGATCCTTATATTAATGTCGATCCAGGGACAATGAGCCCA
TTCCAGCATGGTGAAGTTTTTGTTACCGAAGATGGTGCAGAAACAGATCTGGATCTGGGT
>Pas_rpoB_4:
CAAACTCACTATGGTCGTGTTTGTCCAATTGAAACTCCTGAAGGTCCAAACATTGGTTTG
ATCAACTCGCTTTCTGTATACGCAAAAGCGAATGACTTCGGTTTCTTGGAAACTCCATAC
CGCAAAGTTGTAGATGGTCGTGTAACTGATGATGTTGAATATTTATCTGCAATTGAAGAA
>Pas_cpn60_2:
ATGAACCCAATGGATTTAAAACGCGGTATCGACATTGCAGTAAAAACTGTAGTTGAAAAT
ATCCGTTCTATTGCTAAACCAGCTGATGATTTCAAAGCAATTGAACAAGTAGGTTCAATC
TCTGCTAACTCTGATACTACTGTTGGTAAACTTATTGCTCAAGCAATGGAAAAAGTAGGT
AAAGAAGGCGTAATCACTGTAGAAGAAGGCTCAGGCTTCGAAGACGCATTAGACGTTGTA
Do seq have more option to modify or the other commands can solve this problem?
sed -i '/^>/ s/$/:/' input.txt
Search the lines of input for lines that match ^> (regex for "starts with the > character). Those that do substitute : for end-of-line (you got this part right).
/ slashes are the standard separator character in sed. If you wish to use different characters, be sure to pass -e or s|$|:| probably won't work. Since / characters, unlike | characters, are not meaningful character within the shell, it's best to use them unless the pattern also contains slashes, in which case things get unwieldy.
Be careful with sed -i. Make a backup - make sure you know what's changing by using diff to compare the files.
On OSX -i requires an argument.
Using ed to edit the file:
printf "%s\n" 'g/^>/s/$/:/' w | ed -s input.txt
For every line starting with >, add a colon to the end, and then write the changed file back to disk.

Sed: Extracting regex pattern from lines

I have an input stream of many lines which look like this:
path/to/file: example: 'extract_me.proto'
path/to/other-file: example: 'me_too.proto'
path/to/something/else: example: 'and_me_2.proto'
...
I'd like to just extract the *.proto filenames from these lines, and I have tried:
[INPUT] | sed 's/^.*\([a-zA-Z0-9_]+\.proto\).*$/\1/'
I know that part of my problem is that .* is greedy and I'm going to get things like e.proto and o.proto and 2.proto, but I can't even get that far... it just outputs with the same lines as the input. Any help would be greatly appreciated.
I find it helpful to use extended regex for this purpose (-r) in which case you need not escape your brackets.
sed -r 's/^.*[^a-zA-Z0-9_]([a-zA-Z0-9_]+\.proto).*$/\1/'
The addition of [^a-zA-Z0-9_] forces the .* to not be greedy.
Since you tag your command with linux, I'll assume you have GNU grep. Pick one of
grep -oP '\w+\.proto' file
grep -o "[^']+\\.proto" file
one way to do it:
sed 's/^.*[^a-zA-Z0-9_]\([a-zA-Z0-9_]\+\.proto\).*$/\1/'
escaped the + char
put a negation before the alphanum+underscore to delimit the leading chars
another way: use single quote delimitation, after all it's here for that:
sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\)'.*\$/\1/"
Use this sed:
sed "s/^.*'\([a-zA-Z0-9_]\+\.proto\).*$/\1/"
+ - Extended-RegEx. So, you need to escape to get special meaning. The preceding item will be matched one or more times.
Another way:
sed "s/^.*'\([^']\+\.proto\)'.*$/\1/"
With GNU sed:
sed -E "s/.*'([^']+)'$/\1/"

Conditional replace using sed

My question is probably rather simple. I'm trying to replace sequences of strings that are at the beginning of lines in a file. For example, I would like to replace any instance of the pattern "GN" with "N" or "WR" with "R", but only if they are the first 2 characters of that line. For example, if I had a file with the following content:
WRONG
RIGHT
GNOME
I would like to transform this file to give
RONG
RIGHT
NOME
I know i can use the following to replace any instance of the above example;
sed -i 's/GN/N/g' file.txt
sed -i 's/WR/R/g' file.txt
The issue is that I want this to happen only if the above patterns are the first 2 characters in any given line. Possibly an IF statement, although i'm not sure what the condition would look like. Any pointers in the right direction would be much appreciated, thanks.
just add the circumflex, remove g suffix (unnecessary, since you want at most one replacement), you can also combine them in one script.
sed -i 's/^GN/N/;s/^WR/R/' file.txt
Use the start-of-string regexp anchor ^:
sed -i 's/^GN/N/' file.txt
sed -i 's/^WR/R/' file.txt
Since sed is line-oriented, start-of-string == start-of-line.

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

Resources