Extract substring after a character

Extract substring after a character - linux

I'm trying to extract substring after the last period (dot).
examples below.
echo "filename..txt" should return "txt"
echo "filename.txt." should return ""
echo "filename" should return ""
echo "filename.xml" should return "xml"
I tried below. but works only if the character(dot) exists once. But my filename may have (dot) for 0 or more times.
echo "filename.txt" | cut -d "." -f2

Let's use awk!
awk -F"." '{print (NF>1)? $NF : ""}' file
This sets field separator to . and prints the last one. But if there is none, it prints an empty string.
Test
$ cat file
filename..txt
filename.txt.
filename
filename.xml
$ awk -F"." '{print (NF>1)? $NF : ""}' file
txt
xml

One can make this portable (so it's not Linux-only), avoiding an ERE dependency, with the following:
$ sed -ne 's/.*\.//p' <<< "file..txt"
txt
$ sed -ne 's/.*\.//p' <<< "file.txt."
$ sed -ne 's/.*\.//p' <<< "file"
$ sed -ne 's/.*\.//p' <<< "file.xml"
xml
Note that for testing purposes, I'm using a "here-string" in bash. If your shell is not bash, use whatever your shell uses to feed data to sed.
The important bit here is the use of sed's -n option, which tells it not to print anything by default, combined with the substitute command's explicit p flag, which tells sed to print only upon a successful substitution, which obviously requires a dot to be included in the pattern.
With this solution, the difference between "file.txt." and "file" is that the former returns the input line replaced with null (so you may still get a newline depending on your usage), whereas the latter returns nothing, as sed is not instructed to print, as no . is included in the input. The end result may well be the same, of course:
$ printf "#%s#\n" $(sed -ne 's/.*\.//p' <<< "file.txt.")
##
$ printf "#%s#\n" $(sed -ne 's/.*\.//p' <<< "file")
##

Simple to do with awk:
awk -F"." '{ print $NF }'
What this does: With dot as a delimiter, extract the last field from the input.

Use sed in 2 steps: first remove string without a dot and than remove up to the last dot:
sed -e 's/^[^.]*$//' -e 's/.*\.//'
Test:
for s in file.txt.. file.txt. file.txt filename file.xml; do
echo "$s -> $(echo "$s" | sed -e 's/^[^.]*$//' -e 's/.*\.//')"
done
Testresult:
file.txt.. ->
file.txt. ->
file.txt -> txt
filename ->
file.xml -> xml
Actually the answer of #ghoti is roughly the same, just a bit shorter (better).
This solution can be used by other readers who wants to do something like this in another language.

Related

How to search and delete a pattern from a line?

I need to write a simple bash script that takes a text line
some-pattern something-else
and erases some-pattern and returns only something-else. I wrote a script to do the opposite with grep -o, but I don't know how I could do with this case. Any help is very much appreciated.
sample input:
"SNMPv2::sysLocation.0 = STRING: someLocation"
Desired Output:
"someLocation"

Considering " are NOT in your sample Input_file and expected output, could you please try following with GNU grep.
grep -oP '.*STRING: \K(.*)' Input_file
someLocation
For \K explanation:
\K is a PCRE extension to regex syntax discarding content prior to
that point from being included in match output

You can use sed to delete the part in front of what you want to keep.
Given:
$ echo "$s"
"SNMPv2::sysLocation.0 = STRING: someLocation"
You can do:
$ echo "$s" | sed -nE 's/^.*(someLocation)/\1/p'
someLocation
And if you want to add quotes:
$ echo "$s" | sed -nE 's/^.*(someLocation)/"\1"/p'
"someLocation"
If the portion after STRING: is variable, not fixed, you can use STRING: and the capture anchor:
$ echo "$s" | sed -nE 's/^.*STRING:[[:space:]]*(.*)/"\1"/p'
"someLocation"
Or, sed to capture and print the last word after the last space:
$ echo "$s" | sed -nE 's/([^[:space:]]*$)/\1/p'
You can also use awk if the last word is space separated from the other fields:
$ echo "$s" | awk '{print $NF}'
Or a pipeline with cut and rev works too:
$ echo "$s" | rev | cut -d' ' -f 1 | rev

You can use: echo ${STRING} | awk -F" " '/someLocation/ { print $NF }'
-F will use space (represented by double-quotes with space between them) as separator; /someLocation/ will search for your location; { print $NF } will show the last part of your string (which, I believe, is the place where location is.

How to add single quotes in a shell script using sed

Need help in making a sed script to find and replace user input along with single quotes. Input file admins.py:
Script:
read adminsid
while [[ $adminsid == "" ]];
do
echo "You did not enter anything. Please re-enter AdminID"
read adminsid
done
## Please enter Admin's ID
9999999999,8888888888,1111111111
## Script To Replace ADMIN_IDS = [] to ADMIN_IDS = ['9999999999,8888888888,1111111111'] in file
sed -i "s|ADMIN_IDS = \[.*\]|ADMIN_IDS = ['$adminsid']|g" $file
## Current results:
ADMIN_IDS = ['9999999999,8888888888,1111111111']
## Expected results:
ADMIN_IDS = ['9999999999','8888888888','1111111111']

Assign the variable to the data
adminsid=9999999999,8888888888,1111111111
Then use sed -e (script) option to add the quoting, and square brackets.
echo "$adminsid" | sed -e "s/,/\',\'/g" -e "s/^/[\'/" -e "s/$/\']/"
or to apply changes to a file (filename in $file):
sed -i "$file" -e "s/,/\',\'/g" -e "s/^/[\'/" -e "s/$/\']/"

You can do this with awk too:
Suppose you have assigned the variable as :
adminsid=9999999999,8888888888,1111111111
Then the solution:
echo "$adminsid"| awk -F"," -v quote="'" -v OFS="','" '$1=$1 {print "["quote $0 quote"]"}'
-F"," -v OFS="','" :: Replacing separator (,) with (',')
print "["quote $0 quote"]" :: Add single quotes(') and ([) and (]) to the begin and end of line

This might work for you (GNU sed & bash):
<<<"$adminsid" sed 's/[^,]\+/'\''&'\''/g;s/.*/[&]/'
Surround all non-comma characters by single quotes and then surround the entire string by square brackets.

Replace the , with ',' in the variable and add characters at the beginning and at the end.
sed "s/.*/['&']/" <<< "${adminsid//,/','}"

echo "('${adminsid//,/\\',\\'}')"

Filename manipulation

Kindly help me with a unix script to modify the filename in required format as shown below:
AN_555a_orange_20190513.txt
AN_555b_apple_20190513.txt
Required format: Fruits names first character should be in Caps and also its position should be is changed to second:
AN_Orange_555a_20190513.txt
AN_Apple_555a_20190513.txt
And it should apply for all files present in directory,
below is the command i'm trying which is not working
for in in aaal*
do
out=${in#*_}
out=${out%_*_*_*}
out=${out%[0-9]}
out1=${out#*_}
out2=${out%_*}
AAAI_$out1$out2.txt
done

This script is simple, but worked with your sample:
#!/bin/bash
for i in AN*; do
NAME=$(echo $i | awk -F_ '{printf "%s_%s%s_%s_%s", $1,toupper( substr( $3,1,1)),(substr($3,2,100)),$2,$4,$5}')
echo "--> $NAME"
done

An interesting solution for this case is to use sed, just like this:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/e'
Note the final e at the end of the sed command. It tells sed to execute the result of the substitution as a bash command.
So if you remove the e (which you could do at first, to check the substitution works as expected), you would get in the console:
$ ls -1 | sed 's/\(AN_\)\([^_]*_\)\([a-z]*_\)\([0-9]*.txt\)/mv "&" "\1\u\3\2\4"/'
mv "AN_555a_orange_20190513.txt" "AN_Orange_555a_20190513.txt"
mv "AN_555b_apple_20190513.txt" "AN_Apple_555b_20190513.txt"
(The sed substitution matches the several groups of characters, reorders them and creates the mv ... ... line. Note that & in the replacement pattern denotes the whole pattern matched, and \u tells sed to put the next character as upper case.)
Then add back that final e, and instead of printing these lines sed will execute them, effectively renaming the files.

This onliner could give you more idas:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt)
This will print something like:
mv AN_555a_orange_20190513.txt AN_Orange_555a_20190513.txt
mv AN_555b_apple_20190513.txt AN_Apple_555b_20190513.txt
Then if are happy with the results, pipe it to sh for example:
awk -F_ '{printf "mv %s %s_%s%s_%s_%s\n", $0, $1,toupper(substr($3,1,1)), substr($3, 2),$2,$4}' <(ls *.txt) | sh

Extract field after colon for lines where field before colon matches pattern

I have a file file1 which looks as below:
tool1v1:1.4.4
tool1v2:1.5.3
tool2v1:1.5.2.c8.5.2.r1981122221118
tool2v2:32.5.0.abc.r20123433554
I want to extract value of tool2v1 and tool2v2
My output should be 1.5.2.c8.5.2.r1981122221118 and 32.5.0.abc.r20123433554.
I have written the following awk but it is not giving correct result:
awk -F: '/^tool2v1/ {print $2}' file1
awk -F: '/^tool2v2/ {print $2}' file1

grep -E can also do the job:
grep -E "tool2v[12]" file1 |sed 's/^.*://'

If you have a grep that supports Perl compatible regular expressions such as GNU grep, you can use a variable-sized look-behind:
$ grep -Po '^tool2v[12]:\K.*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
The -o option is to retain just the match instead of the whole matching line; \K is the same as "the line must match the things to the left, but don't include them in the match".
You could also use a normal look-behind:
$ grep -Po '(?<=^tool2v[12]:).*' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554
And finally, to fix your awk which was almost correct (and as pointed out in a comment):
$ awk -F: '/^tool2v[12]/ { print $2 }' infile
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554

You can filter with grep:
grep '\(tool2v1\|tool2v2\)'
And then remove the part before the : with sed:
sed 's/^.*://'
This sed operation means:
^ - match from beginning of string
.* - all characters
up to and including the :
... and replace this matched content with nothing.
The format is sed 's/<MATCH>/<REPLACE>/'
Whole command:
grep '\(tool2v1\|tool2v2\)' file1|sed 's/^.*://'
Result:
1.5.2.c8.5.2.r1981122221118
32.5.0.abc.r20123433554

the question has already been answered though, but you can also use pure bash to achieve the desired result
#!/usr/bin/env bash
while read line;do
if [[ "$line" =~ ^tool2v* ]];then
echo "${line#*:}"
fi
done < ./file1.txt
the while loop reads every line of the file.txt, =~ does a regexp match to check if the value of $line variable if it starts with toolv2, then it trims : backward

linux shell title case

I am wrinting a shell script and have a variable like this: something-that-is-hyphenated.
I need to use it in various points in the script as:
something-that-is-hyphenated, somethingthatishyphenated, SomethingThatIsHyphenated
I have managed to change it to somethingthatishyphenated by stripping out - using sed "s/-//g".
I am sure there is a simpler way, and also, need to know how to get the camel cased version.
Edit: Working function derived from #Michał's answer
function hyphenToCamel {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,1,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hyphenToCamel)
echo $CAMEL
Edit: Finally, a sed one liner thanks to #glenn
echo a-hyphenated-string | sed -E "s/(^|-)([a-z])/\u\2/g"

a GNU sed one-liner
echo something-that-is-hyphenated |
sed -e 's/-\([a-z]\)/\u\1/g' -e 's/^[a-z]/\u&/'
\u in the replacement string is documented in the sed manual.

Pure bashism:
var0=something-that-is-hyphenated
var1=(${var0//-/ })
var2=${var1[*]^}
var3=${var2// /}
echo $var3
SomethingThatIsHyphenated
Line 1 is trivial.
Line 2 is the bashism for replaceAll or 's/-/ /g', wrapped in parens, to build an array.
Line 3 uses ${foo^}, which means uppercase (while ${foo,} would mean 'lowercase' [note, how ^ points up while , points down]) but to operate on every first letter of a word, we address the whole array with ${foo[*]} (or ${foo[#]}, if you would prefer that).
Line 4 is again a replace-all: blank with nothing.
Line 5 is trivial again.

You can define a function:
hypenToCamel() {
tr '-' '\n' | awk '{printf "%s%s", toupper(substr($0,0,1)), substr($0,2)}'
}
CAMEL=$(echo something-that-is-hyphenated | hypenToCamel)
echo $CAMEL

In the shell you are stuck with being messy:
aa="aaa-aaa-bbb-bbb"
echo " $aa" | sed -e 's/--*/ /g' -e 's/ a/A/g' -e 's/ b/B/g' ... -e 's/ *//g'
Note the carefully placed space in the echo and the double space in the last -e.
I leave it as an exercise to complete the code.
In perl it is a bit easier as a one-line shell command:
perl -e 'print map{ $a = ucfirst; $a =~ s/ +//g; $a} split( /-+/, $ARGV[0] ), "\n"' $aa

For the records, here's a pure Bash safe method (that is not subject to pathname expansion)—using Bash≥4:
var0=something-that-is-hyphenated
IFS=- read -r -d '' -a var1 < <(printf '%s\0' "${var0,,}")
printf '%s' "${var1[#]^}"
This (safely) splits the lowercase expansion of var0 at the hyphens, with each split part in array var1. Then we use the ^ parameter expansion to uppercase the first character of the fields of this array, and concatenate them.
If your variable may also contain spaces and you want to act on them too, change IFS=- into IFS='- '.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Extract substring after a character - linux

Let's use awk! awk -F"." '{print (NF>1)? $NF : ""}' file This sets field separator to . and prints the last one. But if there is none, it prints an empty string. Test $ cat file filename..txt filename.txt. filename filename.xml $ awk -F"." '{print (NF>1)? $NF : ""}' file txt xml

Simple to do with awk: awk -F"." '{ print $NF }' What this does: With dot as a delimiter, extract the last field from the input.

Related

How to search and delete a pattern from a line?

How to add single quotes in a shell script using sed

Filename manipulation

Extract field after colon for lines where field before colon matches pattern

linux shell title case

Categories

Resources