A Linux Shell Script Problem - linux

I have a string separated by dot in Linux Shell,
$example=This.is.My.String
I want to
1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get:
This.is.My.Goood.Long.String
2.Get the part after the last dot, so I will get
String
3.Turn the dot into underscore except the last dot, so I will get
This_is_My.String
If you have time, please explain a little bit, I am still learning Regular Expression.
Thanks a lot!

I don't know what you mean by 'Linux Shell' so I will assume bash. This solution will also work in zsh, etcetera:
example=This.is.My.String
before_last_dot=${example%.*}
after_last_dot=${example##*.}
echo ${before_last_dot}.Goood.Long.${after_last_dot}
This.is.My.Goood.Long.String
echo ${before_last_dot//./_}.${after_last_dot}
This_is_My.String
The interim variables before_last_dot and after_last_dot should explain my usage of the % and ## operators. The //, I also think is self-explanatory but I'd be happy to clarify if you have any questions.
This doesn't use sed (or even regular expressions), but bash's inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)

Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots:
sed 's/\(.*\)\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//'
It splits the line before the last dot by inserting a newline and copies the result into hold space:
s/\(.*\)\./\1\n./;h
removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space:
s/[^\n]*\n//;x
removes everything after and including the newline from the copy that's now in pattern space
s/\n.*//
changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space
s/\./_/g;G
removes the newline that the append operation adds
s/\n//
Then the sed script is finished and the pattern space is output.
At the end of each numbered step (some consist of two actual steps):
Step Pattern Space Hold Space
This.is.My\n.String This.is.My\n.String
This.is.My\n.String .String
This.is.My .String
This_is_My\n.String .String
This_is_My.String .String

Solution
Two versions of this, too:
Complex: sed 's/\(.*\)\([.][^.]*$\)/\1.Goood.Long\2/'
Simple: sed 's/.*\./&Goood.Long./' - thanks Dennis Williamson
What do you want?
Complex: sed 's/.*[.]\([^.]*\)$/\1/'
Simpler: sed 's/.*\.//' - thanks, glenn jackman.
sed 's/\([^.]*\)[.]\([^.]*[.]\)/\1_\2/g'
With 3, you probably need to run the substitute (in its entirety) at least twice, in general.
Explanation
Remember, in sed, the notation \(...\) is a 'capture' that can be referenced as '\1' or similar in the replacement text.
Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.
Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.
Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log2N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.

Here's a version that uses Bash's regex matching (Bash 3.2 or greater).
[[ $example =~ ^(.*)\.(.*)$ ]]
echo ${BASH_REMATCH[1]//./_}.${BASH_REMATCH[2]}
Here's a Bash version that uses IFS (Internal Field Separator).
saveIFS=$IFS
IFS=.
array=($e) # * split the string at each dot
lastword=${array[#]: -1}
unset "array[${#array}-1]" # *
IFS=_
echo "${array[*]}.$lastword" # The asterisk as a subscript when inside quotes causes IFS (an underscore in this case) to be inserted between each element of the array
IFS=$saveIFS
* use declare -p array after these steps to see what the array looks like.

1.
$ echo 'This.is.my.string' | sed 's}[^\.][^\.]*$}Good Long.&}'
This.is.my.Good Long.string
before: a dot, then no dot until the end. after: obvious, & is what matched the first part
2.
$ echo 'This.is.my.string' | sed 's}.*\.}}'
string
sed greedy matches, so it will extend the first closure (.*) as far as possible i.e. to the last dot.
3.
$ echo 'This.is.my.string' | tr . _ | sed 's/_\([^_]*\)$/\.\1/'
This_is_my.string
convert all dots to _, then turn the last _ to a dot.
(caveat: this will turn 'This.is.my.string_foo' to 'This_is_my_string.foo', not 'This_is_my.string_foo')

You don't need regular expressions at all (those complex things hurt my eyes!) if you use Awk and are a little creative.
1. echo $example| awk -v ins="Good.long" -F . '{OFS="."; $NF = ins"."$NF;print}'
What this does:
-v ins="Good.long" tells awk to create a variable called 'ins' with "Good.long" as content,
-F . tells awk to use the dot as a separator for your fields for input,
-OFS tells awk to use the dot as a separator for your fields as output,
NF is the number of fields, so $NF represents the last field,
the $NF=... part replaces the last field, it appends the current last string to what you want to insert (the variable called "ins" declared earlier).
2. echo $example| awk -F . '{print $NF}'
$NF is the last field, so that's all!
3. echo $example| awk -F . '{OFS="_"; $(NF-1) = $(NF-1)"."$NF; NF=NF-1; print}'
Here we have to be creative, as Awk AFAIK doesn't allow deleting fields. Of course, we set the output field separateor to underscore.
$(NF-1) = $(NF-1)"."$NF: First, we replace the second last field with the last glued to the second last, with a dot between.
Then, we fool awk to make it think the Number of fields is equal to the number of fields minus one, hence deleting the last field!
Note you can't say $NF="", because then it would display two underscores.

Related

how to transpose values two by two using shell?

I have my data in a file store by lines like this :
3.172704445659,50.011996744997,3.1821975358417,50.012335988197,3.2174797791605,50.023182479597
And I would like 2 columns :
3.172704445659 50.011996744997
3.1821975358417 50.012335988197
3.2174797791605 50.023182479597
I know sed command for delete ','(sed "s/,/ /") but I don't know how to "back to line" every two digits ?
Do you have any ideas ?
One in awk:
$ awk -F, '{for(i=1;i<=NF;i++)printf "%s%s",$i,(i%2&&i!=NF?OFS:ORS)}' file
Output:
3.172704445659 50.011996744997
3.1821975358417 50.012335988197
3.2174797791605 50.023182479597
Solution viable for those without knowledge of awk command - simple for loop over an array of numbers.
IFS=',' read -ra NUMBERS < file
NUMBERS_ON_LINE=2
INDEX=0
for NUMBER in "${NUMBERS[#]}"; do
if (($INDEX==$NUMBERS_ON_LINE-1)); then
INDEX=0
echo "$NUMBER"
else
((INDEX++))
echo -n "$NUMBER "
fi
done
Since you already tried sed, here is a solution using sed:
sed -r "s/(([^,]*,){2})/\1\n/g; s/,\n/\n/g" YOURFILE
-r uses sed extended regexp
there are two substitutions used:
the first substitution, with the (([^,]*,){2}) part, captures two comma separated numbers at once and store them into \1 for reuse: \1 holds in your example at the first match: 3.172704445659,50.011996744997,. Notice: both commas are present.
(([^,]*,){2}) means capture a sequence consisting of NOT comma - that is the [^,]* part followed by a ,
we want two such sequences - that is the (...){2} part
and we want to capture it for reuse in \1 - that is the outer pair of parentheses
then substitute with \1\n - that just inserts the newline after the match, in other words a newline after each second comma
as we have now a comma before the newline that we need to get rid of, we do a second substitution to achieve that:
s/,\n/\n/g
a comma followed by newline is replace with only newline - in other words the comma is deleted
awk and sed are powerful tools, and in fact constitute programming languages in their own right. So, they can, of course, handle this task with ease.
But so can bash, which will have the benefits of being more portable (no outside dependencies), and executing faster (as it uses only built-in functions):
IFS=$', \n'
values=($(</path/to/file))
printf '%.13f %.13f\n' "${values[#]}"

sed is misbehaving when replacing certain regular expressions

I am trying to remove numbers - but only when they immediately follow periods. Similar replaces seem to work correctly, but not with periods.
I have tried the following which was given as a solution in another post:
echo "fr.r1.1.0" | sed s/\.[0-9][0-9]*/\./g
I get fr..... It seems that even though I escape the period it is matching arbitrary characters instead of only periods.
This expression seems to work for the previous example:
echo "fr.r1.1.0" | sed s/[[:punct:]][0-9][0-9]*/\./g
and gives me fr.r1.. but then for
echo "ge.s1_1.0" | sed s/[[:punct:]][0-9][0-9]*/\./g
I get ge.s1.. instead of ge.s1_1.
You will have to put the sed instructions between single quotes to avoid interpretation of some of the special characters by your shell:
echo "fr.r1.1.0" | sed 's/\.[0-9][0-9]*/\./g'
fr.r1..
Also you do not need to escape the dot in the replacement part (.) and [0-9][0-9]* can be simplified into [0-9]\+ giving the simplified command:
echo "fr.r1.1.0" | sed 's/\.[0-9]\+/./g'
fr.r1..
Last but not least, as POSIX [:punct:] character class is defined as
punctuation (all graphic characters except letters and digits)
https://en.wikibooks.org/wiki/Regular_Expressions/POSIX_Basic_Regular_Expressions
it will also include underscore (and a lot of other stuff), therefore, if you want to limit your matches to . followed by digits you will need to explicitly use dot (escaped or via its ascii value)

How can I get substring from a string in linux?

I am trying to extract a specific string from a string in linux.
For example, I want to extract 'android.content.pm.PackageParser.parseBaseApplication' from the below string.
The String has a regular format and only the string within parenthesis is changeable.
Join point 'method-execution(boolean android.content.pm.PackageParser.parseBaseApplication(android.content.pm.PackageParser$Package, android.content.res.Resources, org.xmlpull.v1.XmlPullParser, android.util.AttributeSet, int, java.lang.String[]))' in Type
However, I have a trouble in finding a proper approach to do this.
At first, I tried sed command but it's too complicate so I couldn't complete the work.
Could you recommend any other approach to do this?
Thanks alot.
If the interested string is always the second string after the first ( then:
echo "..." | awk -F '[()]' '{split($2,a," "); printf a[2]}'
extract it.
It splits the line using delimiters ( and ). So $2 will the data between ( and ). split splits $2 and you get the second string which is
android.content.pm.PackageParser.parseBaseApplication
for your example.
This looks like AOP syntax. So with certain assumption, this can be done as :
echo "Join point...." | cut -d'(' -f2 | cut -d' ' -f2
Explanation : cut based on ( and get second field, which is the method signature except parameters. Since we are not interested in return type as well, split the signature based on blank space and get the second field, which is the method name.
This is based your stated invariant, that the substring you're capturing is the only part that varies from file to file, here is a perl solution:
Extract=$(perl -ne 'print $1 if /\s*Join point \x27method-execution\(boolean\s+([^(]*)/' file_to_search)
echo $Extract
android.content.pm.PackageParser.parseBaseApplication
I used the full lead-in because it reduced the chance of false-positive, but if you find other things change and want to use yet a substring of that (e.g., "method-execution(boolean "), that's your choice to make.
This matches out to the where the variant substring starts, which goes to the next invariant--the open parenthesis--so we can just capture while not open parenthesis. Since it's probably some human interaction changing the variant, I allowed for extra spaces with the \s+ (one or more white space).
You could use almost the same regex with sed, but would need to consume the entire string to avoid it becoming part of the output. e.g., in shorthand:
sed -r 's/.*LEAD_IN(CAPTURE_TEXT).*/\1/
Where LEAD_IN is the constant leader, "Join point..." and CAPTURE_TEXT the same capture group as in the perl solution. Main difference is leading and triling ".*" to consume the entire subject.

How to remove characters from a word if they are also in the next word (sed)?

I'm trying to find a way to delete all characters in the first word IF that character is in the second word. The input looks like this:
computer cost
And the result should be: "mpuer" because the c, o and t were deleted. There are multiple lines like this separated by a return, the 2 words are separated by a space.
I've been searching quite some time for the solution, but I'm really stuck. All help is appreciated.
This might work for you:
echo "computer cost" |
sed ':a;s/\(.\)\(.* .*\1.*\)/\2/;ta;s/ .*//'
mpuer
Explanation:
Make a label for future branch command :a;
Delete a character in the first word that matches with the same character in the second word s/\(.\)\(.* .*\1.*\)/\2/
If the substitution occurred branch to label ta
When no more substitutions delete the second word. s/ .*//
The substitution regexp can be further explained:
\(.\) matches any character in word one (later refered to as \1)
\(.* .*\1.*\) matches any characters in the remainder of a word one .* followed by a space followed by some on none characters in word two .* followed by a matching character from word one \1 followed by the remaining characters from word two .* this grouping will later be known as \2.
If the above matches replace it by \2 thus effectively deleting the matching character \1
This works (as does the solution by potong):
sed -e ': loop' \
-e 's/\([a-z]*\)\([a-z]\)\([a-z]*\) \([a-z]*\2[a-z]*\)/\1\3 \4/' \
-e 't loop' \
-e 's/ .*//' \
"$#"
The first line establishes a label. The third line branches to the label if there's been a successful substitute since the line was read and the last time the t was executed, so that establishes a loop while the substitute command finds something to do. The last line removes the word after the space once the loop is complete.
All eyes concentrate on the regexes, now. The key insight is that you can look for a repeat of a remembered pattern later in the string using \n where n is a digit. The first part of the regex partitions the line into 5 pieces. The first part is a (possibly empty) sequence of letters that aren't interesting; the second is a single letter that is interesting; the third is another (possibly empty) sequence of letters that aren't interesting; the fourth is the space separating the first word from the second. The final part can itself be subdivided into 3 parts, though they are all grouped together into a single capture expression. It consists of a sequence of zero or more uninteresting letters, a repeat of the interesting letter from the first word on the line (the \2), and another sequence of zero or more non-interesting letters.
The replacement string keeps the before and after parts of the first word, plus the space and the second word.
In combination, it finds each of the letters c, o and t in turn, eliminating them from the first word and leaving them alone in the second.
The conditional branching in sed is hard to use, but it can really score on occasion. When your hands are tied by the assignment like this, it makes the solution feasible.
$ al 'computer cost' 'encyclopedia brittanica' 'security privacy' |
> sed -e ': loop; s/\([a-z]*\)\([a-z]\)\([a-z]*\) \([a-z]*\2[a-z]*\)/\1\3 \4/; t loop'
mpuer
eyloped
seut
$
al simply lists its arguments one per line - hence the mnemonic Argument List:
#include <stdio.h>
int main(int argc, char **argv)
{
while (*++argv)
puts(*argv);
return 0;
}
Potong's solution is essentially equivalent to a 'Code Golf' version of mine:
sed ':a;s/\(.\)\(.* .*\1.*\)/\2/;ta;s/ .*//'
It uses the same general technique that mine does, but simplifies the regex. One simplification is the use of . (any character) in place of [a-z] (any letter). Another is to realize that the leading pattern doesn't matter; it will be left alone. The last is to group the tail of the first word with the whole of the second. In retrospect, I could (should?) have added a ^ anchor to my pattern. Potong's label is simply a.
Basically you do this by tr;
echo computer cost | while read x y;do echo $x | tr -d $y ; done;
if you have a file (words) like
computer cost
computer mop
Following command will do the replacement.
while read x y; do echo $x | tr -d $y ; done< words
If you want to use sed just replace tr -d $y with sed s/[$y]//g

sed: Find pattern over two lines, not replace after that pattern

Wow, this one has really got me. Gonna need some tricky sed skill here I think. Here is the output value of command text I'm trying to replace:
...
fast
n : abstaining from food
The value I'd like to replace it with, is:
...
Noun
: abstaining from food
This turns out to be tricker that I thought. Because 'fast' is listed a number of times and because it is listed in other places at the beginning of the line. So I came up with this to define the range:
sed '/fast/,/^ n : / s/fast/Noun/'
Which I thought would do, but... Unfortunately, this doesn't end the replacement and the rest of the output following this match are replaced with Noun. How to get sed to stop replacement after the match? Even better, can I find a two line pattern match and replace it?
Try this:
sed "h; :b; \$b ; N; /^${1}\n n/ {h;x;s//Noun\n/; bb}; \$b ; P; D"
Unfortunately, Paul's answer reads the whole file in which makes any additional processing you might want to do difficult. This version reads the lines in pairs.
By enclosing the sed script in double quotes instead of single quotes, you can include shell variables such as positional parameters. I would recommend surrounding them with curly braces so they are set apart from the adjacent characters. When using double quotes, you'll have to be careful of the shell wanting to do its various expansions. In this example, I've escaped the dollar signs that signify the last line of the input file for the branch commands. Otherwise the shell will try to substitute the value of a variable $b which is likely to be null thus making sed unhappy.
Another technique would be to use single quotes and close and open them each time you have a shell variable:
sed 'h; :b; $b ; N; /^'${1}'\n n/ {h;x;s//Noun\n/; bb}; $b ; P; D'
# ↑open close↑ ↑open close↑
I'm assuming that the "[/code]" in your expected result is a typo. Let me know if it's not.
This seems to do what you want:
sed -e ':a;N;$!ba;s/fast\n n/Noun\n/'
I essentially stole the answer from here.
This might work for you:
sed '$!N;s/^fast\n\s*n :/Noun\n :/;P;D' file
...
Noun
: abstaining from food

Resources