Perl Unable to truncate string

Perl Unable to truncate string - string

I am trying to extract AAA and BBB from the output of the command "dspmq".
$dspmq <- this command gives output as -->
QMNAME(AAA) STATUS(Running)
QMNAME(BBB) STATUS(Running)
But it doesn't work with the below code.
perl -e 'use Data::Dumper qw(Dumper);my #qmgrlist = `dspmq`;$size = #qmgrlist;foreach my $i (#qmgrlist){my #temp1 = split /QMNAME\(/, $i;print #temp1;}'
AAA) STATUS(Running)
BBB) STATUS(Running)
I am able to truncate "QMNAME(" but unable to truncate those to the right of AAA and BBB. Basically I want to get the string between "QMNAME(" and the immediate ")". Please assist.

I think a regex approach is better than split() here, but you could use split() by splitting on parentheses and taking the second item in the returned list.
for (#qmgrlist) {
say +(split /[()]/)[0];
}
And a brief note on your use of command-line options to run this code. You can make it simpler if you a) pipe the output of qspmq into your code and b) use -n to process a record at a time.
$ perl -nE 'say +(split /[()]/)[1]' `dspmq`
There's also -M to load modules (e.g. -MData::Dumper), but you don't seem to be using Data::Dumper any more.

split isn't going to do what you need. I would just use a regular expression to match the sub-string you need
So change the loop from this
foreach my $i (#qmgrlist)
{
my #temp1 = split /QMNAME\(/, $i;
print #temp1;
}
to this
foreach my $i (#qmgrlist)
{
print "$1\n"
if /QMNAME\((.+?)\)/;
}

Try this perl one-liner:
dspmq | perl -lne 'print for m{ QMNAME [(] ( [^)]* ) [)] }x'
Here, dspmq STDOUT is fed using a pipe | into STDIN of the perl code, which has these flags:
-e tells Perl interpreter to look for the code inline rather than in a separate script file.
-n feeds the input line by line to the inline code (this way you do not need to store the output in an array - this matters for large outputs, not in your case).
-l strips the input record separator (newline on *NIX) before feeding it to the code, and appends it automatically after during print.
The print ... for ... m{... (...) ...} code prints every pattern captured in parentheses.
The captured pattern is [^)]*, which is maximum number (0 or more) chars that are not (^) listed in the character class, that is, that are not closing parens.
[(] ... [)] are literal parentheses escaped as character classes for readability. I prefer this to escaping like so: \( ... \).
QMNAME is used to make the programmer's intentions clear: you want the string that follows QMNAME in parens. I prefer this to using the field index, such as 1, which protects you against minor variation in output of your command used with different options, on different systems, etc.
Finally, the x regex modifier in m{...}x enables comments and whitespace to be ignored, and is preferred for readability.
RELATED:
Cutting the output of a dspmq command

Desired output can be achieved with following code
use strict;
use warnings;
use feature 'say';
map{ say $1 if /QMNAME\((.+?)\)/ } <DATA>;
__DATA__
QMNAME(AAA) STATUS(Running)
QMNAME(BBB) STATUS(Running)
output
AAA
BBB
and one liner (not tested - I am on Windows computer)
dspmq | perl -lne 'print $1 if /QMNAME\((.+?)\)/'

Related

Way to replace one variable with another in a string

I need to replace one variable with another variable in a multiple strings.
For example:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in string1 string2 string3; do
x="$(echo "$str" | sed 's/[a-zA-Z]//g')" # extracting a character between letters
sed 's/$x/$y/'$str # I tried this, but it does not work at all.
echo "$str"
done
Expecting output:
One;two
three;four
five;six
In my output, nothing changes:
One,two
three.four
five:six

You can use bash's substitution operator instead of sed. And simply replace anything that isn't a letter with $y.
#!/bin/bash
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "$string1" "$string2" "$string3"; do
x=${str//[^a-zA-Z]+/$y}
echo "$x"
done
Output is:
One;two
three;four
five;six
Note that your general approach wouldn't work if the input string has muliple delimiters, e.g. One,two,three. When you remove all the letters you get ,,, but that doesn't appear anywhere in the string.

Addressing issues with OP's current code:
referencing variables requires a leading $, preferably a pair of {}, and (usually) double quotes (eg, to insure embedded spaces are considered as part of the variable's value)
sed can take as input a) a stream of text on stdin, b) a file, c) process substitution or d) a here-document/here-string
when building a sed script that includes variable refences the sed script must be wrapped in double quotes (not single quotes)
Pulling all of this into OP's current code we get:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "${string1}" "${string2}" "${string3}"; do # proper references of the 3x "stringX" variables
x="$(echo "$str" | sed 's/[a-zA-Z]//g')"
sed "s/$x/$y/" <<< "${str}" # feeding "str" as here-string to sed; allowing variables "x/y" to be expanded in the sed script
echo "$str"
done
This generates:
One;two # generated by the 2nd sed call
One,two # generated by the echo
;hree.four # generated by the 2nd sed call
three.four # generated by the echo
five;six # generated by the 2nd sed call
five:six # generated by the echo
OK, so we're now getting some output but there are obviously some issues:
the results of the 2nd sed call are being sent to stdout/terminal as opposed to being captured in a variable (presumably the str variable - per the follow-on echo ???)
for string2 we find that x=. which when plugged into the 2nd sed call becomes sed "s/./;/"; from here the . matches the first character it finds which in this case is the 1st t in string2, so the output becomes ;hree.four (and the . is not replaced)
dynamically building sed scripts without knowing what's in x (and y) becomes tricky without some additional coding; instead it's typically easier to use parameter substitution to perform the replacements for us
in this particular case we can replace both sed calls with a single parameter substitution (which also eliminates the expensive overhead of two subprocesses for the $(echo ... | sed ...) call)
Making a few changes to OP's current code we can try:
string1="One,two"
string2="three.four"
string3="five:six"
y=";"
for str in "${string1}" "${string2}" "${string3}"; do
x="${str//[^a-zA-Z]/${y}}" # parameter substitution; replace everything *but* a letter with the contents of variable "y"
echo "${str} => ${x}" # display old and new strings
done
This generates:
One,two => One;two
three.four => three;four
five:six => five;six

How can I truncate a line of text longer than a given length?

How would you go about removing everything after x number of characters? For example, cut everything after 15 characters and add ... to it.
This is an example sentence should turn into This is an exam...

GnuTools head can use chars rather than lines:
head -c 15 <<<'This is an example sentence'
Although consider that head -c only deals with bytes, so this is incompatible with multi-bytes characters like UTF-8 umlaut ü.
Bash built-in string indexing works:
str='This is an example sentence'
echo "${str:0:15}"
Output:
This is an exam
And finally something that works with ksh, dash, zsh…:
printf '%.15s\n' 'This is an example sentence'
Even programmatically:
n=15
printf '%.*s\n' $n 'This is an example sentence'
If you are using Bash, you can directly assign the output of printf to a variable and save a sub-shell call with:
trim_length=15
full_string='This is an example sentence'
printf -v trimmed_string '%.*s' $trim_length "$full_string"

Use sed:
echo 'some long string value' | sed 's/\(.\{15\}\).*/\1.../'
Output:
some long strin...
This solution has the advantage that short strings do not get the ... tail added:
echo 'short string' | sed 's/\(.\{15\}\).*/\1.../'
Output:
short string
So it's one solution for all sized outputs.

Use cut:
echo "This is an example sentence" | cut -c1-15
This is an exam
This includes characters (to handle multi-byte chars) 1-15, c.f. cut(1)
-b, --bytes=LIST
select only these bytes
-c, --characters=LIST
select only these characters

Awk can also accomplish this:
$ echo 'some long string value' | awk '{print substr($0, 1, 15) "..."}'
some long strin...
In awk, $0 is the current line. substr($0, 1, 15) extracts characters 1 through 15 from $0. The trailing "..." appends three dots.

Todd actually has a good answer however I chose to change it up a little to make the function better and remove unnecessary parts :p
trim() {
if (( "${#1}" > "$2" )); then
echo "${1:0:$2}$3"
else
echo "$1"
fi
}
In this version the appended text on longer string are chosen by the third argument, the max length is chosen by the second argument and the text itself is chosen by the first argument.
No need for variables :)

Using Bash Shell Expansions (No External Commands)
If you don't care about shell portability, you can do this entirely within Bash using a number of different shell expansions in the printf builtin. This avoids shelling out to external commands. For example:
trim () {
local str ellipsis_utf8
local -i maxlen
# use explaining variables; avoid magic numbers
str="$*"
maxlen="15"
ellipsis_utf8=$'\u2026'
# only truncate $str when longer than $maxlen
if (( "${#str}" > "$maxlen" )); then
printf "%s%s\n" "${str:0:$maxlen}" "${ellipsis_utf8}"
else
printf "%s\n" "$str"
fi
}
trim "This is an example sentence." # This is an exam…
trim "Short sentence." # Short sentence.
trim "-n Flag-like strings." # Flag-like strin…
trim "With interstitial -E flag." # With interstiti…
You can also loop through an entire file this way. Given a file containing the same sentences above (one per line), you can use the read builtin's default REPLY variable as follows:
while read; do
trim "$REPLY"
done < example.txt
Whether or not this approach is faster or easier to read is debatable, but it's 100% Bash and executes without forks or subshells.

bash Changing every other comma to point

I am working with set of data which is written in Swedish format. comma is used instead of point for decimal numbers in Sweden.
My data set is like this:
1,188,1,250,0,757,0,946,8,960
1,257,1,300,0,802,1,002,9,485
1,328,1,350,0,846,1,058,10,021
1,381,1,400,0,880,1,100,10,418
Which I want to change every other comma to point and have output like this:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418
Any idea of how to do that with simple shell scripting. It is fine If I do it in multiple steps. I mean if I change first the first instance of comma and then the third instance and ...
Thank you very much for your help.

Using sed
sed 's/,\([^,]*\(,\|$\)\)/.\1/g' file
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418

For reference, here is a possible way to achieve the conversion using awk:
awk -F, '{for(i=1;i<=NF;i=i+2) {printf $i "." $(i+1); if(i<NF-2) printf FS }; printf "\n" }' file
The for loop iterates every 2 fields separated by a comma (set by the option -F,) and prints the current element and the next one separated by a dot.
The comma separator represented by FS is printed except at the end of line.

As a Perl one-liner, using split and array manipulation:
perl -F, -e '#a = #b = (); while (#b = splice #F, 0, 2) {
push #a, join ".", #b} print join ",", #a' file
Output:
1.188,1.250,0.757,0.946,8.960
1.257,1.300,0.802,1.002,9.485
1.328,1.350,0.846,1.058,10.021
1.381,1.400,0.880,1.100,10.418

Many sed dialects allow you to specify which instance of a pattern to replace by specifying a numeric option to s///.
sed -e 's/,/./9' -e 's/,/./7' -e 's/,/./5' -e 's/,/./3' -e 's/,/./'
ISTR some sed dialects would allow you to simplify this to
sed 's/,/./1,2'
but this is not supported on my Debian.
Demo: http://ideone.com/6s2lAl

Using Perl to remove n characters from the end of multiple lines

I want to remove n characters from each line using PERL.
For example, I have the following input:
catbathatxx (length 11; 11%3=2 characters) (Remove 2 characters from this line)
mansunsonx (length 10; 10%3=1 character) (Remove 1 character from this line)
#!/usr/bin/perl -w
open FH, "input.txt";
#array=<FH>;
foreach $tmp(#array)
{
$b=length($tmp)%3;
my $c=substr($tmp, 0, length($tmp)-$b);
print "$c\n";
}
I want to output the final string (after the characters have been removed).
However, this program is not giving the correct result. Can you please guide me on what the mistake is?
Thanks a lot. Please let me know if there are any doubts/clarifications.

I am assuming trailing whitespace is not significant.
#!/usr/bin/env perl
use strict; use warnings;
use constant MULTIPLE_OF => 3;
while (my $line = <DATA>) {
$line =~ s/\s+\z//;
next unless my $length = length $line;
my $chars_to_remove = $length % MULTIPLE_OF;
$line =~ s/.{$chars_to_remove}\z//;
print $line, "\n";
}
__DATA__
catbathatxx
mansunsonx
0123456789
012345678

The \K regex sequence makes this a lot clearer; it was introduced in Perl v5.10.0.
The code looks like this
use 5.10.0;
use warnings;
for (qw/ catbathatxx mansunsonx /) {
(my $s = $_) =~ s/^ (?:...)* \K .* //x;
say $s;
}
output
catbathat
mansunson

In general you would want to post the result you are getting. That being said...
Each line in the file has a \n (or \r\n on windows) on the end of it that you're not accounting for. You need to chomp() the line.
Edit to add: My perl is getting rusty from non-use but if memory serves me correct you can actually chomp() the entire array after reading the file: chomp(#array)

You should use chomp() on your array, like this:
#array=<FH>;
chomp(#array);

perl -plwe 'chomp; $c = length($_) % 3; chop while $c--' < /tmp/zock.txt
Look up the options in perlrun. Note that line endings are characters, too. Get them out of the way using chomp; re-add them on output using the -l option. Use chop to efficiently remove characters from the end of a string.

Reading your code, you are trying to print just the first 'nx3' characters for the largest value of n for each line.
The following code does this using a simple regular expression.
For each line, it first removes the line ending, then greedy matches
as many .{3} as it can (. matches any character, {3} asks for exactly 3 of them).
The memory requirement of this approach (compared with using an array the size of your file) is fixed. Not too important if your file is small compared with your free memory, but sometimes files are gigabytes, and sometimes memory is very small.
It's always worth using variable names that reflect the purpose of the variable, rather than things like $a or #array. In this case I used only one variable, which I called $line.
It's also good practice to close files as soon as you have finished with them.
#!/usr/bin/perl
use strict;
use warnings; # This will apply warnings even if you use command perl to run it
open FH, '<', 'input.txt'; # Use three part file open - single quote where no interpolation required.
for my $line (<FH>){
chomp($line);
$line =~ s/((.{3})*).*/$1\n/;
print $line;
}
close FH;

How do I count the number of occurrences of a string in an entire file?

Is there an inbuilt command to do this or has anyone had any luck with a script that does it?
I am looking to count the number of times a certain string (not word) appears in a file. This can include multiple occurrences per line so the count should count every occurrence not just count 1 for lines that have the string 2 or more times.
For example, with this sample file:
blah(*)wasp( *)jkdjs(*)kdfks(l*)ffks(dl
flksj(*)gjkd(*
)jfhk(*)fj (*) ks)(*gfjk(*)
If I am looking to count the occurrences of the string (*) I would expect the count to be 6, i.e. 2 from the first line, 1 from the second line and 3 from the third line. Note how the one across lines 2-3 does not count because there is a LF character separating them.
Update: great responses so far! Can I ask that the script handle the conversion of (*) to \(*\), etc? That way I could just pass any desired string as an input parameter without worrying about what conversion needs to be done to it so it appears in the correct format.

You can use basic tools such as grep and wc:
grep -o '(\*)' input.txt | wc -l

Using perl's "Eskimo kiss" operator with the -n switch to print a total at the end. Use \Q...\E to ignore any meta characters.
perl -lnwe '$a+=()=/\Q(*)/g; }{ print $a;' file.txt
Script:
use strict;
use warnings;
my $count;
my $text = shift;
while (<>) {
$count += () = /\Q$text/g;
}
print "$count\n";
Usage:
perl script.pl "(*)" file.txt

This loops over the lines of the file, and on each line finds all occurrences of the string "(*)". Each time that string is found, $c is incremented. When there are no more lines to loop over, the value of $c is printed.
perl -ne'$c++ while /\(\*\)/g;END{print"$c\n"}' filename.txt
Update: Regarding your comment asking that this be converted into a solution that accepts a regex as an argument, you might do it like this:
perl -ne'BEGIN{$re=shift;}$c++ while /\Q$re/g;END{print"$c\n"}' 'regex' filename.txt
That ought to do the trick. If I felt inclined to skim through perlrun again I might see a more elegant solution, but this should work.
You could also eliminate the explicit inner while loop in favor of an implicit one by providing list context to the regexp:
perl -ne'BEGIN{$re=shift}$c+=()=/\Q$re/g;END{print"$c\n"}' 'regex' filename.txt

You can use basic grep command:
Example: If you want to find the no of occurrence of "hello" word in a file
grep -c "hello" filename
If you want to find the no of occurrence of a pattern then
grep -c -P "Your Pattern"
Pattern example : hell.w, \d+ etc

I have used below command to find particular string count in a file
grep search_String fileName|wc -l

text="(\*)"
grep -o $text file | wc -l
You can make it into a script which accepts arguments like this:
script count:
#!/bin/bash
text="$1"
file="$2"
grep -o "$text" "$file" | wc -l
Usage:
./count "(\*)" file_path

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Perl Unable to truncate string - string

split isn't going to do what you need. I would just use a regular expression to match the sub-string you need So change the loop from this foreach my $i (#qmgrlist) { my #temp1 = split /QMNAME\(/, $i; print #temp1; } to this foreach my $i (#qmgrlist) { print "$1\n" if /QMNAME\((.+?)\)/; }

Related

Way to replace one variable with another in a string

How can I truncate a line of text longer than a given length?

bash Changing every other comma to point

Using Perl to remove n characters from the end of multiple lines

How do I count the number of occurrences of a string in an entire file?

Categories

Resources