How preserve space separated groups in bash - string

I want to build a string with contains quoted groups of words.
These groups should go to same function argument.
I tried to play with arrays.
Literally constructed arrays works, but I still hope to find
a magic syntax hack for bare string.
# literal array
LA=(a "b c")
function printArgs() { # function should print 2 lines
while [ $# -ne 0 ] ; do print $1 ; shift; done
}
printArgs "${LA[#]}" # works fine
# but how to use string to split only unquoted spaces?
LA="a \"b c\""
printArgs "${LA[#]}" # doesn't work :(
LA=($LA)
printArgs "${LA[#]}" # also doesn't work :(
bash arrays have a problem they are not transferable over conveyor
- (echo/$()).

A dirty approach would be :
#!/bin/bash
LA=(a "b c")
function printArgs()
{ # function should print 2 lines
while [ $# -ne 0 ]
do
echo "${1//_/ }" #Use parameter expansion to globally replace '_' with space
#Do double quote as we don't want to have word splitting
shift
done
}
printArgs "${LA[#]}" # works fine
LA="a b__c" # Use a place holder '_' for space, note the two '_' for two spaces
printArgs $LA #Don't double quote '$LA' here. We wish word splitting to happen. And works fine :-)
Sample Output
a
b c
a
b c
Note that the number of spaces inside grouped entities are preserved
Sidenote
The choice of place-holder is critical here. Hopefully you could find one that won't appear in the actual string.

Related

How do you interpret ${VAR#*:*:*} in Bourne Shell

I am using Bourne Shell. Need to confirm if my understanding of following is correct?
$ echo $SHELL
/bin/bash
$ VAR="NJ:NY:PA" <-- declare an array with semicolon as separator?
$ echo ${VAR#*} <-- show entire array without separator?
NJ:NY:PA
$ echo ${VAR#*:*} <-- show array after first separator?
NY:PA
$ echo ${VAR#*:*:*} <-- show string after two separator
PA
${var#pattern} is a parameter expansion that expands to the value of $var with the shortest possible match for pattern removed from the front of the string.
Thus, ${VAR#*:} removes everything up and including to the first :; ${VAR#*:*:} removes everything up to and including the second :.
The trailing *s on the end of the expansions given in the question don't have any use, and should be avoided: There's no reason whatsoever to use ${var#*:*:*} instead of ${var#*:*:} -- since these match the smallest amount of text possible, and * is allowed to expand to 0 characters, the final * matches and removes nothing.
If what you really want is an array, you might consider using a real array instead.
# read contents of string VAR into an array of states
IFS=: read -r -a states <<<"$VAR"
echo "${states[0]}" # will echo NJ
echo "${states[1]}" # will echo NY
echo "${#states[#]}" # count states; will emit 3
...which also gives you the ability to write:
printf ' - %s\n' "${states[#]}" # put *all* state names into an argument list

Concatenating remaining arguments beyond the first N in bash

I did not have to write any bash script before. Here is what I need to do.
My script will be run with a set of string arguments. Number of stings will be more than 8. I will have to concatenate strings 9 and onward and make a single string from those. Like this...
myscript s1 s2 s3 s4 s5 s6 s7 s8 s9 s10....(total unknown)
in the script, I need to do this...
new string = s9 + s10 + ...
I am trying something like this...(from web search).
array="${#}"
tLen=${#array[#]}
# use for loop to read string beyond 9
for (( i=8; i<${tLen}; i++ ));
do
echo ${array[$i]} --> just to show string beyond 9
done
Not working. It prints out if i=0. Here is my input.
./tastest 1 2 3 4 5 6 7 8 A B C
I am expecting A B C to be printed. Finally I will have to make ABC.
Can anyone help?
It should be a lot simpler than the looping in the question:
shift 8
echo "$*"
Lose arguments 1-8; print all the other arguments as a single string with a single space separating arguments (and spaces within arguments preserved).
Or, if you need it in a variable, then:
nine_onwards="$*"
Or if you can't throw away the first 8 arguments in the main shell process:
nine_onwards="$(shift 8; echo "$*")"
You can check that there are at least 9 arguments, of course, complaining if there aren't. Or you can accept an empty string instead — with no error.
And if the arguments must be concatenated with no space (as in the amendment to the question), then you have to juggle with $IFS:
nine_onwards="$(shift 8; IFS=""; echo "$*")"
If I'm interpreting the comments from below this answer correctly, then you want to save the first 8 arguments in 8 separate simple (non-array) variables, and then arguments 9 onwards in another simple variable with no spaces between the argument values.
That's trivially doable:
var1="$1"
var2="$2"
var3="$3"
var4="$4"
var5="$5"
var6="$6"
var7="$7"
var8="$8"
var9="$(shift 8; IFS=""; echo "$*")"
The names don't have to be as closely related as those are. You could use:
teflon="$1"
absinthe="$2"
astronomy="$3"
lobster="$4"
darkest_peru="$5"
mp="$6"
culinary="$7"
dogma="$8"
concatenation="$(shift 8; IFS=""; echo "$*")"
You don't have to do them in that order, either; any sequence (permutation) will do nicely.
Note, too, that in the question, you have:
array="${#}"
Despite the name, that creates a simple variable containing the arguments. To create an array, you must use parentheses like this, where the spaces are optional:
array=( "$#" )
# Create a 0-index-based copy of the array of input arguments.
# (You could, however, work with the 1-based pseudo array $# directly.)
array=( "${#}" )
# Print a concatenation of all input arguments starting with the 9th
# (starting at 0-based index 8), which are passed *individually* to
# `printf`, due to use of `#` to reference the array [slice]
# `%s` as the `printf` format then joins the elements with no separator
# (and no trailing \n).
printf '%s' "${array[#]:8}"
# Alternative: Print the elements separated with a space:
# Note that using `*` instead of `#` causes the array [slice] to be expanded
# to a *single* string using the first char. in `$IFS` as the separator,
# which is a space by default; here you could add a trailing \n by using
# '%s\n' as the `printf` format string.
printf '%s' "${array[*]:8}"
Note that array="${#}" does not create an array - it simply creates a string scalar comprising the concatenation of the input array's elements (invariably) separated by a space each; to create an array, you must enclose it in (...).
To create a space-separated single string from the arguments starting with the 9th enclosed in double quotes, as you request in your follow-up question, use the following:
printf -v var10 '"%s"' "${array[*]:8}"
With the last sample call from your question $var10 will then contain literal "A B C", including the double quotes.
As for assigning arguments 1 through 8 to individual variables.:
Jonathan Leffler's helpful answer shows how to save the first 8 arguments in individual variables.
Here's an algorithmic alternative that creates individual variables based on a given name prefix and sequence number:
n=8 # how many arguments to assign to individual variables
# Create n 'var<i>' variables capturing the first n arguments.
i=0 # variable sequence number
for val in "${array[#]:0:n}"; do
declare "var$((++i))=$val" # create $var<i>, starting with index 1
done
# Print the variables created and their values, using variable indirection.
printf "\nvar<i> variables:\n"
for varName in "${!var#}"; do
printf '%s\n' "$varName=${!varName}"
done
You are close - something like this would work:
array=( ${*} )
# use for loop to read string beyond 9
for (( i=8; i<${#array[*]}; i++ ));
do
echo -n ${array[$i]}
done

Perl: Count number of times a word appears in text and print out surrounding words

I want to do two things:
1) count the number of times a given word appears in a text file
2) print out the context of that word
This is the code I am currently using:
my $word_delimiter = qr{
[^[:alnum:][:space:]]*
(?: [[:space:]]+ | -- | , | \. | \t | ^ )
[^[:alnum:]]*
}x;
my $word = "hello";
my $count = 0;
#
# here, a file's contents are loaded into $lines, code not shown
#
$lines =~ s/\R/ /g; # replace all line breaks with blanks (cannot just erase them, because this might connect words that should not be connected)
$lines =~ s/\s+/ /g; # replace all multiple whitespaces (incl. blanks, tabs, newlines) with single blanks
$lines = " ".$lines." "; # add a blank at beginning and end to ensure that first and last word can be found by regex pattern below
while ($lines =~ m/$word_delimiter$word$word_delimiter/g ) {
++$count;
# here, I would like to print the word with some context around it (i.e. a few words before and after it)
}
Three problems:
1) Is my $word_delimiter pattern catching all reasonable characters I can expect to separate words? Of course, I would not want to separate hyphenated words, etc. [Note: I am using UTF-8 throughout but only English and German text; and I understand what reasonably separates a word might be a matter of judgment]
2) When the file to be analzed contains text like "goodbye hello hello goodbye", the counter is incremented only once, because the regex only matches the first occurence of " hello ". After all, the second time it could find "hello", it is not preceeded by another whitespace. Any ideas on how to catch the second occurence, too? Should I maybe somehow reset pos()?
3) How to (reasonably efficiently) print out a few words before and after any matched word?
Thanks!
1. Is my $word_delimiter pattern catching all reasonable characters I can expect to separate words?
Word characters are denoted by the character class \w. It also matches digits and characters from non-roman scripts.
\W represents the negated sense (non-word characters).
\b represents a word boundary and has zero-length.
Using these already available character classes should suffice.
2. Any ideas on how to catch the second occurence, too?
Use zero-length word boundaries.
while ( $lines =~ /\b$word\b/g ) {
++$count;
}

string alignment in perl / match alignment

I have two strings $dna1 and $dna2. Print the two strings as concatenated, and then print the second string lined up over its copy at the end of the concatenated strings. For example, if the input
strings are AAAA and TTTT, print:
AAAATTTT
TTTT
this is a self exercise question .. not a homework ,
i tried using index
#!/usr/bin/perl -w
$a ='AAAAAAAAAATTTTTTTTT';
$b ='TTTTTTTTTT';
print $a,"\n";
print ''x index($a,$b),$b,"\n";
but it is not working as needed .help please
Start by checking what index($a,$b) is returning... Perhaps you should pick a $b that's actually in $a!
Then realise that concatenating 10 instances of an empty string is an empty string, not 10 spaces.
This is a fun little exercise. I did this:
perl -lwe'$a="AAAA"; $b="TTTT"; $c = $a.$b; $i = index($c,$b) + length($b);
print $c; printf "%${i}s\n", $b;'
AAAAAAATTTT
TTTT
Note that generally speaking, using the variable names $a through $c is a bad idea, and only acceptable here because it is a one-liner. $a and $b are also reserved variable names used with sort.

How to grep/split a word in middle of %% or $$

I have a variable from which I have to grep the which in middle of %% adn the word which starts with $$. I used split it works... but for only some scenarios.
Example:
#!/usr/bin/perl
my $lastline ="%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)";
my #lastline_temp = split(/%/,$lastline);
print #lastline_temp;
my #var=split("\\$\\$",$lastline_temp[2]);
print #var;
I get the o/p as expected. But can i get the same using Grep command. I mean I dont want to use the array[2] or array[1]. So that I can replace the values easily.
I don't really see how you can get the output you expect. Because you put your data in "busy" quotes (interpolating, double, ...), it comes out being stored as:
'%Filters_LN_RESS_DIR%ARCOptionsPegaCHF_Vega$01212_GV_DATE_LDN)'
See Quote and Quote-like Operators and perhaps read Interpolation in Perl
Notice that the backslashes are gone. A backslash in interpolating quotes simply means "treat the next character as literal", so you get literal 'A', literal 'O', literal 'P', ....
That '0' is the value of $( (aka $REAL_GROUP_ID) which you unwittingly asked it to interpolate. So there is no sequence '$$' to split on.
Can you get the same using a grep command? It depends on what "the same" is. You save the results in arrays, the purpose of grep is to exclude things from the arrays. You will neither have the arrays, nor the output of the arrays if you use a non-trivial grep: grep {; 1 } #data.
Actually you can get the exact same result with this regular expression, assuming that the single string in #vars is the "result".
m/%([^%]*)$/
Of course, that's no more than
substr( $lastline, rindex( $lastline, '%' ) + 1 );
which can run 8-10 times faster.
First, be very careful in your use of quotes, I'm not sure if you don't mean
'%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)'
instead of
"%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)"
which might be a different string. For example, if evaluated, "$$" means the variable $PROCESS_ID.
After trying to solve riddles (not sure about that), and quoting your string
my $lastline =
'%Filters_LN_RESS_DIR%\ARC\Options\Pega\CHF_Vega\$$(1212_GV_DATE_LDN)'
differently, I'd use:
my ($w1, $w2) = $lastline =~ m{ % # the % char at the start
([^%]+) # CAPTURE everything until next %
[^(]+ # scan to the first brace
\( # hit the brace
([^)]+) # CAPTURE everything up to closing brace
}x;
print "$w1\n$w2";
to extract your words. Result:
Filters_LN_RESS_DIR
1212_GV_DATE_LDN
But what do you mean by replace the values easily. Which values?
Addendum
Now lets extract the "words" delimited by '\'. Using a simple split:
my #words = split /\\/, # use substr to start split after the first '\\'
substr $lastline, index($lastline,'\\');
you'll get the words between the backslashes if you drop the last entry (which is the $$(..) string):
pop #words; # remove the last element '$$(..)'
print join "\n", #words; # print the other elements
Result:
ARC
Options
Pega
CHF_Vega
Does this work better with grep? Seems to:
my #words = grep /^[^\$%]+$/, split /\\/, $lastline;
and
print join "\n", #words;
also results in:
ARC
Options
Pega
CHF_Vega
Maybe that is what you are after? What do you want to do with these?
Regards
rbo

Resources