Perl String Modyifing - string

Say I have a file called hello.txt which contains "Hello World!".
If I wanted to make a script which opened the file and read the contents (I know how to do that) and added stuff to the string, how would I go about doing this?
For example: Hello World would have '..' inserted at the start of the string/content, and then every 2 characters later, except at the end. Also consider the contents will not always be "Hello World".

Since you already know how to read from a file, I take it your only real question is how to add .. after every 2 characters of any given string:
my $string = "Hello World";
$string =~ s/^|(..)(?!$)/$1../g;
print "$string\n";
Output:
..He..ll..o ..Wo..rl..d
Though I can't imagine how that would ever be useful.
The regex looks for the start of string or two characters not followed by the end of the string, using negative look-ahead, and replaces all matches with any captured characters followed by two periods.

Related

How to compare and recursively modify strings in Bash

I need to write a bash code performing some tasks I am going to explain.
The input: two uppercase strings of same length, no matter
their length is. Es:
CYVFGDDAS --> string1 , unchangeable reference string
CRFDGVEAT --> string2 , modifiable string
I am trying to write Bash code that is able to compare the characters with same index recursively starting from the first position:
-- beginnig of the cycle --
if the characters are the same skip any action and go to the
the next position,
while
if the characters are not the same the character of string1
replaces the character of string2 at that position
the new string2 is saved in a file
a substituion code is also written in the same file (I will
explain this below)
the old string2 is replaced by the new string2 in such a way
its changes are retained
start anothe cycle from the beginning
------
Repeat the cycle until the last character is processed.
So, for the example above, the code should start checking from the
first position where two C characters are placed. They match so no
action is taken and both strings are left unchanged.
Going to he second position Y should replace R in the second string,
the modified string should be saved and written in a text file togheter with the substitution code YA2V (Y is the replacing character of string1, A is a costant character that must be present in all substitutions codes, 2 is the positional index where the substitution occurred, and V is the replaced character of string2).
I am proficient in Python which has a large number of modules for string manipulation but because the code should be added to a pre-existing Bash program I need to get this done in Bash environment (builtin commands, awk, sed etc, does not matter). Looks to me that Bash does not have an extended arsenal of tools like Python, so I am first of all wondering if this project is feasible or not.
However, what I tried so far is to convert the strings in blank
separated fields by inserting spaces between the characters in such
a way awk can deal better with them as fields but I did not go very
far with this.
Sorry for the lengthy explanation. Any help is greatly appreciated.
No recursion is needed, just iterate over the strings. You can use parameter expansion with a for loop:
#!/bin/bash
s1=CYVFGDDAS
s2=CRFDGVEAT
for ((i=0; i<${#s1} ; ++i)) ; do
if [[ ${s1:i:1} != ${s2:i:1} ]] ; then
printf '%s\n' "${s1:0:i+1}${s2:i+1}"
printf '%s\n' "${s1:i:1}A$((i+1))${s2:i:1}"
fi
done
${s1:i:1} means extract the substring of $s1 from position $i of length 1. If the length is omitted, it extracts as much as it can.
It just outputs the strings, redirect them to files as you need.
CYFDGVEAT
YA2R
CYVDGVEAT
VA3F
CYVFGVEAT
FA4D
CYVFGDEAT
DA6V
CYVFGDDAT
DA7E
CYVFGDDAS
SA9T

inserting a number from stdout into a string from stdout

I'm working on a Linux terminal.
I have a string followed by a number as stdout and I need a command that replaces the middle of the string by the number and writes the result to stdout.
This is the string and number: librarian 16
and this is what the output should be: l16n
I have tried using echo librarian 16|sed s/[a-z]*/16/g and this gives me 9 999 the problems are that it replaces every letter separitaly and that it also replaces the first and last letter and that I can't make it use the number from stdout.
I have also tried using cut -c 1-1 , sed s/[^0-9]*//g and cut-c 9-9 to generate l, 16 and n respectively but I can't find how to combine their outputs into a single line.
Lastly I have tried using text editors to copy the number and paste it into the string but I haven't made much progress since I don't know how to use editors directly from the command line.
So what you want is to capture the first letter, the last letter and the number while ignoring the middle.
In regex we use ( and ) to tell the engine what we want to capture, anything else simply gets matched, or "eaten", but not captured. So the pattern should look like this:
([a-z])[a-z]*([a-z]) ([0-9]+)
([a-z]) to capture the first letter
[a-z]* to match zero or more characters but not capture. We choose "*" here because there might not be anything to match in the middle, like when there are two or less letters.
([a-z]) to capture the last letter.
to "eat" the whitespace.
([0-9]+) to capture the number. We use + instead of * because we require a number at this position.
sed uses a different syntax for some fo these constructs so we'll use the -E flag. You could do without it but you'd have to escape the ()+ characters which IMO makes pattern a little bit confusing.
Now, to retrieve the captured content, we have to use an engine-specific sequence of characters. sed uses \n where n is the number of the capturing group, so our final pattern should look like this:
\1\3\2
\1: First letter
\3: Number
\2: Last letter
Now we put everything together:
$ echo librarian 16|sed -r 's/([a-z])[a-z]*([a-z]) ([0-9]+)/\1\3\2/g'
l16n

Bash split an array, add a variable and concatenate it back together

I've been trying to figure this out, unfortunately I can't. I am trying to create a function that finds the ';' character, puts four spaces before it and then and puts the code back together in a neat sentence. I've been cracking at this for a bit, and can't figure out a couple of things. I can't get the output to display what I want it to. I've tried finding the index of the ';' character and it seems I'm going about it the wrong way. The other mistake that I seem to be making is that I'm trying to split in a array in a for loop, and then split the individual words in the array by letter but I can't figure out how to do that either. If someone can give me a pointer this would be greatly appreciated. This is in bash version 4.3.48
#!commentPlacer()
{
arg=($1) #argument
len=${#arg[#]} #length of the argument
comment=; #character to look for in second loop
commaIndex=(${arg[#]#;}) #the attempted index look up
commentSpace=" ;" #the variable being concatenated into the array
for(( count1=0; count1 <= ${#arg[#]}; count1++ )) #search the argument looking for comment space
do if [[ ${arg[count1]} != commentSpace ]] #if no commentSpace variable then
then for (( count2=0; count2 < ${#arg[count1]} ; count2++ )) #loop through again
do if [[ ${arg[count2]} != comment ]] #if no comment
then A=(${arg[#]:0:commaIndex})
A+=(commentSpace)
A+=(${arg[#]commaIndex:-1}) #concatenate array
echo "$A"
fi
done
fi
done
}
If I understand what you want correctly, it's basically to put 4 spaces in front of each ";" in the argument, and print the result. This is actually simple to do in bash with a string substitution:
commentPlacer() {
echo "${1//;/ ;}"
}
The expansion here has the format ${variable//pattern/replacement}, and it gives the contents of the variable, with each occurrence of pattern replaced by replacement. Note that with only a single / before the pattern, it would replace only the first occurrence.
Now, I'm not sure I understand how your script is supposed to work, but I see several things that clearly aren't doing what you expect them to do. Here's a quick summary of the problems I see:
arg=($1) #argument
This doesn't create an array of characters from the first argument. var=(...) treats the thing in ( ) as a list of words, not characters. Since $1 isn't in double-quotes, it'll be split into words based on whitespace (generally spaces, tabs, and linefeeds), and then any of those words that contain wildcards will be expanded to a list of matching filenames. I'm pretty sure this isn't at all what you want (in fact, it's almost never what you want, so variable references should almost always be double-quoted to prevent it). Creating a character array in bash isn't easy, and in general isn't something you want to do. You can access individual characters in a string variable with ${var:index:1}, where index is the character you want (counting from 0).
commaIndex=(${arg[#]#;}) #the attempted index look up
This doesn't do a lookup. The substitution ${var#pattern} gives the value of var with pattern removed from the front (if it matches). If there are multiple possible matches, it uses the shortest one. The variant ${var##pattern} uses the longest possible match. With ${array[#]#pattern}, it'll try to remove the pattern from each element -- and since it's not in double-quotes, the result of that gets word-split and wildcard-expanded as usual. I'm pretty sure this isn't at all what you want.
if [[ ${arg[count1]} != commentSpace ]] #if no commentSpace variable then
Here (and in a number of other places), you're using a variable without $ in front; this doesn't use the variable at all, it just treats "commentSpace" as a static string. Also, in several places it's important to have double-quotes around it, e.g. to keep the spaces in $commentSpace from vanishing due to word splitting. There are some places where it's safe to leave the double-quotes off, but in general it's too hard to keep track of them, so just use double-quotes everywhere.
General suggestions: don't try to write c (or java or whatever) programs in bash; it works too differently, and you have to think differently. Use shellcheck.net to spot common problems (like non-double-quoted variable references). Finally, you can see what bash is doing by putting set -x before a section that doesn't do what you expect; that'll make bash print each line as it executes it, showing the equivalent of what it's executing.
Make a little function using pattern substitution on stdin:
semicolon4s() { while read x; do echo "${x//;/ ;}"; done; }
semicolon4s <<< 'foo;bar;baz'
Output:
foo ;bar ;baz

bash difference between raw string and string in variable

I wrote a little script in bash, but it only worked when I stored the string as a variable, and I'd like to know why. Here's the summary:
When I use the string itself, bash treats it as a single entity
for word in "this is a sentence"; do
echo $word
done
# => this is a sentence
If I save the exact same string into a variable, bash iterates over the words
sentence="this is a sentence"
for word in $sentence; do
echo $word
done
# => this
# is
# a
# sentence
Why are these being treated differently?
Is there a simple way to iterate through the words in the string without first saving the string as a variable?
The quotes tell bash to treat a thing in quotes as a single parameter in a parameter list at the time the expression is evaluated. The quotes (unless protected with \ or ') are removed.
echo "" # prints newlines, no quotes
echo '""' # Print ""
export X='""'
env | grep X # X contains ""
export X=""
env | grep X # X is empty
When you use a variable, bash unpacks it as is (i.e. as if you typed the variable's contents in the variable's place). For a for-loop bash determines the list-elements to iterate over by separating the for-loop's parameters by whitespace, but treating (as always) quote-protected items a single parameter/list-element. Your variable contained no quotes -- items are treated as separate parameters.
As comments suggested, quotes are important. A for loop will step through a list of values terminated by a semicolon, and that list is a set of strings. Unquoted strings are delimited usually by whitespace. Whitespace inside a quoted string does not separate the string from its brethren, it's simply part of the quoted string. There's some truly excellent documentation about quotes in bash at http://mywiki.wooledge.org/Quotes . Read it. Read it now. You'll find a part that says
The quotes are not actually passed along to the command. They are removed by the shell (this process is cleverly called "quote removal").
To step through the words in a sentence that's stored in a variable (if I've inferred your question correctly), you could perhaps use an array to separate the words by whitespace:
#!/bin/bash
sentence="this is a sentence"
IFS=" " read -a words <<< "$sentence"
for word in "${words[#]}"; do
echo "$word"
done
In bash, read -a will divide a string by $IFS and place the divided parts into elements of the array. See http://mywiki.wooledge.org/BashGuide/Arrays for more information about how bash arrays work.
If you want more details in pursuit of a specific problem, you might want to tell us what the problem is, or risk making this an XY problem.
In the assignment
sentence="this is a sentence"
there are no unquoted spaces, so everything to the right of the = is treated as a single word. (Something like sentence=this is a sentence would be parsed as a single assignment sentence=this followed by an attempt to run a program called is.) As a result, the value of sentences is a sequence of 18 characters. It is identical to
sentence=this\ is\ a\ sentence
because again, there are no unquoted spaces.
For the same reason
for word in "this is a sentence"; do
echo $word
done
has word being set to each word in the following sequence, which only contains a single word because there are no unquoted spaces.
The key difference with your other loop is that parameter expansions are subject to word-splitting after the fact. The loop
for word in $sentence; do
echo $word
done
after parameter expansion looks like
for word in this is a sentence; do
echo $word
done
so now word is set to each of the 4 words in the list following the in keyword.
It's not clear what you are actually asking at the end of your question, but the preceding is legal code. There is no requirement that a string be placed in quotes in bash; quotes do not define something as a string value, but simply escape every character that appears within the quotes. "foo" and \f\o\o are the same thing in shell.
Quoting turns any string into a single unit. If you lose the quotes, everything should be fine.

How do I erase printed characters in a console application(Linux)?

I am creating a small console app that needs a progress bar. Something like...
Conversion: 175/348 Seconds |========== | 50%
My question is, how do you erase characters already printed to the console? When I reach the 51st percentage, I have to erase this line from the console and insert a new line. In my current solution, this is what happens...
Conversion: 175/348 Seconds |========== | 50%
Conversion: 179/348 Seconds |========== | 52%
Conversion: 183/348 Seconds |========== | 54%
Conversion: 187/348 Seconds |=========== | 56%
Code I use is...
print "Conversion: $converted_seconds/$total_time Seconds $progress_bar $converted_percentage%\n";
I am doing this in Linux using PHP(only I will use the app - so please excuse the language choice). So, the solution should work on the Linux platform - but if you have a solution that's cross platform, that would be preferable.
I don't think you need to apologize for the language choice. PHP is a great language for console applications.
Try this out:
<?php
for( $i=0;$i<10;$i++){
print "$i \r";
sleep(1);
}
?>
The "\r" will overwrite the line with the new text. To make a new line you can just use "\n", but I'm guessing you already knew that.
Hope this helps! I know this works in Linux, but I don't know if it works in Windows or other operating systems.
To erase a previously printed character you have three options:
echo chr(8) . " "; echoes the back character, and will move the cursor back one place, and the space then overwrites the character. You can use chr(8) multiple times in a row to move back multiple characters.
echo "\r"; will return the cursor to the start of the current line. You can now replace the line with new text.
The third option is to set the line and column of the cursor position using ANSI escape codes, then print the replacement characters. It might not work with all terminals:
function movecursor($line, $column){
echo "\033[{$line};{$column}H";
}
\r did the trick.
For future reference, \b does not work in PHP in Linux. I was curious - so I did a couple of experiments in other languages as well(I did this in Linux - I don't know if the result will be the same in Windows/Mac)..
\b Works in...
Perl
Ruby
Tcl - with code puts -nonewline "Hello\b"
\b Doesn't work in
PHP - the code print "Hello\b"; prints out Hello\b
Python - code print "Hello\b" prints out Hello<new line> . Same result with print "Hello\b",
I'm not sure if it's the same in Linux but in Windows console apps you can print \r and the cursor will return to the first left position of the line allowing you to overwrite all the characters to the right.
You can use \b to move back a single character but since you're going to be updating your progress bar \r would be simpler to use than printing \b x number of times.
This seems to be pretty old topic but I will drop my 5 into.
for ($i; $i<_POSITION_; $i--) {
echo "\010"; //issue backspace
}
Found this on the internet some time ago, unfortunately don't remember where. So all credits goes to original author.
to erase a previously printed character, I print a backspace after it:
print "a"
print "\b"
will print nothing (actually it will print and then a backspace, but you probably won't notice it)

Resources