Perl string parsing code

Perl string parsing code - string

I was wondering if someone could help me better understand what this given code to parse a text file is doing.
while ($line = <STDIN>) {
#flds = split("\t", $line);
foreach $fld (#flds) {
if ($fld =~ s/^"(.*)"$/\1/) {
$fld =~ s/""/"/g;
}
}
print join("\t", #flds), "\n";
}
We are given this block of code as a start to parse a text file such as.
Name Problem #1 Comments for P1 E.C. Problem Comments Email
Park, John 17 Really bad. 5 park#gmail.edu
Doe, Jane 100 Well done! 0 Why didn't you do this? doe2#gmail.edu
Smith, Bob 0 0 smith9999#gmail.com
...which will be used to set up a formatted output based on the parsed text.
I'm having trouble fully understanding how the block of code is parsing and holding the information so that I can know how to access certain parts of the information I want. Could someone better explain what the above code is doing at each step?

This is actually looks kind of a really crappy way to parse a CSV file.
while ($line = <STDIN>) { #read from STDIN 1 line at a time.
#flds = split("\t", $line); #Split the line into an array using the tab character and assign to #flds
foreach $fld (#flds) { #Loop through each item/column that's in the array #fld and assign the value to $fld
if ($fld =~ s/^"(.*)"$/\1/) { #Does the column have a string that is surrounded in quotes? If it does, replace it with the string only.
$fld =~ s/""/"/g; #Replace any strings that are only two double quotes.
}
}
print join("\t", #flds), "\n"; #Join the string back together using the tab character and print it out. Append a line break at the end.
}

Related

How can I remove all vowels in a sentence except for the vowel at the first letter?

What is wrong specifically with this code? How can I correct it?
$x = "without any vowels after the first letter\n";
foreach $i (#x[1..]) {
if ($i =~ /[AEIOUaeiou]/) {
$x =~ tr/A E I O U a e i o u//d;
}
}
print "$x\n";
I tried [1..] to exclude the first letter. If it does not work, how else can I remove the first letter?
EDIT I edited code to make it syntactically (mostly) correct to convey their obvious original idea, except for the attempt to index into a string which isn't correct in Perl. (Clarifying that is a part of what I consider useful in this question.)

First, most of that is not Perl, or any programming language for that matter. I'd suggest to work through a Perl tutorial of your choice first, before trying to get solutions for specific problems. However, here's an answer since the problem itself is of enough interest in general.
Next, in Perl you can't directly index into a string, so you can't skip the first character(s) like that.
But you can separate that first character in the string and process the rest (removing vowels), of course. One way with regex†
use warnings;
use strict;
use feature 'say';
my $str = shift // 'out with ALL vowels after first';
$str =~ s/.\K(.*)/ $1 =~ tr{[aeiouAEIOU]}{}dr /e;
say $str; #--> ot wth LL vwls ftr frst
This relies on the /e modifier, which makes it so that the replacement side is evaluated as code, and so it runs an independent transliteration (tr) there, processing the captured substring.
Then we need the /r modifier in that embedded tr/regex, to return the new string instead of changing the old one in place -- what wouldn't be possible anyway as one can't change $1.
One can also use a regex insteda of tr, less efficient but with its many conveniences
$str =~ s/.\K(.*)/ $1 =~ s{[aeiou]}{}igr /e;
Now we can use far more sophisticated tools in that regex than in tr; in this case it's only the i flag, for case-insensitive.
If it were more than the one first character to keep change . to .{N}.
† Regex is not compulsory, of course. A more elementary take: split the string into its first character and the rest, then use tr on the rest
use warnings;
use strict;
use feature 'say';
my $str = shift // q(out with ALL vowels after first);
my ($result, $rest) = split //, $str, 2; # first char, rest of string
$result .= $rest =~ tr/aeiouAEIOU//dr; # prune $rest of vowels, append to $result
say $result;
Then put this in a little mini subroutine. To change the original string in place, instead of getting a new ($result) string, use it ($str) everywhere instead of $result.
I am not sure about how it compares efficiency wise but it may well fare well.
For the curiosity's sake, here it is in a single statement
$str = join '', map { length > 1 ? tr/aeiouAEIOU//dr : $_ } split //, $str, 2;
This specifically uses the fact that only the first (one) character need be skipped; that is easily made dynamical, as long as the criterion does involve the length of substrings.
More importantly, this assumes that the rest of the string is longer than 1 character. To drop that assumption change the criterion
use feature 'state';
$str = join '', map {
state $chr_cnt = 0;
++$chr_cnt > 1 ? tr/aeiouAEIOU//dr : $_
}
split //, $str, 2;
This also relies on leaving aside just one character. It uses a feature to keep a lexical value across executions, state.
A more generic solution, which uses the property of substr to be possible to write to
substr($str, 1) =~ tr/aeiouAEIOU//d;
Here it's much cleaner and simpler to relax the limitation to the first character: just change that 1 in order to skip more characters. The tricky -- unexpected -- part here may be that normally builtins can't be written to like that, they aren't lvalue subroutines

The algorithm for solution of the problem is in your question
add letter to a string if it isn't vowel
add letter to the string if it is first vowel in the input string
use strict;
use warnings;
my $x = "without any vowels after the first letter\n";
my($o,$count) = ('',0);
print 'IN: ' . $x;
for ( split('',$x) ) {
$o .= $_ unless $count != 0 and /[aeiou]/i;
$count++ if /[aeiou]/i;
}
print 'OUT: ' . $o;
Output
IN: without any vowels after the first letter
OUT: witht ny vwls ftr th frst lttr
Addendum: OP's clarification of the problem
look at each word in the sentence
if a word starts from vowel then delete all vowels but first one
if a word starts from none vowel then delete all vowels
use strict;
use warnings;
use feature 'say';
my $x = 'I like apples more than oranges';
my #o;
say 'IN: ' . $x;
for ( split(' ', $x) ) {
if ( /^[aeiou]/i ) {
s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e;
} else {
tr|aeiouAEIOU||d;
}
#o = (#o,$_);
}
say 'OUT: ' . join(' ', #o);
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs
Or in perlish style
use strict;
use warnings;
use feature 'say';
my $x = "I like apples more than oranges";
say 'IN: ' . $x;
say 'OUT: ' . join(' ', map { s/.\K(.*)/$1 =~ tr|aeiouAEIOU||dr/e && $_ } split('[ ]+', $x));
Output
IN: I like apples more than oranges
OUT: I lk appls mr thn orngs

Log Parsing via Powershell - print all array elements after nth element

I'm parsing a log file that is space delimited for the first 7 elements and then a log message or sentence follows. I know just enough to get around in PS, and I'm learning more each day, so I'm not sure this is the best way to do this and apologies if I'm not leveraging a more efficient means that would be second nature to you. I'm using -split(' ')[n] to extract each field of the log file line by line. I'm able to extract the first parts fine as they are space-delimited, but I'm not sure how to get the rest of the elements up to the end of the line.
$logFile=Get-Content $logFilePath
$dateStamp=$logfile -split(' ')[0]
$timeStamp=$logfile -split(' ')[1]
$requestID=$logfile -split(' ')[3]
$binaryID=$logfile -split(' ')[4]
$logID=$logfile -split(' ')[5]
$action=$logfile -split(' ')[6]
$logMessage=$logfile -split(' ')[?]
This is not a CSV that I can import. I'm more familiar with string manipulation in bash so I am able to successfully replace spaces in the first 7 elements, and the end, with "," :
#!/bin/bash
inputFile="/cygdrive/c/Temp/logfile.log"
outputFile="/cygdrive/c/Temp/test_log.csv"
echo "\"DATE\",\"TIME\",\"HYPEN\",\"REQUESTID\",\"BINARY\",\"PROC_NUMBER\",\"MESSAGE\"" > $outputFile
while read -a line
do
arrLength=$(echo ${#line[#]})
echo \"${line[0]}\",\"${line[1]}\",\"${line[2]}\",\"${line[3]}\",\"${line[4]}\",\"${line[5]}\",\"${line[#]:6:$arrLength}\"
done < $inputFile >> $outputFile
Can you help either printing the array elements from position n to the end, or replacing the spaces appropriately in PS so I have a CSV that I can import? Just trying to avoid the two-step process of converting it in bash, then importing it in PS but I'm still researching. I did find this post Parsing Text file and placing contents into an Array Powershell
for importing the file assuming it's space-delimited and that works for the first 7 elements but not sure about everything after that.
Of course I welcome any other PS solutions such as one of those [something]::SOMETHING things I've seen by googling that might do all this much more seamlessly.

You can specify the maximum number of substrings in which the string is split like this:
$splittedRow = $logfile.split(' ',8)
$dateStamp=$splittedRow[0]
$timeStamp=$splittedRow[1]
$requestID=$splittedRow[3]
$binaryID=$splittedRow[4]
$logID=$splittedRow[5]
$action=$spltttedRow[6]
$logMessage=$splittedRow[7]

As an addition to Viktor Be's answer:
$data = "111 22222 333 4444444 5 6 77 888888 9999999 0" #this is the content of file below for testing purposes
#$data = get-content -path C:\temp\mytest.txt
foreach ($line in $data){
$splitted = $line.split(' ',8)
$line_output= ""
for ($i = 0;$i -lt 7;$i++){
$line_output += "$($splitted[$i]);"
}
$line_output += $splitted[7]
$line_output | out-file "C:\temp\MyCsvThatPowershellCanRead.csv" -append
}

You should be able to iterate over each line in the logfile and get the information you need the way you are doing. However, it's easy to grab the message field, which could include n number of spaces in the log message with a regular expression.
The following regex should work for you. Assuming $line is the current line you are on:
$line -match '(?<=(\S+\s+){6}).*'
$logMessage = $matches[0]
The way this expression works is that it looks for .* (which means any character 0 or more times) that comes after 6 occurences of non-whitespace characters followed by whitespace characters. The .* in this expression should match on your log message.

search a specific sub string pattern in a string using perl

I'm a newbie to perl, I went through this Check whether a string contains a substring to how to check a substring is present in a string, Now my scenario is little different
I have a string like
/home/me/Desktop/MyWork/systemfile/directory/systemfile64.elf ,
In the end this might be systemfile32.elf or systemfile16.elf,so In my perl script I need to check whether this string contains a a substring in the format systemfile*.elf.
How can I achieve this in perl ?
I'm planing to do like this
if(index($mainstring, _serach_for_pattern_systemfile*.elf_ ) ~= -1) {
say" Found the string";
}

You can use the pattermatching
if ($string =~ /systemfile\d\d\.elf$/){
# DoSomething
}
\d stands for a digit (0-9)
$ stands for end of string

Well
if( $mainstring =~ m'/systemfile(16|32)\.elf$' ) {
say" Found the string";
}
does the job.
For your informations :
$string =~ m' ... '
is the same than
$string =~ / ... /
which checks the string against the given regular expression. This is one of the most useful features of the Perl language.
More info at http://perldoc.perl.org/perlre.html
(I did use the m'' syntax to improve readability, because of the presence of another '/' character in the regexp. I could also write /\/systemfile\d+\.elf$/

if ($string =~ /systemfile.*\.elf/) {
# Do something with the string.
}
That should match only the strings you seek (given that every time, a given string is stored in $string). Inside the curly brackets you should write your logic.
The . stands for "any character" and the * stands for "as many times you see the last character". So, .* means "any character as many times you see it". If you know that the string will end in this pattern, then it will be safer to add $ at the end of the pattern to mark that the string should end with this:
$string =~ /systemfile.*\.elf$/
Just don't forget to chomp $string to avoid any line-breaks that might mess with your desired output.

use strict;
use warnings;
my $string = 'systemfile16.elf';
if ($string =~ /^systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
will match systemfile'anythinghere'.elf if you have a set directory.
if you want to search entire string, including directory then:
my $string = 'c:\\windows\\system\\systemfile16.elf';
if ($string =~ /systemfile.*\.elf$/) {
print "Found string $string";
} else {
print "String not found";
if you only want to match 2 systemfile then 2 numeric characters .elf then use the other methods mentioned above by other answers. but if you want systemanything.elf then use one of these.

perl extract numbers from string, edit, put back into string at their original position

I'm trying to edit the numbers in a string and put it back in the same place as they have been before.
Example:
$string = "struct:{thin:[[75518103,75518217],[75518338,75518363],[75532810,75533910],],thick:[[75518363,75518424],[75521257,75521463],],}";
I need to edit the numbers, but want to keep the rest of the string at it is. Additionally the number of brackets can vary.
Until now I split the string at "," with
#array = split (',',$string);
and extracted the numbers for editing with
foreach (#array) {
$_ =~ s/\D//g;
$_ = $number - $_;
}
now I want to put the numbers back in their original place in the string, but I don't know how.
Somehow I hope there is a better way to edit the numbers in the string without splitting it and extracting the numbers. Hope you can help me

You could use a regular expression substitution with the /e flag, search for long numbers and run Perl code in the substitution part.
use strict;
use warnings;
use feature 'say';
my $number = 100_000_000;
my $string = "struct:{thin:[[75518103,75518217],[75518338,75518363],[75532810,75533910],],thick:[[75518363,75518424],[75521257,75521463],],}";
$string =~ s/(\d+)/{$number - $1}/eg;
say $string;
__END__
struct:{thin:[[24481897,24481783],[24481662,24481637],[24467190,24466090],],thick:[[24481637,24481576],[24478743,24478537],],}
If there are no other numbers in the string, that would work. In case there is more logic involved, you can also move it into a subroutine and just call that in the substitution.
sub replace {
return $_ % 2 ? $_ * 2 : $_ / 4;
}
$string =~ s/(\d+)/{replace($1)}/eg;
You might also need to revise the search pattern to be a bit more precise.

I just found the evaluation modifier for regex! I now did it with
$string =~ s/([0-9]+)/$number-$1/eg;
and it worked!

Extract a substring using PowerShell

How can I extract a substring using PowerShell?
I have this string ...
"-----start-------Hello World------end-------"
I have to extract ...
Hello World
What is the best way to do that?

The -match operator tests a regex, combine it with the magic variable $matches to get your result
PS C:\> $x = "----start----Hello World----end----"
PS C:\> $x -match "----start----(?<content>.*)----end----"
True
PS C:\> $matches['content']
Hello World
Whenever in doubt about regex-y things, check out this site: http://www.regular-expressions.info

The Substring method provides us a way to extract a particular string from the original string based on a starting position and length. If only one argument is provided, it is taken to be the starting position, and the remainder of the string is outputted.
PS > "test_string".Substring(0,4)
Test
PS > "test_string".Substring(4)
_stringPS >
But this is easier...
$s = 'Hello World is in here Hello World!'
$p = 'Hello World'
$s -match $p
And finally, to recurse through a directory selecting only the .txt files and searching for occurrence of "Hello World":
dir -rec -filter *.txt | Select-String 'Hello World'

Not sure if this is efficient or not, but strings in PowerShell can be referred to using array index syntax, in a similar fashion to Python.
It's not completely intuitive because of the fact the first letter is referred to by index = 0, but it does:
Allow a second index number that is longer than the string, without generating an error
Extract substrings in reverse
Extract substrings from the end of the string
Here are some examples:
PS > 'Hello World'[0..2]
Yields the result (index values included for clarity - not generated in output):
H [0]
e [1]
l [2]
Which can be made more useful by passing -join '':
PS > 'Hello World'[0..2] -join ''
Hel
There are some interesting effects you can obtain by using different indices:
Forwards
Use a first index value that is less than the second and the substring will be extracted in the forwards direction as you would expect. This time the second index value is far in excess of the string length but there is no error:
PS > 'Hello World'[3..300] -join ''
lo World
Unlike:
PS > 'Hello World'.Substring(3,300)
Exception calling "Substring" with "2" argument(s): "Index and length must refer to a location within
the string.
Backwards
If you supply a second index value that is lower than the first, the string is returned in reverse:
PS > 'Hello World'[4..0] -join ''
olleH
From End
If you use negative numbers you can refer to a position from the end of the string. To extract 'World', the last 5 letters, we use:
PS > 'Hello World'[-5..-1] -join ''
World

PS> $a = "-----start-------Hello World------end-------"
PS> $a.substring(17, 11)
or
PS> $a.Substring($a.IndexOf('H'), 11)
$a.Substring(argument1, argument2) --> Here argument1 = Starting position of the desired alphabet and argument2 = Length of the substring you want as output.
Here 17 is the index of the alphabet 'H' and since we want to Print till Hello World, we provide 11 as the second argument

Building on Matt's answer, here's one that searches across newlines and is easy to modify for your own use
$String="----start----`nHello World`n----end----"
$SearchStart="----start----`n" #Will not be included in results
$SearchEnd="`n----end----" #Will not be included in results
$String -match "(?s)$SearchStart(?<content>.*)$SearchEnd"
$result=$matches['content']
$result
--
NOTE: if you want to run this against a file keep in mind Get-Content returns an array not a single string. You can work around this by doing the following:
$String=[string]::join("`n", (Get-Content $Filename))

other solution
$template="-----start-------{Value:This is a test 123}------end-------"
$text="-----start-------Hello World------end-------"
$text | ConvertFrom-String -TemplateContent $template

Since the string is not complex, no need to add RegEx strings. A simple match will do the trick
$line = "----start----Hello World----end----"
$line -match "Hello World"
$matches[0]
Hello World
$result = $matches[0]
$result
Hello World

I needed to extract a few lines in a log file and this post was helpful in solving my issue, so i thought of adding it here. If someone needs to extract muliple lines, you can use the script to get the index of the a word matching that string (i'm searching for "Root") and extract content in all lines.
$File_content = Get-Content "Path of the text file"
$result = #()
foreach ($val in $File_content){
$Index_No = $val.IndexOf("Root")
$result += $val.substring($Index_No)
}
$result | Select-Object -Unique
Cheers..!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Perl string parsing code - string

Related

How can I remove all vowels in a sentence except for the vowel at the first letter?

Log Parsing via Powershell - print all array elements after nth element

search a specific sub string pattern in a string using perl

perl extract numbers from string, edit, put back into string at their original position

Extract a substring using PowerShell

Categories

Resources