Log Parsing via Powershell - print all array elements after nth element - string

I'm parsing a log file that is space delimited for the first 7 elements and then a log message or sentence follows. I know just enough to get around in PS, and I'm learning more each day, so I'm not sure this is the best way to do this and apologies if I'm not leveraging a more efficient means that would be second nature to you. I'm using -split(' ')[n] to extract each field of the log file line by line. I'm able to extract the first parts fine as they are space-delimited, but I'm not sure how to get the rest of the elements up to the end of the line.
$logFile=Get-Content $logFilePath
$dateStamp=$logfile -split(' ')[0]
$timeStamp=$logfile -split(' ')[1]
$requestID=$logfile -split(' ')[3]
$binaryID=$logfile -split(' ')[4]
$logID=$logfile -split(' ')[5]
$action=$logfile -split(' ')[6]
$logMessage=$logfile -split(' ')[?]
This is not a CSV that I can import. I'm more familiar with string manipulation in bash so I am able to successfully replace spaces in the first 7 elements, and the end, with "," :
#!/bin/bash
inputFile="/cygdrive/c/Temp/logfile.log"
outputFile="/cygdrive/c/Temp/test_log.csv"
echo "\"DATE\",\"TIME\",\"HYPEN\",\"REQUESTID\",\"BINARY\",\"PROC_NUMBER\",\"MESSAGE\"" > $outputFile
while read -a line
do
arrLength=$(echo ${#line[#]})
echo \"${line[0]}\",\"${line[1]}\",\"${line[2]}\",\"${line[3]}\",\"${line[4]}\",\"${line[5]}\",\"${line[#]:6:$arrLength}\"
done < $inputFile >> $outputFile
Can you help either printing the array elements from position n to the end, or replacing the spaces appropriately in PS so I have a CSV that I can import? Just trying to avoid the two-step process of converting it in bash, then importing it in PS but I'm still researching. I did find this post Parsing Text file and placing contents into an Array Powershell
for importing the file assuming it's space-delimited and that works for the first 7 elements but not sure about everything after that.
Of course I welcome any other PS solutions such as one of those [something]::SOMETHING things I've seen by googling that might do all this much more seamlessly.

You can specify the maximum number of substrings in which the string is split like this:
$splittedRow = $logfile.split(' ',8)
$dateStamp=$splittedRow[0]
$timeStamp=$splittedRow[1]
$requestID=$splittedRow[3]
$binaryID=$splittedRow[4]
$logID=$splittedRow[5]
$action=$spltttedRow[6]
$logMessage=$splittedRow[7]

As an addition to Viktor Be's answer:
$data = "111 22222 333 4444444 5 6 77 888888 9999999 0" #this is the content of file below for testing purposes
#$data = get-content -path C:\temp\mytest.txt
foreach ($line in $data){
$splitted = $line.split(' ',8)
$line_output= ""
for ($i = 0;$i -lt 7;$i++){
$line_output += "$($splitted[$i]);"
}
$line_output += $splitted[7]
$line_output | out-file "C:\temp\MyCsvThatPowershellCanRead.csv" -append
}

You should be able to iterate over each line in the logfile and get the information you need the way you are doing. However, it's easy to grab the message field, which could include n number of spaces in the log message with a regular expression.
The following regex should work for you. Assuming $line is the current line you are on:
$line -match '(?<=(\S+\s+){6}).*'
$logMessage = $matches[0]
The way this expression works is that it looks for .* (which means any character 0 or more times) that comes after 6 occurences of non-whitespace characters followed by whitespace characters. The .* in this expression should match on your log message.

Related

How to parse string in powershell

I have a Powershell command that outputs multiple lines.
I want to output only one line that contains the name of a .zip file.
Currently, all lines are returned when substring .zip is found:
$p.Start() | Out-Null
$p.WaitForExit()
$output = $p.StandardOutput.ReadToEnd()
$output += $p.StandardError.ReadToEnd()
foreach($line in $output)
{
if($line.Contains(".zip"))
{
$line
}
}
Since you're using .ReadToEnd(), $output receives a single, multi-line string, not an array of lines.
You must therefore split that string into individual lines yourself, using the -split operator.
You can then apply a string-comparison operator such as -match or -like directly to the array of lines to extract matching lines:
# Sample multi-line string.
$output = #'
line 1
foo.zip
another line
'#
$output -split '\r?\n' -match '\.zip' # -> 'foo.zip'
-split is regex-based, and regex \r?\n matches newlines (line breaks) of either variety (CRLF, as typical on Windows, as well as LF, as typical on Unix-like platforms).
-match is also regex-based, which is why the . in \.zip is \-escaped, given that . is a regex metacharacter (it matches any character other than LF by default).
Note that -match, like PowerShell in general, is case-insensitive by default, so both foo.zip and foo.ZIP would match, for instance;
if you do want case-sensitivity, use -cmatch.
As an aside:
I wonder why you're running your command via a [System.Diagnostics.Process] instance, given that you seem to be invoking synchronously while capturing its standard streams.
PowerShell allows you to do that much more simply by direct invocation, optionally with redirection:
$output = ... 2>&1

PowerShell split not working using words (read from file)

I am trying to extract the sentences that appear between a particular pattern of word, from a file. The intention is to extract the sentences that appear between the first pair of 'GO' words from the file. The logic implemented here is to split the file based on the word 'GO', and then print the second element of the array(the sentences starting with SET in this example). However, PowerShell is not recognizing the separator (GO); instead it seems to be recognizing 'new line' as the separator, and is printing the second sentence.
Please note that I need to read the file and then get the extraction done.
Content of the file
Home address "TJ One way"
Office address "C company Two way"
GO
SET ANSI_NULLS, ANSI_PADDING, ANSI_WARNINGS, ARITHABORT, CONCAT_NULL_YIELDS_NULL, QUOTED_IDENTIFIER ON;
SET NUMERIC_ROUNDABORT OFF;
GO
Home address "TJ One way"
Office address "C company Two way"
GO
:on error exit
GO
My code
$path = 'D:\Scripts'
$deltaFile = 'GoSampleFile.txt'
$modifiedDelta = 'GoSampleFile1.txt'
New-Item -path $path -Name $modifiedDelta -ItemType file -Force
#Split for each appearing GO, after escaping the double quotes
(Get-Content $path'\'$deltaFile).replace('"', '`"') | Set-Content $path'\'$modifiedDelta
$separator = 'GO'
$modifiedDeltaString = Get-Content $path'\'$modifiedDelta
#Write-Host $modifiedDeltaString
#Write-Host $separator
$goArray = $modifiedDeltaString -split "GO", 0, "SimpleMatch"
Write-Output $goArray[1]
#Housekeeping of the temporary file
Remove-Item $path'\'$modifiedDelta
Use Get-Content -Raw ... to read the contents as one string instead of an array of strings for each line
Might as well be a new answer as there's another problem and I'll provide more detail.
As DAX has said you need to use -Raw as Get-Content returns an array of strings, one for each line. When you use -split on it each element is treated separately.
Eg when used on the following array
[0] "Testing"
[1] "This is a test"
[2] "'tis still a test"
$array -split "is", 0, "SimpleMatch"
[0] "Testing"
[1] "Th"
[2] " "
[3] " a test"
[4] "'t"
[5] " still a test"
When you use the -Raw switch, Get-Content returns the entire file as a single string with newline characters.
The other thing I'll point out is you're escaping the quotes, but this isn't necessary. The reason you need to escape quotes is so PowerShell doesn't assume you're terminating the string:
$t = "This is a "bad" test"
> At line:1 char:18
+ $t = "This is a "bad" test"
+ ~~~~~~~~~~
Unexpected token 'bad" test"' in expression or statement.
You need to escape the quotes so that "bad" is still part of the string.
However when you are reading from a file the quotes are already part of the string:
Get-Content C:\test.txt
> This is a "bad" test
Because you are not typing the quotes into the console, they do not need to be escaped. To show you with your own code, check the full content of your temp file:
Home address `"TJ One way`"
Office address `"C company Two way`"
I can't think of any reason you would need to be doing this. Perhaps if you wanted to copy and paste into a console for some reason but that's it.
This may appear to work for now but only because the SQL query I assume you are trying to run doesn't contain quotes, and while I'm not sure if they are used in SQL it would throw an error if you tried, and regardless it's an extra step you don't need to be doing so you can basically scrap the whole temp file and read straight from the original.

Parse a file under linux

I'm trying to compute some news article popularity based on twitter data. However, while retrieving the tweets I forgot to escape the characters ending up with an unusable file.
Here is a line from the file:
1369283975$,$337427565662830592$,$0$,$username$,$Average U.S. 401(k) balance tops $80$,$000$,$ up 75 pct since 2009 http://t.co/etHHMUFpoo #news$,$http://www.reuters.com/article/2013/05/23/funds-fidelity-401k-idUSL2N0E31ZC20130523?feedType=RSS&feedName=marketsNews
The '$,$' pattern occurs not only as a field delimiter but also in the tweet, from where I want to remove it.
A correct line would be:
1369283975$,$337427565662830592$,$0$,$username$,$Average U.S. 401(k) balance tops $80000 up 75 pct since 2009 http://t.co/etHHMUFpoo #news$,$http://www.reuters.com/article/2013/05/23/funds-fidelity-401k-idUSL2N0E31ZC20130523?feedType=RSS&feedName=marketsNews
I tried to use cut and sed but I'm not getting the results I want. What would be a good strategy to solve this?
If we can assume that there are never extra separators in the time, id, retweets, username, and link fields, then you could take the middle part and remove all $,$ from it, for example like this:
perl -ne 'chomp; #a=split(/\$,\$/); $_ = join("", #a[4..($#a-1)]); print join("\$,\$", #a[0..3], $_, $a[$#a]), "\n"' < data.txt
What this does:
splits the line using $,$ as delimiter
takes the middle part = fields[4] .. fields[N-1]
joins again by $,$ the first 4 fields, the fixed middle part, and the last field (the link)
This works with your example, but I don't know what other corner cases you might have.
A good way to validate the result is to count the number of occurrences of $,$ is 6 on all lines. You can do that by piping the result to this:
... | perl -ne 'print scalar split(/\$,\$/), "\n"' | sort -u
(should output a single line, with "6")

trying to count the charcters in a line in perl and failed

At the beginning I simply used the following to count the length of each line:
while(<FH>){
chomp;
$length=length($_);
}
but when I compared the result I got with the one produced by linux command WC, I found a problem:
all tab characters in my file are treated as of 1 character length in perl, whereas it is 8 for wc, so I did the following modification:
while(<FH>){
chomp;
my $length=length($_);
my $tabCount= tr/\t/\t/;
my $lineLength=$wc-$tabCount+($tabCount*8);
}
for the above code it works for all most all the cases now, except for one, in wc not all tabs are counted, but only the one that has not be taken with some characters, for example, if at the start of a line, I type in1234and then press a tab, in wc it is not counted as a tab, but the above code counted that, are there any ways I could adopt to solve this issue? Thanks
Solved it, used tab expansion, here is the code:
1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
$length=length($string);
if anyone could give it an explanation, that would be awesome, I tested it to be working, but don't quite understand it. Anyways, thanks for all the help
I don't think tabs are your problem, wc doesn't count a tab as eight characters. I think your problem is that you're stripping EOLs but wc counts them. Also, you're not accumulating the lengths, you were just tracking the length of the last line. This:
while(<FH>){
chomp;
$length=length($_);
}
Should be more like this:
my $length = 0;
while(<FH>) {
$length += length($_);
}
# $length now has the total number of characters
Solved it, used tab expansion, here is the code:
1 while $string =~ s/\t+/' ' x (length($&) * 8 - length($`) % 8)/e;
$length=length($string);
if anyone could give it an explanation, that would be awesome, I tested it to be working, but don't quite understand it. Anyways, thanks for all the help
How about just calling wc from within perl?
$result = `wc -l /path/to/file`

Extract a substring using PowerShell

How can I extract a substring using PowerShell?
I have this string ...
"-----start-------Hello World------end-------"
I have to extract ...
Hello World
What is the best way to do that?
The -match operator tests a regex, combine it with the magic variable $matches to get your result
PS C:\> $x = "----start----Hello World----end----"
PS C:\> $x -match "----start----(?<content>.*)----end----"
True
PS C:\> $matches['content']
Hello World
Whenever in doubt about regex-y things, check out this site: http://www.regular-expressions.info
The Substring method provides us a way to extract a particular string from the original string based on a starting position and length. If only one argument is provided, it is taken to be the starting position, and the remainder of the string is outputted.
PS > "test_string".Substring(0,4)
Test
PS > "test_string".Substring(4)
_stringPS >
But this is easier...
$s = 'Hello World is in here Hello World!'
$p = 'Hello World'
$s -match $p
And finally, to recurse through a directory selecting only the .txt files and searching for occurrence of "Hello World":
dir -rec -filter *.txt | Select-String 'Hello World'
Not sure if this is efficient or not, but strings in PowerShell can be referred to using array index syntax, in a similar fashion to Python.
It's not completely intuitive because of the fact the first letter is referred to by index = 0, but it does:
Allow a second index number that is longer than the string, without generating an error
Extract substrings in reverse
Extract substrings from the end of the string
Here are some examples:
PS > 'Hello World'[0..2]
Yields the result (index values included for clarity - not generated in output):
H [0]
e [1]
l [2]
Which can be made more useful by passing -join '':
PS > 'Hello World'[0..2] -join ''
Hel
There are some interesting effects you can obtain by using different indices:
Forwards
Use a first index value that is less than the second and the substring will be extracted in the forwards direction as you would expect. This time the second index value is far in excess of the string length but there is no error:
PS > 'Hello World'[3..300] -join ''
lo World
Unlike:
PS > 'Hello World'.Substring(3,300)
Exception calling "Substring" with "2" argument(s): "Index and length must refer to a location within
the string.
Backwards
If you supply a second index value that is lower than the first, the string is returned in reverse:
PS > 'Hello World'[4..0] -join ''
olleH
From End
If you use negative numbers you can refer to a position from the end of the string. To extract 'World', the last 5 letters, we use:
PS > 'Hello World'[-5..-1] -join ''
World
PS> $a = "-----start-------Hello World------end-------"
PS> $a.substring(17, 11)
or
PS> $a.Substring($a.IndexOf('H'), 11)
$a.Substring(argument1, argument2) --> Here argument1 = Starting position of the desired alphabet and argument2 = Length of the substring you want as output.
Here 17 is the index of the alphabet 'H' and since we want to Print till Hello World, we provide 11 as the second argument
Building on Matt's answer, here's one that searches across newlines and is easy to modify for your own use
$String="----start----`nHello World`n----end----"
$SearchStart="----start----`n" #Will not be included in results
$SearchEnd="`n----end----" #Will not be included in results
$String -match "(?s)$SearchStart(?<content>.*)$SearchEnd"
$result=$matches['content']
$result
--
NOTE: if you want to run this against a file keep in mind Get-Content returns an array not a single string. You can work around this by doing the following:
$String=[string]::join("`n", (Get-Content $Filename))
other solution
$template="-----start-------{Value:This is a test 123}------end-------"
$text="-----start-------Hello World------end-------"
$text | ConvertFrom-String -TemplateContent $template
Since the string is not complex, no need to add RegEx strings. A simple match will do the trick
$line = "----start----Hello World----end----"
$line -match "Hello World"
$matches[0]
Hello World
$result = $matches[0]
$result
Hello World
I needed to extract a few lines in a log file and this post was helpful in solving my issue, so i thought of adding it here. If someone needs to extract muliple lines, you can use the script to get the index of the a word matching that string (i'm searching for "Root") and extract content in all lines.
$File_content = Get-Content "Path of the text file"
$result = #()
foreach ($val in $File_content){
$Index_No = $val.IndexOf("Root")
$result += $val.substring($Index_No)
}
$result | Select-Object -Unique
Cheers..!

Resources