Specifying certain characters for powershell script to run on - text

This may be a complete shot in the dark, I have done some research and can't seem to find anything on this. But anythings possible with powershell I guess!
I asked a question earlier here! on how to change certain characters in a script.
$infopath = Get-ChilItem "C:\Users\X\Desktop\Info\*.txt" -Recurse
$infopath | %{
(gc $_) -replace "bs", "\" -replace "fs", "/" -replace "co", ":" -replace ".name", "" | Set-Content $_.fullname
However there are some parts of the text file, that may contain bs, fs or co, that I don't want changing. Therefore what I would like to do, is add some sort of parameter into this existing script that only changes the text in the last 70 characters, or after the 4th # or after the 78th character.
Like I said, this could well be a ridiculous idea, but I would like to see other peoples views.
Any help greatly appreciated!

Using your suggestion of getting the substring after the 78th character, I've created the following using 78 as the starting point. This leaves the first 78 characters unedited, and replaces the given strings after the 78th character. The whole file is replaced with the output.
$infopath = Get-ChilItem "C:\Users\X\Desktop\Info\*.txt" -Recurse
$startpos = 78
$infopath | %{
$content = (gc $_);
$content.Substring(0,$startpos)+($content.Substring($startpos,$content.Length-$startpos) -replace "bs", "\" -replace "fs", "/" -replace "co", ":" -replace ".name", "")|Set-Content $_.fullname
}
I hope this helps.
Thanks, Chris.

Related

Manipulate strings in a txt file with Powershell - Search for words in each line containing "." and save them in new txt file

I have a text file with different entries. I want to manipulate it to filter out always the word, containing a dot (using Powershell)
$file = "C:\Users\test_folder\test.txt"
Get-Content $file
Output:
Compass Zype.Compass 1.1.0 thisisaword
Pomodoro Logger zxch3n.PomodoroLogger 0.6.3 thisisaword
......
......
......
Bla Word Program.Name 1.1.1 this is another entry
As you can see, in all lines, the "second" "word" contains a dot, like "Program.Name".
I want to create a new file, which contains just those words, each line one word.
So my file should look something like:
Zype.Compass
zxch3n.PomodoroLogger
Program.Name
What I have tried so far:
Clear-Host
$folder = "C:\Users\test_folder"
$file = "C:\Users\test_folder\test.txt"
$content_txtfile = Get-Content $file
foreach ($line in $content_textfile)
{
if ($line -like "*.*"){
$line | Out-File "$folder\test_filtered.txt"
}
}
But my output is not what I want.
I hope you get what my problem is.
Thanks in advance! :)
Here is a solution using Select-String to find sub strings by RegEx pattern:
(Select-String -Path $file -Pattern '\w+\.\w+').Matches.Value |
Set-Content "$folder\test_filtered.txt"
You can find an explanation and the ability to experiment with the RegEx pattern at RegEx101.
Note that while the RegEx101 demo also shows matches for the version numbers, Select-String gives you only the first match per line (unless argument -AllMatches is passed).
This looks like fixed-width fields, and if so you can reduce it to this:
Get-Content $file | # Read the file
%{ $_.Substring(29,36).Trim()} | # Extract the column
?{ $_.Contains(".") } | # Filter for values with "."
Set-Content "$folder\test_filtered.txt" # Write result
Get-content is slow and -like is sometimes slower than -match. I prefer -match but some prefer -like.
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
foreach ($line in [System.IO.File]::ReadLines($filename)) {
if ($line -match "\.") {
$line | out-file $output -append
}
}
Otherwise for a shorter option, maybe
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
Get-content "c:\path\to\file.txt" | where {$_ -match "\.") | Out-file $output
For other match options that are for the first column, either name the column (not what you do here) or use a different search criteria
\. Means a period anywhere seein the whole line
If it's all periods and at the beginning you can use begining of line so..
"^\." Which means first character is a period.
If it's always a period before the tab maybe do an anything except tab period anything except tab or...
"^[^\t]*\.[^\t]*" this means at the start of the line anything except tab any quantity then a period then anything except a tab any number of times.

Powershell: Replace string in File1 based on string in File2

I am being forced to use Powershell because of my work. I have used it to do a couple of things but one of my codes is now trash because I have to update a string in a file to include a year that is in a second file. Here is what I'm working with:
File1: Contains a few strings but in there is 48 strings that say:
Jenga_Sequence-XXXX.consensus_Bob_0.6_quality_20
The main point of the string is Sequence-XXXX, sorry for the random place holders.
File2: is a table that has the strings:
John/USA/Sequence-XXXX/Year
I need to replace the strings in File1 with the corresponding Strings in File2.
Sample Text of File1:
Jenga_Sequence-0001.consensus_Bob_0.6_quality_20
AAAAAAAAAAAAAAAAAAAAAAAAA
Jenga_Sequence-0002.consensus_Bob_0.6_quality_20
aaaaaaaaaaaaaaaaaaaaaaaaa
Jenga_Sequence-0003.consensus_Bob_0.6_quality_20
bbbbbbbbbbbbbbbbbbbbbbbbb
Jenga_Sequence-0004.consensus_Bob_0.6_quality_20
BBBBBBBBBBBBBBBBBBBBBBBBB
Jenga_Sequence-0005.consensus_Bob_0.6_quality_20
QQQQQQQQQQQQQQQQQQQQQ
Sample Table of File2:
|Sequence_ID|Date|
|---------------------------|----------|
|John/USA/Sequence-0003/2020|10/11/2020|
|John/USA/Sequence-0001/2021|1/5/2021|
|John/USA/Sequence-0005/2021|1/10/2021|
|John/USA/Sequence-0004/2020|12/23/2020|
|John/USA/Sequence-0002/2021|1/6/2021|
So, I need a Powershell code that replaces
Jenga_Sequence-0001.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0001/2021,
Jenga_Sequence-0002.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0002/2021,
Jenga_Sequence-0003.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0003/2020, and so on. There are typically 48 of these in a file.
My previous code simple replaced "Jenga_" with "John/USA/" and ".consensus_Bob_0.6_quality_20" with "/2020" but now that we are seeing "/2021" the static code will not work.
I am still open to replacing pieces of the string and having a code that sets the year replacement to the correct year.
That was the angle I was doing a broad search on but I could never find anything specific enough to help.
Any help will be appreciated!
EDIT: Here is the part of my previous code that dealt with the finding and replacing, even though I feel it needs to be trashed:
$filePath = 'Jenga_Combined.txt'
$tempFilePath = "$env:TEMP\$($filePath | Split-Path -Leaf)"
$find = 'Jenga_'
$replace = 'John/USA/'
$find2 = '.consensus_Bob_0.6_quality_20'
$replace2 = '/2020'
(Get-Content -Path $filePath) -replace $find, $replace -replace $find2, $replace2 | Add-Content -Path $tempFilePath
Remove-Item -Path $filePath
Move-Item -Path $tempFilePath -Destination $filePath
EDIT2: The "Real Data" from file2. File2 is a Tab Delimited .txt file which makes it not "look great" when copy and pasting. Hopefully this helps. File1 is exactly like above (although the AAAAA stuff is roughly 30,000 letters long)
Sequence_ID date
John/USA/Sequence-0003/2020 2020-10-11
John/USA/Sequence-0001/2021 2021-01-05
John/USA/Sequence-0005/2021 2021-01-10
John/USA/Sequence-0004/2020 2020-12-23
John/USA/Sequence-0002/2021 2021-01-06
Dan
The common factor here is the Sequence_ID number in both files.
You can do this like:
$csvData = Import-Csv -Path 'D:\Test\File2.txt' -Delimiter "`t"
$result = switch -Regex -File 'D:\Test\Jenga_Combined.txt' {
'^Jenga_Sequence-(\d+).*' {
$replace = $csvData | Where-Object { $_.Sequence_ID -like "*Sequence-$($matches[1])*" }
if (!$replace) { Write-Warning "No corresponding Sequence_ID $($matches[1]) found!"; $_ }
else { $replace.Sequence_ID }
}
default { $_ }
}
# output on screen
$result
# output to new file
$result | Set-Content -Path 'D:\Test\Jenga_Combined_NEW.txt' -Force
Output on screen:
John/USA/Sequence-0001/2021
AAAAAAAAAAAAAAAAAAAAAAAAA
John/USA/Sequence-0002/2021
aaaaaaaaaaaaaaaaaaaaaaaaa
John/USA/Sequence-0003/2020
bbbbbbbbbbbbbbbbbbbbbbbbb
John/USA/Sequence-0004/2020
BBBBBBBBBBBBBBBBBBBBBBBBB
John/USA/Sequence-0005/2021
QQQQQQQQQQQQQQQQQQQQQ
Of course, you need to change the file paths to match your environment

Excluding lines, which are not containing one or multiple strings from text file

I have multiple server log files. In total they contain around 500.000 lines of log text. I only want to keep the lines that contain "Downloaded" and "Log". Lines I want to exclude are focussing on error logs and basic system operations like "client startup", "client restart" and so on.
An example of the lines we are looking for is this one:
[22:29:05]: Downloaded 39 /SYSTEM/SAP logs from System-4, customer (000;838) from 21:28:51,705 to 21:29:04,671
The lines that are to be kept should be complemented by the date string, which is part of the log-file name. ($date)
Further, as the received logs are rather unstructured, the filtered files should be transformed into one csv-file (columns: timestamp, log downloads, system directory, system type, customer, start time, end time, date [to be added to every line from file name]. The replace operation of turning spaced into comma is just a first try to bring in some structure to the data. This file is supposed to be loaded into a python dashboard program.
At the moment it takes 2,5 mins to preprocess 3 Txt-Files, while the target is 5-10 seconds maximum, if even possible.
Thank you really much for your support, as I'm struggeling with this since Monday last week. Maybe powershell is not the best way to go? I'm open for any help!
At the moment I'm running this powershell script:
$files = Get-ChildItem "C:\Users\AnonUser\RestLogs\*" -Include *.log
New-Item C:\Users\AnonUser\RestLogs\CleanedLogs.txt -ItemType file
foreach ($f in $files){
$date = $f.BaseName.Substring(22,8)
(Get-Content $f) | Where-Object { ($_ -match 'Downloaded' -and $_ -match 'SAP')} | ForEach-Object {$_ -replace " ", ","}{$_+ ','+ $date} | Add-Content CleanedLogs.txt
}
This is about the fastest I could manage. I didn't test using -split vs -replace or special .NET methods:
$files = Get-ChildItem "C:\Users\AnonUser\RestLogs\*" -Include *.log
New-Item C:\Users\AnonUser\RestLogs\CleanedLogs.txt -ItemType file
foreach ($f in $files) {
$date = $f.BaseName.Substring(22,8)
(((Get-Content $f) -match "Downloaded.*?SAP") -replace " ",",") -replace "$","$date" | add-content CleanedLogs.txt
}
In general, speed is gained by removing loops and Where-Object "filtering."

powershell - remove string containing line breaks and spaces

I have a script running in powershell (v2), that removes strings from a file.
The basic process is:
(Get-Content $Local_Dir1\$filename1) -replace 'longString', 'shortString' | `
Set-Content $cfg_Local_Dir\$filename1
Get-Content $Local_Dir1\$filename1 | `
Where-Object {$_ -notmatch 'stringToMatch'} | `
Where-Object {$_ -notmatch 'secondStringToMatch'} | `
Set-Content $Local_Dir1\$filename
This works fine. However, I have an annoying string that I can't get rid of.
It basically consists of: a line break and carriage return, 4 spaces, and then a line break and carriage return. In HEX it is 0D 0A 20 20 20 20 0D 0A
How can I remove this?
I tried simply:
Where-Object {$_ -notmatch ' '} #4 x spaces
But that removed all content after that line (and this is on the second line).
I looked at:
Where-Object {$_ -notmatch '$([char]0x0D)'}
(I would have expanded it if it had removed all the Carriage Returns) which I saw in another post somewhere, but that did nothing.
What is the correct way of dealing with this problem?
Additional: 2015-11-24 13:49
Example Data:
<?xml version="1.0" encoding="UTF-8"?>
<start_of_data>
<job>123456</job>
<name>ABC123</name>
<start></start>
</start_of_data>
<start_of_data>
<job>789012</job>
<name>DEF345</name>
<start></start>
</start_of_data>
Initially there is a string on line 2 which is removed by 'stringToMatch', and the spaces are on line3.
Couple of things worth pointing out here. When you use -match/-notmatch you are using regex. We can consolidate your strings and space issue into one string.
Get-Content $Local_Dir1\$filename1 |
Where-Object {$_ -notmatch 'stringToMatch|secondStringToMatch|\s{4,}'} |
Set-Content $Local_Dir1\$filename
That works using alternation to match either element separated by pipes. This is by no means perfect as we don't have sample data to work with but if you have lines with either of those two string or at least 4 consecutive spaces they will be omitted.
From talking in the comments and looking at the example file you are just trying to omit lines that are blank. Using another string class or regex could fix that. These lines function differently but would both ignore lines that are just white-space.
![string]::IsNullOrWhiteSpace($_)
-notmatch ^\s+$
I will op'd for the former as it is more intuitive.
Where-Object {![string]::IsNullOrWhiteSpace($_) -and $_ -notmatch 'stringToMatch|secondStringToMatch'}
Like I said in comments if you are picky on this requirement that you could filter out lines with exactly 4 white-space characters with -notmatch ^\s{4}$
Also like sodawillow says you should have used double quotes to allow variable expansion. Since you are using regex \r would have worked just as well.
Where-Object {$_ -notmatch "$([char]0x0D)"}
However I don't think you would have seen that character anyway in order to exclude it. Get-Content would scrub that out to make a string array. That might depend on encoding.
Try .Net String class:
Where-Object {-not[string]::IsNullOrEmpty(([string]$_).trim())}
Trim will remove spaces and IsNullOrEmpty will check the rest.

Replacing a set of strings containing any character

Is there a way to replace a set of string no matter what the string contains?
I am trying to replace one string containing: quotes(""), brackets([]), #, e.
gci C:\test *.txt -recurse | ForEach {(Get-Content $_ | ForEach {$_ -replace '"my"', "money"}) | Set-Content $_ }
but what if a string I want to replace has EVERYTHING in <>:
PowerPlayReport Product_version="10.2.6100.36" xmlns="http://www.cognos.com/powerplay/report[1234#1]" Author="PPWIN" Version="4.0"
So you want to replace everything in [] in the sample text you included in your question. If you were not aware ( although I think you are now ) -replace supports regular expressions. A simple regex can find the text you are looking for. I am also going remove some of the redundancy in your code.
Get-ChildItem C:\test -Filter *.txt -Recurse | ForEach-Object{
$file = $_.FullName
(Get-Content $file) -replace "\[.*?]","[bagel]" | Set-Content $file
}
Explanation borrowed from regex101.com
\[ matches the character [ literally
.*? matches any character (except newline). Quantifier: *? Between zero and unlimited times, as few times as possible, expanding as needed [lazy]
] matches the character ] literally
So that line would then appear as the following inside the source file.
PowerPlayReport Product_version="10.2.6100.36" xmlns="http://www.cognos.com/powerplay/report[bagel]" Author="PPWIN" Version="4.0"

Resources