Using powershell, but open to other potential solutions....
I have a long string. I need to replace several sequences of characters by position in that string with a mask character (period or space). I don't know what those characters are going to be, but I know they need to be something else. I have written code using mid and iterating through the string using mid and position numbers, but that is a bit cumbersome and wondering if there is a faster/more elegant method.
Example:
Given the 2 strings:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
12345678901234567890123456
I want to replace characters 2-4, 8-9, 16-22, & 23 with ., yielding:
A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6
I can do that with a series of MID's, but I was just wanting to know if there were some sort of faster masking function to make this happen. I have to do this through millions of rows and second count.
Try this:
$regex = [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'
('ABCDEFGHIJKLMNOPQRSTUVWXYZ',
'12345678901234567890123456') -Replace $regex,$replace
A...EFGH..KLMNOP.....VWX.Z
1...5678..123456.....234.6
The -replace operator is slower than string.replace() for a single operation, but has the advantage of being able to operate on an array of strings, which is faster than the string method plus a foreach loop.
Here's a sample implementation (requires V4):
$regex = [regex]'(.).{3}(.{4}).{2}(.{6}).{5}(.{3}).(.+)'
$replace = '$1...$2..$3.....$4.$5'
filter fix-file {
$_ -replace $regex,$replace |
add-content "c:\mynewfiles\$($file.name)"
}
get-childitem c:\myfiles\*.txt -PipelineVariable file |
get-content -ReadCount 1000 | fix-file
If you want to use the mask method, you can generate $regex and $replace from that:
$mask = '-...----..------.....---.-'
$regex = [regex]($mask -replace '(-+)','($1)').replace('-','.')
$replace =
([char[]]($mask -replace '-+','-') |
foreach {$i=1}{if ($_ -eq '.'){$_} else {'$'+$i++}} {}) -join ''
$regex.ToString()
$replace
(.)...(....)..(......).....(...).(.)
$1...$2..$3.....$4.$5
Here another approach:
C:\PS> $mask ="-...----..------.....---.-"
C:\PS> ([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ' | % {$i=0}{if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''
A...EFGH..KLMNOP.....VWX.Z
And if we are going to take advantage of V4 features :-), try this:
C:\PS> $i=0;([char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ').Foreach({if ($mask[$i++] -eq '-') {$_} else {'.'}}) -join ''
Here yet another approach:
C:\PS> $mask = "{0}...{4}{5}{6}{7}..{10}{11}{12}{13}{14}{15}.....{21}{22}{23}.{25}"
C:\PS> $singlecharstrings = [string[]][char[]]'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C:\PS> $mask -f $singlecharstrings
A...EFGH..KLMNOP.....VWX.Z
Related
I have a text file with different entries. I want to manipulate it to filter out always the word, containing a dot (using Powershell)
$file = "C:\Users\test_folder\test.txt"
Get-Content $file
Output:
Compass Zype.Compass 1.1.0 thisisaword
Pomodoro Logger zxch3n.PomodoroLogger 0.6.3 thisisaword
......
......
......
Bla Word Program.Name 1.1.1 this is another entry
As you can see, in all lines, the "second" "word" contains a dot, like "Program.Name".
I want to create a new file, which contains just those words, each line one word.
So my file should look something like:
Zype.Compass
zxch3n.PomodoroLogger
Program.Name
What I have tried so far:
Clear-Host
$folder = "C:\Users\test_folder"
$file = "C:\Users\test_folder\test.txt"
$content_txtfile = Get-Content $file
foreach ($line in $content_textfile)
{
if ($line -like "*.*"){
$line | Out-File "$folder\test_filtered.txt"
}
}
But my output is not what I want.
I hope you get what my problem is.
Thanks in advance! :)
Here is a solution using Select-String to find sub strings by RegEx pattern:
(Select-String -Path $file -Pattern '\w+\.\w+').Matches.Value |
Set-Content "$folder\test_filtered.txt"
You can find an explanation and the ability to experiment with the RegEx pattern at RegEx101.
Note that while the RegEx101 demo also shows matches for the version numbers, Select-String gives you only the first match per line (unless argument -AllMatches is passed).
This looks like fixed-width fields, and if so you can reduce it to this:
Get-Content $file | # Read the file
%{ $_.Substring(29,36).Trim()} | # Extract the column
?{ $_.Contains(".") } | # Filter for values with "."
Set-Content "$folder\test_filtered.txt" # Write result
Get-content is slow and -like is sometimes slower than -match. I prefer -match but some prefer -like.
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
foreach ($line in [System.IO.File]::ReadLines($filename)) {
if ($line -match "\.") {
$line | out-file $output -append
}
}
Otherwise for a shorter option, maybe
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
Get-content "c:\path\to\file.txt" | where {$_ -match "\.") | Out-file $output
For other match options that are for the first column, either name the column (not what you do here) or use a different search criteria
\. Means a period anywhere seein the whole line
If it's all periods and at the beginning you can use begining of line so..
"^\." Which means first character is a period.
If it's always a period before the tab maybe do an anything except tab period anything except tab or...
"^[^\t]*\.[^\t]*" this means at the start of the line anything except tab any quantity then a period then anything except a tab any number of times.
I want to insert ":" after every second character - the end result should look like this 51:40:2e:c0:11:0b:3e:3c.
My solution
$s = "51402ec0110b3e3c"
for ($i = 2; $i -lt $s.Length; $i+=3)
{
$s.Insert($i,':')
}
Write-Host $s
returns
51:402ec0110b3e3c
51402:ec0110b3e3c
51402ec0:110b3e3c
51402ec0110:b3e3c
51402ec0110b3e:3c
51402ec0110b3e3c
Why is $s being returned multiple times when I only put Write-Host at the very end? Also, it looks like $s keeps returning to its original value after every loop, overwriting the previous loops...
I would've thought that the loop accomplishes the same as this:
$s = $sInsert(2,':').Insert(5,':').Insert(8,':').Insert(11,':').Insert(14,':').Insert(17,':').Insert(20,':')
You can also do this without looping via the -replace operator:
$s = "51402ec0110b3e3c"
$s = $s -replace '..(?!$)','$0:'
A combination of -split and -join works too:
$s = "51402ec0110b3e3c"
$s = $s -split '(..)' -ne '' -join ':'
This is because insert returns a new string. It does not modify inplace. So you have to change
$s.Insert($i,':')
to
$s = $s.Insert($i,':')
an alternate method for converting a hex string to a colon-delimited string is to use the -split operator with a pattern that specifies two consecutive characters. this ...
$s = "51402ec0110b3e3c"
($s -split '(..)').Where({$_}) -join ':'
... will give this = 51:40:2e:c0:11:0b:3e:3c.
the .Where() filters out the blank entries the split leaves behind. [grin]
take care,
lee
I have a 1st text file looks like this : 12AB34.US. The second text file is CD 34 EF.
I want to find my 2nd text file exist or not in the 1st text file.
I tried to cut 3 characters last in the first text file (.US). Then I split to each 2 characters (because the 2nd text file consist of 2 characters). Then, I tried this code, and it always return "Not Found".
$String = Get-Content "C:\Users\te2.txt"
$Data = Get-Content "C:\Users\Fixed.txt"
$Split = $Data -split '(..)'
$Cut = $String.Substring(0,6)
$String_Split = $Cut -split '(..)'
$String_Split
$Check= $String_Split | %{$_ -match $Split}
if ($Check-contains $true) {
Write-Host "0"
} else {
Write-Host "1"
}
There are a number of problems with your current approach.
The 2-char groups don't align:
# strings split into groups of two
'12' 'AB' '34' # first string
'CD' ' 3' '4 ' # second string
When you test multiple strings with -match, you need to
escape the input string to avoid matchings on meta characters (like .), and
place the collection on the left-hand side of the operator, the pattern on the right:
$Compare = $FBString_Split | % {$Data_Split -match [regex]::Escape($_)}
if ($Compare -contains $true) {
Write-Host "Found"
} else {
Write-Host "Not Found"
}
For a more general solution to find out if any substring of N chars of one string is also a substring of another, you could probably do something like this instead:
$a = '12AB34.US'
$b = 'CD 34 EF'
# we want to test all substrings of length 2
$n = 2
$possibleSubstrings = 0..($n - 1) | ForEach-Object {
# grab substrings of length $n at every offset from 0 to $n
$a.Substring($_) -split "($('.'*$n))" | Where-Object Length -eq $n |ForEach-Object {
# escape the substring for later use with `-match`
[regex]::Escape($_)
}
} |Sort-Object -Unique
# We can construct a single regex pattern for all possible substrings:
$pattern = $possibleSubstrings -join '|'
# And finally we test if it matches
if($b -match $pattern){
Write-Host "Found!"
}
else {
Write-Host "Not found!"
}
This approach will give you the correct answer, but it'll become extremely slow on large inputs, at which point you may want to look at non-regex based strategies like Boyer-Moore
Excel guy here that occasionally turns to automating powershell via vba.
I tried to solve https://stackoverflow.com/q/36538022/641067 (now closed) and couldn't get there with my basic powershell knowledge and googlefu alone.
In essence the problem the OP presented is:
There are a list of names in a text file.
Aim is to capture only those names that occurr at least once (so discard unique names, see point (3)).
Names occurring at least once include partial matches, ie Will and William can be considered duplicates and should be retained. Whereas Bill is not a duplicate of William.
I tried various approaches including
Group
Compare-Object see example below
But I was stymied by part (3). I suspect that a loop is required to do this but am curious whether there is a direct Powershellapproach,
Looking forward to hearing from the experts.
what I tried
$a = Get-Content "c:\temp\in.txt"
$b = $a | select -unique
[regex] $a_regex = ‘(?i)(‘ + (($a |foreach {[regex]::escape($_)}) –join “|”) + ‘)’
$c = $b -match $a_regex
Compare-object –referenceobject $c -IncludeEqual $a
Following testscript using a loop would work for the rules you outlined and looks foolproof to me
$t = ('first', 'will', 'william', 'williamlong', 'unique', 'lieve', 'lieven')
$s = $t | sort-object
[String[]]$r = #()
$i = 0;
while ($i -lt $s.Count - 1) {
if ($s[$i+1].StartsWith($s[$i])) {
$r += $s[$i]
$r += $s[$i+1]
}
$i++
}
$r | Sort-Object -Unique
and following testscript using a regex might get you started.
$content = "nomatch`nevenmatch1`nevenmatch12`nunevenmatch1`nunevenmatch12`nunevenmatch123"
$string = (($content.Split("`n") | Sort-Object -Unique) -join "`n")
$regex = [regex] '(?im)^(\w+)(\n\1\w+)+'
$matchdetails = $regex.Match($string)
while ($matchdetails.Success) {
$matchdetails.Value
$matchdetails = $matchdetails.NextMatch()
}
$FilePath = 'Z:\next\ResourcesConfiguration.config'
$oldString = 'Z:\next\Core\Resources\'
$NewString = 'G:\PublishDir\next\Core\Resources\'
Any Idea how can you replace a string having : sign in it. I want to change the path in a config file. Simple code is not working for this. tried following
(Get-Content $original_file) | Foreach-Object {
$_ -replace $oldString, $NewString
} | Set-Content $destination_file
The Replace operator takes a regular expression pattern and '\' is has a special meaning in regex, it's the escape character. You need to double each backslash, or better , use the escape method:
$_ -replace [regex]::escape($oldString), $NewString
Alterntively, you can use the string.replace method which takes a string and doesn't need a special care:
$_.Replace($oldString,$NewString)
Try this,
$oldString = [REGEX]::ESCAPE('Z:\next\Core\Resources\')
You need escape the pattern to search for.
This works:
$Source = 'Z:\Next\ResourceConfiguration.config'
$Dest = 'G:\PublishDir\next\ResourceConfiguration.config'
$RegEx = "Z:\\next\\Core\\Resources"
$Replace = 'G:\PublishDir\next\Core\Resources'
(Get-Content $FilePath) | Foreach-Object { $_ -replace $RegEx,$Replace } | Set-Content $Dest
The reason that your attempt wasn't working is that -replace expects it's first parameter to be a Regular Expression. Simply put, you needed to escape the backslashes in the directory path, which is done by adding an additional backspace (\\). It expects the second parameter to be a string, so no changes need to be done there.