PowerShell script to look for particular word in the file and add “4” at the beginning of the line - linux

I am the best. Page
I am good. Page
I am funny. Page
Output:
4 I am the best. Page
4 I am good. Page
4 I am funny. Page
PowerShell script needs to look for “page” and add “4” at the beginning of the line. I have created this script:
powershell -Command “sed ‘s/^Page/4 &/‘c:\users*.txt >test.txt”
but it didn't work in PowerShell.

This ought to do it:
$content = Get-Content "C:\path\to\my\file.txt"
$newcontent = $null
Foreach($line in $content)
{
if($line -ne "")
{
$line = "4 "+"$line`r`n"
$newcontent += $line
}
else
{
$newcontent += "`r`n"
}
}
Set-Content -Path "C:\path\to\my\file.txt" -Value $newcontent

powershell -command "foreach($ln in cat 'c:\users*.txt'){if($ln -match 'page'){write-host '4'$ln}}"
or
powershell -command "foreach($ln in cat 'c:\users*.txt'){if($ln -match 'page'){write-host '4'$ln}else{echo $ln}}"
depending on whether you only want to output lines with "page" in them.
Note also that powershell does not have a built in alias for sed and your /^Page/ would have only matched "page" at the beginning of a line in anycase.

sed is a Unix commandline tool that isn't commonly installed on Windows (although there are Windows ports of it).
The PowerShell way of doing what you're asking is
(Get-Content 'c:\users\*.txt') -replace '.*page','4$&' | Set-Content 'test.txt'
or (using aliases and redirection for reduced typing):
(cat 'c:\users\*.txt') -replace '.*page','4$&' > 'test.txt'
If you want to update each file separately (note: that is NOT what your Unix code snippet does) you'd do something like this:
Get-ChildItem 'C:\users\*.txt' | ForEach-Object {
(Get-Content $_.FullName) -replace '.*page','4$&' | Set-Content $_.FullName
}
or (again using aliases):
ls 'c:\users\*.txt' | %{(cat $_.FullName) -replace '.*page','4$&' | sc $_.FullName}
Note that you cannot use redirection in this case, because the redirection operator would open the file for writing before cat could read it, which would effectively truncate the file.

Related

Manipulate strings in a txt file with Powershell - Search for words in each line containing "." and save them in new txt file

I have a text file with different entries. I want to manipulate it to filter out always the word, containing a dot (using Powershell)
$file = "C:\Users\test_folder\test.txt"
Get-Content $file
Output:
Compass Zype.Compass 1.1.0 thisisaword
Pomodoro Logger zxch3n.PomodoroLogger 0.6.3 thisisaword
......
......
......
Bla Word Program.Name 1.1.1 this is another entry
As you can see, in all lines, the "second" "word" contains a dot, like "Program.Name".
I want to create a new file, which contains just those words, each line one word.
So my file should look something like:
Zype.Compass
zxch3n.PomodoroLogger
Program.Name
What I have tried so far:
Clear-Host
$folder = "C:\Users\test_folder"
$file = "C:\Users\test_folder\test.txt"
$content_txtfile = Get-Content $file
foreach ($line in $content_textfile)
{
if ($line -like "*.*"){
$line | Out-File "$folder\test_filtered.txt"
}
}
But my output is not what I want.
I hope you get what my problem is.
Thanks in advance! :)
Here is a solution using Select-String to find sub strings by RegEx pattern:
(Select-String -Path $file -Pattern '\w+\.\w+').Matches.Value |
Set-Content "$folder\test_filtered.txt"
You can find an explanation and the ability to experiment with the RegEx pattern at RegEx101.
Note that while the RegEx101 demo also shows matches for the version numbers, Select-String gives you only the first match per line (unless argument -AllMatches is passed).
This looks like fixed-width fields, and if so you can reduce it to this:
Get-Content $file | # Read the file
%{ $_.Substring(29,36).Trim()} | # Extract the column
?{ $_.Contains(".") } | # Filter for values with "."
Set-Content "$folder\test_filtered.txt" # Write result
Get-content is slow and -like is sometimes slower than -match. I prefer -match but some prefer -like.
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
foreach ($line in [System.IO.File]::ReadLines($filename)) {
if ($line -match "\.") {
$line | out-file $output -append
}
}
Otherwise for a shorter option, maybe
$filename = "c:\path\to\file.txt"
$output = "c:\path\to\output.txt"
Get-content "c:\path\to\file.txt" | where {$_ -match "\.") | Out-file $output
For other match options that are for the first column, either name the column (not what you do here) or use a different search criteria
\. Means a period anywhere seein the whole line
If it's all periods and at the beginning you can use begining of line so..
"^\." Which means first character is a period.
If it's always a period before the tab maybe do an anything except tab period anything except tab or...
"^[^\t]*\.[^\t]*" this means at the start of the line anything except tab any quantity then a period then anything except a tab any number of times.

Powershell: Replace string in File1 based on string in File2

I am being forced to use Powershell because of my work. I have used it to do a couple of things but one of my codes is now trash because I have to update a string in a file to include a year that is in a second file. Here is what I'm working with:
File1: Contains a few strings but in there is 48 strings that say:
Jenga_Sequence-XXXX.consensus_Bob_0.6_quality_20
The main point of the string is Sequence-XXXX, sorry for the random place holders.
File2: is a table that has the strings:
John/USA/Sequence-XXXX/Year
I need to replace the strings in File1 with the corresponding Strings in File2.
Sample Text of File1:
Jenga_Sequence-0001.consensus_Bob_0.6_quality_20
AAAAAAAAAAAAAAAAAAAAAAAAA
Jenga_Sequence-0002.consensus_Bob_0.6_quality_20
aaaaaaaaaaaaaaaaaaaaaaaaa
Jenga_Sequence-0003.consensus_Bob_0.6_quality_20
bbbbbbbbbbbbbbbbbbbbbbbbb
Jenga_Sequence-0004.consensus_Bob_0.6_quality_20
BBBBBBBBBBBBBBBBBBBBBBBBB
Jenga_Sequence-0005.consensus_Bob_0.6_quality_20
QQQQQQQQQQQQQQQQQQQQQ
Sample Table of File2:
|Sequence_ID|Date|
|---------------------------|----------|
|John/USA/Sequence-0003/2020|10/11/2020|
|John/USA/Sequence-0001/2021|1/5/2021|
|John/USA/Sequence-0005/2021|1/10/2021|
|John/USA/Sequence-0004/2020|12/23/2020|
|John/USA/Sequence-0002/2021|1/6/2021|
So, I need a Powershell code that replaces
Jenga_Sequence-0001.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0001/2021,
Jenga_Sequence-0002.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0002/2021,
Jenga_Sequence-0003.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0003/2020, and so on. There are typically 48 of these in a file.
My previous code simple replaced "Jenga_" with "John/USA/" and ".consensus_Bob_0.6_quality_20" with "/2020" but now that we are seeing "/2021" the static code will not work.
I am still open to replacing pieces of the string and having a code that sets the year replacement to the correct year.
That was the angle I was doing a broad search on but I could never find anything specific enough to help.
Any help will be appreciated!
EDIT: Here is the part of my previous code that dealt with the finding and replacing, even though I feel it needs to be trashed:
$filePath = 'Jenga_Combined.txt'
$tempFilePath = "$env:TEMP\$($filePath | Split-Path -Leaf)"
$find = 'Jenga_'
$replace = 'John/USA/'
$find2 = '.consensus_Bob_0.6_quality_20'
$replace2 = '/2020'
(Get-Content -Path $filePath) -replace $find, $replace -replace $find2, $replace2 | Add-Content -Path $tempFilePath
Remove-Item -Path $filePath
Move-Item -Path $tempFilePath -Destination $filePath
EDIT2: The "Real Data" from file2. File2 is a Tab Delimited .txt file which makes it not "look great" when copy and pasting. Hopefully this helps. File1 is exactly like above (although the AAAAA stuff is roughly 30,000 letters long)
Sequence_ID date
John/USA/Sequence-0003/2020 2020-10-11
John/USA/Sequence-0001/2021 2021-01-05
John/USA/Sequence-0005/2021 2021-01-10
John/USA/Sequence-0004/2020 2020-12-23
John/USA/Sequence-0002/2021 2021-01-06
Dan
The common factor here is the Sequence_ID number in both files.
You can do this like:
$csvData = Import-Csv -Path 'D:\Test\File2.txt' -Delimiter "`t"
$result = switch -Regex -File 'D:\Test\Jenga_Combined.txt' {
'^Jenga_Sequence-(\d+).*' {
$replace = $csvData | Where-Object { $_.Sequence_ID -like "*Sequence-$($matches[1])*" }
if (!$replace) { Write-Warning "No corresponding Sequence_ID $($matches[1]) found!"; $_ }
else { $replace.Sequence_ID }
}
default { $_ }
}
# output on screen
$result
# output to new file
$result | Set-Content -Path 'D:\Test\Jenga_Combined_NEW.txt' -Force
Output on screen:
John/USA/Sequence-0001/2021
AAAAAAAAAAAAAAAAAAAAAAAAA
John/USA/Sequence-0002/2021
aaaaaaaaaaaaaaaaaaaaaaaaa
John/USA/Sequence-0003/2020
bbbbbbbbbbbbbbbbbbbbbbbbb
John/USA/Sequence-0004/2020
BBBBBBBBBBBBBBBBBBBBBBBBB
John/USA/Sequence-0005/2021
QQQQQQQQQQQQQQQQQQQQQ
Of course, you need to change the file paths to match your environment

Powershell - Optimizing a very, very large csv and text file search and replace

I have a directory with ~ 3000 text files in it, and I'm doing periodic search and replaces on those text files as I transition a program to a new server.
Each text file may have an average of ~3000 lines, and I need to search the files for maybe 300 - 1000 terms at a time.
I'm replacing the server prefix which is related to the string I'm searching for. So for every one of the csv entries, I'm looking for Search_String, \\Old_Server\"Search_String" and making sure that after the program completes, the result is "\\New_Server\Search_String".
I cobbled together a powershell program, and it works. But it's so slow I've never seen it complete.
Any suggestions for making it faster?
EDIT 1:
I changed get-content as suggested, but it still took 3 minutes to search two files (~8000 lines) for 9 separate search terms. I must still be screwing up; a notepad++ search and replace would still be way faster if done manually 9 times.
I'm not sure how to get rid of the first (Get-Content) because I want to make a copy of the file for backup before I make any changes to it.
EDIT 2:
So this is an order of magnitude faster; it's searching a file in maybe 10 seconds. But now it doesn't write changes to files, and it only searches the first file in the directory! I didn't change that code, so I don't know why it broke.
EDIT 3:
Success! I adapted a solution posted below to make it much, much faster. It's searching each file in a couple of seconds now. I may reverse the loop order, so that it loads the file into the array and then searches and replaces each entry in the CSV rather than the other way around. I'll post that if I get it to work.
Final script is below for reference.
#get input from the user
$old = Read-Host 'Enter the old cimplicity qualifier (F24, IRF3 etc'
$new = Read-Host 'Enter the new cimplicity qualifier (CB3, F24_2 etc)'
$DirName = Get-Date -format "yyyy_MM_dd_hh_mm"
New-Item -ItemType directory -Path $DirName -force
New-Item "$DirName\log.txt" -ItemType file -force -Value "`nMatched CTX files on $dirname`n"
$logfile = "$DirName\log.txt"
$VerbosePreference = "SilentlyContinue"
$points = import-csv SearchAndReplace.csv -header find #Import CSV File
#$ctxfiles = Get-ChildItem . -include *.ctx | select -expand fullname #Import local directory of CTX Files
$points | foreach-object { #For each row of points in the CSV file
$findvar = $_.find #Store column 1 as string to search for
$OldQualifiedPoint = "\\\\"+$old+"\\" + $findvar #Use escape slashes to escape each invidual bs so it's not read as regex
$NewQualifiedPoint = "\\"+$new+"\" + $findvar #escape slashes are NOT required on the new string
$DuplicateNew = "\\\\" + $new + "\\" + "\\\\" + $new + "\\"
$QualifiedNew = "\\" + $new + "\"
dir . *.ctx | #Grab all CTX Files
select -expand fullname | #grab all of those file names and...
foreach {#iterate through each file
$DateTime = Get-Date -Format "hh:mm:ss"
$FileName = $_
Write-Host "$DateTime - $FindVar - Checking $FileName"
$FileCopied = 0
#Check file contents, and copy matching files to newly created directory
If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
If (!($FileCopied)) {
Copy $FileName -Destination $DirName
$FileCopied = 1
Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
Write-Host "$DateTime - Found $Findvar in $filename"
}
$FileContent = Get-Content $Filename -ReadCount 0
$FileContent =
$FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
$FileContent | Set-Content $FileName
}
}
$File.Dispose()
}
If I'm reading this correctly, you should be able to read a 3000 line file into memory, and do those replaces as an array operation, eliminating the need to iterate through each line. You can also chain those replace operations into a single command.
dir . *.ctx | #Grab all CTX Files
select -expand fullname | #grab all of those file names and...
foreach {#iterate through each file
$DateTime = Get-Date -Format "hh:mm:ss"
$FileName = $_
Write-Host "$DateTime - $FindVar - Checking $FileName"
#Check file contents, and copy matching files to newly created directory
If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
Copy $FileName -Destination $DirName
Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
Write-Host "$DateTime - Found $Findvar in $filename"
$FileContent = Get-Content $Filename -ReadCount 0
$FileContent =
$FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
$FileContent | Set-Content $FileName
}
}
On another note, Select-String will take the filepath as an argument, so you don't have to do a Get-Content and then pipe that to Select-String.
Yes, you can make it much faster by not using Get-Content... Use Stream Reader instead.
$file = New-Object System.IO.StreamReader -Arg "test.txt"
while (($line = $file.ReadLine()) -ne $null) {
# $line has your line
}
$file.dispose()
i wanted to use PowerShell for this and created a script like the one below:
$filepath = "input.csv"
$newfilepath = "input_fixed.csv"
filter num2x { $_ -replace "aaa","bbb" }
measure-command {
Get-Content -ReadCount 1000 $filepath | num2x | add-content $newfilepath
}
It took 19 minutes on my laptop to process 6.5Gb file. The code below is reading file in a batch (using ReadCount) and uses filter that should optimize performance.
But then I tried FART and it did the same thing in 3 minutes! quite a difference!

Optimizing simple search script in PowerShell

I need to create a script to search through just below a million files of text, code, etc. to find matches and then output all hits on a particular string pattern to a CSV file.
So far I made this;
$location = 'C:\Work*'
$arr = "foo", "bar" #Where "foo" and "bar" are string patterns I want to search for (separately)
for($i=0;$i -lt $arr.length; $i++) {
Get-ChildItem $location -recurse | select-string -pattern $($arr[$i]) | select-object Path | Export-Csv "C:\Work\Results\$($arr[$i]).txt"
}
This returns to me a CSV file named "foo.txt" with a list of all files with the word "foo" in it, and a file named "bar.txt" with a list of all files containing the word "bar".
Is there any way anyone can think of to optimize this script to make it work faster? Or ideas on how to make an entirely different, but equivalent script that just works faster?
All input appreciated!
If your files are not huge and can be read into memory then this version should work quite faster (and my quick and dirty local test seems to prove that):
$location = 'C:\ROM'
$arr = "Roman", "Kuzmin"
# remove output files
foreach($test in $arr) {
Remove-Item ".\$test.txt" -ErrorAction 0 -Confirm
}
Get-ChildItem $location -Recurse | .{process{ if (!$_.PSIsContainer) {
# read all text once
$content = [System.IO.File]::ReadAllText($_.FullName)
# test patterns and output paths once
foreach($test in $arr) {
if ($content -match $test) {
$_.FullName >> ".\$test.txt"
}
}
}}}
Notes: 1) mind changed paths and patterns in the example; 2) output files are not CSV but plain text; there is not much reason in CSV if you are interested just in paths - plain text files one path per line will do.
Let's suppose that 1) the files are not too big and you can load it into memory, 2) you really just want the Path of the file, that matches (not the line etc.).
I tried to read the file only once and then iterate through the regexes. There is some gain (it's a faster then the original solution), but the final result will depend on other factors like file sizes, count of files etc.
Also removing 'ignorecase' makes it faster a little bit.
$res = #{}
$arr | % { $res[$_] = #() }
Get-ChildItem $location -recurse |
? { !$_.PsIsContainer } |
% { $file = $_
$text = [Io.File]::ReadAllText($file.FullName)
$arr |
% { $regex = $_
if ([Regex]::IsMatch($text, $regex, 'ignorecase')) {
$res[$regex] = $file.FullName
}
}
}
$res.GetEnumerator() | % {
$_.Value | Export-Csv "d:\temp\so-res$($_.Key).txt"
}

PowerShell FINDSTR eqivalent?

What's the DOS FINDSTR equivalent for PowerShell? I need to search a bunch of log files for "ERROR".
Here's the quick answer
Get-ChildItem -Recurse -Include *.log | select-string ERROR
I found it here which has a great indepth answer!
For example, find all instances of "#include" in the c files in this directory and all sub-directories.
gci -r -i *.c | select-string "#include"
gci is an alias for get-childitem
Just to expand on Monroecheeseman's answer. gci is an alias for Get-ChildItem (which is the equivalent to dir or ls), the -r switch does a recursive search and -i means include.
Piping the result of that query to select-string has it read each file and look for lines matching a regular expression (the provided one in this case is ERROR, but it can be any .NET regular expression).
The result will be a collection of match objects, showing the line matching, the file, and and other related information.
if ($entry.EntryType -eq "Error")
Being Object Oriented, you want to test the property in question with one of the standard comparison operators you can find here.
I have a PS script watching logs remotely for me right now - some simple modification should make it work for you.
edit: I suppose I should also add that is a cmdlet built for this already if you don't want to unroll the way I did. Check out:
man Get-EventLog
Get-EventLog -newest 5 -logname System -EntryType Error
On a related note, here's a search that will list all the files containing a particular regex search or string. It could use some improvement so feel free to work on it. Also if someone wanted to encapsulate it in a function that would be welcome.
I'm new here so if this should go in it's own topic just let me know. I figured I'd put it her since this looks mostly related.
# Search in Files Script
# ---- Set these before you begin ----
$FolderToSearch="C:\" # UNC paths are ok, but remember you're mass reading file contents over the network
$Search="Looking For This" # accepts regex format
$IncludeSubfolders=$True #BUG: if this is set $False then $FileIncludeFilter must be "*" or you will always get 0 results
$AllMatches=$False
$FileIncludeFilter="*".split(",") # Restricting to specific file types is faster than excluding everything else
$FileExcludeFilter="*.exe,*.dll,*.wav,*.mp3,*.gif,*.jpg,*.png,*.ghs,*.rar,*.iso,*.zip,*.vmdk,*.dat,*.pst,*.gho".split(",")
# ---- Initialize ----
if ($AllMatches -eq $True) {$SelectParam=#{AllMatches=$True}}
else {$SelectParam=#{List=$True}}
if ($IncludeSubfolders -eq $True) {$RecurseParam=#{Recurse=$True}}
else {$RecurseParam=#{Recurse=$False}}
# ---- Build File List ----
#$Files=Get-Content -Path="$env:userprofile\Desktop\FileList.txt" # For searching a manual list of files
Write-Host "Building file list..." -NoNewline
$Files=Get-ChildItem -Include $FileIncludeFilter -Exclude $FileExcludeFilter -Path $FolderToSearch -ErrorAction silentlycontinue #RecurseParam|Where-Object{-not $_.psIsContainer} # #RecurseParam is basically -Recurse=[$True|$False]
#$Files=$Files|Out-GridView -PassThru -Title 'Select the Files to Search' # Manually choose files to search, requires powershell 3.0
Write-Host "Done"
# ---- Begin Search ----
Write-Host "Searching Files..."
$Files|
Select-String $Search #SelectParam| #The # instead of $ lets me pass the hastable as a list of parameters. #SelectParam is either -List or -AllMatches
Tee-Object -Variable Results|
Select-Object Path
Write-Host "Search Complete"
#$Results|Group-Object path|ForEach-Object{$path=$_.name; $matches=$_.group|%{[string]::join("`t", $_.Matches)}; "$path`t$matches"} # Show results including the matches separated by tabs (useful if using regex search)
<# Other Stuff
#-- Saving and restoring results
$Results|Export-Csv "$env:appdata\SearchResults.txt" # $env:appdata can be replaced with any UNC path, this just seemed like a logical place to default to
$Results=Import-Csv "$env:appdata\SearchResults.txt"
#-- alternate search patterns
$Search="(\d[-|]{0,}){15,19}" #Rough CC Match
#>
This is not the best way to do this:
gci <the_directory_path> -filter *.csv | where { $_.OpenText().ReadToEnd().Contains("|") -eq $true }
This helped me find all csv files which had the | character in them.
PowerShell has basically precluded the need for findstr.exe as the previous answers demonstrate. Any of these answers should work fine.
However, if you actually need to use findstr.exe (as was my case) here is a PowerShell wrapper for it:
Use the -Verbose option to output the findstr command line.
function Find-String
{
[CmdletBinding(DefaultParameterSetName='Path')]
param
(
[Parameter(Mandatory=$true, Position=0)]
[string]
$Pattern,
[Parameter(ParameterSetName='Path', Mandatory=$false, Position=1, ValueFromPipeline=$true)]
[string[]]
$Path,
[Parameter(ParameterSetName='LiteralPath', Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
[Alias('PSPath')]
[string[]]
$LiteralPath,
[Parameter(Mandatory=$false)]
[switch]
$IgnoreCase,
[Parameter(Mandatory=$false)]
[switch]
$UseLiteral,
[Parameter(Mandatory=$false)]
[switch]
$Recurse,
[Parameter(Mandatory=$false)]
[switch]
$Force,
[Parameter(Mandatory=$false)]
[switch]
$AsCustomObject
)
begin
{
$value = $Pattern.Replace('\', '\\\\').Replace('"', '\"')
$findStrArgs = #(
'/N'
'/O'
#('/R', '/L')[[bool]$UseLiteral]
"/c:$value"
)
if ($IgnoreCase)
{
$findStrArgs += '/I'
}
function GetCmdLine([array]$argList)
{
($argList | foreach { #($_, "`"$_`"")[($_.Trim() -match '\s')] }) -join ' '
}
}
process
{
$PSBoundParameters[$PSCmdlet.ParameterSetName] | foreach {
try
{
$_ | Get-ChildItem -Recurse:$Recurse -Force:$Force -ErrorAction Stop | foreach {
try
{
$file = $_
$argList = $findStrArgs + $file.FullName
Write-Verbose "findstr.exe $(GetCmdLine $argList)"
findstr.exe $argList | foreach {
if (-not $AsCustomObject)
{
return "${file}:$_"
}
$split = $_.Split(':', 3)
[pscustomobject] #{
File = $file
Line = $split[0]
Column = $split[1]
Value = $split[2]
}
}
}
catch
{
Write-Error -ErrorRecord $_
}
}
}
catch
{
Write-Error -ErrorRecord $_
}
}
}
}
FYI:
If you update to Powershell version 7 you can use grep...
I know egrep is in powershell on Azure CLI...
But SS is there!
An old article here: [https://devblogs.microsoft.com/powershell/select-string-and-grep/]

Resources