Filenames on computer is named like so
quant-ph9501001
math9901001
cond-mat0001001
hep-lat0308001
gr-qc0703001
but on http links filenames is / character included
quant-ph/9501001
math/9901001
cond-mat/0001001
hep-lat/0308001
gr-qc/0703001
I can't rename my files quant-ph9501001 into quant-ph/9501001 because / is an illegal character so I can't use my code correctly to parse and rename from syntax to script actions.
Syntax of my filenames following this pattern:
letters + 8 digits
letters + '-' + letters + 8 digits
I can change quant-ph9501001 to quant-ph_9501001, but I need to parse missing character in filenames as if reading / (slash character).
So if I have strings like
gr-qc0701001
gr-qc_0701001
it should read like
quant-ph/9501001
My script don't working (no parsing) for gr-qc/0701001 because I can't rename filenames using illegal character. Error is 404.
iwr : The remote server returned an error: (404) Not Found.
If script works correctly PowerShell should be returns this string:
General Relativity and Quantum Cosmology (gr-qc)
and filename should be
Spectral Broadening of Radiation from Relativistic Collapsing Objects
My script is
$list1 = #"
quant-ph9802001
quant-ph9802004
"#
$list2 = #"
quant-ph/9802001
quant-ph/9802004
"#
Write-Output "Adding forward slashes"
$list1 -split "`r`n" | % {
$item = $_.Trim()
$newItem = $item -replace '(.*)(\d{7})', '$1/$2'
Write-Output $("{0} ==> {1}" -f $item, $newItem)
}
Write-Output "Removing forward slashes"
$list2 -split "`r`n" | % {
$item = $_.Trim()
$newItem = $item -replace '(.*)/(\d{7})', '$1$2'
Write-Output $("{0} ==> {1}" -f $item, $newItem)
}
Function Clean-InvalidFileNameChars {
param(
[Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[String]$Name
)
$invalidChars = [IO.Path]::GetInvalidFileNameChars() -join ''
$re = "[{0}]" -f [RegEx]::Escape($invalidChars)
$res=($Name -replace $re)
return $res.Substring(0, [math]::Min(260, $res.Length))
}
Function Clean-InvalidPathChars {
param(
[Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[String]$Name
)
$invalidChars = [IO.Path]::GetInvalidPathChars() -join ''
$re = "[{0}]" -f [RegEx]::Escape($invalidChars)
$res=($Name -replace $re)
return $res.Substring(0, [math]::Min(248, $res.Length))
}
$rootpath="c:\temp2"
$rootpathresult="c:\tempresult"
$template=#'
[3] arXiv:1611.00057 [pdf, ps, other]
Title: {title*:Holomorphy of adjoint $L$ functions for quasisplit A2}
Authors: Joseph Hundley
Comments: 18 pages
Subjects: {subject:Number Theory (math.NT)}
[4] arXiv:1611.00066 [pdf, other]
Title: {title*:Many Haken Heegaard splittings}
Authors: Alessandro Sisto
Comments: 12 pages, 3 figures
Subjects: {subject:Geometric Topology (math.GT)}
[5] arXiv:1611.00067 [pdf, ps, other]
Title: {title*:Subsumed homoclinic connections and infinitely many coexisting attractors in piecewise-linear maps}
Authors: David J.W. Simpson, Christopher P. Tuffley
Subjects: {subject:Dynamical Systems (math.DS)}
[21] arXiv:1611.00114 [pdf, ps, other]
Title: {title*:Faces of highest weight modules and the universal Weyl polyhedron}
Authors: Gurbir Dhillon, Apoorva Khare
Comments: We recall preliminaries and results from the companion paper arXiv:1606.09640
Subjects: {subject:Representation Theory (math.RT)}; Combinatorics (math.CO); Metric Geometry (math.MG)
'#
#extract utils data and clean
$listbook=gci $rootpath -File -filter *.pdf | foreach { New-Object psobject -Property #{file=$_.fullname; books= ((iwr "https://arxiv.org/abs/$($_.BaseName)").ParsedHtml.body.outerText | ConvertFrom-String -TemplateContent $template)}} | select file -ExpandProperty books | select file, #{N="Subject";E={Clean-InvalidPathChars $_.subject}}, #{N="Title";E={Clean-InvalidFileNameChars $_.title}}
#build dirs and copy+rename file
$listbook | %{$newpath="$rootpathresult\$($_.subject)"; New-Item -ItemType Directory -Path "$newpath" -Force; Copy-Item $_.file "$newpath\$($_.title).pdf" -Force}
EDIT: Error is still 404 this after Kori Gill answers
http://i.imgur.com/ZOZyMad.png
Problem is the difference between from local filenames and online filenames. I should add in memory temporally this illegal character in local filenames otherwise script doesn't work.
I can't say I totally understand your questions, but sounds like you just need to convert these names to/from a format that has or does not have a forward slash. You mention 8 digits, but your examples have 7. You can adjust as needed.
I think something like this will help you...
$list1 = #"
quant-ph9501001
math9901001
cond-mat0001001
hep-lat0308001
gr-qc0703001
"#
$list2 = #"
quant-ph/9501001
math/9901001
cond-mat/0001001
hep-lat/0308001
gr-qc/0703001
"#
Write-Output "Adding forward slashes"
$list1 -split "`r`n" | % {
$item = $_.Trim()
$newItem = $item -replace '(.*)(\d{7})', '$1/$2'
Write-Output $("{0} ==> {1}" -f $item, $newItem)
}
Write-Output "Removing forward slashes"
$list2 -split "`r`n" | % {
$item = $_.Trim()
$newItem = $item -replace '(.*)/(\d{7})', '$1$2'
Write-Output $("{0} ==> {1}" -f $item, $newItem)
}
Outputs:
Adding forward slashes
quant-ph9501001 ==> quant-ph/9501001
math9901001 ==> math/9901001
cond-mat0001001 ==> cond-mat/0001001
hep-lat0308001 ==> hep-lat/0308001
gr-qc0703001 ==> gr-qc/0703001
Removing forward slashes
quant-ph/9501001 ==> quant-ph9501001
math/9901001 ==> math9901001
cond-mat/0001001 ==> cond-mat0001001
hep-lat/0308001 ==> hep-lat0308001
gr-qc/0703001 ==> gr-qc0703001
Related
I wrote code like this:
$count = 0
$path = "C:\Videos\"
$oldvids = Get-ChildItem -Path $path -Include *.* -Recurse
foreach ($oldvid in $oldvids) {
$curpath = $oldvid.DirectoryName
$name = [System.IO.Path]::GetFileNameWithoutExtension($oldvid)
$names = $name.Split(" - ")
$names[0] = ""
$metadata_title = $names -join "-"
$ext = [System.IO.Path]::GetExtension($oldvid)
if ($name.StartsWith("new_") -eq $false)
{
$newvid = $curpath + "/new_" + $name + ".mp4"
if ([System.IO.File]::Exists($newvid) -eq $false)
{
$count++
Write-Output $metadata_title
}
}
}
But this code causes a file name like this:
Chapter 1 - New Video
to become:
Chapter 1---New Video
How can I make sure a single - is actually only one? Do I have to escape it?
The idea is to eliminate first part of the file names, so from:
01 - Chapter 1 - Video 1
to:
Chapter 1 - Video 1
So I wanted to split using " - " and then join everything back without the first element in the split array.
Looking at your example and your explanation of changing metadata with ffmpeg on each file, I guess this is what you need:
$count = 0
$path = 'C:\Videos'
# get a list of old video files (these do not start with 'new_')
$oldvids = Get-ChildItem -Path $path -Filter *.mp4 -File -Recurse |
Where-Object { $_.Name -notmatch '^new_' }
foreach ($oldvid in $oldvids) {
# if the file is called 'C:\Videos\01 - Chapter 1 - Video 1.mp4'
$tempName = $oldvid.Name -replace '^\d+\s*-\s*(.+)', 'new_$1' # --> new_Chapter 1 - Video 1.mp4
# or do
# $tempName = 'new_' + ($oldvid.Name -split '-', 2)[-1].Trim() # --> new_Chapter 1 - Video 1.mp4
# or
# $tempName = $oldvid.Name -replace '^\d+\s*-\s*', 'new_' # --> new_Chapter 1 - Video 1.mp4
# combine the current file path with the temporary name
$outputFile = Join-Path -Path $oldvid.DirectoryName -ChildPath $tempName
#######################################################################
# next do your ffmpeg command to change metadata
# for input you use $oldvid.FullName and for output you use $outputFile
Write-Host "Updated file $($oldvid.Name) as $tempName"
#######################################################################
# when done with ffmpeg, delete the original (or for safety move it to somewhere else)
Write-Host "Deleting file '$($oldvid.Name)'"
$oldvid | Remove-Item -WhatIf
# and rename the updated file by removing the 'new_' part from its name
$newName = ($tempName -replace '^new_').Trim()
Write-Host "Renaming updated file to '$newName'"
$tempName | Rename-Item -NewName $newName
# all done, proceed with the next file
$count++
}
Note: I have added switch -WhatIf to the Remove-Item line. This is a safety measure that will only display what file would be deleted without actually deleting it.
If you are sure the correct file should be deleted, then remove that -WhatIf switch so the original file gets destroyed after maipulating it with ffmpeg.
As per your comment, to send items to the Recycle bin instead of destroying them like Remove-Item does, here's two ways of achieving that:
Method 1: Use COM
function RemoveTo-RecycleBin {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
[Alias('FullName')]
[string[]]$Path
)
begin {
$shell = New-Object -ComObject 'Shell.Application'
$Recycler = $Shell.NameSpace(0xa)
}
process {
foreach ($item in $Path) {
[void]$Recycler.MoveHere($item)
}
}
end {
# clean-up the used COM objects
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($Recycler)
$null = [System.Runtime.Interopservices.Marshal]::ReleaseComObject($shell)
$null = [System.GC]::Collect()
$null = [System.GC]::WaitForPendingFinalizers()
}
}
# usage example, remove all files from the D:\Test directory
Get-ChildItem -Path 'D:\Test' -Filter '*.*' -File | RemoveTo-RecycleBin
# usage example, remove all files and subdirectories from the D:\Test directory
Get-ChildItem -Path 'D:\Test' | RemoveTo-RecycleBin
Method 2: Use the Microsoft.VisualBasic assembly
function RemoveTo-RecycleBin {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
[Alias('FullName')]
[string[]]$Path,
[switch]$ShowConfirmationDialog
)
begin {
Add-Type -AssemblyName Microsoft.VisualBasic
$showUI = if ($ShowConfirmationDialog) { 'AllDialogs' } else { 'OnlyErrorDialogs' }
}
process {
foreach ($item in $Path) {
Write-Host $item
# detect if this is a file or a directory
if ((Get-Item -Path $item) -is [System.IO.DirectoryInfo]) {
# first parameter: the absolute full path
# second parameter: one of Microsoft.VisualBasic.FileIO.UIOption values: OnlyErrorDialogs or AllDialogs
# third parameter: one of Microsoft.VisualBasic.FileIO.RecycleOption values: DeletePermanently or SendToRecycleBin
[Microsoft.VisualBasic.FileIO.FileSystem]::DeleteDirectory($item, $showUI, 'SendToRecycleBin')
}
else {
# first parameter: the absolute full path and file name
# second parameter: one of Microsoft.VisualBasic.FileIO.UIOption values: OnlyErrorDialogs or AllDialogs
# third parameter: one of Microsoft.VisualBasic.FileIO.RecycleOption values: DeletePermanently or SendToRecycleBin
[Microsoft.VisualBasic.FileIO.FileSystem]::DeleteFile($item,$showUI, 'SendToRecycleBin')
}
}
}
}
# usage example, remove all files from the D:\Test directory
Get-ChildItem -Path 'D:\Test' -Filter '*.*' -File | RemoveTo-RecycleBin
# usage example, remove all files and subdirectories from the D:\Test directory
Get-ChildItem -Path 'D:\Test' | RemoveTo-RecycleBin
Just choose any of the above functions, put it on top of your script and then change line
$oldvid | Remove-Item -WhatIf
into
$oldvid | RemoveTo-RecycleBin
I have this PowerShell script that strips html tags and just leaves the text and have it display the word count for that html file when the script is executed. My question is when I execute:
function Html-ToText {
param([System.String] $html)
# remove line breaks, replace with spaces
$html = $html -replace "(`r|`n|`t)", " "
# write-verbose "removed line breaks: `n`n$html`n"
# remove invisible content
#('head', 'style', 'script', 'object', 'embed', 'applet', 'noframes', 'noscript', 'noembed') | % {
$html = $html -replace "<$_[^>]*?>.*?</$_>", ""
}
# write-verbose "removed invisible blocks: `n`n$html`n"
# Condense extra whitespace
$html = $html -replace "( )+", " "
# write-verbose "condensed whitespace: `n`n$html`n"
# Add line breaks
#('div','p','blockquote','h[1-9]') | % { $html = $html -replace "</?$_[^>]*?>.*?</$_>", ("`n" + '$0' )}
# Add line breaks for self-closing tags
#('div','p','blockquote','h[1-9]','br') | % { $html = $html -replace "<$_[^>]*?/>", ('$0' + "`n")}
# write-verbose "added line breaks: `n`n$html`n"
#strip tags
$html = $html -replace "<[^>]*?>", ""
# write-verbose "removed tags: `n`n$html`n"
# replace common entities
#(
#("•", " * "),
#("‹", "<"),
#("›", ">"),
#("&(rsquo|lsquo);", "'"),
#("&(quot|ldquo|rdquo);", '"'),
#("™", "(tm)"),
#("⁄", "/"),
#("&(quot|#34|#034|#x22);", '"'),
#('&(amp|#38|#038|#x26);', "&"),
#("&(lt|#60|#060|#x3c);", "<"),
#("&(gt|#62|#062|#x3e);", ">"),
#('&(copy|#169);', "(c)"),
#("&(reg|#174);", "(r)"),
#(" ", " "),
#("&(.{2,6});", "")
) | % { $html = $html -replace $_[0], $_[1] }
# write-verbose "replaced entities: `n`n$html`n"
return $html + $a | Measure-Object -word
}
And then run:
Html-ToText (new-object net.webclient).DownloadString("test.html")
it displays 4 words that are displayed in the output in PowerShell. How do I export that output from the PowerShell window into a an excel spreadsheet with the column words and the count 4?
The CSV you want just looks like this:
Words
4
it's easy enough to just write that to a text file, Excel will read it. But you're in luck, the output of Measure-Object is already an object with 'Words' as a property and '4' as a value, and you can feed that straight into Export-Csv. Use select-object to pick just the property you want:
$x = Html-ToText (new-object net.webclient).DownloadString("test.html")
# drop the Lines/Characters/etc fields, just export words
$x | select-Object Words | Export-Csv out.csv -NoTypeInformation
I'd be tempted to see if I could use
$x = Invoke-WebResponse http://www.google.com
$x.AllElements.InnerText
to get the words out of the HTML, before I tried stripping the content with replaces.
I figured it out. What I did was added
+ $a | Measure-Object -Word after the #html variable in the script and then ran:
Html-ToText (new-object net.webclient).DownloadString("test.html") + select-Object Words | Export-Csv out.csv -NoTypeInformation and it exported the word count – josh s 1 min ago
I normally find the answer to my problem by going through the site, but this time I have read every question yet still I am in despair and really need an experienced eye.
What I have is basically a structural health monitoring system. I measure strains and receive raw data. This raw data is processed by a MATLAB executable that I wrote myself and then uploaded to an ftp-server. We had a student that automated this with a PowerShell script which was working perfectly until I changed literally one small line in MATLAB and recompiled the code.
I do not understand much about PowerShell, so please be patient with me. The error I receive is you cannot call a method on a null-valued expression. This occurs when I try to replace a set of strings (just called xxx_xxx) with a date that exists as a variable in PowerShell. I can see xxx_xxx in the command window (see attached image), I can print out the date that I want to use as replacement, but somehow it does not work.
I cannot provide a working code snippet because you would need the DAQ to generate data, and as I said, I don't understand the language much. But below is the code. For easier reading, the line that I am receiving the error is the following:
$outData = $cmdOutput.Replace("xxx_xxx",$snaps[$i].Substring(6,4)+"-"+$snaps[$i].Substring(3,2)+"-"+$snaps[$i].Substring(0,2)+" "+$snaps[$i].Substring(11,8)+";")
If anyone could help me with this, I would be eternally grateful!
$retry=3
while(1){
#$dir = "C:\Users\Petar\Documents\Zoo\PetarData\INPUT DATA\New folder\"
$dir = "C:\Users\Yunus\Documents\Micron Optics\ENLIGHT\Data\" + $(get-date -f yyyy) + "\" + $(get-date -f MM) + "\"
#$outdir = "C:\Users\Petar\Documents\Zoo\PetarData\OUTPUT DATA\New folder\"
$archivedirin = "C:\Users\Yunus\Documents\Elefantenhaus\Archive\IN\"
$archivedirout = "C:\Users\Yunus\Documents\Elefantenhaus\Archive\OUT\"
$tempdir = "C:\Users\Yunus\Documents\Elefantenhaus\Archive\TEMP\"
$prefix = "EHZZ";
$filecount=(Get-ChildItem $dir).Count
$latest = Get-ChildItem -Path $dir | Sort-Object LastAccessTime -Descending | Select-Object -First 1
if($filecount -gt 1){
$exclude = $latest.name
$Files = GCI -path $dir | Where-object {$_.name -ne $exclude}
$dest = $archivedirin + "batch_"+$(get-date -f MM-dd-yyyy_HH_mm_ss)+"\"
new-item -type directory $dest
foreach ($file in $Files){move-item -path ($dir+$file) -destination $dest}
$latest = Get-ChildItem -Path $dest | Sort-Object LastAccessTime -Descending | Select-Object -First 1
$filename = $dest + $latest.name
$s=Get-Content $filename
while($s -eq $null){
if($retry -lt 0){break}
write-host "could not read file"
$retry = $retry -1
$s=Get-Content $filename
}
#read content of input file
$snaps = $s
#loop through the lines in the file until the first occurence of a timestamp, that is our desired line
for ($i = 0; $i -lt $snaps.length; $i++)
{
$ismatch =[regex]::Matches($snaps[$i], '^(\d\d.\d\d.\d\d\d\d\s\d\d+)')
if ( $ismatch -ne $null -and $ismatch[0].Groups[1].Value)
{
$temp=Get-Content $filename | select -skip $i
$filenametemp = $tempdir+"\temp.txt" #temp file path, don't change the filename "temp.txt"
#$filename3 = $tempdir+"\test.txt"
Add-Content $filenametemp $temp
$filename = $archivedirout+$prefix+"_"+$snaps[$i].Substring(8,2)+$snaps[$i].Substring(3,2)+$snaps[$i].Substring(0,2)+"_"+$snaps[$i].Substring(11,2)+$snaps[$i].Substring(14,2)+$snaps[$i].Substring(17,2)+".txt"
$cmdOutput = (cmd /c new_modified.exe $tempdir) | Out-String
write-output $cmdOutput #"$cmdOutput is:"
#IF ([string]::IsNullOrWhitespace($cmdOutput)){
# break
#}
$outData = $cmdOutput.Replace("xxx_xxx",$snaps[$i].Substring(6,4)+"-"+$snaps[$i].Substring(3,2)+"-"+$snaps[$i].Substring(0,2)+" "+$snaps[$i].Substring(11,8)+";")
Add-Content $filename $outData
remove-item -path $filenametemp
break
}
}
#break
}
else
{
write-host "waiting for file"
}
Start-Sleep -s 30
}
I think what is happening is that the output of the external program isn't being piped into a variable correctly. I haven't had a chance to test this but Tee-Object looks like the appropriate method for you.
I would suggest you try replacing...
$cmdOutput = (cmd /c new_modified.exe $tempdir) | Out-String
with...
cmd /c new_modified.exe $tempdir | Tee-Object -variable $cmdOutput
I have a directory with ~ 3000 text files in it, and I'm doing periodic search and replaces on those text files as I transition a program to a new server.
Each text file may have an average of ~3000 lines, and I need to search the files for maybe 300 - 1000 terms at a time.
I'm replacing the server prefix which is related to the string I'm searching for. So for every one of the csv entries, I'm looking for Search_String, \\Old_Server\"Search_String" and making sure that after the program completes, the result is "\\New_Server\Search_String".
I cobbled together a powershell program, and it works. But it's so slow I've never seen it complete.
Any suggestions for making it faster?
EDIT 1:
I changed get-content as suggested, but it still took 3 minutes to search two files (~8000 lines) for 9 separate search terms. I must still be screwing up; a notepad++ search and replace would still be way faster if done manually 9 times.
I'm not sure how to get rid of the first (Get-Content) because I want to make a copy of the file for backup before I make any changes to it.
EDIT 2:
So this is an order of magnitude faster; it's searching a file in maybe 10 seconds. But now it doesn't write changes to files, and it only searches the first file in the directory! I didn't change that code, so I don't know why it broke.
EDIT 3:
Success! I adapted a solution posted below to make it much, much faster. It's searching each file in a couple of seconds now. I may reverse the loop order, so that it loads the file into the array and then searches and replaces each entry in the CSV rather than the other way around. I'll post that if I get it to work.
Final script is below for reference.
#get input from the user
$old = Read-Host 'Enter the old cimplicity qualifier (F24, IRF3 etc'
$new = Read-Host 'Enter the new cimplicity qualifier (CB3, F24_2 etc)'
$DirName = Get-Date -format "yyyy_MM_dd_hh_mm"
New-Item -ItemType directory -Path $DirName -force
New-Item "$DirName\log.txt" -ItemType file -force -Value "`nMatched CTX files on $dirname`n"
$logfile = "$DirName\log.txt"
$VerbosePreference = "SilentlyContinue"
$points = import-csv SearchAndReplace.csv -header find #Import CSV File
#$ctxfiles = Get-ChildItem . -include *.ctx | select -expand fullname #Import local directory of CTX Files
$points | foreach-object { #For each row of points in the CSV file
$findvar = $_.find #Store column 1 as string to search for
$OldQualifiedPoint = "\\\\"+$old+"\\" + $findvar #Use escape slashes to escape each invidual bs so it's not read as regex
$NewQualifiedPoint = "\\"+$new+"\" + $findvar #escape slashes are NOT required on the new string
$DuplicateNew = "\\\\" + $new + "\\" + "\\\\" + $new + "\\"
$QualifiedNew = "\\" + $new + "\"
dir . *.ctx | #Grab all CTX Files
select -expand fullname | #grab all of those file names and...
foreach {#iterate through each file
$DateTime = Get-Date -Format "hh:mm:ss"
$FileName = $_
Write-Host "$DateTime - $FindVar - Checking $FileName"
$FileCopied = 0
#Check file contents, and copy matching files to newly created directory
If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
If (!($FileCopied)) {
Copy $FileName -Destination $DirName
$FileCopied = 1
Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
Write-Host "$DateTime - Found $Findvar in $filename"
}
$FileContent = Get-Content $Filename -ReadCount 0
$FileContent =
$FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
$FileContent | Set-Content $FileName
}
}
$File.Dispose()
}
If I'm reading this correctly, you should be able to read a 3000 line file into memory, and do those replaces as an array operation, eliminating the need to iterate through each line. You can also chain those replace operations into a single command.
dir . *.ctx | #Grab all CTX Files
select -expand fullname | #grab all of those file names and...
foreach {#iterate through each file
$DateTime = Get-Date -Format "hh:mm:ss"
$FileName = $_
Write-Host "$DateTime - $FindVar - Checking $FileName"
#Check file contents, and copy matching files to newly created directory
If (Select-String -Path $_ -Pattern $findvar -Quiet ) {
Copy $FileName -Destination $DirName
Add-Content $logfile "`n$DateTime - Found $Findvar in $filename"
Write-Host "$DateTime - Found $Findvar in $filename"
$FileContent = Get-Content $Filename -ReadCount 0
$FileContent =
$FileContent -replace $OldQualifiedPoint,$NewQualifiedPoint -replace $findvar,$NewQualifiedPoint -replace $DuplicateNew,$QualifiedNew
$FileContent | Set-Content $FileName
}
}
On another note, Select-String will take the filepath as an argument, so you don't have to do a Get-Content and then pipe that to Select-String.
Yes, you can make it much faster by not using Get-Content... Use Stream Reader instead.
$file = New-Object System.IO.StreamReader -Arg "test.txt"
while (($line = $file.ReadLine()) -ne $null) {
# $line has your line
}
$file.dispose()
i wanted to use PowerShell for this and created a script like the one below:
$filepath = "input.csv"
$newfilepath = "input_fixed.csv"
filter num2x { $_ -replace "aaa","bbb" }
measure-command {
Get-Content -ReadCount 1000 $filepath | num2x | add-content $newfilepath
}
It took 19 minutes on my laptop to process 6.5Gb file. The code below is reading file in a batch (using ReadCount) and uses filter that should optimize performance.
But then I tried FART and it did the same thing in 3 minutes! quite a difference!
What's the DOS FINDSTR equivalent for PowerShell? I need to search a bunch of log files for "ERROR".
Here's the quick answer
Get-ChildItem -Recurse -Include *.log | select-string ERROR
I found it here which has a great indepth answer!
For example, find all instances of "#include" in the c files in this directory and all sub-directories.
gci -r -i *.c | select-string "#include"
gci is an alias for get-childitem
Just to expand on Monroecheeseman's answer. gci is an alias for Get-ChildItem (which is the equivalent to dir or ls), the -r switch does a recursive search and -i means include.
Piping the result of that query to select-string has it read each file and look for lines matching a regular expression (the provided one in this case is ERROR, but it can be any .NET regular expression).
The result will be a collection of match objects, showing the line matching, the file, and and other related information.
if ($entry.EntryType -eq "Error")
Being Object Oriented, you want to test the property in question with one of the standard comparison operators you can find here.
I have a PS script watching logs remotely for me right now - some simple modification should make it work for you.
edit: I suppose I should also add that is a cmdlet built for this already if you don't want to unroll the way I did. Check out:
man Get-EventLog
Get-EventLog -newest 5 -logname System -EntryType Error
On a related note, here's a search that will list all the files containing a particular regex search or string. It could use some improvement so feel free to work on it. Also if someone wanted to encapsulate it in a function that would be welcome.
I'm new here so if this should go in it's own topic just let me know. I figured I'd put it her since this looks mostly related.
# Search in Files Script
# ---- Set these before you begin ----
$FolderToSearch="C:\" # UNC paths are ok, but remember you're mass reading file contents over the network
$Search="Looking For This" # accepts regex format
$IncludeSubfolders=$True #BUG: if this is set $False then $FileIncludeFilter must be "*" or you will always get 0 results
$AllMatches=$False
$FileIncludeFilter="*".split(",") # Restricting to specific file types is faster than excluding everything else
$FileExcludeFilter="*.exe,*.dll,*.wav,*.mp3,*.gif,*.jpg,*.png,*.ghs,*.rar,*.iso,*.zip,*.vmdk,*.dat,*.pst,*.gho".split(",")
# ---- Initialize ----
if ($AllMatches -eq $True) {$SelectParam=#{AllMatches=$True}}
else {$SelectParam=#{List=$True}}
if ($IncludeSubfolders -eq $True) {$RecurseParam=#{Recurse=$True}}
else {$RecurseParam=#{Recurse=$False}}
# ---- Build File List ----
#$Files=Get-Content -Path="$env:userprofile\Desktop\FileList.txt" # For searching a manual list of files
Write-Host "Building file list..." -NoNewline
$Files=Get-ChildItem -Include $FileIncludeFilter -Exclude $FileExcludeFilter -Path $FolderToSearch -ErrorAction silentlycontinue #RecurseParam|Where-Object{-not $_.psIsContainer} # #RecurseParam is basically -Recurse=[$True|$False]
#$Files=$Files|Out-GridView -PassThru -Title 'Select the Files to Search' # Manually choose files to search, requires powershell 3.0
Write-Host "Done"
# ---- Begin Search ----
Write-Host "Searching Files..."
$Files|
Select-String $Search #SelectParam| #The # instead of $ lets me pass the hastable as a list of parameters. #SelectParam is either -List or -AllMatches
Tee-Object -Variable Results|
Select-Object Path
Write-Host "Search Complete"
#$Results|Group-Object path|ForEach-Object{$path=$_.name; $matches=$_.group|%{[string]::join("`t", $_.Matches)}; "$path`t$matches"} # Show results including the matches separated by tabs (useful if using regex search)
<# Other Stuff
#-- Saving and restoring results
$Results|Export-Csv "$env:appdata\SearchResults.txt" # $env:appdata can be replaced with any UNC path, this just seemed like a logical place to default to
$Results=Import-Csv "$env:appdata\SearchResults.txt"
#-- alternate search patterns
$Search="(\d[-|]{0,}){15,19}" #Rough CC Match
#>
This is not the best way to do this:
gci <the_directory_path> -filter *.csv | where { $_.OpenText().ReadToEnd().Contains("|") -eq $true }
This helped me find all csv files which had the | character in them.
PowerShell has basically precluded the need for findstr.exe as the previous answers demonstrate. Any of these answers should work fine.
However, if you actually need to use findstr.exe (as was my case) here is a PowerShell wrapper for it:
Use the -Verbose option to output the findstr command line.
function Find-String
{
[CmdletBinding(DefaultParameterSetName='Path')]
param
(
[Parameter(Mandatory=$true, Position=0)]
[string]
$Pattern,
[Parameter(ParameterSetName='Path', Mandatory=$false, Position=1, ValueFromPipeline=$true)]
[string[]]
$Path,
[Parameter(ParameterSetName='LiteralPath', Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
[Alias('PSPath')]
[string[]]
$LiteralPath,
[Parameter(Mandatory=$false)]
[switch]
$IgnoreCase,
[Parameter(Mandatory=$false)]
[switch]
$UseLiteral,
[Parameter(Mandatory=$false)]
[switch]
$Recurse,
[Parameter(Mandatory=$false)]
[switch]
$Force,
[Parameter(Mandatory=$false)]
[switch]
$AsCustomObject
)
begin
{
$value = $Pattern.Replace('\', '\\\\').Replace('"', '\"')
$findStrArgs = #(
'/N'
'/O'
#('/R', '/L')[[bool]$UseLiteral]
"/c:$value"
)
if ($IgnoreCase)
{
$findStrArgs += '/I'
}
function GetCmdLine([array]$argList)
{
($argList | foreach { #($_, "`"$_`"")[($_.Trim() -match '\s')] }) -join ' '
}
}
process
{
$PSBoundParameters[$PSCmdlet.ParameterSetName] | foreach {
try
{
$_ | Get-ChildItem -Recurse:$Recurse -Force:$Force -ErrorAction Stop | foreach {
try
{
$file = $_
$argList = $findStrArgs + $file.FullName
Write-Verbose "findstr.exe $(GetCmdLine $argList)"
findstr.exe $argList | foreach {
if (-not $AsCustomObject)
{
return "${file}:$_"
}
$split = $_.Split(':', 3)
[pscustomobject] #{
File = $file
Line = $split[0]
Column = $split[1]
Value = $split[2]
}
}
}
catch
{
Write-Error -ErrorRecord $_
}
}
}
catch
{
Write-Error -ErrorRecord $_
}
}
}
}
FYI:
If you update to Powershell version 7 you can use grep...
I know egrep is in powershell on Azure CLI...
But SS is there!
An old article here: [https://devblogs.microsoft.com/powershell/select-string-and-grep/]