I'm having some trouble with using multithreading in powershell. I've tried creating a synced hash table and doing ForEach-Object -Parallel -ThrottleLimit 3 -AsJob{...} but this just gives me errors.
A sample of what I'm trying to do is:
$index = [System.Collections.Hashtable]::Synchronized(#{})
Import-Csv -path .\csvfile.csv | ForEach-Object -Parallel -ThrottleLimit 3 -AsJob{
$nameKey = $_.($FILE_HEADER)
$dirKey = $_.($DIR_HEADER)
$extKey = $_.($EXTENSION_HEADER)
$nameLetter = $nameKey.Substring(0,1) #Retrieves the very first character of the name for indexing
#Confirm we are indexing into non-null array as we step through dimensions
if($null -eq $index[$dirKey])
{$index[$dirKey] = #{}}
if($null -eq $index[$dirKey][$extKey])
{$index[$dirKey][$extKey] = #{}}
if($null -eq $index[$dirKey][$extKey][$nameLetter])
{$index[$dirKey][$extKey][$nameLetter] = #{}}
if($null -eq $index[$dirKey][$extKey][$nameLetter][$nameKey])
{$index[$dirKey][$extKey][$nameLetter][$nameKey] = 0}
$index[$dirKey][$extKey][$nameLetter][$nameKey]++
}
In the result I would have a 4d hash table where I can call $index[$dirKey][$extKey][$nameLetter][$nameKey] to get a counter representing the number of times this name was added.
I am doing this because this CSV is half a million lines long and simply building my hash table linearly takes two hours. The next stage where I go through these entries takes even longer.
What I am looking for is the ability to run through all the entries of the CSV once and build my index file using as many threads as I want. What is the best way to go about this? Also how do I determine the most sensible number of threads?
Related
What I'm trying to do
The below script loops through every item in an Array of data streams and requests a summary value for output to a text file. This external request is by far the most expensive part of the process, and so I am now using a Runspacepool to run multiple (5) requests in parallel, and whichever finishes first outputs its results.
These requests all write to a synchronised hashtable, $hash, which holds a running total ($hash.counter) and tracks which thread ($hash.thread) is updating the total and a .txt output file, to avoid potential write collisions.
What isn't working
Each thread is able to update the counter easily enough $hash.counter+=$r, but when I try and Read the value into an Add-Content statement:
Add-Content C:\Temp\test.txt "$hash.counter|$r|$p|$ThreadID"
it adds an object reference rather than a number:
System.Collections.Hashtable+SyncHashtable.counter|123|MyStreamName|21252
And so I've ended up passing the counter through a temporary variable that can be used in the string:
[int]$t = $hash.counter+0
Add-Content C:\Temp\test.txt "$t|$r|$p|$ThreadID"
Which does output the true total:
14565423|123|MyStreamName|21252
What I'm asking
Is it possible to remove this temporary variable and output directly from the hashtable? Why does the object reference have a '+' in the middle?
I've had to add logic to 'lock' the hashtable to prevent data collisions. Should this be necessary? I'd been told that synchronised hashtables were supposed to be thread-safe for R/W operations, but without this logic my counter doesn't reach the correct total.
Full code for the loop itself below - I've left out setup of the Runspacepool etc
ForEach($i in $Array){
# Save down the data stream name and create parameter list for passing to the new job
$p = $i.Point.Name
$parameters = #{
hash = $hash
conn = $Conn
p = $p
}
# Instantiate new powershell runspace and send a script to it
$PowerShell = [powershell]::Create()
$PowerShell.RunspacePool = $RunspacePool
[void]$Powershell.AddScript({
# Receive parameter list, retrieve threadid
Param (
$hash,
$conn,
$p
)
$ThreadID = [appdomain]::GetCurrentThreadId()
# Send data request to the PI Data Archive using the existing connection
$q = Get-something (actual code removed)
[int]$r = $q.Values.Values[0].Value
# Lock out other threads from writing to the hashtable and output file to prevent collisions
# If the thread isn't locked, attempt to lock it. If it is locked, sleep for 1ms and try again. Tracked by synchronised Hashtable.
Do{
if($hash.thread -eq 0){
$hash.thread = $ThreadID
}
# Sleep for 1ms and check that the lock wasn't overwritten by another parallel thread.
Start-Sleep -Milliseconds 1
}Until($hash.thread -eq $ThreadID)
# Increment the synchronised hash counter. Save the new result to a temporary variable (can't figure out how to get the hash counter itself to output to the file)
$hash.counter+=$r
[int]$t = $hash.counter+0
# Write to output file new counter total, result, pointName and threadID
Add-Content C:\Temp\test.txt "$t|$r|$p|$ThreadID"
# release lock on the hashtable and output file
$hash.thread = 0
})
# Add parameter list to instance (matching param() list from the script. Invoke the new instance and save a handle for closing it
[void]$Powershell.AddParameters($parameters)
$Handle = $PowerShell.BeginInvoke()
# Save down the handle into the $jobs list for closing the instances afterwards
$temp = [PSCustomObject]#{
PowerShell=$Powershell
Handle=$Handle
}
[void]$jobs.Add($Temp)
}
For various reasons, I started to write a PowerShell portscanner, not least to start learning it.
First iteration used Test-Netconnection. This seemed as if it would be too slow; so I went one level down to use sockets, specifically System.Net.Sockets.TcpClient. (Have started looking at System.Net.Sockets.Socket as the MS docs make mention of the Socket.BeginConnect() method which can begin an asynchronous request for a remote connection, but not sure if this will help yet.)
This still seemed too slow, so I looked at jobs. All this did was consume more resources for not much speed increase, so after much googling, I managed to make threading (or what PowerShell calls threading any way) work through the use of RunSpacePools. I thought it was pretty much done, and performance is ok if you're looking at an input file of 5 IP addresses. However, tried it out with a CIDR /24 this morning, and it took about 20-30 minutes.
[Edit] I should add that this script will take a 'thread' value, but if none is provided uses a default thread value of number of cores + 1 in order to take advantage of the RunSpacePool multithreading.
So I started looking at how Fyodor increased his speed, and in The Art of Scanning in PHRACK article 11 he states (whilst talking about TCP Connect() scanning):
While making a separate connect() call for every targeted port in a
linear fashion would take ages over a slow connection, you can hasten
the scan by using many sockets in parallel.
That is clearly where some optimisation is available.
So, is anyone able to point me in the direction of how I enable this - as I say, still quite new to PoSH, so am pushing the limits of my comprehension with RunSpacePools.
Specifically, I would like some advice on a) if my instincts are right to increase the scan speed to increase socket parallelism, b) how to do that and c) if System.Net.Sockets.Socket is more appropriate.
function doConnect {
$ipLoopCount = 0
$portLoopCount = 0
# check for randomise switch
if ($randomise) {
$ipArray = makeRange | Sort-Object {Get-Random}
$portArray = makePortRange | Sort-Object {Get-Random}
} else {
# Connects to IPs in order
$ipArray = makeRange
$portArray = makePortRange
}
# initialise runspaces
if ($threads) {
$useThreads = $threads
} else {
$useThreads = ([int]$env:NUMBER_OF_PROCESSORS + 1)
}
$pool = [RunspaceFactory]::CreateRunspacePool(1, $useThreads)
$pool.ApartmentState = "MTA"
$pool.Open()
$runspaces = #()
# set higher priority for powershell process
if ($priority) {
$proc = Get-Process -Id $pid;
$proc.PriorityClass = 'High'
} else{
$proc = Get-Process -Id $pid;
$proc.PriorityClass = 'Normal'
}
# info object
$infoDisplay = #{
InputFile = $inFile
Target_IPs = $ipArray
Target_Ports = $portArray
Process_Priority = $proc.PriorityClass
Threads = $useThreads
}
[PSCustomObject]$infoDisplay
# set up scriptblock to pass to runspaces
$scriptblock = {
Param (
[IPAddress]$sb_ip,
[int]$sb_port
)
# This progress bar doesn't work yet
Write-Progress -Activity "Scan range $StartIPaddress - $EndIPAddress" -Status "% Complete:" -PercentComplete((($portLoopCount)/($ipArray.Length*$portArray.Length))*100)
if ($delay) {
$delay = Get-Random -Maximum 1000 -Minimum 1;
Start-Sleep -m $delay
}
$socket = New-Object System.Net.Sockets.TcpClient
$socket.Connect($sb_ip, $sb_Port)
if ($socket.Connected) {
Write-Output "Connected to $sb_port on $sb_ip"
#} else {
# Write-Output "Failed to connect to port $sb_port on $sb_ip"
}
$socket.Close()
}
foreach ($nIP in $ipArray) {
$ipLoopCount++
foreach ($nPort in $portArray) {
$portLoopCount++
$runspace = [PowerShell]::Create()
$null = $runspace.AddScript($scriptblock)
$null = $runspace.AddArgument($nIP)
$null = $runspace.AddArgument($nPort)
$runspace.RunspacePool = $pool
$runspaces += [PSCustomObject]#{
Pipe = $runspace;
Status = $runspace.BeginInvoke()
}
}
}
while ($runspaces.Status -ne $null) {
$completed = $runspaces | Where-Object { $_.Status.IsCompleted -eq $true }
foreach ($runspace in $completed) {
$runspace.Pipe.EndInvoke($runspace.Status)
$runspace.Status = $null
}
}
$pool.Close()
$pool.Dispose()
}
It may be that PowerShell is entirely the wrong thing to attempt this in, but is a useful exercise as the environment is quite locked down, and installing a 'proper' portscanner - i.e. nmap - is impossible.
[Edit 2] I don't think reducing the timeout and plumbing that into the logic is the solutiuon that I'm after.
[Edit 3] The parallel switch didn't help.
[Edit 4] Have been thinking about Asynchronous socket connections, as this may help the overall connections speed - but then you have to have another thread/process looking after the incoming traffic. Unsure as to the efficacy.
I have a script that cycles through a folder and condenses multiple CSVs to one xlsx file with the names of the CSV as worksheets. However, when the script runs as part of a larger script it failes when it refreshes the query.
$Query.Refresh()
On its own the script runs fine, but when added to the larger one it fails. Can anyone advise why this is the case?
Below is the error I get:
Insufficient memory to continue the execution of the program.
At C:\Temp\Scripts\Shares_Complete.psm1:254 char:13
+ $Query.Refresh()
+ ~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (:) [], OutOfMemoryException
+ FullyQualifiedErrorId : System.OutOfMemoryException
I have tried single csv with the same code and still the same result.
$script:SP = "C:\Temp\Servers\"
$script:TP = "C:\Temp\Servers\Pc.txt"
$script:FSCSV = "C:\Temp\Server_Shares\Server Lists\"
$script:Message1 = "Unknown Hosts"
$script:Message2 = "Unable to connect"
$script:Message3 = "Unknown Errors Occurred"
$script:Txt = ".txt"
$script:OT = ".csv"
$script:FSERROR1 = $FSCSV+$Message1+$OT
$script:FSERROR2 = $FSCSV+$Message2+$OT
$script:FSERROR3 = $FSCSV+$Message2+$OT
$script:ERL3 = $E4 + "Shares_Errors_$Date.txt"
$script:ECL1 = $E4 + "Shares_Exceptions1_$Date.txt"
$script:ERL1 = $E4 + "Shares_Errors1_$Date.txt"
$script:ECL3 = $E4 + "Shares_Exceptions_$Date.txt"
function Excel-Write {
if ($V -eq "1") {
return
}
[System.GC]::Collect()
$RD = $FSCSV + "*.csv"
$CsvDir = $RD
$Ma4 = $FSCSV + "All Server Shares for Domain $CH4"
$csvs = dir -path $CsvDir # Collects all the .csv's from the driectory
$FSh = $csvs | Select-Object -First 1
$FSh = ($FSh -Split "\\")[4]
$FSh = $FSh -replace ".{5}$"
$FSh
$outputxls = "$Ma4.xlsx"
$script:Excel = New-Object -ComObject Excel.Application
$Excel.DisplayAlerts = $false
$workbook = $excel.Workbooks.Add()
# Loops through each CVS, pulling all the data from each one
foreach ($iCsv in $csvs) {
$script:iCsv
$WN = ($iCsv -Split "\\")[-1]
$WN = $WN -replace ".{4}$"
if ($WN.Length -gt 30) {
$WN = $WN.Substring(0, [Math]::Min($WN.Length, 20))
}
$Excel = New-Object -ComObject Excel.Application
$Excel.DisplayAlerts = $false
$Worksheet = $workbook.Worksheets.Add()
$Worksheet.Name = $WN
$TxtConnector = ("TEXT;" + $iCsv)
$Connector = $worksheet.Querytables.Add($txtconnector,$worksheet.Range("A1"))
$query = $Worksheet.QueryTables.Item($Connector.Name)
$query.TextfileOtherDelimiter = $Excel.Application.International(5)
$Query.TextfileParseType = 1
$Query.TextFileColumnDataTypes = ,2 * $worksheet.Cells.Column.Count
$query.AdjustColumnWidth = 1
$Query.Refresh()
$Query.Delete()
$Worksheet.Cells.EntireColumn.AutoFit()
$Worksheet.Rows.Item(1).Font.Bold = $true
$Worksheet.Rows.Item(1).HorizontalAlignment = -4108
$Worksheet.Rows.Item(1).Font.Underline = $true
$Workbook.Save()
}
$Empty = $workbook.Worksheets.Item("Sheet1")
$Empty.Delete()
$Workbook.SaveAs($outputxls,51)
$Workbook.Close()
$Excel.Quit()
$ObjForm.Close()
Delete
}
Should continue script and create the xlsx.
Looking at your script, it doesn't surprise me you eventually run out of memory, because you are continouisly creating Com objects and never release them from memory.
Whenever you have created Com objects and finished with them, use these lines to free up the memory:
$Excel.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($workbook) | Out-Null
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($Excel) | Out-Null
[System.GC]::Collect()
[System.GC]::WaitForPendingFinalizers()
Also, take a good look at the code.
You are creating a $Script:Excel = New-Object -ComObject excel.application object before the foreach loop but you don't use that. Instead, you are creating new Excel and workbook objects inside the loop over and over again where there is absolutely no reason for it since you can re-use the one you created before the loop.
As an aside: The following characters are not allowed in excel worksheet names
\/?*[]:
Length limitation is 31 characters.
EDIT
I had a look at your project and especially the Shares_Complete.psm1 file.
Although I'm not willing of course to rewrite your entire project, I do have some remarks that may help you:
[System.Net.Dns]::GetHostByName() is obsolete. Use GetHostEntry()
when done with a Windows form, use $ObjForm.Dispose() to clear it from memory
you do a lot of [System.GC]::Collect(); [System.GC]::WaitForPendingFinalizers() for no reason
Why not use [System.Windows.MessageBox]::Show() instead of using a Com object $a = new-object -comobject wscript.shell. Again you leave that object lingering in memory..
use Join-Path cmdlet instead of $RD = $FSCSV + "*.csv" or $Cop = $FSCSV + "*.csv" constructs
remove invalid characters from Excel worksheet names (replace '[\\/?*:[\]]', '')
use Verb-Noun names for your functions so it becomes clear what they do. Now you have functions like Location, Delete and File that don't mean anything
you are calling functions before they are defined like in line 65 where you call function Shares. At that point it does not exist yet because the function itself is written in line 69
add [System.Runtime.Interopservices.Marshal]::ReleaseComObject($worksheet) | Out-Null in function Excel-Write
there is no need to use the variable $Excel in script scope ($Script:Excel = New-Object -ComObject excel.application) where it is used only locally to the function.
you may need to look at Excel specifications and limits
fix your indentation of code so it is clear when a loop or if starts and ends
I would recommend using variable names with more meaning. For an outsider or even yourself after a couple of months two-letter variable names become confusing
Keep in mind I'm new to this and be gentle.
I have a full file path for a document "C:\folder1\folder2\01.03.2017 - FileName.csv" and I want to manipulate it to return the dir that the file is stored in (C:\folder1\folder2), minus the filename (01.03.2017 - FileName.csv).
I'm trying to make this modular so that it will work regardless of the amount of sub-folders a file sits in; we also won't know the FileName in advance, so again this needs to be modular and remove up to and including the last "\"
For background info on how this is currently built, I nicked a bit of code from a previous question I saw on StackOverflow:
Function Get-FileName($initialDirectory)
{
[System.Reflection.Assembly]::LoadWithPartialName(“System.windows.forms”) |
Out-Null
$OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog
$OpenFileDialog.initialDirectory = $initialDirectory
$OpenFileDialog.filter = “All files (*.*)| *.*”
$OpenFileDialog.ShowDialog() | Out-Null
$OpenFileDialog.filename
} #end function Get-FileName
# *** Entry Point to Script ***
$originalData = Get-FileName -initialDirectory “c:\” | Out-String
Write-Host $originalData
$originalDir = $originalData.Split('\')
$originalDir
Running this currently prompts for an "open dialog box" you would see in Windows. You select a folder and the output is currently:
C:\folder1\folder2\01.03.2017 - FileName.csv
C:
folder1
folder2
01.03.2017 - FileName.csv
I've tried a few different -join attempts but none successful.
We will have an input of C:\folder1\folder2\01.03.2017 - FileName.csv as a variable $originalData.
We want the output to be C:\folder1\folder2 as a variable $originalDir.
Function Get-FileName($initialDirectory)
{
[System.Reflection.Assembly]::LoadWithPartialName(“System.windows.forms”) |
Out-Null
$OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog
$OpenFileDialog.initialDirectory = $initialDirectory
$OpenFileDialog.filter = “All files (*.*)| *.*”
$OpenFileDialog.ShowDialog() | Out-Null
$OpenFileDialog.filename
} #end function Get-FileName
$originalData = Get-FileName -initialDirectory “c:\”
Write-Host $originalData
$originalDir = (Get-ChildItem $originalData).DirectoryName
you can use it like so, so take result or your function and use it with get-childitem.
edit: notice there's no | Out-String on the third to last line
I am using a Powershell script to write to a text file. A client showed me this Powershell script to use to replace a excel macro I used to use...
$computers= gc "C:\Users\Powershell\DeviceList.txt"
foreach ($computername in $computers)
{
write-output "<$computername>
active = yes
group =
interval = 5min
name = $computername
host = $computername
community =
version = 1
timeout = 0
retries = default
port = 161
qos_source = 1
</$computername>" | Out-File -filepath "C:\Users\Powershell\Cisco_Mon.txt" -append
}
It works great but now I wanted to build on it to add additional variables. In a perfect world I would like it to read from an excel spreadsheed grabbing each rowof data and each column being defined as a variable. For now using another text file is fine as well. Here is what I started with (it doesnt work) but you can see where I am going with it...
$computers= gc "C:\Users\Powershell\devicelist.txt"
$groups= gc "C:\Users\Powershell\grouplist.txt"
foreach ($computername in $computers) + ($groupname in $groups)
{
write-output "<$computername>
active = yes
group = $groupname
interval = 5min
name = $computername
host = $computername
community =
version = 1
timeout = 0
retries = default
port = 161
qos_source = 1
</$computername>" | Out-File -filepath "C:\Users\Powershell\Cisco_Mon.txt" -append
}
Of course it is not working. Essentially I would LOVE it if I could define each of the above options into a variable from an excel spreadsheet, such as $community, $interval, $active, etc.
Any help with this would be very much appreaciated. If someone could show me how to use an excel spreadsheet, have each column defined as a variable, and write the above text with the variables, that would be GREAT!!!.
Thanks,
smt1228#gmail.com
An Example of this would be the following...
Excel Data: (Colums seperated with "|"
IP | String | Group
10.1.2.3 | Public | Payless
Desired Output:
<10.1.2.3>
active = yes
group = Payless
interval = 5min
name = 10.1.2.3
host = 10.1.2.3
community =
version = 1
timeout = 0
retries = default
port = 161
qos_source = 1
</10.1.2.3>
Addition:
Pulling data from CSV for IP, String, Group where data is as follows in CSV...
10.1.2.3,public,group1
10.2.2.3,default,group2
10.3.2.3,public,group3
10.4.2.3,default,group4
to be writting into a .txt file as
IP = 10.1.2.3.
String = public
Group = Group1
and look for each line in the CSV
Ok, new answer now. The easiest way would be to save your Excel document as CSV so that it looks like this (i.e. very similar to how you presented your data above):
IP,String,Group
10.1.2.3,Public,Payless
You can still open that in Excel, too (and you avoid having to use the Office interop to try parsing out the values).
PowerShell can parse CSV just fine: with Import-Csv:
Import-Csv grouplist.csv | ForEach-Object {
"<{0}>
active = yes
group = {1}
interval = 5min
name = 10.1.2.3
host = 10.1.2.3
community =
version = 1
timeout = 0
retries = default
port = 161
qos_source = 1
</{0}>" -f $_.IP, $_.Group
}
I'm using a format string here where {0}, etc. are placeholders. -f then is the format operator which takes a format string on the left and arguments for the placeholders on the right. You can also see that you can access the individual columns by their name, thanks to Import-Csv.