Powershell: Extract several strings from txt and create table out of it - string

I need to create a csv file out of values that are spread over many txt files. Here is an example for one of the txt files (they are all formatted the same way and stored in one folder, lets say c:\user\txtfiles):
System: asdf
Store: def
processid: 00001
Language: english
prodid: yellowshoes12
email: asdf#asdf.com
prodid: blueshoes34
some
other
text blabla
The result csv should look like this (i added values from another sample txt just to make it clear):
processid, prodid
00001, yellowshoes12
00001, blueshoes34
00002, redtshirt12
00002, greensocks34
That means that every product ID in the txt should be assigned to the one processid in the txt and added as single line to the csv.
I tried to reach the result as follows:
$pathtofiles = Get-ChildItem c:\user\txtfiles | select -ExpandProperty FullName
$parsetxt = $pathtofiles |
ForEach {
$orderdata = Import-Csv $_ |
Where-Object {($_ -like '*processid*') -or ($_ -like '*prodid*')} |
foreach {
write-output $orderdata -replace 'processid: ','' -replace 'prodid: ',''
}
}
$orderdata
So my intention was to isolate the relevant lines, delete everything that is not wanted, assign the values to variables and build a table out of it. One problem is that if I replace $orderdata from the end of the code into the end of the first foreach-loop nothing is printed. But after deliberating quite a while I am not sure if my approach is a good one anyway. So any help would be very appreciated!
Daniel

I think this is best done using a switch -Regex -File construct while iterating over the files in your folder.
# get the files in the folder and loop over them
$result = Get-ChildItem -Path 'c:\user\txtfiles' -Filter '*.txt' -File | ForEach-Object {
# the switch processes each line of a file and matches the regex to it
switch -Regex -File $_.FullName {
'^processid:\s+(\d+)' { $id = $matches[1] }
'^prodid:\s+(\w+)' { [PsCustomObject]#{'processid' = $id; 'prodid' = $matches[1]}}
}
} | Sort-Object processid, prodid
# output on console screen
$result
# output to CSV file
$result | Export-Csv -Path 'c:\user\txtfiles\allids.csv'
Result on screen:
processid prodid
--------- ------
00001 blueshoes34
00001 yellowshoes12
00002 greenshoes56
00002 purpleshoes88

Related

How can I replace a string in multiple files with a value from a list, sequentially in Powershell?

Lets say I have a bunch of text files with people's names, that all have this as the content:
number
I want to replace "number" with a value from a CSV or text file, sequentially, and based on the file name. CSV has two columns, name and number:
Joe 5551011000
Gary 5551011001
Clark 5551011002
So I want to find the text file named Joe, and replace the "number" with "5551011000", and the text file named Gary, and replace "number" with "5551011001".
Thank you!
I didn't get too far:
Get-ChildItem "C:\test\*.txt" -Recurse | ForEach-Object -Process {
(Get-Content $_) -Replace 'changeme', 'MyValue' | Set-Content $_
}
This gets me party there, but I don't know how to find a specific file, then replace "number" in that file with the correct value that matches the name.
I also tried a different approach, with manual entry, and it works, but I need it to just be automated:
get-childitem c:\Marriott -recurse -include *.txt |
select -expand fullname |
foreach {
$new = Read-Host 'What is the new value you want for ' $_
(Get-Content $_) -replace 'number',$new |
Set-Content $_
}
I would convert your CSV to a hashtable, then this gets pretty simple.
$ReplaceHT = #{}
Import-Csv c:\path\to\file.csv -Delimiter ' ' -Header 'FileName','Number' | ForEach-Object {$ReplaceHT.add($_.FileName,$_.Number)}
Get-ChildItem c:\Marriott -recurse -include *.txt -PipelineVariable 'File'|Where{$_.name -in $ReplaceHT.Keys} |ForEach-Object{
(Get-Content $File.FullName) -replace 'changeme', $ReplaceHT[$File.Name] | Set-Content $File.FullName
}

Date content comparison if it is greater then todays date

I am looking for some script in PowerShell that will compare the date present in an inside text file as content and compare if that date is >today`+15 days then print the file name.
Also, if that script can compare the date as mentioned above along with the other string if both conditions are matching then print the file name.
The below command gives me the output for those which have matching string same as hello and was created 30 days back. But now I want to fulfill the above two conditions no matter when the file was created.
Get-ChildItem -Path C:\Users\vpaul\Downloads\functional-script\*.txt -Recurse | Select-String -Pattern 'Hello', 'Hell' | Where CreationTime -lt (Get-Date).AddDays(-6)| Export-Csv C:\Users\vpaul\Downloads\functional-script\File_Name.csv -NoTypeInformation
The output from Select-String doesn't have a CreationTime property, which is why your filtering fails - CreationTime doesn't resolve to anything so it's always "less than" any value you provide.
Either do the filtering on CreationTime before piping to Select-String:
Get-ChildItem ... |Where-Object CreationTime -lt (Get-Date).AddDays(-6) |Select-String 'Hell' | ...
Or use the Path property on the output from Select-String to look up the files attributes again:
Get-ChildItem ... |Select-String 'Hell' |Where-Object {(Get-ItemPropertyValue -LiteralPath $_.Path -Name CreationTime) -lt (Get-Date).AddDays(-6)} |...
Since it looks like you're trying to get and compare a date from a matched text string inside the file, as well as CreationTime file attribute... +15 Days and -6 Days respectively...
Example Text file Content:
Hello 4/1/2021
You could try something similar to this:
$ALL_RECURSED_TXTs = Get-ChildItem -Path '[Folder to Recurse]\*.txt' -Recurse | Where-Object { $_.CreationTime -lt (Get-Date).AddDays(-6) };
foreach($File in $ALL_RECURSED_TXTs) {
Get-Content -Path $File.FullName | Select-String -Pattern 'Hello', 'Hell' |
ForEach-Object {
# Find a RegEx match for your Date String that is in the File
$_ -match 'Hello\s(\d+\/\d+\/\d{4}).*' | Out-Null;
if((Get-date($matches[1])) -gt ((Get-Date).AddDays(15))) {
"$($File.FullName)" | Out-File -FilePath '[Path to Output]\MyPrintedFileNames.txt' -Append;
}
}
}
If you want to see your matched lines in your outfile...
"$_ : $($File.FullName)" | Out-File -FilePath '[Path to Output]\MyPrintedFileNames.txt' -Append;
"but now I want to fulfill the above two conditions no matter when the file was created."
Scrap the Where-Object filter on Get-ChildItem if you want all txt files.
Edit: Getting confused again. Lol. If your txt file date string is not on same line as your "Hello|Hell" it'll get more complex. Good Luck!

Powershell: Replace string in File1 based on string in File2

I am being forced to use Powershell because of my work. I have used it to do a couple of things but one of my codes is now trash because I have to update a string in a file to include a year that is in a second file. Here is what I'm working with:
File1: Contains a few strings but in there is 48 strings that say:
Jenga_Sequence-XXXX.consensus_Bob_0.6_quality_20
The main point of the string is Sequence-XXXX, sorry for the random place holders.
File2: is a table that has the strings:
John/USA/Sequence-XXXX/Year
I need to replace the strings in File1 with the corresponding Strings in File2.
Sample Text of File1:
Jenga_Sequence-0001.consensus_Bob_0.6_quality_20
AAAAAAAAAAAAAAAAAAAAAAAAA
Jenga_Sequence-0002.consensus_Bob_0.6_quality_20
aaaaaaaaaaaaaaaaaaaaaaaaa
Jenga_Sequence-0003.consensus_Bob_0.6_quality_20
bbbbbbbbbbbbbbbbbbbbbbbbb
Jenga_Sequence-0004.consensus_Bob_0.6_quality_20
BBBBBBBBBBBBBBBBBBBBBBBBB
Jenga_Sequence-0005.consensus_Bob_0.6_quality_20
QQQQQQQQQQQQQQQQQQQQQ
Sample Table of File2:
|Sequence_ID|Date|
|---------------------------|----------|
|John/USA/Sequence-0003/2020|10/11/2020|
|John/USA/Sequence-0001/2021|1/5/2021|
|John/USA/Sequence-0005/2021|1/10/2021|
|John/USA/Sequence-0004/2020|12/23/2020|
|John/USA/Sequence-0002/2021|1/6/2021|
So, I need a Powershell code that replaces
Jenga_Sequence-0001.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0001/2021,
Jenga_Sequence-0002.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0002/2021,
Jenga_Sequence-0003.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0003/2020, and so on. There are typically 48 of these in a file.
My previous code simple replaced "Jenga_" with "John/USA/" and ".consensus_Bob_0.6_quality_20" with "/2020" but now that we are seeing "/2021" the static code will not work.
I am still open to replacing pieces of the string and having a code that sets the year replacement to the correct year.
That was the angle I was doing a broad search on but I could never find anything specific enough to help.
Any help will be appreciated!
EDIT: Here is the part of my previous code that dealt with the finding and replacing, even though I feel it needs to be trashed:
$filePath = 'Jenga_Combined.txt'
$tempFilePath = "$env:TEMP\$($filePath | Split-Path -Leaf)"
$find = 'Jenga_'
$replace = 'John/USA/'
$find2 = '.consensus_Bob_0.6_quality_20'
$replace2 = '/2020'
(Get-Content -Path $filePath) -replace $find, $replace -replace $find2, $replace2 | Add-Content -Path $tempFilePath
Remove-Item -Path $filePath
Move-Item -Path $tempFilePath -Destination $filePath
EDIT2: The "Real Data" from file2. File2 is a Tab Delimited .txt file which makes it not "look great" when copy and pasting. Hopefully this helps. File1 is exactly like above (although the AAAAA stuff is roughly 30,000 letters long)
Sequence_ID date
John/USA/Sequence-0003/2020 2020-10-11
John/USA/Sequence-0001/2021 2021-01-05
John/USA/Sequence-0005/2021 2021-01-10
John/USA/Sequence-0004/2020 2020-12-23
John/USA/Sequence-0002/2021 2021-01-06
Dan
The common factor here is the Sequence_ID number in both files.
You can do this like:
$csvData = Import-Csv -Path 'D:\Test\File2.txt' -Delimiter "`t"
$result = switch -Regex -File 'D:\Test\Jenga_Combined.txt' {
'^Jenga_Sequence-(\d+).*' {
$replace = $csvData | Where-Object { $_.Sequence_ID -like "*Sequence-$($matches[1])*" }
if (!$replace) { Write-Warning "No corresponding Sequence_ID $($matches[1]) found!"; $_ }
else { $replace.Sequence_ID }
}
default { $_ }
}
# output on screen
$result
# output to new file
$result | Set-Content -Path 'D:\Test\Jenga_Combined_NEW.txt' -Force
Output on screen:
John/USA/Sequence-0001/2021
AAAAAAAAAAAAAAAAAAAAAAAAA
John/USA/Sequence-0002/2021
aaaaaaaaaaaaaaaaaaaaaaaaa
John/USA/Sequence-0003/2020
bbbbbbbbbbbbbbbbbbbbbbbbb
John/USA/Sequence-0004/2020
BBBBBBBBBBBBBBBBBBBBBBBBB
John/USA/Sequence-0005/2021
QQQQQQQQQQQQQQQQQQQQQ
Of course, you need to change the file paths to match your environment

Outputting PowerShell data to a string

This is really PowerShell 101, I realise, but I'm stuck.
I'm trying to iterate through a folder tree, getting each subfolder name and a count of files. No problems there.
The new requirement is to get the ACLs on each subfolder as well. All of this data needs to be output as a CSV file, with a line consisting of each folder name, the file count, and the ACLs in a single string in one field of the CSV (I was going to delimit them with semicolons).
I am open to exporting to XML if the data can be viewed in Excel.
The part where I'm stuck is getting the ACL information into a single string for the CSV.
Get-ACL on each directory shows the data as follows (I'm doing a Select to just get the IdentityReference and FileSystemRights, which is all we're interested in):
IdentityReference FileSystemRights
----------------- ----------------
BUILTIN\Users ReadAndExecute, Synchronize
BUILTIN\Users AppendData
BUILTIN\Users CreateFiles
I would like the output file formatted with one line per subdirectory, similar to
#filecount,folder,perms
51,C:\temp,BUILTIN\Users:ReadAndExecute,Synchronize;BUILTIN\Users:AppendData...
I however can't get any kind of join working to have it presented in this way. I don't care about what combination of delimiters are used (again, must be readable in Excel).
The script, such as it is, is as follows. The output file has its line of data appended with each directory it traverses. I'm sure this isn't very efficient, but I don't want the process consuming all the server memory either. The bits I can't figure out are prepended with ###.
(Get-ChildItem C:\temp -recurse | Where-Object {$_.PSIsContainer -eq $True}) | foreach {
$a = ($_.GetFiles().Count)
$f = $_.FullName
$p = (get-acl $_.FullName).Access | select-object identityreference,filesystemrights
### do something with $p?
Out-File -FilePath c:\outfile.csv -Append -InputObject $a`,$f`,###$p?
}
Since you want all ACEs of a folder mangled into a single line you need something like this:
Get-ChildItem 'C:\temp' -Recurse | ? { $_.PSIsContainer } | % {
# build a list of "trustee:permissions" pairs
$perms = (Get-Acl $_.FullName).Access | % {
"{0}:{1}" -f $_.IdentityReference, $_.FileSystemRights
}
New-Object -Type PSObject -Property #{
'Filecount' = $_.GetFiles().Count
'Folder' = $_.FullName
'Permissions' = $perms -join ';' # join the list to a single string
}
} | Export-Csv 'c:\outfile.csv' -NoType
Repeated appending inside a loop usually guarantees poor performance, so it should be avoided whenever possible. The outer loop creates a list of custom objects, which can then be exported via Export-Csv in a single go.

Optimizing simple search script in PowerShell

I need to create a script to search through just below a million files of text, code, etc. to find matches and then output all hits on a particular string pattern to a CSV file.
So far I made this;
$location = 'C:\Work*'
$arr = "foo", "bar" #Where "foo" and "bar" are string patterns I want to search for (separately)
for($i=0;$i -lt $arr.length; $i++) {
Get-ChildItem $location -recurse | select-string -pattern $($arr[$i]) | select-object Path | Export-Csv "C:\Work\Results\$($arr[$i]).txt"
}
This returns to me a CSV file named "foo.txt" with a list of all files with the word "foo" in it, and a file named "bar.txt" with a list of all files containing the word "bar".
Is there any way anyone can think of to optimize this script to make it work faster? Or ideas on how to make an entirely different, but equivalent script that just works faster?
All input appreciated!
If your files are not huge and can be read into memory then this version should work quite faster (and my quick and dirty local test seems to prove that):
$location = 'C:\ROM'
$arr = "Roman", "Kuzmin"
# remove output files
foreach($test in $arr) {
Remove-Item ".\$test.txt" -ErrorAction 0 -Confirm
}
Get-ChildItem $location -Recurse | .{process{ if (!$_.PSIsContainer) {
# read all text once
$content = [System.IO.File]::ReadAllText($_.FullName)
# test patterns and output paths once
foreach($test in $arr) {
if ($content -match $test) {
$_.FullName >> ".\$test.txt"
}
}
}}}
Notes: 1) mind changed paths and patterns in the example; 2) output files are not CSV but plain text; there is not much reason in CSV if you are interested just in paths - plain text files one path per line will do.
Let's suppose that 1) the files are not too big and you can load it into memory, 2) you really just want the Path of the file, that matches (not the line etc.).
I tried to read the file only once and then iterate through the regexes. There is some gain (it's a faster then the original solution), but the final result will depend on other factors like file sizes, count of files etc.
Also removing 'ignorecase' makes it faster a little bit.
$res = #{}
$arr | % { $res[$_] = #() }
Get-ChildItem $location -recurse |
? { !$_.PsIsContainer } |
% { $file = $_
$text = [Io.File]::ReadAllText($file.FullName)
$arr |
% { $regex = $_
if ([Regex]::IsMatch($text, $regex, 'ignorecase')) {
$res[$regex] = $file.FullName
}
}
}
$res.GetEnumerator() | % {
$_.Value | Export-Csv "d:\temp\so-res$($_.Key).txt"
}

Resources