Powershell - Pulling string from txt, splitting it, then concatenating it for archive - string

I have an application where I am getting a list of new\modified files from git status, then I take the incomplete strings from that file, concatenate them with the root dir file path, then move those files to an archive. I have it half working, but the nature of how I am using powershell does not provide error reports and the process is obviously erroring out. Here is the code I am trying to use. (It has gone through several iterations, please excuse the commented out portions) Basically I am trying to Get-Content from the txt file, then replace ? with \ (for some reason the process that creates the txt love forward slashes...), then split that string at the spaces. The only part of the string I am interested in is the last part, which I am trying to concatenate with the known working root directory, then I am attempting to move those to an archive location. Before you ask, this is something we are not willing to track in git, due to the nature of the files (they are test outputs that are time stamped, we want to save them on a per test run basis, not in git) I am still fairly new to powershell and have been banging my head against this rock for far too long.
Get-Content $outfile | Foreach-Object
{
#$_.Replace("/","\")
#$lineSplit = $_.Split(' ')
$_.Split(" ")
$filePath = "$repo_dir\$_[-1]"
$filePath.Replace('/','\')
"File Path Created: $filePath"
$untrackedLegacyTestFiles += $filePath
}
Get-Content $untrackedLegacyTestFiles | Foreach-Object
{
Copy-Item $_ $target_root -force
"Copying File: $_ to $target_root"
}
}
the $outfile is a text file where each line has a partial file path leading to a txt file generated by a test application we use. This info is provided by git, so it looks like this in the $outfile txt file:
!! Some/File/Path/Doc.txt
The "!!" mean git sees it as a new file, however it could be several characters from a " M" to "??". Which is why I am trying to split it on the spaces and take only the last element.
My desired output would be to take the the last element of the split string from the $outfile (Some/File/Path/Doc.txt) and concatenate it with the $repo_dir to form a complete file path, then move the Doc.txt to an archive location ($target_root).

To combine a path in PowerShell, you should use the Join-Path cmdlet. To extract the path from your string, you can use a regex:
$extractedPath = [regex]::Match('!! Some/File/Path/Doc.txt', '.*\s(.+)$').Groups[1].Value
$filePath = Join-Path $repo_dir $extractedPath
The Join-Path cmldet will also convert all forward slashes to backslashes so no need to replace them :-).
Your whole script could look like this:
Get-Content $outfile | Foreach-Object {
$path = Join-Path $repo_dir ([regex]::Match($_, '.*\s(.+)$').Groups[1].Value)
Copy-Item $path $target_root -force
}
If you don't like to use regexin your code, you can also extract the path using:
$extractedPath = '!! Some/File/Path/Doc.txt' -split ' ' | select -Last 1
or
$extractedPath = ('!! Some/File/Path/Doc.txt' -split ' ')[-1]

Related

How to recursively search all files in a directory and sub-directories using PowerShell?

I'm not understanding where the recursion is occurring nor how it's used in the below tree function (which is meant to emulate some of the linux tree command results).
From the tree function, how are files (or file names and their path) passed to, here, a SearchString function?
for context, here's a REPL session demonstrating the end-goal on a single file: getting the PSPath property for a file, and using that property for a simple regex.
Session transcript:
posh> $dir = "/home/nicholas/Calibre Library/Microsoft Office User/549 (1476)"
posh> $files = Get-ChildItem -Path $dir –File
posh> $files.Length
3
posh> $files[0].Extension
.txt
posh> $files[0].PSPath
Microsoft.PowerShell.Core\FileSystem::/home/nicholas/Calibre Library/Microsoft Office User/549 (1476)/549 - Microsoft Office User.txt
posh> $pattern = '(?=.*?foo)(?=.*?bar)'
posh> $string = Get-Content $files[0]
posh> $string | Select-String $pattern
This file doesn't have any "foo" and "bar" matches. The goal is to search the entire Calibre library using PowerShell as above.
large output from a tree of the Calibre library trimmed to a single result:
Directory: /home/nicholas/Calibre Library/Microsoft Office User/548 (1474)
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 2/20/2021 3:22 AM 159883 548 - Microsoft Office User.txt
----- 2/20/2021 2:13 AM 351719 cover.jpg
----- 2/20/2021 2:39 AM 1126 metadata.opf
posh> ./worker.ps1
How is the above file and path passed to the SearchString function?
the goal being to iterate through the entire library and search all plain-text file. (Assumption being that plain-text files have a ".txt" extension.)
library code:
function SearchFile($dir,$file)
{
$path = [string]::Concat($dir,"/",$file)
$pattern='(?=.*?foo)(?=.*?bar)'
$string = Get-Content $path
$result = $string | Select-String $pattern
$result
}
function tree($dir)
{
"$dir"
$tree = Get-ChildItem -Recurse
$tree = Get-ChildItem -Path $dir -Recurse
# get any files and invoke SearchFile here ?
$tree
}
worker code:
. /home/nicholas/powershell/functions/library.ps1
$dir = "/home/nicholas/Calibre Library"
tree $dir
The execution of the SearchFile function should be triggered when a ".txt" file is found. That logic is missing. But the larger missing piece is how to invoke SearchFile from the tree function so that every file gets searched.
How is that done? Leaving aside the file-type or file extension. Not seeing where the recursion occurs.
You are really overcomplicating things. You can do this very easily by using Get-ChildItem to find your txt files recursively in $dir path and then piping these FileInfo objects directly to Select-String cmdlet which accepts pipeline input and will grab the PSPath from the FileInfo object being passed to it and do its thing. Select-String will do this for every object that Get-ChildItem sends to it which are FileInfo objects for all txt files found recursively in your $dir path.
$dir = '/home/nicholas/Calibre Library/Microsoft Office User/549 (1476)'
Get-ChildItem -Recurse -Path $dir -Filter *.txt |
Select-String -Pattern '(?=.*?foo)(?=.*?bar)'
Get-ChildItem already does the recursion for you when you specify the -Recurse argument. For your code it doesn't make any difference. You get a linear list of all file informations that you can process using ForEach-Object in the same way as if you didn't specify -Recurse.
The SearchFile function should be executed when a ".txt" file is found.
Use the -Filter parameter to specify *.txt. Also when you want to get files only, always pass -File. This allows the filesystem provider to already skip directories, which is faster and also more correct (in theory there could be directories named e. g. foo.txt which would let SearchFile run into an error).
function tree($dir)
{
"$dir"
Get-ChildItem -Path $dir -Recurse -File -Filter *.txt | ForEach-Object {
SearchFile -dir $_.Directory.PSPath -file $_.Name
}
}
I don't know why your function SearchFile has separate parameters for directory and file name. The Get-ChildItem already outputs the full path in $_.PSPath. It doesn't make much sense to split the path apart and join it together again in SearchFile. I suggest you replace them by a single Path parameter.

Excluding lines, which are not containing one or multiple strings from text file

I have multiple server log files. In total they contain around 500.000 lines of log text. I only want to keep the lines that contain "Downloaded" and "Log". Lines I want to exclude are focussing on error logs and basic system operations like "client startup", "client restart" and so on.
An example of the lines we are looking for is this one:
[22:29:05]: Downloaded 39 /SYSTEM/SAP logs from System-4, customer (000;838) from 21:28:51,705 to 21:29:04,671
The lines that are to be kept should be complemented by the date string, which is part of the log-file name. ($date)
Further, as the received logs are rather unstructured, the filtered files should be transformed into one csv-file (columns: timestamp, log downloads, system directory, system type, customer, start time, end time, date [to be added to every line from file name]. The replace operation of turning spaced into comma is just a first try to bring in some structure to the data. This file is supposed to be loaded into a python dashboard program.
At the moment it takes 2,5 mins to preprocess 3 Txt-Files, while the target is 5-10 seconds maximum, if even possible.
Thank you really much for your support, as I'm struggeling with this since Monday last week. Maybe powershell is not the best way to go? I'm open for any help!
At the moment I'm running this powershell script:
$files = Get-ChildItem "C:\Users\AnonUser\RestLogs\*" -Include *.log
New-Item C:\Users\AnonUser\RestLogs\CleanedLogs.txt -ItemType file
foreach ($f in $files){
$date = $f.BaseName.Substring(22,8)
(Get-Content $f) | Where-Object { ($_ -match 'Downloaded' -and $_ -match 'SAP')} | ForEach-Object {$_ -replace " ", ","}{$_+ ','+ $date} | Add-Content CleanedLogs.txt
}
This is about the fastest I could manage. I didn't test using -split vs -replace or special .NET methods:
$files = Get-ChildItem "C:\Users\AnonUser\RestLogs\*" -Include *.log
New-Item C:\Users\AnonUser\RestLogs\CleanedLogs.txt -ItemType file
foreach ($f in $files) {
$date = $f.BaseName.Substring(22,8)
(((Get-Content $f) -match "Downloaded.*?SAP") -replace " ",",") -replace "$","$date" | add-content CleanedLogs.txt
}
In general, speed is gained by removing loops and Where-Object "filtering."

Rename multiple files with string from .txt file using PowerShell

Im currently working on a programm that needs a .xml file, reads it into a Oracle Database and afterwards exports a new .xml file. But the problem is that the new file has to have the exact same name as the original file.
I saved the original filenames into a .txt file and i'm now trying to search for a keyword inside the lines to rename the right files with the correct names inside the .txt file. Here an example:
My 4 files (exported from the Database):
PM.Data_information.xml
PM.Data_location.xml
PM.Cover_quality.xml
PM.Cover_adress.xml
Content of Namefile.txt (original names):
PM.Data_information_provide_SE-R-SO_V0220_657400509_3_210.xml
PM.Data_location_provide_SE-R-SO_V0220_9191200509_3_209.xml
PM.Cover_quality_provide_SE-R-SO_V0220_354123509_3_211.xml
PM.Cover_adress_provide_SE-R-SO_V0220_521400509_3_212.xml
I only worked out how to get a line by selecting the linenumber:
$content = Get-Content C:\Namefile.txt
$informationanme = $content[0]
Rename-Item PM.Data_information.xml -NewName $informationname
Isn't there a way to select that line by searching for the keyword inside the string?
$content = Get-Content C:\temp\ps\NewFile.txt
$files = Get-ChildItem c:\temp\ps\
$content |
%{
$currentLine = $_
$file = $files | Where-Object { $currentLine.StartsWith($_.Name.Replace(".xml", "")) }
Rename-Item $file.Name $currentLine
}
This code should do the trick. Note you will need to have all of your files that need renaming in one folder. Set the folder path to the $files variable (currently set to c:\temp\ps). Set the path where your NewFile.txt is to the $content path.
The code works by looping around each line in the NewFile.txt and finding any file where the name matches the start of the line (if there are any files that do not follow this pattern you will obviously need to update the code but hopefully gives you a good starting point).
other solution ;)
gci -Path "c:\temp" -File -Filter "*.xml" | % { rni $_.fullname (sls "C:\temp\Namefile.txt" -Pattern ([System.IO.Path]::GetFileNameWithoutExtension($_.fullname))).Line }

Outputting PowerShell data to a string

This is really PowerShell 101, I realise, but I'm stuck.
I'm trying to iterate through a folder tree, getting each subfolder name and a count of files. No problems there.
The new requirement is to get the ACLs on each subfolder as well. All of this data needs to be output as a CSV file, with a line consisting of each folder name, the file count, and the ACLs in a single string in one field of the CSV (I was going to delimit them with semicolons).
I am open to exporting to XML if the data can be viewed in Excel.
The part where I'm stuck is getting the ACL information into a single string for the CSV.
Get-ACL on each directory shows the data as follows (I'm doing a Select to just get the IdentityReference and FileSystemRights, which is all we're interested in):
IdentityReference FileSystemRights
----------------- ----------------
BUILTIN\Users ReadAndExecute, Synchronize
BUILTIN\Users AppendData
BUILTIN\Users CreateFiles
I would like the output file formatted with one line per subdirectory, similar to
#filecount,folder,perms
51,C:\temp,BUILTIN\Users:ReadAndExecute,Synchronize;BUILTIN\Users:AppendData...
I however can't get any kind of join working to have it presented in this way. I don't care about what combination of delimiters are used (again, must be readable in Excel).
The script, such as it is, is as follows. The output file has its line of data appended with each directory it traverses. I'm sure this isn't very efficient, but I don't want the process consuming all the server memory either. The bits I can't figure out are prepended with ###.
(Get-ChildItem C:\temp -recurse | Where-Object {$_.PSIsContainer -eq $True}) | foreach {
$a = ($_.GetFiles().Count)
$f = $_.FullName
$p = (get-acl $_.FullName).Access | select-object identityreference,filesystemrights
### do something with $p?
Out-File -FilePath c:\outfile.csv -Append -InputObject $a`,$f`,###$p?
}
Since you want all ACEs of a folder mangled into a single line you need something like this:
Get-ChildItem 'C:\temp' -Recurse | ? { $_.PSIsContainer } | % {
# build a list of "trustee:permissions" pairs
$perms = (Get-Acl $_.FullName).Access | % {
"{0}:{1}" -f $_.IdentityReference, $_.FileSystemRights
}
New-Object -Type PSObject -Property #{
'Filecount' = $_.GetFiles().Count
'Folder' = $_.FullName
'Permissions' = $perms -join ';' # join the list to a single string
}
} | Export-Csv 'c:\outfile.csv' -NoType
Repeated appending inside a loop usually guarantees poor performance, so it should be avoided whenever possible. The outer loop creates a list of custom objects, which can then be exported via Export-Csv in a single go.

save as "proper" csv / delete quotes from CSV except for where comma exists

I am downloading a CSV from a SharePoint site. It comes with a .csv file extension.
When I inspect the file's contents by opening it in Notepad, I see data that looks like this sample row:
"TITLE",OFFICE CODE,="","CUSTOMER'S NAME",ACCOUNT
I want the data look like this:
TITLE,OFFICE CODE,,"CUSTOMER'S NAME",ACCOUNT
One way to solve this problem is manually. When I open the file in Excel and save it (without altering anything), it prompts me with the following: fileOrig.csv may contain features that are not compatible with CSV (Comma delimited). Do you want to keep the workbook in this format? When I save it, and then inspect it in Notepad, the data is formatted according to how I want it do look.
Is there a quick way to resave the original CSV with PowerShell?
If there is no quick way to resave the file with PowerShell, I would like to use PowerShell to parse it.
These are the parsing rules I want to introduce:
Remove encapsulating doublequote from cells that do not contain a , char
Remove the = char
I tried writing a test script that just looks at the column that potentially contains , chars. It is supposed to find the cells that do not contain a , char, and remove the doublequotes that encapsulate the text. It does not work, because I think it tosses the doublequote upon Import-Csv
$source = 'I:\dir\fileOrig.csv'
$dest = 'I:\dir\fileStaging.csv'
$dest2 = 'I:\dir\fileFinal.csv'
get-content $source |
select -Skip 1 |
set-content "$file-temp"
move "$file-temp" $dest -Force
$testcsv = Import-Csv $dest
foreach($test in $testcsv)
{
#Write-Host $test."CUSTOMER NAME"
if($test."CUSTOMER NAME" -NotLike "*,*") {
$test."CUSTOMER NAME" -replace '"', ''
}
}
$testcsv | Export-Csv -path $dest2 -Force
Can someone please help me either with implementing the logic above, or if you know of a better way to save the file as a proper CSV, can you please let me know?
Since Excel can handle the problem, why not use a vbs script to automate it? Use notepad to create "Fix.vbs" with the following lines:
Set objExcel = CreateObject("Excel.Application")
Set objWorkbook = objExcel.Workbooks.Open("C:\test\test.csv")
objworkbook.Application.DisplayAlerts = False
objworkbook.Save
objexcel.quit
run it from a command prompt and it should do the trick.
I see that there's already an approved answer, I'm just offering an alternative.
If you want to keep it in PowerShell you could do this:
$File = 'I:\dir\fileOrig.csv'
$dest = 'I:\dir\fileStaging.csv'
$Output = 'I:\dir\fileFinal.csv'
$CSV = Import-Csv $file
$Members = $test|gm -MemberType Properties|select -ExpandProperty name
$test|%{$row=$_;$Members|%{if(!($row.$_ -match "\w+")){$row.$_=$null}};$_=$row}|export-csv $dest -NoTypeInformation -Force
gc $file|%{($_.split(",") -replace "^`"(.*)`"$","`$1") -join ","}|Out-File $Output
That imports the CSV, makes sure that there are words (letters, numbers, and/or underscores... don't ask my why underscores are considered words, RegEx demands that it be so!) in each property for each entry, exports the CSV, then runs through the file again as just text splitting at commas and if it shows up enclosed in double quotes it strips those, re-joins the line, and then outputs it to a file. The only thing that I don't think shows up like your "preferred output" in the OP is that instead of "CUSTOMER'S NAME" you get CUSTOMER'S NAME.

Resources