Powershell: Splitting String into CSV

Powershell: Splitting String into CSV - string

I have a (hopefully) quick question that I can't seem to work out on my own.
I have a string:
ITEM1 Quantity: 12x 355ml bottlePrice: $23.95 $23.50 SAVE $0.45
That I would like to split out and insert into a CSV file. The columns in the generated CSV, and their values, would be:
Name: (everything before "Quantity:" in the above string.)
Quantity: (between Quantity and x)
Size: (Everything between x and bottle)
OrigPrice: (Everything after Price and the second $ sign)
SalePrice: (Everything between OrigPrice and SAVE)
Savings: (Everything after "SAVE")
I hope this all makes sense, I can provide more info if needed.
I appreciate the help!

How about something like:
$subject = 'ITEM1 Quantity: 12x 355ml bottlePrice: $23.95 $23.50 SAVE $0.45'
if ($subject -cmatch '^(?<Name>.*)Quantity:(?<Quantity>.*)x(?<Size>.*)bottle\s*Price:(?<OrigPrice>.*)\s*(?<SalePrice>\$.*)SAVE(?<Savings>.*)$') {
$result = $matches[0]
} else {
$result = ''
}
"Matches:"
$matches
I couldn't tell if there really needed to be a space between bottle & Price (it didn't look like it, but it'll handle it if there is).
If you need the name, you can access it like:
$matches["Name"]
A better solution (and one that actually gets it to CSV format, would be something like the following (thanks to #nickptrvc for pointing out what I missed):
function Read-Data {
[cmdletbinding()]
param(
[parameter(Mandatory)][string]$Path
)
$reader = [System.IO.File]::OpenText($Path)
while(( $subject = $reader.ReadLine()) -ne $null ) {
if ($subject -cmatch '^(?<Name>.*)Quantity:(?<Quantity>.*)x(?<Size>.*)bottle\s*Price:(?<OrigPrice>.*)\s*(?<SalePrice>\$.*)SAVE(?<Savings>.*)$') {
$result = $matches[0]
$obj = [PSCustomObject]#{
Name=$matches["Name"].trim();
Quantity=$matches["Quantity"].trim();
Size=$matches["Size"].trim();
OrigPrice=$matches["OrigPrice"].Trim();
SalePrice=$matches["SalePrice"].Trim();
Savings=$matches["Savings"].Trim()
}
$obj
}
}
}
Then, to use it, save this to a file (I called mine Read-Data.ps1), source the file, and then you have two options: 1) you can use ConvertTo-Csv to simply convert the objects to CSV, and return the result to the screen, or you can use Export-Csv to save it to a file:
. C:\Test\Convert-Data.ps1
Read-Data -Path C:\Test\datafile.dat | ConvertTo-Csv -NoTypeInformation
or
Read-Data -Path C:\Test\datafile.dat | Export-Csv -NoTypeInformation -Path C:\Test\datafile.csv

Related

How to modify excel data and export to text file using PowerShell script?

First time poster here. Apologies if I am not following best practices for posting this question.
I am very new to scripting and PowerShell.
Problem:
I have data in an excel sheet in this format.
Excel Data Image Link
I want to modify and export this data into a text file. In this format.
Required Output Image Link
Till now I have tried to modify the excel data by accessing each cell. To access each cell I am using a similar code mentioned below.
for (($i = 1); $i -lt 4; $i++)
{
$column=$ExcelWorkSheet.Columns.Item(1).Rows.Item($i).Text
$dataType=$ExcelWorkSheet.Columns.Item(2).Rows.Item($i).Text
$c1=("`"" + "$column" + "`""+":")
$c2=("`"" + "$dataType" + "`"" + ",")
$ExcelWorkSheet.Columns.Item(1).Rows.Item($i).Value=$c1
$ExcelWorkSheet.Columns.Item(2).Rows.Item($i).Value=$c2
}
I am still not sure if this is the correct way to go.
what would be the best way to solve this?
Just want to understand what I should do to solve this problem. I am not looking for the exact code.
Step by step instructions or some resources would be helpful.
Thanks!

This might help... maybe...
# Import Stuff
$Data = Import-Csv -Path .\Desktop\data.csv
# New Array
$Output = #()
# Run through Unique Owners
foreach ($Owner in ($Data | Select-Object OWNER -Unique)) {
$Lines = $Data | Where-Object {$_.OWNER -eq $Owner.OWNER}
# Lazy way to do a bit of checking, if same then use it or Break
if ($Lines[0].TABLE_NAME -eq $Lines[1].TABLE_NAME) {
$Out_TableName = $Lines[0].TABLE_NAME
# ID and NAME data
$Out_ID = $Lines | Where-Object {$_.COLUMN_NAME -eq "ID"} | Select-Object COLUMN_NAME, DATA_TYPE, DATA_LENGTH
$Out_NAME = $Lines | Where-Object {$_.COLUMN_NAME -eq "NAME"} | Select-Object COLUMN_NAME, DATA_TYPE, DATA_LENGTH
} else {
# Show the user that something
Write-Host "Problem with Owner ""$($Owner.OWNER)"" Data?!" -ForegroundColor Red
Break
}
# Output into the array in format
$Output += #"
"$($Owner.OWNER).$($Out_TableName)":{
"$($Out_ID.COLUMN_NAME)": "$($Out_ID.DATA_TYPE) ($($Out_ID.DATA_LENGTH))",
"$($Out_NAME.COLUMN_NAME)": "$($Out_NAME.DATA_TYPE) ($($Out_NAME.DATA_LENGTH))"
}
"#
}
# Put Output in a text file
$Output | Set-Content .\Desktop\output.txt -Force
I should add, that I had your data in a CSV like this...
OWNER,TABLE_NAME,COLUMN_NAME,DATA_TYPE,DATA_LENGTH
A,Employee,ID,NUMBER,22
A,Employee,NAME,VARCHAR2,22
B,Department,ID,NUMBER,23
B,Department,NAME,VARCHAR2,24

Powershell: Replace string in File1 based on string in File2

I am being forced to use Powershell because of my work. I have used it to do a couple of things but one of my codes is now trash because I have to update a string in a file to include a year that is in a second file. Here is what I'm working with:
File1: Contains a few strings but in there is 48 strings that say:
Jenga_Sequence-XXXX.consensus_Bob_0.6_quality_20
The main point of the string is Sequence-XXXX, sorry for the random place holders.
File2: is a table that has the strings:
John/USA/Sequence-XXXX/Year
I need to replace the strings in File1 with the corresponding Strings in File2.
Sample Text of File1:
Jenga_Sequence-0001.consensus_Bob_0.6_quality_20
AAAAAAAAAAAAAAAAAAAAAAAAA
Jenga_Sequence-0002.consensus_Bob_0.6_quality_20
aaaaaaaaaaaaaaaaaaaaaaaaa
Jenga_Sequence-0003.consensus_Bob_0.6_quality_20
bbbbbbbbbbbbbbbbbbbbbbbbb
Jenga_Sequence-0004.consensus_Bob_0.6_quality_20
BBBBBBBBBBBBBBBBBBBBBBBBB
Jenga_Sequence-0005.consensus_Bob_0.6_quality_20
QQQQQQQQQQQQQQQQQQQQQ
Sample Table of File2:
|Sequence_ID|Date|
|---------------------------|----------|
|John/USA/Sequence-0003/2020|10/11/2020|
|John/USA/Sequence-0001/2021|1/5/2021|
|John/USA/Sequence-0005/2021|1/10/2021|
|John/USA/Sequence-0004/2020|12/23/2020|
|John/USA/Sequence-0002/2021|1/6/2021|
So, I need a Powershell code that replaces
Jenga_Sequence-0001.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0001/2021,
Jenga_Sequence-0002.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0002/2021,
Jenga_Sequence-0003.consensus_Bob_0.6_quality_20 with John/USA/Sequence-0003/2020, and so on. There are typically 48 of these in a file.
My previous code simple replaced "Jenga_" with "John/USA/" and ".consensus_Bob_0.6_quality_20" with "/2020" but now that we are seeing "/2021" the static code will not work.
I am still open to replacing pieces of the string and having a code that sets the year replacement to the correct year.
That was the angle I was doing a broad search on but I could never find anything specific enough to help.
Any help will be appreciated!
EDIT: Here is the part of my previous code that dealt with the finding and replacing, even though I feel it needs to be trashed:
$filePath = 'Jenga_Combined.txt'
$tempFilePath = "$env:TEMP\$($filePath | Split-Path -Leaf)"
$find = 'Jenga_'
$replace = 'John/USA/'
$find2 = '.consensus_Bob_0.6_quality_20'
$replace2 = '/2020'
(Get-Content -Path $filePath) -replace $find, $replace -replace $find2, $replace2 | Add-Content -Path $tempFilePath
Remove-Item -Path $filePath
Move-Item -Path $tempFilePath -Destination $filePath
EDIT2: The "Real Data" from file2. File2 is a Tab Delimited .txt file which makes it not "look great" when copy and pasting. Hopefully this helps. File1 is exactly like above (although the AAAAA stuff is roughly 30,000 letters long)
Sequence_ID date
John/USA/Sequence-0003/2020 2020-10-11
John/USA/Sequence-0001/2021 2021-01-05
John/USA/Sequence-0005/2021 2021-01-10
John/USA/Sequence-0004/2020 2020-12-23
John/USA/Sequence-0002/2021 2021-01-06
Dan

The common factor here is the Sequence_ID number in both files.
You can do this like:
$csvData = Import-Csv -Path 'D:\Test\File2.txt' -Delimiter "`t"
$result = switch -Regex -File 'D:\Test\Jenga_Combined.txt' {
'^Jenga_Sequence-(\d+).*' {
$replace = $csvData | Where-Object { $_.Sequence_ID -like "*Sequence-$($matches[1])*" }
if (!$replace) { Write-Warning "No corresponding Sequence_ID $($matches[1]) found!"; $_ }
else { $replace.Sequence_ID }
}
default { $_ }
}
# output on screen
$result
# output to new file
$result | Set-Content -Path 'D:\Test\Jenga_Combined_NEW.txt' -Force
Output on screen:
John/USA/Sequence-0001/2021
AAAAAAAAAAAAAAAAAAAAAAAAA
John/USA/Sequence-0002/2021
aaaaaaaaaaaaaaaaaaaaaaaaa
John/USA/Sequence-0003/2020
bbbbbbbbbbbbbbbbbbbbbbbbb
John/USA/Sequence-0004/2020
BBBBBBBBBBBBBBBBBBBBBBBBB
John/USA/Sequence-0005/2021
QQQQQQQQQQQQQQQQQQQQQ
Of course, you need to change the file paths to match your environment

Powershell: Extract several strings from txt and create table out of it

I need to create a csv file out of values that are spread over many txt files. Here is an example for one of the txt files (they are all formatted the same way and stored in one folder, lets say c:\user\txtfiles):
System: asdf
Store: def
processid: 00001
Language: english
prodid: yellowshoes12
email: asdf#asdf.com
prodid: blueshoes34
some
other
text blabla
The result csv should look like this (i added values from another sample txt just to make it clear):
processid, prodid
00001, yellowshoes12
00001, blueshoes34
00002, redtshirt12
00002, greensocks34
That means that every product ID in the txt should be assigned to the one processid in the txt and added as single line to the csv.
I tried to reach the result as follows:
$pathtofiles = Get-ChildItem c:\user\txtfiles | select -ExpandProperty FullName
$parsetxt = $pathtofiles |
ForEach {
$orderdata = Import-Csv $_ |
Where-Object {($_ -like '*processid*') -or ($_ -like '*prodid*')} |
foreach {
write-output $orderdata -replace 'processid: ','' -replace 'prodid: ',''
}
}
$orderdata
So my intention was to isolate the relevant lines, delete everything that is not wanted, assign the values to variables and build a table out of it. One problem is that if I replace $orderdata from the end of the code into the end of the first foreach-loop nothing is printed. But after deliberating quite a while I am not sure if my approach is a good one anyway. So any help would be very appreciated!
Daniel

I think this is best done using a switch -Regex -File construct while iterating over the files in your folder.
# get the files in the folder and loop over them
$result = Get-ChildItem -Path 'c:\user\txtfiles' -Filter '*.txt' -File | ForEach-Object {
# the switch processes each line of a file and matches the regex to it
switch -Regex -File $_.FullName {
'^processid:\s+(\d+)' { $id = $matches[1] }
'^prodid:\s+(\w+)' { [PsCustomObject]#{'processid' = $id; 'prodid' = $matches[1]}}
}
} | Sort-Object processid, prodid
# output on console screen
$result
# output to CSV file
$result | Export-Csv -Path 'c:\user\txtfiles\allids.csv'
Result on screen:
processid prodid
--------- ------
00001 blueshoes34
00001 yellowshoes12
00002 greenshoes56
00002 purpleshoes88

Powershell comparing data in a CSV against files in a folder

I'm fairly new to powershell.
I'm trying to compare data in a CSV File against random files in a specific folder.
I want to see if and what has changed and then log that in another column called "Changed".
Here's what I've done below, it seems to create a new column called 'Changed' but doesn't input the changes in it.
$Spreadsheet = 'C:\Powershell\CSV\inv.csv'
$SpreadSheetPath = "C:\Powershell\CSV"
Import-Csv $Spreadsheet -Delimiter "|" -Encoding Default | ForEach-Object -
{
$Path += $_.Path
$Filename += $_.Filename
$DateModified += $_.DateModified
$FileSize += $_.FileSize
$MD5Hash += $_.MD5Hash
}
{
$Msg1 = "Path changed"
$Msg2 = "File Name changed"
$Msg3 = "Date Modified changed"
$Msg4 = "File Size changed"
$Msg5 = "MD5 changed"
$Msg6 = "Files are the same"
$psdata = "D:\ps-test\data\*.*"
}
If (($Path -eq $psdata))
{
Import-Csv C:\Powershell\CSV\inv.csv |
Select-Object *,#{Name='Changed';Expression={$Msg6}} |
Export-Csv C:\Powershell\CSV\NewSpreadsheet4.csv
}
Else
{
Import-Csv C:\Powershell\CSV\inv.csv |
Select-Object *,#{Name='Changed';Expression={$Msg1}} |
Export-Csv C:\Powershell\CSV\NewSpreadsheet4.csv
}
Here is an example of what the CSV looks like:
Path Filename Date Modified File Size MD5 Hash
D:\ps-test\data adminmodeinfo.htm 03/11/2010 22:42 1079 BD1C9468D71FD33BB35716630C4EC6AC
E:\ps-test\data admintoolinfo.htm 03/11/2010 22:42 868 24B99B6316F0C49C23F27FEA6FF1C6AC
E:\ps-test\data admin_ban.bmp 03/11/2010 22:42 63480 C856F1F3C58962B456E749F2EA9C933A
E:\ps-test\data baseline.dat 03/20/2010 03:18:33 173818 F13183D88AABD1A725437802F8551A06
E:\ps-test\data blueRule.gif 03/11/2010 22:42 815 D1AEFE884935095DAB42DAFD072AA46F
E:\ps-test\data deffactory.dat 03/20/2010 03:18:33 706 862D4DFD2F49021BB7C145BDAFE62F6F
E:\ps-test\data dividerArt.jpg 03/11/2010 22:42 367 F7050C596C097C0B01A443058CD15E35

There are many issues with your code.I will try to highlight a few of the issues, link to documentation and point you in the right direction so that you can resolve your issues. A proper solution would require getting many more requirements, or writing code (off-topic for StackOverflow)
Change
| ForEach-Object -
{
to
| ForEach-Object {
In the Foreach-Object, you are concatenating values from each line because you are using +=.
On the first run, $Path contains D:\ps-test\data.
After the second run, it contains D:\ps-test\dataE:\ps-test\data.
At the end of your test data, it contains D:\ps-test\dataE:\ps-test\dataE:\ps-test\dataE:\ps-test\dataE:\ps-test\dataE:\ps-test\dataE:\ps-test\data
The messages are contained in a script block, but it does not look like this is intentional as this is never executed. So after the scriptblock, the variable $Msg1 has not been created; it's blank.
If (($Path -eq $psdata))
double brackets not required.
will always be false because the variable $psdata does not exist as it was stated inside a script block.
will always be false because you are attempting to equate the strings; your input does not literally contain "D:\ps-test\data\*.*". You probably want -like instead of -eq.
will always be inaccurate because even if the paths are compared, there is no check that the file actually exists on the system.
Useful links
Test-Path to check if file exists.
Get-FileHash to get MD5 hash and compare to file.
Get-ChildItem to get a list of directories/files in a directory.
Write-Output so that you can print variables and make sure they contain what you expect.
about_comparison_operators - -in and -contains will help you.

This is a suggestion to help you get started. It's not complete and not tested! Let me know if it works as expected and if you have any questions.
Import-Csv 'C:\Powershell\CSV\inv.csv' -Delimiter "|" -Encoding Default | foreach {
$Path += $_.Path
$Filename += $_.Filename
$DateModified += $_.DateModified
$FileSize += $_.FileSize
$MD5Hash += $_.MD5Hash
$file = [System.IO.FileInfo](Join-Path $Path $Filename)
if (-not $file.Exists) {
$message = "File does not exist"
}
elseif ($file.LastWriteTime -ne [DateTime]$DateModified) {
$message = "Dates differ"
}
elseif ($file.Length -ne [int]$FileSize) {
$message = "Sizes differ"
}
# and so on...
# (You cannot really compared a changed file name btw)
New-Object -Type PSObject -Prop #{
Path = $Path
Filename = $Filename
DateModified = $DateModified
FileSize = $FileSize
MD5Hash = $MD5Hash
Message = $message
}
} | Export-CSV 'C:\Powershell\CSV\NewSpreadsheet4.csv'

Optimizing simple search script in PowerShell

I need to create a script to search through just below a million files of text, code, etc. to find matches and then output all hits on a particular string pattern to a CSV file.
So far I made this;
$location = 'C:\Work*'
$arr = "foo", "bar" #Where "foo" and "bar" are string patterns I want to search for (separately)
for($i=0;$i -lt $arr.length; $i++) {
Get-ChildItem $location -recurse | select-string -pattern $($arr[$i]) | select-object Path | Export-Csv "C:\Work\Results\$($arr[$i]).txt"
}
This returns to me a CSV file named "foo.txt" with a list of all files with the word "foo" in it, and a file named "bar.txt" with a list of all files containing the word "bar".
Is there any way anyone can think of to optimize this script to make it work faster? Or ideas on how to make an entirely different, but equivalent script that just works faster?
All input appreciated!

If your files are not huge and can be read into memory then this version should work quite faster (and my quick and dirty local test seems to prove that):
$location = 'C:\ROM'
$arr = "Roman", "Kuzmin"
# remove output files
foreach($test in $arr) {
Remove-Item ".\$test.txt" -ErrorAction 0 -Confirm
}
Get-ChildItem $location -Recurse | .{process{ if (!$_.PSIsContainer) {
# read all text once
$content = [System.IO.File]::ReadAllText($_.FullName)
# test patterns and output paths once
foreach($test in $arr) {
if ($content -match $test) {
$_.FullName >> ".\$test.txt"
}
}
}}}
Notes: 1) mind changed paths and patterns in the example; 2) output files are not CSV but plain text; there is not much reason in CSV if you are interested just in paths - plain text files one path per line will do.

Let's suppose that 1) the files are not too big and you can load it into memory, 2) you really just want the Path of the file, that matches (not the line etc.).
I tried to read the file only once and then iterate through the regexes. There is some gain (it's a faster then the original solution), but the final result will depend on other factors like file sizes, count of files etc.
Also removing 'ignorecase' makes it faster a little bit.
$res = #{}
$arr | % { $res[$_] = #() }
Get-ChildItem $location -recurse |
? { !$_.PsIsContainer } |
% { $file = $_
$text = [Io.File]::ReadAllText($file.FullName)
$arr |
% { $regex = $_
if ([Regex]::IsMatch($text, $regex, 'ignorecase')) {
$res[$regex] = $file.FullName
}
}
}
$res.GetEnumerator() | % {
$_.Value | Export-Csv "d:\temp\so-res$($_.Key).txt"
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string