Read csv stream line by line to create an array for Excel Range - excel

This is my first post - I will be happy to make any corrections required for any mistakes made in the post.
I have been looking through the forums here for a few months and have learned a lot but I cannot seem to accomplish my goal with what I have found.
I need to read a CSV file (Read-Only) when it changes and place the resulting array into and active and open Excel 2016 Tab. I can do this using com and system.io.watcherchangetypes but this is too slow and requires copy paste.
I need to read the csv as fast as possible (under a second) and convert the lines into a usable array for Excel. This whole process has to take under 2 seconds MAX. Some of the CSV's will exceed 180,000 lines as the day goes on.
I work for a Trading Company.
I would be happy with a single column, Tab delimited and multiple Rows. I cant get the multiple rows.
I have to write the range line by line and that takes too long.
I was looking at this one but I am not clear on how to make the whole thing dynamic. There is no set amount of headers and the rows will change as well. I cannot work with any static data at all.
This is the post which prompted me to ask for help: How to use powershell to reorder CSV columns
$export = "\\UNC\to\file\Name.csv"
#$excel = New-Object -ComObject Excel.Application
#$excel.visible = $true
#$workbook = $excel.Workbooks.Add()
$reader = [System.IO.File]::OpenText($export)
$writer = New-Object System.IO.StreamWriter "data2.csv"
for(;;) {
$line = $reader.ReadLine()
if ($null -eq $line) {
break
}
$i=1
$data = $line.Split(",") | %{
if($_ -ne $null)
{
Write-Host $_ $i
++$i
}
}
[void]$data.Length
# $data.GetValue()
#$writer.WriteLine('{0},{1},{2}', $data[0], $data[1], $data[2])
}
$reader.Close()
#$writer.Close()
Any help will be greatly appreciated!
UPDATE:
I figured it out. The result is probably not the most efficient but it gets me what I need for now while i explore how to better accomplish it with what I have learned.
(Measure-Command { $data = [System.io.File]::Open($export, 'Open', 'Read', 'ReadWrite')
$reader = New-Object System.IO.StreamReader($data)
$count = 0
While($text = $reader.Readline())
{
If($text -eq $null)
{
$reader.Close()
$data.close()
}
++$count
}
}).TotalSeconds
$array2 = New-Object 'object[,]' $count,1
$end = ++$count
$file = New-Object System.IO.StreamReader -ArgumentList $export
$stringBuilder = New-Object System.Text.StringBuilder
$list = New-Object System.Collections.Generic.List[System.String]
$a = 0
Measure-Command {
While ($i = $file.ReadLine() -Replace ",","`t")
{
if ($i -eq $null)
{
$file.close()
break loop
}
$null = $stringBuilder.Append($i)
$list.Add($i)
$array2[$a,0] = $i
++$a
}
$outputString = $stringBuilder.ToString()
$array = $list.ToArray()
}

You can do something like this
data = pd.read_csv("data1.csv", sep='\s+',header=None)
dataarraynew13phse = np.array(data)
dataarraynew13phse=dataarraynew13phse.flatten()
sep = '\s+' can be useful to decode tabs, in multiple lines
And then flatten() can make it in a single row or array

Related

Powershell - Delete excel rows that contain a word

I'm really new to Powershell and I feel like I've looked all over and can't quite figure out what is wrong with my code.
My goal is a powershell script that can run against an Excel workbook and delete rows with a specific string in the cell (in this case it is local admin accounts).
Currently my script launches the excel sheet opens, but no rows are deleted. The code exits without error. Any help would be greatly appreciated
$ObjExcelCellTypeLastCell = 11
$ObjExcel = New-Object -ComObject Excel.Application
$ObjExcel.Visible = $True
$ObjExcel.DisplayAlerts = $True
$Workbook = $ObjExcel.Workbooks.Open("File\Path\")
$Worksheet = $Workbook.Worksheets.Item(1)
$used = $Worksheet.usedRange
$lastCell = $used.SpecialCells($ObjExcelCellTypeLastCell)
$row = $lastCell.row
for ($i = $Worksheet.usedrange.rows.count; $i -gt 0; $i--)
{
If ($Worksheet.Cells.Item($i, 1) = "Local Admin") {
$Range = $Worksheet.Cells.Item($i, 1).EntireRow
$Range.Delete()
$i = $i + 1
Else
Break
}
Exit
}
I don't know much about powershell but i think your if statement $Worksheet.Cells.Item($i, 1) = "Local Admin" is wrong, you should use -eq
also maybe you need to call the Close method on the workbook object that you just Open'd
I am not sure if it's solved, but my code is like below. It's not exactly same to mine, but I think this would work.
#get last row
$rowLast = $WorkSheet.UsedRange.Rows.Count
#for loop
for ($row = $rowLast; $row -gt 0; $row--) {
if($WorkSheet.Cells.Item($row, 1).Text -eq "Local Admin"){
#delete the row. Without "[void]", you will get message "True" when successfully deleted the row.
[void]$WorkSheet.Rows($row).Delete()
}
}
I think you need ".Text" after "$Worksheet.Cells.Item($i, 1)".
Also, I think following codes should be removed.
$i = $i + 1
Else
Break
Exit

Issues pulling value of cell using excel com objects in powershell

I am writing a script that scans each cell in an excel file for PII. I've got most of it working, but I am experiencing two issues which may be related.
First of all, I am not convinced that the "Do" loop is performing as intended. The goal here is if the text in a cell matches the regex string, create a PSCustomObject with the location information, then use the object to add a line to a csv file.
It appears that the loop is running for every file, regardless of whether or not it actually found a match.
The other issue is that I can't seem to actually pull the cell value for the matched cell. I've tried several different variables and methods, the latest attempt being "$target.text," but the value of the variable is always null.
I've been racking my brain on this for days, but I'm sure it'll be obvious once I see it.
Any help here would be appreciated.
Thanks.
$searchtext = "\b(?!0{3}|6{3})([0-6]\d{2}|7([0-6]\d|7[012]))([ -]?)(?!00)\d\d\3(?!0000)\d{4}\b"
$xlsFiles = Get-ChildItem $searchpath -recurse -include *.xlsx, *.xls, *.xlxm | Select-object -Expand FullName
$Excel = New-Object -ComObject Excel.Application
$excel.DisplayAlerts = $false;
$excel.AskToUpdateLinks = $false;
foreach ($xlsfile in $xlsfiles) {
Write-host (Get-Date -f yyyymmdd:hhmm) $xlsfile
try{
$Workbook = $Excel.Workbooks.Open($xlsFile, 0, 0, 5, "password")
}
Catch {
Write-host $xlsfile 'is password protected. Skipping...' -ForegroundColor Yellow
continue
}
ForEach ($Sheet in $($Workbook.Sheets)) {
$i = $sheet.Index
$Range = $Workbook.Sheets.Item($i).UsedRange
$Target = $Sheet.UsedRange.Find($Searchtext)
$First = $Target
Do {
$Target = $Range.Find($Target)
$Violation = [PSCustomObject]#{
Path = $xlsfile
Line = "SSN Found" + $target.text
LineNumber = "Sheet: " + $i
}
$Violation | Select-Object Path, Line, LineNumber | export-csv $outputpath\$PIIFile -append -NoTypeInformation
}
While ($NULL -ne $Target -and $Target.AddressLocal() -ne $First.AddressLocal())
}
$Excel.Quit()
}
Figured it out. Just a simple case of faulty logic in the loops.
Thanks to everyone who looked at this.

Excel add Row Grouping using powershell

I have below csv file, I want to import into excel and add the row grouping for the child items using powershell. I was able open the file and format the cell. Not sure how to add row grouping.
Data
name,,
one,,
,value1,value2
,value3 ,value4
two,,
,value4,sevalue4
,value5,sevalue5
,value6,sevalue6
,value7,sevalue7
three,,
,value8,sevalue8
,value9,sevalue9
,value10,sevalue10
,value11,sevalue11
I want to convert like this in excel.
Here is the code I have it to open it in excel.
$a = New-Object -comobject Excel.Application
$a.visible = $True
$b = $a.Workbooks.Open("C:\shared\c1.csv")
$c = $b.Worksheets.Item(1)
$d = $c.Cells(1,1)
$d.Interior.ColorIndex = 19
$d.Font.ColorIndex = 11
$d.Font.Bold = $True
$b.Save("C:\shared\c1.xlsx")
How do I add row grouping for this data?
Thanks
SR
Logic Applied:
Group all the consecutive rows for which the value in column A is blank
In the following code, I have opened a CSV file, made the required grouping as per the data shared by you and saved it. While saving it, because of the row grouping, I was not able to save it in csv format. So, I had to change the format to a normal workbook. But, it works.
Code
$objExl = New-Object -ComObject Excel.Application
$objExl.visible = $true
$objExl.DisplayAlerts = $false
$strPath = "C:\Users\gurmansingh\Documents\a.csv" #Enter the path of csv
$objBook = $objExl.Workbooks.open($strPath)
$objSheet = $objBook.Worksheets.item(1)
$intRowCount = $objSheet.usedRange.Rows.Count
for($i=1; $i -le $intRowCount; $i++)
{
if($objSheet.Cells.Item($i,1).text -like "")
{
$startRow = $i
for($j=$i+1; $j -le $intRowCount; $j++)
{
if($objSheet.cells.Item($j,1).text -ne "" -or $j -eq $intRowCount)
{
$endRow = $j-1
if($j -eq $intRowCount)
{
$endRow = $j
}
break
}
}
$str = "A"+$startRow+":A"+$endRow
$objSheet.Range($str).Rows.Group()
$i=$j
}
}
$objBook.SaveAs("C:\Users\gurmansingh\Documents\b",51) #saving in a different format.
$objBook.Close()
$objExl.Quit()
Before:
a.csv
Output after running the code:
b.xlsx
Also, check out how easy it is to do using my Excel PowerShell module.
Install-Module ImportExcel
https://github.com/dfinke/ImportExcel/issues/556#issuecomment-469897886

Powershell script to match condition of excel cell values

I am novice programmer of powershell, I am trying to do excel search and change of format and font option. Here is the snippet were I am trying to search for the word "PASSED" and change the color to green and bold, currently the code does exits out without changing as expected what is wrong in this which I could not figure out, need help in this regards.
$excel = New-Object -ComObject Excel.Application
$excel.Visible = $false
$excel.DisplayAlerts = $False
$workbook = $excel.Workbooks.Open("C:\test.xlsx")
$sheet = $workbook.ActiveSheet
$xlCellTypeLastCell = 11
$used = $sheet.usedRange
$lastCell = $used.SpecialCells($xlCellTypeLastCell)
$row = $lastCell.row # goes to the last used row in the worksheet
for ($i = 1; $i -lt $row.length; $i++) {
If ($sheet.cells.Item(1,2).Value() = "PASSED") {
$sheet.Cells.Item(1,$i+1).Font.ColorIndex = 10
$sheet.Cells.Item(1,$i+1).Font.Bold = $true
}
}
$workbook.SaveAs("C:\output.xlsx")
$workbook.Close()
Input(test.xlsx) file has the following
Module | test | Status
ABC a PASSED
Its quiet a huge file with different status of each unit test.
$row is a string containing the last row number, comparing to it's Length property in the for loop will land you in trouble since it'll give you the length of the string itself.
Change it to:
for ($i = 1; $i -lt $row; $i++) {
In the if statement inside the loop, there's another problem: =
In order to compare two values for equality, use the -eq operator instead of = (= is only for assignment):
if ($sheet.cells.Item($i,2).Value() -eq "PASSED") {
$sheet.Cells.Item(1,$i+1).Font.ColorIndex = 10
$sheet.Cells.Item(1,$i+1).Font.Bold = $true
}
Lastly, Excel cell references are not zero-based, so Item(1,2) will refer to the cell that in your example has the value "test" (notice how it takes a row as the first parameter, and a column as the second). Change it to Item(2,3) to test against the correct cell, and transpose the cell coordinates inside the if block as well.
You may want to update the for loop to reflect this as well:
for ($i = 2; $i -le $row; $i++) {
if ($sheet.cells.Item($i,3).Value() = "PASSED") {
$sheet.Cells.Item($i,3).Font.ColorIndex = 10
$sheet.Cells.Item($i,3).Font.Bold = $true
}
}

Read Excel data with Powershell and write to a variable

Using PowerShell I would like to capture user input, compare the input to data in an Excel spreadsheet and write the data in corresponding cells to a variable. I am fairly new to PowerShell and can't seem to figure this out. Example would be: A user is prompted for a Store Number, they enter "123". The input is then compared to the data in Column A. The data in the corresponding cells is captured and written to a variable, say $GoLiveDate.
Any help would be greatly appreciated.
User input can be read like this:
$num = Read-Host "Store number"
Excel can be handled like this:
$xl = New-Object -COM "Excel.Application"
$xl.Visible = $true
$wb = $xl.Workbooks.Open("C:\path\to\your.xlsx")
$ws = $wb.Sheets.Item(1)
Looking up a value in one column and assigning the corresponding value from another column to a variable could be done like this:
for ($i = 1; $i -le 3; $i++) {
if ( $ws.Cells.Item($i, 1).Value -eq $num ) {
$GoLiveDate = $ws.Cells.Item($i, 2).Value
break
}
}
Don't forget to clean up after you're done:
$wb.Close()
$xl.Quit()
[System.Runtime.Interopservices.Marshal]::ReleaseComObject($xl)
I find it preferable to use an OleDB connection to interact with Excel. It's faster than COM interop and less error prone than import-csv. You can prepare a collection of psobjects (one psobject is one row, each property corresponding to a column) to match your desired target grid and insert it into the Excel file. Similarly, you can insert a DataTable instead of a PSObject collection, but unless you start by retrieving data from some data source, PSObject collection way is usually easier.
Here's a function i use for writing a psobject collection to Excel:
function insert-OLEDBData ($file,$sheet,$ocol) {
{
"xlsb$"
{"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$File`";Extended Properties=`"Excel 12.0;HDR=YES;IMEX=1`";"}
"xlsx$"
{"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$File`";Extended Properties=`"Excel 12.0 Xml;HDR=YES;IMEX=1`";"}
}
$OLEDBCon = New-Object System.Data.OleDb.OleDbConnection($cs)
$hdr = $oCol|gm -MemberType NoteProperty|%{$_.name}
$names = '[' + ($hdr-join"],[") + ']'
$vals = (#("?")*([array]$hdr).length)-join','
$sql = "insert into [$sheet`$] ($names) values ($vals)"
$sqlCmd = New-Object system.Data.OleDb.OleDbCommand($sql)
$sqlCmd.connection = $oledbcon
$cpary = #($null)*([array]$hdr).length
$i=0
[array]$hdr|%{([array]$cpary)[$i] = $sqlCmd.parameters.add($_,"VarChar",255);$i++}
$oledbcon.open()
for ($i=0;$i-lt([array]$ocol).length;$i++)
{
for ($k=0;$k-lt([array]$hdr).length;$k++)
{
([array]$cpary)[$k].value = ([array]$oCol)[$i].(([array]$hdr)[$k])
}
$res = $sqlCmd.ExecuteNonQuery()
}
$OLEDBCon.close()
}
This does not seem to work anymore. I swear it used to, but maybe an update to O365 killed it? or I last used it on Win 7, and have long since moved to Win 10:
$GoLiveDate = $ws.Cells.Item($i, 2).Value
I can still use .Value for writing to a cell, but not for reading it into a variable. instead of the contents of the cell, It returns: "Variant Value (Variant) {get} {set}"
But after some digging, I found this does work to read a cell into a variable:
$GoLiveDate = $ws.Cells.Item($i, 2).Text
In regards to the next question / comment squishy79 asks about slowness, and subsequent
OleDB solutions, I can't seem to get those to work in modern OS' either, but my own performance trick is to have all my Excel PowerShell scripts write to a tab delimited .txt file like so:
Add-Content -Path "C:\FileName.txt" -Value $Header1`t$Header2`t$Header3...
Add-Content -Path "C:\FileName.txt" -Value $Data1`t$Data2`t$Data3...
Add-Content -Path "C:\FileName.txt" -Value $Data4`t$Data5`t$Data6...
then when done writing all the data, open the .txt file using the very slow Com "Excel.Application" just to do formatting then SaveAs .xlsx (See comment by SaveAs):
Function OpenInExcelFormatSaveAsXlsx
{
Param ($FilePath)
If (Test-Path $FilePath)
{
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$Workbook = $Excel.Workbooks.Open($FilePath)
$Sheet = $Workbook.ActiveSheet
$UsedRange = $Sheet.UsedRange
$RowMax = ($Sheet.UsedRange.Rows).count
$ColMax = ($Sheet.UsedRange.Columns).count
# This code gets the Alpha character for Columns, even for AA AB, etc.
For ($Col = 1; $Col -le $ColMax; $Col++)
{
$Asc = ""
$Asc1 = ""
$Asc2 = ""
If ($Col -lt 27)
{
$Asc = ([char]($Col + 64))
Write-Host "Asc: $Asc"
}
Else
{
$First = [math]::truncate($Col / 26)
$Second = $Col - ($First * 26)
If ($Second -eq 0)
{
$First = ($First - 1)
$Second = 26
}
$Asc1 = ([char][int]($First + 64))
$Asc2 = ([char][int]($Second + 64))
$Asc = "$Asc1$Asc2"
}
}
Write-Host "Col: $Col"
Write-Host "Asc + 1: $Asc" + "1"
$Range = $Sheet.Range("a1", "$Asc" + "1")
$Range.Select() | Out-Null
$Range.Font.Bold = $true
$Range.Borders.Item(9).LineStyle = 1
$Range.Borders.Item(9).Weight = 2
$UsedRange = $Sheet.UsedRange
$UsedRange.EntireColumn.AutoFit() | Out-Null
$SavePath = $FilePath.Replace(".txt", ".xlsx")
# I found scant documentation, but you need a file format 51 to save a .txt file as .xlsx
$Workbook.SaveAs($SavePath, 51)
$Workbook.Close
$Excel.Quit()
}
Else
{
Write-Host "File Not Found: $FilePath"
}
}
$TextFilePath = "C:\ITUtilities\MyTabDelimitedTextFile.txt"
OpenInExcelFormatSaveAsXlsx -FilePath $TextFilePath
If you don't care about formatting, you can just open the tab delimited .txt files as-is in Excel.
Of course, this is not very good for inserting data into an existing Excel spreadsheet unless you are OK with having the script rewrite the whole sheet it each time an insert is made. It will still run much faster than using COM in most cases.
I found this, and Yevgeniy's answer. I had to do a few minor changes to the above function in order for it to work. Most notably the handeling of NULL or empty valued values in the input array. Here is Yevgeniy's code with a few minor changes:
function insert-OLEDBData {
PARAM (
[Parameter(Mandatory=$True,Position=1)]
[string]$file,
[Parameter(Mandatory=$True,Position=2)]
[string]$sheet,
[Parameter(Mandatory=$True,Position=3)]
[array]$ocol
)
$cs = Switch -regex ($file)
{
"xlsb$"
{"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$File`";Extended Properties=`"Excel 12.0;HDR=YES`";"}
"xlsx$"
{"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=`"$File`";Extended Properties=`"Excel 12.0 Xml;HDR=YES`";"}
}
$OLEDBCon = New-Object System.Data.OleDb.OleDbConnection($cs)
$hdr = $oCol | Get-Member -MemberType NoteProperty,Property | ForEach-Object {$_.name}
$names = '[' + ($hdr -join "],[") + ']'
$vals = (#("?")*([array]$hdr).length) -join ','
$sql = "insert into [$sheet`$] ($names) values ($vals)"
$sqlCmd = New-Object system.Data.OleDb.OleDbCommand($sql)
$sqlCmd.connection = $oledbcon
$cpary = #($null)*([array]$hdr).length
$i=0
[array]$hdr|%{([array]$cpary)[$i] = $sqlCmd.parameters.add($_,"VarChar",255);$i++}
$oledbcon.open()
for ($i=0;$i -lt ([array]$ocol).length;$i++)
{
for ($k=0;$k -lt ([array]$hdr).length;$k++)
{
IF (([array]$oCol)[$i].(([array]$hdr)[$k]) -notlike "") {
([array]$cpary)[$k].value = ([array]$oCol)[$i].(([array]$hdr)[$k])
} ELSE {
([array]$cpary)[$k].value = ""
}
}
$res = $sqlCmd.ExecuteNonQuery()
}
$OLEDBCon.close()
}

Resources