I want to use Powershell to find special characters (like Greek letters) in an Excel document and replace them with HTML entities. My script looks like this:
$file = "C:\Path\To\File\test.xls"
$xl = New-Object -ComObject Excel.Application
$xl.Visible = $True
$objWorkbook = $xl.Workbooks.Open($file)
$objWorksheet = $objWorkbook.Worksheets.Item(1)
$objRange = $objWorksheet.UsedRange
$charArray = #(
([char]948, "δ"),
([char]916, "Δ")
)
foreach ($char in $charArray){
$FindText = $char[0]
$ReplaceText = $char[1]
if ($objRange.find("$FindText")) {
$objRange.replace("$FindText", $ReplaceText)
} else {write-host "Didn't find $FindText"}
}
The trouble is, the .find() and .replace() methods are not case-sensitive, so [char]948 (δ) matches both the lowercase delta (δ) and uppercase delta (Δ) characters. The result is that all δ and Δ characters in the Excel (.xls) file are replaced with δ.
In VBA, Range.Find() has a MatchCase parameter, but it does not seem that Powershell allows it. For example, $objRange.find("$FindText", MatchCase:=$True) does not work.
I also tried Powershell's -cmatch and -creplace commands, which are case-sensitive, but I could not figure out how to get those to work on the Excel range object $objRange:
$objRange -creplace "$FindText", $ReplaceText has no effect on the Excel file.
I can't export or convert the data to .txt or .csv because the special characters don't survive the conversion.
Is there a way to make this work?
Using PowerShell you can use creplace operator
"aAaAaA" -creplace 'a','I'
IAIAIA
To replace find you can use the IndexOf method from the string class it takes a comparisonType
IndexOf(string value, int startIndex, int count, System.StringComparison comparisonType)
Example :
"Jean Paul".indexOF("paul", 0, [System.StringComparison]::CurrentCulture)
-1
"Jean Paul".indexOF("paul", 0, [System.StringComparison]::CurrentCultureIgnoreCase)
5
Related
I am reading values from different Excel files, and composing a new one containing information from all the others. While doing that, Excel seems to automatically change '.' to a comma ','. How do I prevent that?
I am using Powershell ISE on Win10 and Office365. I tried reading and writing 'value2' and 'text' and writing those. I tried casting the value2 to string when I write it. This did not work. The variables in Powershell hold the correct values as strings. The moment I save the new Excel file, the correct format is gone.
Example: Value is "123.456". I can read it, the Powershell variable shows "123.456". I write it to Excel and open the Excel afterwards, it reads:
123,456 and interprets it as number instead of a text.
How I read the value
[...]
$tmp += ($worksheet.cells.item($intRow,$col).value2)
How I write the value (I tried "value", and "text" for both)
[...]
elseif($value -eq 6){
$sheet.Cells.item($intRow,$columncounter).value2 = ($tmp[$value]).ToString()
}
[...]
This is how I open the excel file for writing:
$objExcel=New-Object -ComObject Excel.Application
$objExcel.Visible=$false
$resultbook = $objExcel.Workbooks.Add()
$sheet = $resultbook.ActiveSheet
$sheet.Name = "Data"
This is how I save the excel file
$resultbook.SaveAs($name)
$resultbook.close()
Expected: Input == Output, example: 1234.5678 --> 1234.5678
Actual Result: Input != Output, example 1234.5678 --> 1234,5678
It works fine for all other strings, texts, numbers except those containing dots.
I presume there must be a way to specify the cell format in the target file, however I did not find any documentation on that.
Powershell script is as follows:
$E = New-Object -ComObject Excel.Application
$E.Visible = $false
$E.DisplayAlerts = $false
$wb = $E.Workbooks.Open($args[0])
$wb_name = fix-wbname($wb.Name)
foreach ($ws in $wb.Worksheets)
{
$n = $wb.Name + "_" + $ws.Name + ".csv"
$n = Join-Path -Path $args[1] -ChildPath $n
$ws.SaveAs($n, 6)
}
It works, but Excel does silly things to the text formatting. Dates in the YYYY-MM-DD format are changed to M/D/YYYY. The number 18446744073709500000 is changed to "1.84467E+19"
Is there any way I can do this and have Excel just export the values as they are?
In general, no. Dates in excel are not stored internally in the same format that you see them on the screen. They are stored in some sort of binary numberger format that is almost undoubtedly the same format used internally in Windows. I haven't researched this. You are going to have to find out what default conversion takes place when the conversion to csv happens. This could be culture dependent.
Culture dependencies can be controlled by settings inside of excel. If the conversions to csv are being done by Powershell, you may be able to control culture dependecies by specifying optional parameters.
Sorry this is so vague.
I have an Excel file that I receive and want to process it to a CSV using Powershell.
I have to alter it quite specifically so it can be a reliable input for a program that will process the csv info.
I don't know the exact headers, but i know there can be duplicates.
What I do is open the xlsx file with excel and save it as CSV:
$objExcel = New-Object -ComObject Excel.Application
$objExcel.Visible = $True
$objExcel.DisplayAlerts = $True
$Workbook = $objExcel.Workbooks.open($xlsx1)
$WorkSheet = $WorkBook.sheets.item($sheet)
$xlCSV = 6
$Workbook = $objExcel.Workbooks.open($xlsx2)
$WorkSheet = $WorkBook.sheets.item($sheet)
$WorkBook.SaveAs($csv2,$xlCSV)
Now, the XLSX file will have comma's, so first I want to change them to dots.
I tried this, but it's not working:
$objRange = $worksheet.UsedRange
$objRange.Replace ",", "."
It errors out saying: Unexpected token '", "'.
Then when saving I want to set the Delimiter to comma, as it uses ";" standard.
With something like:
$WorkBook.SaveAs($csv2,$xlCSV) -delimiter ","
The last problem is the duplicate headers; this prevents PS to use Import-CSV. Here I tried, when file is separated with a comma it works:
Get-Content $downloads\BBKS_DIR_AUTO_COMMA.csv -totalcount 1 >$downloads\Headers.txt
But then I need to rename de duplicate names like I can have Regio, Regio, Regio.
I want to change this to Regio, Regio2, Regio3
My plan was to lookup the data of the txt, search for duplicates, and then ad an incremental nummer.
In the end I need to add a column with incremental numbers, but always with four numbers, like; 0001, 0002, 0010, 0020, 0200, 1500, I wont exceed 9999. How can this be done?
If you can help me, if only partially I'm very happy.
Further, I'm running Windows 7 x64, Powershell 3.0, Excel 2016 (if relevant)
If easier, its fine to go back to Command prompt for some tasks.
Personally, I wouldn't try and work with Excel sheets via Excel itself and COM - I'd use the excellent module https://github.com/dfinke/ImportExcel
Then you can import from the sheet straight to a native Powershell object array, and re-export with Export-Csv -Delimiter.
Edit: To answer follow ups :
Once you've loaded the module you can do "Get-Module ImportExcel | Select-Object -ExpandProperty ExportedCommands" to see what it makes available.
To import your Excel in the first place, do something like :
$WorkBook = Import-Excel
And if you need to take care of duplicate column names, you can do :
$WorkBook = Import-Excel -Header #("Regio1", "Regio2", "Regio")
Where the array you pass to -Header needs to include every column you want from the workbook.
Following up from this question, I'm trying to replace $B$1 to TEXT($B$1,"0000") on all formulas I can find on a lot of workbooks. Now that i'm past that .save() problem, I've got another (which should've been the first, actually): I can't seem to change .Formula value, no matter what I try.
PS C:\> $Search.Formula = $Search.Formula -replace '\$B\$1','TEXTO($B$1,"0000")'
Exceção ao definir "Formula": "Exceção de HRESULT: 0x800A03EC"
No linha:1 caractere:1
+ $Search.Formula = $Search.Formula -replace '\$B\$1','TEXTO($B$1,"0000")'
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], SetValueInvocationException
+ FullyQualifiedErrorId : CatchFromBaseAdapterSetValueTI
Formula is:
=PROCV(("A"&ANO($A6)&"M"&MÊS($A6)&"P"&$B$1);BASE!$A:$P;9;FALSO)
In English, if I remember the correlation correctly:
=VLOOKUP(("A"&YEAR($A6)&"M"&MONTH($A6)&"P"&$B$1);BASE!$A:$P;9;FALSE)
The expected output would be
=VLOOKUP(("A"&YEAR($A6)&"M"&MONTH($A6)&"P"&TEXT($B$1,"0000"));BASE!$A:$P;9;FALSE)
There were a couple of things going on with what you supplied. At first glance, you seem to be using the backslash for an escape character to make the dollar signs literal. The escape character for this in PowerShell is the back-tick or grave (e.g. `).
If I was performing this action within Excel, I would probably just Find & Replace every $B$1 on the worksheet with text($B$1, "0000"). Seems to me that it is powerful enough to take care of the operation without PowerShell's -replace method. The worksheet method does depend somewhat on $B$1 begin available but since it is also in the replacement, you pretty much need to know what you are replacing beforehand. Some error control in that area may be necessary if this script is left for casual users.
$excel = New-Object -comobject Excel.Application
$FilePath = "c:\temp\example.xlsx"
$workbook = $excel.Workbooks.Open($FilePath)
$excel.Visible = $true
$worksheet = $workbook.worksheets.item("Sheet1")
#set some Find & Replace vars
$what = "`$B`$1"
$with = "text(`$B`$1, `"0000`")"
#use worksheet-wide Find & Replace to change formula
$worksheet.usedrange.replace($what, $with, 2)
#formula(s) should be changed. now Find it and display it
$fnd = $worksheet.usedrange.find($what, $worksheet.range("A1"), -4123, 2)
Write-Output $fnd.formula
$workbook.save()
$workbook.close()
$excel.quit()
I've proofed the Range.Replace method by finding and displaying the formula after the operations and made more extensive use of the grave escape character rather than swap back and forth between single quotes and double quotes within quoted strings.
The above code uses the EN-US version I tested with. The actual replacement text for your regional settings would seem to be,
$with = "texto(`$B`$1; `"0000`")"
I wonder if there is any way to speed up reading an Excel file with powershell. Many would say I should stop using the do until, but the problem is I need it badly, because in my Excel sheet there can be 2 rows or 5000 rows. I understand that 5000 rows needs some time. But 2 rows shouldn't need 90sec+.
$Excel = New-Object -ComObject Excel.Application
$Excel.Visible = $true
$Excel.DisplayAlerts = $false
$Path = EXCELFILEPATH
$Workbook = $Excel.Workbooks.open($Path)
$Sheet1 = $Workbook.Worksheets.Item(test)
$URows = #()
Do {$URows += $Sheet1.Cells.Item($Row,1).Text; $row = $row + [int] 1} until (!$Sheet1.Cells.Item($Row,1).Text)
$URows | foreach {
$MyParms = #{};
$SetParms = #{};
And i got this 30 times in the script too:
If ($Sheet1.Cells.Item($Row,2).Text){$var1 = $Sheet1.Cells.Item($Row,2).Text
$MyParms.Add("PAR1",$var1)
$SetParms.Add("PAR1",$var1)}
}
I have the idea of running the $MyParms stuff contemporarily, but I have no idea how. Any suggestions?
Or
Increase the speed of reading, but I have no clue how to achieve that without destroying the "read until nothing is there".
Or
The speed is normal and I shouldn't complain.
Don't use Excel.Application in the first place if you need speed. You can use an Excel spreadsheet as an ODBC data source - the file is analogous to a database, and each worksheet a table. The speed difference is immense. Here's an intro on using Excel spreadsheets without Excel
Appending to an array with the += operator is terribly slow, because it will copy all elements from the existing array to a new array. Use something like this instead:
$URows = for ($row = 1; !$Sheet1.Cells.Item($row, 1).Text; $row++) {
if ($Sheet1.Cells.Item($Row,2).Text) {
$MyParms['PAR1'] = $Sheet1.Cells.Item($Row, 2).Text)
$SetParms['PAR1'] = $Sheet1.Cells.Item($Row, 2).Text)
}
$Sheet1.Cells.Item($Row,1).Text
}
Your Do loop is basically a counting loop. The canonical form for such loops is
for (init counter; condition; increment counter) {
...
}
so I changed the loop accordingly. Of course you'd achieve the same result like this:
$row = 1
$URows = Do {
...
$row += 1
}
but that would just mean more code without any benefits. This modification doesn't have any performance impact, though.
Relevant in terms of performance are the other two changes:
I moved the code filling the hashtables inside the first loop, so the code won't loop twice over the data. Using index and assignment operators instead of the Add method for assigning values to the hashtable prevents the code from raising an error when a key already exists in the hashtable.
Instead of appending to an array (which has the abovementioned performance impact) the code now simply echoes the cell text in the loop, which PowerShell automatically turns into a list. The list is then assigned to the variable $URows.