I have this PowerShell script that strips html tags and just leaves the text and have it display the word count for that html file when the script is executed. My question is when I execute:
function Html-ToText {
param([System.String] $html)
# remove line breaks, replace with spaces
$html = $html -replace "(`r|`n|`t)", " "
# write-verbose "removed line breaks: `n`n$html`n"
# remove invisible content
#('head', 'style', 'script', 'object', 'embed', 'applet', 'noframes', 'noscript', 'noembed') | % {
$html = $html -replace "<$_[^>]*?>.*?</$_>", ""
}
# write-verbose "removed invisible blocks: `n`n$html`n"
# Condense extra whitespace
$html = $html -replace "( )+", " "
# write-verbose "condensed whitespace: `n`n$html`n"
# Add line breaks
#('div','p','blockquote','h[1-9]') | % { $html = $html -replace "</?$_[^>]*?>.*?</$_>", ("`n" + '$0' )}
# Add line breaks for self-closing tags
#('div','p','blockquote','h[1-9]','br') | % { $html = $html -replace "<$_[^>]*?/>", ('$0' + "`n")}
# write-verbose "added line breaks: `n`n$html`n"
#strip tags
$html = $html -replace "<[^>]*?>", ""
# write-verbose "removed tags: `n`n$html`n"
# replace common entities
#(
#("•", " * "),
#("‹", "<"),
#("›", ">"),
#("&(rsquo|lsquo);", "'"),
#("&(quot|ldquo|rdquo);", '"'),
#("™", "(tm)"),
#("⁄", "/"),
#("&(quot|#34|#034|#x22);", '"'),
#('&(amp|#38|#038|#x26);', "&"),
#("&(lt|#60|#060|#x3c);", "<"),
#("&(gt|#62|#062|#x3e);", ">"),
#('&(copy|#169);', "(c)"),
#("&(reg|#174);", "(r)"),
#(" ", " "),
#("&(.{2,6});", "")
) | % { $html = $html -replace $_[0], $_[1] }
# write-verbose "replaced entities: `n`n$html`n"
return $html + $a | Measure-Object -word
}
And then run:
Html-ToText (new-object net.webclient).DownloadString("test.html")
it displays 4 words that are displayed in the output in PowerShell. How do I export that output from the PowerShell window into a an excel spreadsheet with the column words and the count 4?
The CSV you want just looks like this:
Words
4
it's easy enough to just write that to a text file, Excel will read it. But you're in luck, the output of Measure-Object is already an object with 'Words' as a property and '4' as a value, and you can feed that straight into Export-Csv. Use select-object to pick just the property you want:
$x = Html-ToText (new-object net.webclient).DownloadString("test.html")
# drop the Lines/Characters/etc fields, just export words
$x | select-Object Words | Export-Csv out.csv -NoTypeInformation
I'd be tempted to see if I could use
$x = Invoke-WebResponse http://www.google.com
$x.AllElements.InnerText
to get the words out of the HTML, before I tried stripping the content with replaces.
I figured it out. What I did was added
+ $a | Measure-Object -Word after the #html variable in the script and then ran:
Html-ToText (new-object net.webclient).DownloadString("test.html") + select-Object Words | Export-Csv out.csv -NoTypeInformation and it exported the word count – josh s 1 min ago
Related
Actually I have this loop :
foreach($line in Get-Content .\script2.csv)
{ $firstname = $line.split(';')[0]
$lastname = $line.split(';')[1]
$email = $line.split(';')[2]
$newLine = "$firstname,""$lastname"",""$email"""
$newLine >> newCSV.csv }
I use it to extract data and paste it in a correct format.
I would like to know what is the correct syntax to start it from the row 2 and not taking all my sheet ?
Thanks !
Use Select -Skip $N to skip the first $N items of a collection:
foreach($line in Get-Content .\script2.csv |Select -Skip 1)
{
$firstname = $line.split(';')[0]
$lastname = $line.split(';')[1]
$email = $line.split(';')[2]
$newLine = "$firstname,""$lastname"",""$email"""
$newLine >> newCSV.csv
}
If what you want to do is to convert a CSV file that uses the semi-colon ; as delimiter to a new Csv file that uses the comma , as delimiter, and in the process remove the header line from it, you can do:
Import-Csv -Path 'D:\Test\blah.csv' -Delimiter ';' | ConvertTo-Csv -NoTypeInformation |
Out-String -Stream | Select-Object -Skip 1 | Set-Content -Path 'D:\Test\newCSV.csv'
I have long text files, containing markup for another software. The form of the text is this:
*INIT=D:\ws\**randomnonesense*
ROW=THISISROW,D:\ws\morestuff
PALVELU=200,WSTIME70.DLL,N,A
PALVELU=201,WSLDIR70.DLL,N,A
PALVELU=202,WSLDIX70.DLL,N,A
PALVELU=204,WSEXCE32.DLL,N,A
PALVELU=205,WMON.DLL,N,A
PALVELU=206,WSWORD32.DLL,N,A
PALVELU=207,WSLEPT32.DLL,N,A
PALVELU=208,WSCONV70.DLL,N,A
PALVELU=209,WSFTPC70.DLL,N,A
KUVAUS=\\192.168.169.17\adwise$\applic\LIKSA_TURE.A70,D:\ws\%aspno%\%username%
MDBS-KANTA=LIKSAV,%aspno%LIK.DB,5,RTTL,ANSI,111.111.111.11MDBS-
KANTA=LANKA,%aspno%LAN.DB,5,RTTL,ANSI,1000.000.111.11
I am writing a script, that needs to replace all of those rows that start with the word PALVELU. The amount of other stuff before and after those rows can be any. Also number of those PALVELU-rows or their lenght is not the same in every file. Still I would need to replace all of those rows in every file with another set of rows. Is there a way to do this?
You can use the following method:
Replace(oldStr, newStr)
eg:
Write-Host "replacing a text in string"
$test=" old text to be replaced"
Write-Host "going to replace old"
$test.Replace("old","New")
Write-Host "Text is replaced"
Hope this will helpful.
You can use [regex]::split to split content of your file by lines, after you have all line you can check if each line contain the word PALVELU using -like operator.
I use \n in the regex pattern to split your content on every start of a newline
$rows = [regex]::split("*INIT=D:\ws\**randomnonesense*
ROW=THISISROW,D:\ws\morestuff
PALVELU=200,WSTIME70.DLL,N,A
PALVELU=201,WSLDIR70.DLL,N,A
PALVELU=202,WSLDIX70.DLL,N,A
PALVELU=204,WSEXCE32.DLL,N,A
PALVELU=205,WMON.DLL,N,A
PALVELU=206,WSWORD32.DLL,N,A
PALVELU=207,WSLEPT32.DLL,N,A
PALVELU=208,WSCONV70.DLL,N,A
PALVELU=209,WSFTPC70.DLL,N,A
KUVAUS=\\192.168.169.17\adwise$\applic\LIKSA_TURE.A70,D:\ws\%aspno%\%username%
MDBS-KANTA=LIKSAV,%aspno%LIK.DB,5,RTTL,ANSI,111.111.111.11MDBS-
KANTA=LANKA,%aspno%LAN.DB,5,RTTL,ANSI,1000.000.111.11", "(.*)\n")
$output = [System.Collections.ArrayList]#()
foreach($row in $rows) {
if($row -like '*PALVELU*') {
$output.Add($row.Replace("PALVELU", "TEST")) | Out-Null
} else {
$output.Add($row) | Out-Null
}
}
If the goal is to replace each PALVELU row with a new row, you may do the following:
# Assumes your file is a.txt
$ReplaceThis = '^PALVELU.*$'
$ReplaceWith = '<new row>'
$(switch -regex -file a.txt {
$ReplaceThis { $_ -replace $ReplaceThis,$ReplaceWith }
Default { $_ }
}) | Set-Content a.txt
The code above will result in X number of PALVELU rows being replaced with X number of <new row>.
If the goal is to replace consecutive PALVELU rows with a single <new row> string in files that only contain one set of consecutive PALVELU rows, you may do the following:
$ReplaceThis = '(?sm)^PALVELU.*(?-s)^PALVELU.*$'
$ReplaceWith = '<new row>'
(Get-Content a.txt -Raw) -replace $ReplaceThis,$ReplaceWith |
Set-Content a.txt
If the goal is to replace all PALVELU rows with a single <new row> string at the location of the first PALVELU row and the rows could be anywhere in the file, you may do the following:
$ReplaceThis = '^PALVELU.*$'
$ReplaceWith = '<new row>'
$firstMatch = $false
$(switch -regex -file a.txt {
$ReplaceThis {
if ($firstMatch) {
$_ -replace $ReplaceThis
}
else {
$_ -replace $ReplaceThis,$ReplaceWith
$firstMatch = $true
}
}
default { $_ }
}) -ne '' | Set-Content a.txt
I'm working within Powershell to color specific words within a here-string. It's working except for words that have the Return/Newline characters within. How can I compute the length of a word without these characters?
Below is the Function I'm using and test data. I would like the 'is' on the second line to also be colored, but I believe that the Return/Newline is causing the issue with the length mismatch.
I appreciate any and all help that could be provided! Thanks! -JFV
Function ColorSpecificWordsOutput {
param(
[Parameter(Mandatory=$true, Position=0)]
[string]$InputText,
[Parameter(Mandatory=$true, Position=1)]
$KeyColor
)
$keys = $keycolor.keys -join "|"
#Split on spaces, pipe to foreach-object
$InputText.Split(" ") | ForEach-Object {
#If word matches one of the $keys
If ($_ -imatch $keys) {
#Retrieve word as string from $keys
[string]$m = $matches.Values[0].trim()
#If length of word equals the $keys word
If($_.Length -eq $m.Length) {
#Write out the word with the mapped forground color without a new line
Write-Host "$_ " -ForegroundColor $keyColor.item($m) -NoNewline
}
#Otherwise, just write the word without color
Else { Write-Host "$_ " -NoNewline }
}
#Otherwise, just write the word without color
else {
Write-Host "$_ " -NoNewline
}
}
}
$w = #"
This color is Yellow: test
Is it correct ?
"#
$find = #{
is = "Cyan"
test = "Yellow"
correct = "Green"
}
ColorSpecificWordsOutput -InputText $w -KeyColor $find
Try trimming each word before using it so the white space is not a factor
Function ColorSpecificWordsOutput {
param(
[Parameter(Mandatory=$true, Position=0)]
[string]$InputText,
[Parameter(Mandatory=$true, Position=1)]
$KeyColor
)
$keys = $keycolor.keys -join "|"
#Split on spaces, pipe to foreach-object
$InputText.Split(" ") | ForEach-Object {
$word = $_.Trim() # Trim current word
#If word matches one of the $keys
If ($word -imatch $keys) {
#Retrieve word as string from $keys
[string]$m = $matches.Values[0].trim()
#If length of word equals the $keys word
If($word.Length -eq $m.Length) {
#Write out the word with the mapped forground color without a new line
Write-Host "$word " -ForegroundColor $keyColor.item($m) -NoNewline
}
#Otherwise, just write the word without color
Else { Write-Host "$word " -NoNewline }
}
#Otherwise, just write the word without color
else {
Write-Host "$word " -NoNewline
}
}
}
$w = #"
This color is Yellow: test
Is it correct ?
"#
$find = #{
is = "Cyan"
test = "Yellow"
correct = "Green"
}
ColorSpecificWordsOutput -InputText $w -KeyColor $find
Other wise you could perform the trim when doing the length comparison
Function ColorSpecificWordsOutput {
param(
[Parameter(Mandatory=$true, Position=0)]
[string]$InputText,
[Parameter(Mandatory=$true, Position=1)]
$KeyColor
)
$keys = $keycolor.keys -join "|"
#Split on spaces, pipe to foreach-object
$InputText.Split(" ") | ForEach-Object {
$word = $_
#If word matches one of the $keys
If ($word -imatch $keys) {
#Retrieve word as string from $keys
[string]$m = $matches.Values[0].trim()
#If length of word equals the $keys word
If($word.Trim().Length -eq $m.Length) {#performing trim before comparison
#Write out the word with the mapped forground color without a new line
Write-Host "$word " -ForegroundColor $keyColor.item($m) -NoNewline
}
#Otherwise, just write the word without color
Else { Write-Host "$word " -NoNewline }
}
#Otherwise, just write the word without color
else {
Write-Host "$word " -NoNewline
}
}
}
$w = #"
This color is Yellow: test
Is it correct ?
"#
$find = #{
is = "Cyan"
test = "Yellow"
correct = "Green"
}
ColorSpecificWordsOutput -InputText $w -KeyColor $find
Filenames on computer is named like so
quant-ph9501001
math9901001
cond-mat0001001
hep-lat0308001
gr-qc0703001
but on http links filenames is / character included
quant-ph/9501001
math/9901001
cond-mat/0001001
hep-lat/0308001
gr-qc/0703001
I can't rename my files quant-ph9501001 into quant-ph/9501001 because / is an illegal character so I can't use my code correctly to parse and rename from syntax to script actions.
Syntax of my filenames following this pattern:
letters + 8 digits
letters + '-' + letters + 8 digits
I can change quant-ph9501001 to quant-ph_9501001, but I need to parse missing character in filenames as if reading / (slash character).
So if I have strings like
gr-qc0701001
gr-qc_0701001
it should read like
quant-ph/9501001
My script don't working (no parsing) for gr-qc/0701001 because I can't rename filenames using illegal character. Error is 404.
iwr : The remote server returned an error: (404) Not Found.
If script works correctly PowerShell should be returns this string:
General Relativity and Quantum Cosmology (gr-qc)
and filename should be
Spectral Broadening of Radiation from Relativistic Collapsing Objects
My script is
$list1 = #"
quant-ph9802001
quant-ph9802004
"#
$list2 = #"
quant-ph/9802001
quant-ph/9802004
"#
Write-Output "Adding forward slashes"
$list1 -split "`r`n" | % {
$item = $_.Trim()
$newItem = $item -replace '(.*)(\d{7})', '$1/$2'
Write-Output $("{0} ==> {1}" -f $item, $newItem)
}
Write-Output "Removing forward slashes"
$list2 -split "`r`n" | % {
$item = $_.Trim()
$newItem = $item -replace '(.*)/(\d{7})', '$1$2'
Write-Output $("{0} ==> {1}" -f $item, $newItem)
}
Function Clean-InvalidFileNameChars {
param(
[Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[String]$Name
)
$invalidChars = [IO.Path]::GetInvalidFileNameChars() -join ''
$re = "[{0}]" -f [RegEx]::Escape($invalidChars)
$res=($Name -replace $re)
return $res.Substring(0, [math]::Min(260, $res.Length))
}
Function Clean-InvalidPathChars {
param(
[Parameter(Mandatory=$true,
Position=0,
ValueFromPipeline=$true,
ValueFromPipelineByPropertyName=$true)]
[String]$Name
)
$invalidChars = [IO.Path]::GetInvalidPathChars() -join ''
$re = "[{0}]" -f [RegEx]::Escape($invalidChars)
$res=($Name -replace $re)
return $res.Substring(0, [math]::Min(248, $res.Length))
}
$rootpath="c:\temp2"
$rootpathresult="c:\tempresult"
$template=#'
[3] arXiv:1611.00057 [pdf, ps, other]
Title: {title*:Holomorphy of adjoint $L$ functions for quasisplit A2}
Authors: Joseph Hundley
Comments: 18 pages
Subjects: {subject:Number Theory (math.NT)}
[4] arXiv:1611.00066 [pdf, other]
Title: {title*:Many Haken Heegaard splittings}
Authors: Alessandro Sisto
Comments: 12 pages, 3 figures
Subjects: {subject:Geometric Topology (math.GT)}
[5] arXiv:1611.00067 [pdf, ps, other]
Title: {title*:Subsumed homoclinic connections and infinitely many coexisting attractors in piecewise-linear maps}
Authors: David J.W. Simpson, Christopher P. Tuffley
Subjects: {subject:Dynamical Systems (math.DS)}
[21] arXiv:1611.00114 [pdf, ps, other]
Title: {title*:Faces of highest weight modules and the universal Weyl polyhedron}
Authors: Gurbir Dhillon, Apoorva Khare
Comments: We recall preliminaries and results from the companion paper arXiv:1606.09640
Subjects: {subject:Representation Theory (math.RT)}; Combinatorics (math.CO); Metric Geometry (math.MG)
'#
#extract utils data and clean
$listbook=gci $rootpath -File -filter *.pdf | foreach { New-Object psobject -Property #{file=$_.fullname; books= ((iwr "https://arxiv.org/abs/$($_.BaseName)").ParsedHtml.body.outerText | ConvertFrom-String -TemplateContent $template)}} | select file -ExpandProperty books | select file, #{N="Subject";E={Clean-InvalidPathChars $_.subject}}, #{N="Title";E={Clean-InvalidFileNameChars $_.title}}
#build dirs and copy+rename file
$listbook | %{$newpath="$rootpathresult\$($_.subject)"; New-Item -ItemType Directory -Path "$newpath" -Force; Copy-Item $_.file "$newpath\$($_.title).pdf" -Force}
EDIT: Error is still 404 this after Kori Gill answers
http://i.imgur.com/ZOZyMad.png
Problem is the difference between from local filenames and online filenames. I should add in memory temporally this illegal character in local filenames otherwise script doesn't work.
I can't say I totally understand your questions, but sounds like you just need to convert these names to/from a format that has or does not have a forward slash. You mention 8 digits, but your examples have 7. You can adjust as needed.
I think something like this will help you...
$list1 = #"
quant-ph9501001
math9901001
cond-mat0001001
hep-lat0308001
gr-qc0703001
"#
$list2 = #"
quant-ph/9501001
math/9901001
cond-mat/0001001
hep-lat/0308001
gr-qc/0703001
"#
Write-Output "Adding forward slashes"
$list1 -split "`r`n" | % {
$item = $_.Trim()
$newItem = $item -replace '(.*)(\d{7})', '$1/$2'
Write-Output $("{0} ==> {1}" -f $item, $newItem)
}
Write-Output "Removing forward slashes"
$list2 -split "`r`n" | % {
$item = $_.Trim()
$newItem = $item -replace '(.*)/(\d{7})', '$1$2'
Write-Output $("{0} ==> {1}" -f $item, $newItem)
}
Outputs:
Adding forward slashes
quant-ph9501001 ==> quant-ph/9501001
math9901001 ==> math/9901001
cond-mat0001001 ==> cond-mat/0001001
hep-lat0308001 ==> hep-lat/0308001
gr-qc0703001 ==> gr-qc/0703001
Removing forward slashes
quant-ph/9501001 ==> quant-ph9501001
math/9901001 ==> math9901001
cond-mat/0001001 ==> cond-mat0001001
hep-lat/0308001 ==> hep-lat0308001
gr-qc/0703001 ==> gr-qc0703001
I am currently working on editing one line of a text file. When I try to overwrite the text file, I only get one line back in the text file. I am trying to call the function with
modifyconfig "test" "100"
config.txt:
check=0
test=1
modifyConfig() function:
Function modifyConfig ([string]$key, [int]$value){
$path = "D:\RenameScript\config.txt"
((Get-Content $path) | ForEach-Object {
Write-Host $_
# If '=' is found, check key
if ($_.Contains("=")){
# If key matches, replace old value with new value and break out of loop
$pos = $_.IndexOf("=")
$checkKey = $_.Substring(0, $pos)
if ($checkKey -eq $key){
$oldValue = $_.Substring($pos+1)
Write-Host 'Key: ' $checkKey
Write-Host 'Old Value: ' $oldValue
$_.replace($oldValue,$value)
Write-Host "Result:" $_
}
} else {
# Do nothing
}
}) | Set-Content ($path)
}
The result I receive in my config.txt:
test=100
I am missing "check=0".
What have I missed?
$_.replace($oldValue,$value) in your innermost conditional replaces $oldValue with $value and then prints the modified string, but you don't have code printing non-matching strings. Because of that only the modified string are written back to $path.
Replace the line
# Do nothing
with
$_
and also add an else branch with a $_ to the inner conditional.
Or you could assign $_ to another variable and modify your code like this:
Foreach-Object {
$line = $_
if ($line -like "*=*") {
$arr = $line -split "=", 2
if ($arr[0].Trim() -eq $key) {
$arr[1] = $value
$line = $arr -join "="
}
}
$line
}
or a one liner.. (not exactly pin pointed answer, but to the question title)
(get-content $influxconf | foreach-object {$_ -replace "# auth-enabled = false" , "auth-enabled = true" }) | Set-Content $influxconf