How to maniuplate text in first column of CSV file with script

How to maniuplate text in first column of CSV file with script - string

Have a CSV file with multiple columns with information. Need to remove the opening and closing " in the Employee Name as well as the , as seen below.
Employee Name,Employee #,column3, column4 etc. <br>
"Lastname, Firstname",123,abc,xyz<br>
"Lastname, Firstname",123,abc,xyz<br>
Result:
Employee Name,Employee #,column3, column4 etc.<br>
Lastname Firstname,123,abc,xyz<br>
Lastname Firstname,123,abc,xyz<br>
Tried using the following Powershell script:
(gc C:\pathtocsv.csv) | % {$_ -replace '"', ""} | out-file C:\pathtocsv.csv -Fo -En ascii
This only removes the " " around Lastname , Firstname but the comma is still present when opening the csv file in a text editor. Need this format to send to data to another company. Everything I have tried removes every comma. Novice in powershell and other languages, I am sure this is an easy fix. Please help!

Powershell has a lot of built-in handling for CSV files, instead of trying to treat is as a text file you can use the following to remove just the comma you want:
Import-Csv .\a.csv | % {
$_."Employee Name" = ($_."Employee Name" -replace ',','')
$_ #return modified rows
} | Export-Csv .\b.csv -notype -delim ','
this will by default export everything with double quotes, so you may need to go back and run something like:
(gc .\b.csv -raw) -replace '"','' | Out-File .\c.csv
to also remove all the double quotes.

Warning: quotes are important if text contains special characters (i.e. comma, quote)
If you really want to strip lines, you can process your csv as regular text file:
#sample data
#'
"Lastname, Firstname",123,abc,xyz
"Lastname, Firstname",123,abc,xyz
'# | out-file c:\temp\test.csv
Get-Content c:\temp\test.csv | % {
$match = [Regex]::Match($_,'"([^,]*), ([^"]*)"(.*)')
if ($match.Success) {
$match.Groups[1].Value+' '+$match.Groups[2].Value+$match.Groups[3].Value
} else {
$_ #skip processing if line format do not match pattern
}
}

Related

Powershell multiple string replacement using while cycle

I am trying to solve a somewhat weird problem: I need to replace strings within a raw content by strings from the same content that meet a certain matching criteria. The input data look like this:
apple-beta
apple-alpha_orange-beta
apple-alpha_orange-alpha_cherry-beta
apple-alpha_orange-alpha_kiwi-beta
apple-alpha_orange-alpha_mango-beta
abcd-alpha_efgh-beta
abcd-alpha_efgh-alpha_ijkl-beta
abcd-alpha_efgh-alpha_mnop-beta
The replacment should work as follows: look for all "-beta" strings in the content and delete all according "-alpha" strings (eg because there is "orange-beta" already => all "orange-alpha" should be deleted, because there is "apple-beta" already => all "apple-alpha" should be deleted etc.). The result would look like this:
apple-beta
_orange-beta
__cherry-beta
__kiwi-beta
__mango-beta
abcd-alpha_efgh-beta
abcd-alpha__ijkl-beta
abcd-alpha__mnop-beta
I have tried to achieve this with a number of awkward single replacements and temporary file storages as well as with a while-construction that doesn't work at all:
$whileinput = get-content -raw C:\content-input.txt
while ($whileinput -match "\w+-beta") {
$fullval = $whileinput -match "\w+-beta" -replace "-beta","-alpha"
$whileinput = $whileinput -replace '$fullval',''
}
Any help is very appreciated!
Daniel

I would find all your beta items. Then replace the corresponding alpha items.
$data = Get-Content C:\content-input.txt
$betas = ([regex]::Matches($data,'[^_]*?(?=-beta)').Value -ne '' | Foreach-Object {
[regex]::Escape($_)} ) -join '|'
$data -replace "($betas)-alpha"
Explanation:
[regex]::Matches().Value returns only the matched texts.
[^_]*? lazily matches consecutive characters that are not _. (?=-beta) is a positive lookahead for the text -beta but doesn't include the text in the match.
-ne '' is to filter out blank output.
[regex]::Escape() is not necessarily needed in this case. But it is good practice when your text may have special regex characters that you want to match literally.
$betas contains | delimited items because | is the regex OR. Using () to surround the $betas string allows one of those words to be fully matched before matching -alpha in the replacement.

Get-Content gets the entire contents of a file into a variable, so if anything in your file matches that pattern, it'll loop infinitely (because the contents of the file always match your pattern).
PowerShell is heavily based around the concept of the "pipeline" which you can use in conjunction with the Foreach-Object cmdlet to iterate over each line in a file.
I'm not quite clear on what you want the regexes to do, but I don't think the ones you have will do what you want. Try this.
Get-Content -raw C:\content-input.txt | Foreach-Object {
if($_ -match 'beta$') {
$out+=$_ -replace '\w+-alpha',''
}
}
$out | Out-File .\path-to-output.txt
$_ is the default "pipeline variable" aka the current item in the iteration - in this case the current line. Now at least your loop is working!

Accelerate Powershell script runtime

I'm using a POWERSHELL script which converts a specific log format to a tab or comma separated (CSV) format and it looks like this:
$filename = "filename.log"
foreach ($line in [System.IO.File]::ReadLines($filename)) {
$x = [regex]::Split( $line , 'regex')
$xx = $x -join ","
$xx >> Results.csv
}
And it works fine, but for a 20MB log file it takes almost 20 min to be converted! Is there a way to accelerate it?
My System: CPU: Corei7 3720QM / RAM: 8GB
Update: The log format is like this:
192.168.1.5:24652 172.16.30.8:80 http://www.example.com "useragent"
I want destination format to be:
192.168.1.5,24652,172.16.30.8,80,http://www.example.com,"useragent"
REGEX: ^([\d\.]+):(\d+)\s+([\d\.]+):(\d+)\s+([^ ]*)\s+(\".*\")$

As Lieven Keersmaekers points out, you can do a single -replace operation to do the work.
Additionally, foreach($thing in $o.GetThings()){} will initially block until GetThings() return and then store the entire result in memory, which you have no need for. You can avoid this by using the pipeline instead.
Finally, your regex can be simplified so that the engine doesn't have to parse the entire string before splitting, by matching on either : preceded by a digit or whitespace:
Get-Content filename.log |ForEach-Object {
$_ -replace '(?:(?<=\d)\:|\s+)',','
} |Out-File results.csv

Multi-Line String to Single-Line String conversion in PowerShell

I have a text file that has multiple 'chunks' of text. These chunks have multiple lines and are separated with a blank line, e.g.:
This is an example line
This is an example line
This is an example line
This is another example line
This is another example line
This is another example line
I need these chunks to be in single-line format e.g.
This is an example lineThis is an example lineThis is an example line
This is another example lineThis is another example lineThis is another example line
I have researched this thoroughly and have only found ways of making whole text files single-line. I need a way (preferably in a loop) of making an array of string chunks single-line. Is there any way of achieving this?
EDIT:
I have edited the example content to make it a little clearer.

# create a temp file that looks like your content
# add the A,B,C,etc to each line so we can see them being joined later
"Axxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Bxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Cxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Dxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Exxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Fxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Gxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Hxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Ixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" | Set-Content -Path "$($env:TEMP)\JoinChunks.txt"
# read the file content as one big chunk of text (rather than an array of lines
$textChunk = Get-Content -Path "$($env:TEMP)\JoinChunks.txt" -Raw
# split the text into an array of lines
# the regex "(\r*\n){2,}" means 'split the whole text into an array where there are two or more linefeeds
$chunksToJoin = $textChunk -split "(\r*\n){2,}"
# remove linefeeds for each section and output the contents
$chunksToJoin -replace '\r*\n', ''
# one line equivalent of above
((Get-Content -Path "$($env:TEMP)\JoinChunks.txt" -Raw) -split "(\r*\n){2,}") -replace '\r*\n', ''

A bit of a fudge:
[String] $strText = [System.IO.File]::ReadAllText( "c:\temp\test.txt" );
[String[]] $arrLines = ($strText -split "`r`n`r`n").replace("`r`n", "" );
This relies on the file having Windows CRLFs.

There a several ways to approach a task like that. One is to use a regular expression replacement with a negative lookahead assertion:
(Get-Content 'C:\path\to\input.txt' | Out-String) -replace "`r?`n(?!`r?`n)" |
Set-Content 'C:\path\to\output.txt'
You could also work with a StreamReader and StreamWriter:
$reader = New-Object IO.StreamReader 'C:\path\to\input.txt'
$writer = New-Object IO.StreamWriter 'C:\path\to\output.txt'
while ($reader.Peek() -gt 0) {
$line = $reader.ReadLine()
if ($line.Trim() -ne '') {
$writer.Write($line)
} else {
$writer.WriteLine()
}
}

Parsing CSV in Powershell from a string?

I'm reading a file with Get-Content, do some modifications to it, and I have a structure of a CSV after that, which do further stuff with. If I do and Out-File, and then Import-CSV it, works fine, but I want to get rid of this overhead, and parse it to CSV from my edited string.
The problem I'm facing is, that my CSV fields are delimited by commas, and are enclosed in quotation marks, BUT, some of the columns in my CSV contain multiline strings, with commas in them, but those commas are not delimiters.
I was trying to do
$mycsvcontent | ConvertFrom-CSV
and
$mycsvcontent | ConvertFrom-CSV -Header "Column1","Column2","etc..."
But because of the "not delim commas", the structure of the CSV is parsed incorrectly.
How can I achieve this?

Commas within fields (enclosed in quotation marks) should not be a problem, if your $mycsvcontent is an appropriate object.
$mycsvcontent = {"a,a","b","c,c"}
$mycsvcontent | ConvertFrom-CSV -Header "Column1","Column2","etc..."
returns
Column1 Column2 etc...
------- ------- ------
a,a b c,c
If it is just an array, try the following:
$mycsvcontent = #("a,a","b","c,c")
$f = '"' + $($mycsvcontent -join '","') + '"'
$f | ConvertFrom-CSV -Header "Column1","Column2","etc..."
The result is the same

PowerShell to remove text from a string

What is the best way to remove all text in a string after a specific character? In my case "=" and after another character in my case a ,, but keep the text between?
Sample input
=keep this,

Another way to do this is with operator -replace.
$TestString = "test=keep this, but not this."
$NewString = $TestString -replace ".*=" -replace ",.*"
.*= means any number of characters up to and including an equals sign.
,.* means a comma followed by any number of characters.
Since you are basically deleting those two parts of the string, you don't have to specify an empty string with which to replace them. You can use multiple -replaces, but just remember that the order is left-to-right.

$a="some text =keep this,but not this"
$a.split('=')[1].split(',')[0]
returns
keep this

This should do what you want:
C:\PS> if ('=keep this,' -match '=([^,]*)') { $matches[1] }
keep this

This is really old, but I wanted to add my slight variation for anyone else who may stumble across this. Regular expressions are powerful things.
To keep the text which falls between the equal sign and the comma:
-replace "^.*?=(.*?),.*?$",'$1'
This regular expression starts at the beginning of the line, wipes all characters until the first equal sign, captures every character until the next comma, then wipes every character until the end of the line. It then replaces the entire line with the capture group (anything within the parentheses). It will match any line that contains at least one equal sign followed by at least one comma. It is similar to the suggestion by Trix, but unlike that suggestion, this will not match lines which only contain either an equal sign or a comma, it must have both in order.

I referenced #benjamin-hubbard 's answer above to parse the output of dnscmd for A records, and generate a PHP "dictionary"/key-value pairs of IPs and Hostnames. I strung multiple -replace args together to replace text with nothing or tab to format the data for the PHP file.
$DnsDataClean = $DnsData `
-match "^[a-zA-Z0-9].+\sA\s.+" `
-replace "172\.30\.","`$P." `
-replace "\[.*\] " `
-replace "\s[0-9]+\sA\s","`t"
$DnsDataTable = ( $DnsDataClean | `
ForEach-Object {
$HostName = ($_ -split "\t")[0] ;
$IpAddress = ($_ -split "\t")[1] ;
"`t`"$IpAddress`"`t=>`t'$HostName', `n" ;
} | sort ) + "`t`"`$P.255.255`"`t=>`t'None'"
"<?php
`$P = '10.213';
`$IpHostArr = [`n`n$DnsDataTable`n];
?>" | Out-File -Encoding ASCII -FilePath IpHostLookups.php
Get-Content IpHostLookups.php

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to maniuplate text in first column of CSV file with script - string

Related

Powershell multiple string replacement using while cycle

Accelerate Powershell script runtime

Multi-Line String to Single-Line String conversion in PowerShell

Parsing CSV in Powershell from a string?

PowerShell to remove text from a string

Categories

Resources