How to remove all characters from multiple filenames after specified characters - Powershell

How to remove all characters from multiple filenames after specified characters - Powershell - string

Forgive me if I explain poorly, I'm very new to PowerShell.
I'm trying clean up my media files and I'm trying to remove all characters after a specified string from multiple files with all sub directories of a directory.
The filename length will not be consistent. But the file types will. I need to exclude the extension from being removed as well.
So the files would look something like this:
TVshow S01 E01 Title of episode.mp4
LongerTVShow S03 E01 Title of episode.mp4
I want to remove everything after E01, while keeping E01
Result:
TVshow S01 E01.mp4
LongerTVShow S03 E01.mp4
I currently have a few other lines that are cleaning out characters I specify, for example finding periods and replacing them with spaces:
get-childitem -recurse | dir -Filter *.mp4 | Rename-Item -NewName { $_.BaseName.replace('.',' ') + $_.Extension }
That works well, as it will apply to all files in the directory. But you need to specify the character to replace.
I was then just going to use multiple instance of the command for E01, E02, E03 etc. In the same way I remove multiple stings like the code below:
get-childitem -recurse | dir -Filter *.txt | Rename-Item -NewName { $_.BaseName.replace('1080p','').replace('720p','').replace('HD','') + $_.Extension }
I was hoping to use something along the same lines, I've seen suggestions for trim or splitting but I can't seem to figure it out and I haven't been able to find anything.
Thanks for any answers!
Edit
I used the code by AdminOfThings and added that into what I have.
get-childitem -recurse |dir -Filter *.mp4 | Rename-Item -NewName { ($_.BaseName -creplace '(?<=S\d+ E\d+)\D.*') + $_.Extension }
So if anyone needs something like this in future, this will rename any .mp4 files in the directory and all sub directories it's run in. Specifically anything after E01, E02, E03 etc. Resulting in the following:
TvShow S01 E04 title_of_show.mp4
TvShow S08 E03Title_of_show.mp4
into:
TvShow S01 E04.mp4
TvShow S08 E03.mp4
Very specific but someone may find this useful.

You can do the following:
Get-ChildItem -Recurse -Filter *.txt -File |
Rename-Item -NewName { ($_.BaseName -creplace '(?<=S\d+ E\d+)\D.*') + $_.Extension }
Explanation:
The -creplace operator performs a case-sensitive regex match (-replace is case-insensitive) and then string replacement. If no string replacement is used, then the matched string is just replaced with empty string.
The regex string is (?<=S\d+ E\d+)\D.*.
(?<=) is a positive lookbehind mechanism. This means that the current position in the string must have previous characters that match the lookbehind expression.
S\d+ matches literal s or S and one or more (+) digits (\d). In our case, only S is matched because we are using -creplace. The space after \d+ is a literal space.
E\d+ matches literal E and one or more digits.
\D matches a non-digit character. This is needed so that the lookbehind \d+ won't give back any digits. It allows us to be currently on a non-digit and know that we matched all previous digits.
.* matches any characters greedily until the end of the string.

This is long, but how about this:
dir -Filter *.mp4 | Rename-Item -NewName { $_.BaseName.Split(' ')[0] + ' ' + $_.BaseName.Split(' ')[1] + ' ' + $_.BaseName.Split(' ')[2] + $_.Extension }

Related

Powershell multiple string replacement using while cycle

I am trying to solve a somewhat weird problem: I need to replace strings within a raw content by strings from the same content that meet a certain matching criteria. The input data look like this:
apple-beta
apple-alpha_orange-beta
apple-alpha_orange-alpha_cherry-beta
apple-alpha_orange-alpha_kiwi-beta
apple-alpha_orange-alpha_mango-beta
abcd-alpha_efgh-beta
abcd-alpha_efgh-alpha_ijkl-beta
abcd-alpha_efgh-alpha_mnop-beta
The replacment should work as follows: look for all "-beta" strings in the content and delete all according "-alpha" strings (eg because there is "orange-beta" already => all "orange-alpha" should be deleted, because there is "apple-beta" already => all "apple-alpha" should be deleted etc.). The result would look like this:
apple-beta
_orange-beta
__cherry-beta
__kiwi-beta
__mango-beta
abcd-alpha_efgh-beta
abcd-alpha__ijkl-beta
abcd-alpha__mnop-beta
I have tried to achieve this with a number of awkward single replacements and temporary file storages as well as with a while-construction that doesn't work at all:
$whileinput = get-content -raw C:\content-input.txt
while ($whileinput -match "\w+-beta") {
$fullval = $whileinput -match "\w+-beta" -replace "-beta","-alpha"
$whileinput = $whileinput -replace '$fullval',''
}
Any help is very appreciated!
Daniel

I would find all your beta items. Then replace the corresponding alpha items.
$data = Get-Content C:\content-input.txt
$betas = ([regex]::Matches($data,'[^_]*?(?=-beta)').Value -ne '' | Foreach-Object {
[regex]::Escape($_)} ) -join '|'
$data -replace "($betas)-alpha"
Explanation:
[regex]::Matches().Value returns only the matched texts.
[^_]*? lazily matches consecutive characters that are not _. (?=-beta) is a positive lookahead for the text -beta but doesn't include the text in the match.
-ne '' is to filter out blank output.
[regex]::Escape() is not necessarily needed in this case. But it is good practice when your text may have special regex characters that you want to match literally.
$betas contains | delimited items because | is the regex OR. Using () to surround the $betas string allows one of those words to be fully matched before matching -alpha in the replacement.

Get-Content gets the entire contents of a file into a variable, so if anything in your file matches that pattern, it'll loop infinitely (because the contents of the file always match your pattern).
PowerShell is heavily based around the concept of the "pipeline" which you can use in conjunction with the Foreach-Object cmdlet to iterate over each line in a file.
I'm not quite clear on what you want the regexes to do, but I don't think the ones you have will do what you want. Try this.
Get-Content -raw C:\content-input.txt | Foreach-Object {
if($_ -match 'beta$') {
$out+=$_ -replace '\w+-alpha',''
}
}
$out | Out-File .\path-to-output.txt
$_ is the default "pipeline variable" aka the current item in the iteration - in this case the current line. Now at least your loop is working!

PowerShell, parsing, problem using a specific string with -match/-like and -split

I have a variable that is filled with several lines of text and I am trying to parse the data from it. Now around the middle of the text is a specific string "Reference(s) :" and I need to get everything from above this specific string. However every way I have tried has failed.
I tried making it a delimiter
$Var.split("Reference(s) :")
I tried the below 2 options just to try to capture that one line (because if I can do this, then I know I can pull everything before it).
$Var.split("`n") | Where-Object {$_ -match "Reference(s) :"}
and
$Var.split("`n") | Where-Object {$_ -like "*Reference(s) :*"}
and I've tried some if statements (Where $_ is a single line of text)
If ($_ -like "*Reference(s) :*") {some logic}
I cannot just match "Reference" because that word appears elsewhere in the text....and I am needing this to process several instances of this text.
I think the problem has to do with the parenthesis, the space, and the colon (special characters). I did try preceding each special character with a ` but that did not seem to work.
Anyone have any ideas? There has to be a way to match special characters, I just haven't found it yet.

If $var is truly a single string, you can use -split at your reference point and then retrieve the first split string ([0]). This will retrieve everything from the start of the string until the split point.
($Var -split 'Reference\(s\) :')[0]
Since -split uses regex matching, you must backslash escape regex special characters to match them literally. Here ( and ) are special.
In the future, you can just process your match string using [regex]::Escape('String'), and it will do all the escaping for you.
If you want the line just before the reference point, you can convert your string into an array. Then return the line above the matching line.
foreach ($line in ($Var -split '\r?\n')) {
if ($line -match 'Reference\(s\) :') {
$lastLine
} else {
$lastLine = $line
}
}

Apologies I know this is very similar to AdminOfThings good answer. I was already testing it so figured I'd post.
$text =
#"
Line1
Line2
Line3
Line4
Line5
Reference(s) :
Line5
Line7
Line8
Line9
Line10
"#
($text -Split "Reference\(s\) :")[0]
This also works, but for various reasons it's recommended to stay with the PowerShell native split operator:
$text.Split([String[]]"Reference(s) :", [StringSplitOptions]::None )[0]

Here it is using the where method. You can just use -eq instead of -match if that's the whole line.
get-content file
before1
before2
before3
Reference(s) :
after1
after2
after3
(get-content file).where({$_ -match [regex]::escape('Reference(s) :')},'Until')
before1
before2
before3

How to parse string in powershell

I have a Powershell command that outputs multiple lines.
I want to output only one line that contains the name of a .zip file.
Currently, all lines are returned when substring .zip is found:
$p.Start() | Out-Null
$p.WaitForExit()
$output = $p.StandardOutput.ReadToEnd()
$output += $p.StandardError.ReadToEnd()
foreach($line in $output)
{
if($line.Contains(".zip"))
{
$line
}
}

Since you're using .ReadToEnd(), $output receives a single, multi-line string, not an array of lines.
You must therefore split that string into individual lines yourself, using the -split operator.
You can then apply a string-comparison operator such as -match or -like directly to the array of lines to extract matching lines:
# Sample multi-line string.
$output = #'
line 1
foo.zip
another line
'#
$output -split '\r?\n' -match '\.zip' # -> 'foo.zip'
-split is regex-based, and regex \r?\n matches newlines (line breaks) of either variety (CRLF, as typical on Windows, as well as LF, as typical on Unix-like platforms).
-match is also regex-based, which is why the . in \.zip is \-escaped, given that . is a regex metacharacter (it matches any character other than LF by default).
Note that -match, like PowerShell in general, is case-insensitive by default, so both foo.zip and foo.ZIP would match, for instance;
if you do want case-sensitivity, use -cmatch.
As an aside:
I wonder why you're running your command via a [System.Diagnostics.Process] instance, given that you seem to be invoking synchronously while capturing its standard streams.
PowerShell allows you to do that much more simply by direct invocation, optionally with redirection:
$output = ... 2>&1

How to maniuplate text in first column of CSV file with script

Have a CSV file with multiple columns with information. Need to remove the opening and closing " in the Employee Name as well as the , as seen below.
Employee Name,Employee #,column3, column4 etc. <br>
"Lastname, Firstname",123,abc,xyz<br>
"Lastname, Firstname",123,abc,xyz<br>
Result:
Employee Name,Employee #,column3, column4 etc.<br>
Lastname Firstname,123,abc,xyz<br>
Lastname Firstname,123,abc,xyz<br>
Tried using the following Powershell script:
(gc C:\pathtocsv.csv) | % {$_ -replace '"', ""} | out-file C:\pathtocsv.csv -Fo -En ascii
This only removes the " " around Lastname , Firstname but the comma is still present when opening the csv file in a text editor. Need this format to send to data to another company. Everything I have tried removes every comma. Novice in powershell and other languages, I am sure this is an easy fix. Please help!

Powershell has a lot of built-in handling for CSV files, instead of trying to treat is as a text file you can use the following to remove just the comma you want:
Import-Csv .\a.csv | % {
$_."Employee Name" = ($_."Employee Name" -replace ',','')
$_ #return modified rows
} | Export-Csv .\b.csv -notype -delim ','
this will by default export everything with double quotes, so you may need to go back and run something like:
(gc .\b.csv -raw) -replace '"','' | Out-File .\c.csv
to also remove all the double quotes.

Warning: quotes are important if text contains special characters (i.e. comma, quote)
If you really want to strip lines, you can process your csv as regular text file:
#sample data
#'
"Lastname, Firstname",123,abc,xyz
"Lastname, Firstname",123,abc,xyz
'# | out-file c:\temp\test.csv
Get-Content c:\temp\test.csv | % {
$match = [Regex]::Match($_,'"([^,]*), ([^"]*)"(.*)')
if ($match.Success) {
$match.Groups[1].Value+' '+$match.Groups[2].Value+$match.Groups[3].Value
} else {
$_ #skip processing if line format do not match pattern
}
}

PowerShell to remove text from a string

What is the best way to remove all text in a string after a specific character? In my case "=" and after another character in my case a ,, but keep the text between?
Sample input
=keep this,

Another way to do this is with operator -replace.
$TestString = "test=keep this, but not this."
$NewString = $TestString -replace ".*=" -replace ",.*"
.*= means any number of characters up to and including an equals sign.
,.* means a comma followed by any number of characters.
Since you are basically deleting those two parts of the string, you don't have to specify an empty string with which to replace them. You can use multiple -replaces, but just remember that the order is left-to-right.

$a="some text =keep this,but not this"
$a.split('=')[1].split(',')[0]
returns
keep this

This should do what you want:
C:\PS> if ('=keep this,' -match '=([^,]*)') { $matches[1] }
keep this

This is really old, but I wanted to add my slight variation for anyone else who may stumble across this. Regular expressions are powerful things.
To keep the text which falls between the equal sign and the comma:
-replace "^.*?=(.*?),.*?$",'$1'
This regular expression starts at the beginning of the line, wipes all characters until the first equal sign, captures every character until the next comma, then wipes every character until the end of the line. It then replaces the entire line with the capture group (anything within the parentheses). It will match any line that contains at least one equal sign followed by at least one comma. It is similar to the suggestion by Trix, but unlike that suggestion, this will not match lines which only contain either an equal sign or a comma, it must have both in order.

I referenced #benjamin-hubbard 's answer above to parse the output of dnscmd for A records, and generate a PHP "dictionary"/key-value pairs of IPs and Hostnames. I strung multiple -replace args together to replace text with nothing or tab to format the data for the PHP file.
$DnsDataClean = $DnsData `
-match "^[a-zA-Z0-9].+\sA\s.+" `
-replace "172\.30\.","`$P." `
-replace "\[.*\] " `
-replace "\s[0-9]+\sA\s","`t"
$DnsDataTable = ( $DnsDataClean | `
ForEach-Object {
$HostName = ($_ -split "\t")[0] ;
$IpAddress = ($_ -split "\t")[1] ;
"`t`"$IpAddress`"`t=>`t'$HostName', `n" ;
} | sort ) + "`t`"`$P.255.255`"`t=>`t'None'"
"<?php
`$P = '10.213';
`$IpHostArr = [`n`n$DnsDataTable`n];
?>" | Out-File -Encoding ASCII -FilePath IpHostLookups.php
Get-Content IpHostLookups.php

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string