Use Powershell to extract remainder of line after certain text - string

I have a text file with headlines from a LexisNexis Search. I would like to extract the headline from each entry, which comes after the string "HEADLINE: " in the file, and append it to another text file using PowerShell. I am using this line:
select-string -path "C:\Users\WGA\Documents\Personal\ANTH_5330\Content_Analysis\Newspaper_Stories,_Combined_Papers2016-04-18_17-59.txt" -Pattern "HEADLINE: " | select line | out-file C:\Users\WGA\Documents\Personal\ANTH_5330\Content_Analysis\Headlines.txt -append
It is sort of working and I am looking to improve the output. I am linking to the two files below (One is the file to be searched, the other is the output):
https://drive.google.com/folderview?id=0Byxg512qAqFgU0JrRTNUbVlkeGs&usp=sharing
I am open to suggestions to improve this output as, ideally, I would like one line per headline only in the output file.

Let use the regex a little more to get exactly what we want and nothing more. Select-String returns match info objects that contain much of the information you are looking for, including capture groups. Knowing the object properties certain helps. I am assuming you have PowerShell 2.0 for this so it is a little more verbose but works just as well.
$path = "D:\Downloads\Newspaper_Stories,_Combined_Papers2016-04-18_17-59.TXT"
Get-Content $path | Out-String | Select-String -Pattern "(?smi)HeadLine: (.*?)`r`n`r`n" -AllMatches |
Select-Object -ExpandProperty Matches |
ForEach-Object{$_.Groups[1]} |
ForEach-Object{$_.Value -replace "`r`n"," "} |
Set-Content $outputFile
We read in the file as one large string. That is what Out-String is for. We do that since some of your headlines take up multiple lines. Find every line that has "headline" and then grab everything after the colon space up until the first set of newlines. The text we are looking for is inside the capture group (.*?). Next we have to expand the matches objects to get into the groups. Using for each we get the second group which contains our captured group text. A second for each replaces all the newlines with spaces so that the headlines appear as one line in the output.
I noticed that your output file had extra spaces. That is because the default encoding of Out-File is Unicode. Using Set-Content means you won't have to worry about that.
Another thing. If I am wrong and you prefer what you have you can at least skip the header of your output file by changing the select statement to use -ExpandProperty
Sample Output
Charter Schools Fall Short In Public Schools Matchup
State's charter schools buck trend Students at the 108 charters in Colorado have scored higher on state assessment tests than their peers in traditional public schools.
Bills would bypass districts to create charter schools
EDITORIAL The reality of charter schools
EDITORIAL Learning more about charters As Colorado and the nation gain more experience with charter schools, we're discovering that results are mixed-- not unlike public schools.
SPEAK OUT;2 studies, 2 views of charter schools
... output truncated.

try this
Get-Content c:\temp\stories.txt | ? {$_.startswith('HEADLINE: ')} | % {$_.substring(10)} | Out-File c:\temp\headlines.txt -enc ascii

Related

Delete rows in a .CSV file containing specific character with Powershell

I receive an automatic weekly export from a system in a .csv format. It contains a lot of usernames with the initials of the users (e.g. "fl", "nk"). A few of them have their first and last names, separated by coma (e.g. firstname.lastname). These are the ones, which have to be deleted from the .csv file.
My goal here is to write a Powershell script, which delete all rows, containing the character "." (dot) and then save the same .csv file by overwritting it.
Since I'm very new to Powershell, I'd highly appreciate a more detailed answer including the potential code. I tried various examples from similar issues, which I found here, but none of them worked and/or I am getting error messages, mostly because my syntax isn't correct.
Additional info. Here is a part of the table.
I tried this code:
Get-Content "D:\file.csv" | Where-Object {$_ -notmatch '\.'} | Set-Content "D:\File.csv"-Force -NoTypeInformation
As Mathias says, it is helpful to see what you have tried so we can help you come to a working result. It is easy to give you something like this:
$csv = Import-Csv -Path C:\Temp\temp.csv -Delimiter ";"
$newCSV = #()
foreach($row in $csv){
if(!$row.username -or $row.username -notlike "*.*"){
$newCSV += $row
}
}
$newCSV | Export-Csv -Path C:\Temp\temp.csv -Delimiter ";" -NoTypeInformation
The above code eliminates rows that have a dot on the username field. It leaves rows with an empty username intact with the 'if(!$row.username' part. But I have no idea whether this is helpful since there is no example CSV file, also there is no way to know what you have tried so far ;)
Note that I always prefer using ";" as delimiter, because opening the file in Excel will already be correctly seperated. If the current file uses ',' as a delimiter, you will need to change that when importing the CSV.
You were very close! For this you don't need a loop, you just need to do it using the correct cmdlets:
(Import-Csv -Path 'D:\file.csv' -Delimiter ';') |
Where-Object { $_.Initials -notmatch '\.' } |
Export-Csv -Path 'D:\file.csv' -Delimiter ';' -Force -NoTypeInformation
Get-Content simply reads a text file and returns the lines as string array, whereas Import-Csv parses the structure and creates objects with properties from the header line.
The brackets around the Import-Csv are needed to ensure the importing/parsing of the file is completely done before piping the results through. Without that, the resulting file may become completely empty because you cannot read and overwrite the same file at the same time.

Powershell: How to get the location of a file, depending on its name?

So my task is to write a PS script, that outputs the location of a database file. The location of the file is:
C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox database Name\Mailbox database Name.edb
I figured I can get the name of my Exchange database with
Get-MailboxDatabase | fl Name
which has the output:
Mailbox Database 0161713049
which is the name of the db but there is a bunch of invisible characters before and after the actual name.
So my question is, how could I get rid of these invisible characters? I want to concat a string to make it look like this:
C:\Program Files\Microsoft\Exchange Server\V15\Mailbox\Mailbox Database 0161713049\Mailbox Database 0161713049.edb
I would need this code to work on servers with completely different database names too, so simply removing the unwanted characters from the start with .Remove() may help, but since I don't know for sure the length of the name of the database, I can't remove the characters at the end.
Also I can't get rid of the feeling that there is a much simpler way to get the location of my .edb file.
Powershell treats almost all outputs as an object with properties in hashtable format like #{Name=MYEXCHDB}. When you just want a property value as a string instead, you must expand it like #AdminOfThings suggests:
Get-MailboxDatabase | Select-Object -ExpandProperty Name
To concatenate the name into a string:
$myString = "C:\path\to\$(Get-MailboxDatabase | Select-Object -ExpandProperty Name)"
And as #mathias-r-jessen suggests, the path to the database is another property you can get directly:
Get-MailboxDatabase | Select-Object -ExpandProperty EdbFilePath | Select-Object -ExpandProperty PathName

Find and replace a specific string within a specific file type located in wildcard path

Problem:
Update a specific string within numerous configuration files that are found within the subfolders of a partial path using PowerShell.
Expanded Details:
I have multiple configuration files that need a specific string to be updated; however, I do not know the name of these files and must begin my search from a partial path. I must scan each file for the specific string. Then I must replace the old string with the new string, but I must make sure it saves the file with its original name and in the same location it was found. I must also be able to display the results of the script (number of files affected and their names/path). Lastly, this must all be done in PowerShell.
So far I have come up with the following on my own:
$old = "string1"
$new = "string2"
$configs = Get-ChildItem -Path C:\*\foldername\*.config -Recurse
$configs | %{(Get-Content $_) -Replace $old, $new | Set-Content $_FullName
When I run this, something seems to happen.
If the files are open, they will tell me that they were modified by another program.
However, nothing seems to have changed.
I have attempted various modifications of the below code as well. To my dismay, it only seems to be opening and saving each file rather than actually making the change I want to happen.
$configFiles = GCI -Path C:\*\Somefolder\*.config -Recurse
foreach ($config in $configFiles) {
(GC $config.PSPath) | ForEach-Object {
$_ -Replace "oldString", "newString"
} | Set-Content $config.PSPath)
}
To further exasperate the issue, all of my attempts to perform a simple search against the specified string seems to be posing me issues as well.
Discussing with several others, and based on what have learned via SO... the following code SHOULD return results:
GCI -Path C:\*\Somefolder\*.config -Recurse |
Select-String -Pattern "string" |
Select Name
However, nothing seems to happen. I do not know if I am missing something or if the code itself is wrong...
Some questions I have researched and tried that are similar can be found at the below links:
UPDATE:
It is possible that I am being thwarted by special characters such as
+ and /. For example, my string might be: "s+r/ng"
I have applied the escape character that PowerShell says to use, but it seems this is not helping either.
Replacing a text at specified line number of a file using powershell
Find and replacing strings in multiple files
PowerShell Script to Find and Replace for all Files with a Specific Extension
Powershell to replace text in multiple files stored in many folders
I will continue my research and continue making modifications. I'll be sure to notate anything that get's me to my goal or even a step closer. Thank you all in advance.

Trying to Export a CSV list of users using Active Directory Module for Windows Powershell

So the below is where I'm at so far:
import-module activedirectory
$domain = "ourdomain"
Get-ADUser -Filter {enabled -eq $true} -Properties whenCreated,EmailAddress,CanonicalName |
select-object Name,EmailAddress,CanonicalName,whenCreated | export-csv C:\Data\test.csv
Unfortunately, when I run the above I get dates in two different formats in the CSV, e.g.:
01/01/2017
1/01/2017 8:35:56 PM
The issue this poses is that there isn't really a clean way to sort them. Excel's formatting doesn't change either of these formats to be more like the other, both because of the inclusion of time in one and not the other, and because the time-inclusive format doesn't use trailing zeroes in the single digit numbers, but the time-exclusive format does.
We have an existing script that captures users using the LastLogonTimestamp attribute that does this correctly by changing the bottom line to the following:
select-object Name,EmailAddress,CanonicalName,#{Name="Timestamp"; Expression={[DateTime]::FromFileTime($_.whenCreated).ToString('yyyy-MM-dd_hh:mm:ss')}}
For some reason this expression runs properly when we query the LastLogonTimestamp attribute, but when we run this version querying the whenCreated attribute, we get an entirely blank column underneath the Timestamp header.
I'm not particularly knowledgeable about PowerShell itself, and my colleague who had found the original script for the LastLogonTimestamp just found it online and adapted it as minimally as possible to have it work for us, so I don't know if something in this line would work properly with one of these attributes and not the other. It seems strange to me though that two attributes using dates in the same program would store them in different formats though, so I'm not convinced that's it.
In any case, any help anyone can offer to help us get a uniform date format in the output of this script would be greatly appreciated - it needn't have the time included if it's easier to do away with it, though if they're equally easy we may as well keep it.
whencreated is already a [DateTime]. Notice the difference between the properties when you run something like this:
Get-ADUser TestUser -Properties lastlogon,whenCreated | select lastlogon,whenCreated | fl
(Get-ADUser TestUser -Properties lastlogon).lastlogon | gm
(Get-ADUser TestUser -Properties whenCreated).whenCreated | gm
This means that you don't have to convert to a DateTime before running the toString() method.
select-object #{Name="Timestamp"; Expression={$_.whenCreated.ToString('yyyy-MM-dd_hh:mm:ss')}}

PowerShell Security Log

I am writing a PowerShell Script that counts the number of 4624 EventIDs in a given day, but I am getting lost when I go to group the information by date. Is there anyone who could help me out? My output should have the date and the number of Logins for that day and nothing more.
Here is my Code:
Get-EventLog "Security" -Before ([DateTime]::Now) |
Where -FilterScript {$_.EventID -eq 4624}
Try this:
Get-EventLog Security -Before ([DateTime]::Now) |
Where {$_.EventID -eq 4624} |
Group #{e={$_.TimeGenerated.Date}} |
Sort Count -desc
The Group-Object command allows you to specify an expression for the property to group on. In this case you want to group on the date part of the DateTime. Also note that it is unnecessary to quote arguments unless they contain space or special characters like ;, #, {, $ and (.

Resources