Compare two Excel-files in Powershell

Compare two Excel-files in Powershell - excel

I need help comparing two Excel files in Powershell.
I have an Excel-file which contains 6 000 rows and 4-5 columns with headers:
"Number" "Name" "Mobile data".
Let's call it: $Services
Now, I want to compare that file with other Excel-files. For example:
one file containing 50 rows with header columns: "Number", "Name", etc.
Let's call it $Department
The important thing is that in $Services, it contains more important columns like "Mobile data",
so my mission is to compare column: "Number" from $Services with column "Number" from each other Excel file.
Then if they match, write "the whole row" from $Services
I'm not that familiar with Excel, so I thought, this should be possible to do in Powershell.
I'm novice in Powershell, so I only know basic stuff. I'm not that familiar with pscustomobject and param.
Anyway, what I tried to do was to first declare them in variables with ImportExcel:
$Services = Import-Excel -Path 'C:\Users\*.xlsx'
$Department = Import-Excel -Path 'C:\Users\*.xlsx'
Then I made a foreach statement:
foreach ($Service in $Services) {
if (($Service).Number -like ($Department).Number)
{Write-Output "$Service"}
}
The problem with this is that it is collecting all empty columns from ($Services).Number and writing the output of each row in $Services.
I tried to add a nullorEmpty to $Department, if the .Number is empty, but it didn't make any difference. I also tried to add that if the row is empty in .Number, add "1234", but still it collects all .Number that is empty in $Services.
I also tried to do a: $Services | ForEach-Object -Process {if (($_).Number -match ($Department).Number)
{Write-Output $_}} But it didn't match any. When I tried -notmatch it took all.
I don't know but it seems that I have to convert the files to objects, like the columns to object so each string becomes an object. But right now my head is just spinning and I need some hints on where I can start with this.

I would recommend downloading the Module ImportExcel from the PSGallery.
Import-Excel can easily import your Excel sheet(s) to rows of objects, especially if your sheets are 'clean', i.e., only contain (optional) headers and data rows.
Simply import the cells to PowerShell objects and use Compare-Object to discover differences.
EDIT (after reading the additional questions by poster in the comments):
To compare using specific properties you'll need to add these to the Compare-Object parameters.
Using a trivial "PSCustomObject" to create a simple set of objects to show this idea it might look like this:
$l = 1..4 | ForEach-Object { [pscustomobject]#{a=$_;b=$_+1} }
$r = 1,2,4,5 | ForEach-Object { [pscustomobject]#{a=$_;b=$_+1} }
compare-object $l $r -Property B
B SideIndicator
- -------------
6 =>
4 <=
You may also compare multiple properties this way:
compare-object $l $r -Property A,B
A B SideIndicator
- - -------------
5 6 =>
3 4 <=
FYI: I find myself typing "Get-Command -Syntax SomeCommand" so often every day that I just made a function "Get-Syntax" (which also expands aliases) and then aliased this to simply "syn".
90% of the time once you understand the structure of PowerShell cmdlets (at least well-written ones) there is no need to even look at the full help -- the "syntax" blocks are sufficient.
Until then, type HELP (Get-Help) a lot -- 100+ times per day. :)

So the solution for my whole problem was to add -PassThru.
Because my mission was to compare the numbers of the two Excel-files, select the numbers that equals and then take all the properties from one file. So my script became like this:
$Compare = Compare-Object $Services $Department -Property Numbers -IncludeEqual -ExcludeDifferent -PassThru
$Compare | Export-Excel -Path 'C:\Users\*
But I wonder, -PassThru sends all the objects from ReferenceObject, how can I send all the objects from DifferenceObject?

Related

Import multiples data of column from xlsx to a powershell cmd

In input :
i have an \users\myself\desktop\test\file.xslx containing multiples column like this :
ColumnA ColumnB ... ColumnQ (for a total of 17 columns)
each column have some data.
In output :
I would like to have a cmd like this :
New-ADUser -Name $(columnAdata) -GivenName "$(columnBdata)" -Surname "$(columnCdata)" -DisplayName "$(columnDdata)" -SamAccountName "$(columnEdata)" ... etc until -blabla "$(ColumnQdata)"
Is that possible to store de columndata in variables to insert them in a command ?
Thanks a lot.

I would suggest to first change the column headers to be the same as the parameters you intend to use with the New-ADUser cmdlet.
Having matching headers would help greatly in not making mistakes.
Next, save your Excel file as CSV, let's say a file called NewUsers.csv
The code then can be quite simple and easy to maintain:
# import the CSV file using the same separator character as Excel uses on your system
Import-Csv -Path 'X:\NewUsers.csv' -UseCulture | ForEach-Object {
# use Splatting: create a Hashtable with all properties needed taken from the CSV
# see: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_splatting
$userProperties = #{
Name = $_.Name # as opposed to $_.columnAdata
GivenName = $_.GivenName # as opposed to $_.columnBdata
Surname = $_.Surname
DisplayName = $_.DisplayName
SamAccountName = $_.SamAccountName
# etcetera
}
New-ADUser #userProperties
}

Convert date format from DD.MM.YYYY to YYYY-MM-DD

I have folders on the disk in the german format: "01.05.2019", which I want to convert into the english format: "2019-05-01".
I am using PowerShell. Is there a fancy function for doing this? Or should I get all substrings and reorder them?
Currently I only collect the strings:
$OrgDir = "P:\Fotos\Import"
$folders = Get-ChildItem $($OrgDir) -Directory
foreach ($dir in $folders) {
Write-Host "folder: " $dir
}

Use [DateTime]::ParseExact() to avoid the date parser mixing up the month and day:
$OrgDir = "P:\Fotos\Import"
$folders = Get-ChildItem $OrgDir -Directory
foreach ($dir in $folders) {
Write-Host "folder: $([DateTime]::ParseExact($dir.Name, "dd.MM.yyyy", $null).ToString("yyyy-MM-dd"))"
}
The above prints out the converted names. To efficiently rename the files however, I recommend this:
Get-ChildItem $OrgDir -Directory |
ForEach-Object {
$_ | Rename-Item -NewName (
[DateTime]::ParseExact($_.Name, "dd.MM.yyyy", $null).ToString("yyyy-MM-dd")
)
}
This line of PowerShell renames all directories at $OrgDir to the new date format, given that all directories in the folder are named this way.
Reference
UPDATE:
As #Matt Johnson pointed out, $null uses your system default culture for ParseExact(string, format, culture) (as well as ToString(format, culture)). This may or may not cause problems based on what culture setting your system currently has.
To ensure these settings do not interfere with this function, use [System.Globalization.CultureInfo]::InvariantCulture for the culture parameters in both ParseExact() and ToString().

Matt Johnson provides important pointers in his comments:
To write robust code that works independently of what culture is in effect, specify the culture context explicitly, both when:
parsing strings as [datetime] instances
formatting [datetime] instances as strings
Therefore, use the following in your case:
PS> [datetime]::Parse('01.05.2019', [cultureinfo] 'de-DE').
ToString('yyyy-MM-dd', [cultureinfo]::InvariantCulture)
2019-05-01
Since 01.05.2019 is a valid (day-first) short date in German, no custom parsing is needed, only de-DE (German / Germany) as the cultural context.
.ToString('yyyy-MM-dd', [cultureinfo]::InvariantCulture) specifies an explicit output-format string; [cultureinfo]::InvariantCulture - the invariant culture (based on US-English) - as the cultural context ensures that the date is interpreted based on the Gregorian calendar.
Note that in PowerShell casts (e.g., [datetime] '01.05.2019') and string interpolation (e.g., "$(get-date)") always use the invariant culture - whereas calling [datetime]::Parse() and .ToString() without an explicit culture (format-provider) argument uses the current culture.

Finally this did the trick for me:
PS> [datetime]::Parse('01.05.2019', [cultureinfo] 'de-DE').
ToString('yyyy-MM-dd', [cultureinfo]::InvariantCulture)
All other solutions ended in:
Get-Date : Cannot bind parameter 'Date'. Cannot convert value "15.12.2019" to type "System.DateTime". Error: "String was not recognized as a valid DateTime."
Thanks a lot mklement0.

PowerShell on CSV file - looking for string depending on string

I need your help regarding PowerShell programming on CSV file.
I've made some searches but cannot find what I'm looking for (or perhaps I don't know the technical terms). Basically, I have an Excel workbook with large amount of data (more or less 38 columns x 350.000 rows), and there are a couple of formulas that take hours to calculate.
I was first wondering if PowerShell could speed up a bit the calculation compared to Excel. The calculations taking most of my time are in fact not that complex (at least at first glance). My data is more or less constructed like this:
Ref Title
----- --------------------------
A/001 "free_text"
A/002 "free_text A/001 free_text"
... ...
A/005 "free_text A/004 free_text"
A/006 "free_text"
B/001 "free_text"
B/002 "free_text"
C/001 "free_text"
C/002 "free_text"
...
C/050 "free_text C/047 free_text"
... ...
C/103 "free_text"
D/001 "free_text"
D/002 "free_text D/001 free_text"
... ....
Basically the data is as follows:
the Ref field contains unique values, in {letter}/{incremental value} format.
In some rows, the Title field may call up one of the Ref data. For example, in line 2, the Title calls for the A/001 Ref. In the last row, the Title calls for the D/001 Ref, etc.
There is no logic pattern defining when this ref could be called up in a title. This is random.
However, what I'm 100% sure of is the following:
The Ref called in the Title is always belonging to the same {letter} block. For example: the string 'C/047' in the Title field can only be found in the block where the Ref {letter} is C.
The Ref called in the Title will always be located 'after' (or in a lower row) than the Ref it refers to. In other words, I cannot have a line with following pattern:
Ref Title
------------ -----------------------------------------
{letter/i} {free_text {letter/j} free_text} with j<i
→ This is not possible.
→ j is always > i
I've used these characteristics in Excel to minimize my lookup arrays. But it still takes an hour to calculate everything.
I've therefore looked into PowerShell, and started to 'play' a bit with the CSV, and looping with the ForEach-Object hoping I would have quicker results. Up to now I basically ended-up looping twice on my CSV file.
$CSV1 = myfile.csv
$CSV2 = myfile.csv
$CSV1 | ForEach-Object {
# find Title
$TitSearch = $_.$Ref
$CSV2 | ForEach-Object {
if ($_.$Title -eq $TitSearch) {
myinstructions
}
}
}
It works but it's really really really long. So I then tried the following instead of using the $CSV2 | ForEach...:
$CSV | where {$_.$Title -eq $TitleSearch} | % $Ref
In either case, it's too long and not efficient at all. Additionally with these 2 solutions, I'm not using above characteristics which could reduce the lookup array and as already stated, it seems I end up looping twice on the CSV file from its beginning up to the end.
Questions:
Is there a leaner way to do this?
Am I wasting my time with PowerShell?
I though about creating 1 file per Ref {letter} block (1 file for block A, 1 for B, etc...). However I have about 50.000 blocks to create. Or create them one by one, carry out the analysis, put the results in a new file, and delete them. Would that be quicker?
Note: this is for work, to be used by other colleagues, and Excel and PowerShell are really the only softwares we may use. I know VBA but ok... At the end I'm curious about how and if this can be solved in a simple manner using PowerShell.

As far as I can see your base algorithm do N^2 iteration (~120 billion). There is a standard way to make it efficient - you need to build a hashtable first. Hashtable is a key/value storage, and look up is pretty much instantaneous, so algorithm's time complexity will become ~N.
Powershell has built-in data type for that. In your case the key would be ref, and the value an array of cell data (assuming your table is smth like: ref, title, col1, ..., colN)
$hash = #{}
foreach($row in $table} {$hash.Add($row.ref, #($row.title, $row.col1, ...)}
#it will take 350K steps to generate it
#then you can iterate over it again
foreach($key in $hash.Keys) {
$key # access current ref
$rowData = $hash.$key # access to current row elements (by index)
$refRowData = $hash[$rowData[$j]] # lookup from other rows, assuming lookup reference is in some column
}
So it's a general idea how to solve the time issue. To be honest I don't believe you need to recreate a wheel and code it yourself. What you need is a relational database. Since you have excel, you should have MS ACCESS too. Just import your data in there, make ref and title an index, then all you need to do is self join. MS Access suck, but I'm sure it will handle 350K row just fine.
Ideally you'd need to get a database on some corporate MSSQL server (open a ticket, talk to your manger, etc). It will calculate all that in seconds, and then you can link the output to a spreadsheet as well.

Split a string containing fixed length columns

I got data like this:
3LLO24MACT01 24MOB_6012010051700000020100510105010 123456
It contains different values for different columns when I import it.
Every column is fixed width:
Col#1 is the ID and just 1 long. Meaning it is "3" here.
Col#2 is 3 in length and here "LLO".
Col#3 is 9 in length and "24MACT01 " (notice that the missing ones gets filled up by blanks).
This goes on for 15 columns or so...
Is there a method to quickly cut it into different elements based on sequence length? I couldn't find any.

This can be done with RegEx matching, and creating an array of custom objects. Something like this:
$AllRecords = Get-Content C:\Path\To\File.txt | Where{$_ -match "^(.)(.{3})(.{9})"} | ForEach{
[PSCustomObject]#{
'Col1' = $Matches[1]
'Col2' = $Matches[2]
'Col3' = $Matches[3]
}
}
That will take each line, match by how many characters are specified, and then create an object based off those matches. It collects all objects in an array and could be exported to CSV or whatever. The 'Col1', 'Col2' etc are just generic column headers I suggested due to a lack of better information, and could be anything you wanted.
Edit: Thank you iCodez for showing me, perhaps inadvertantly, that you can specify a language for your code samples!

[Regex]::Matches will do this rather easily. All you need to do is specify a Regex pattern that has . followed by the number of characters you want in curly braces. For example, to match a column of three characters, you would write .{3}. You then do this for all 15 columns.
To demonstrate, I will use a string that contains the first three columns of your example data (since I know their sizes):
PS > $data = '3LLO24MACT01 '
PS > $pattern = '(.{1})(.{3})(.{9})'
PS > ([Regex]::Matches($data, $pattern).Groups).Value
3LLO24MACT01
3
LLO
24MACT01
PS >
Note that the first value outputted will be the text matched be all of the capture groups. If you do not need this, you can remove it with slicing:
$columns = ([Regex]::Matches($data, $pattern).Groups).Value
$columns = $columns[1..$columns.Length]

New-PSObjectFromMatches is a helper function for creating PS Objects from regex matches.
The -Debug option can help with the process of writing the regex.

Read values from excel and replace them in another file using Powershell

I need to find a way so that I can read values from an excel file and then replace all the corresponding values in another file accordingly. Basically, I found some discrepancy in one of the automated task we run and I need to convert some values within the file before I send it to the automated task. I have an excel file that list the "wrong" values and their corresponding "correct" values and I need to how Power shell can help me in this.
$docID = $args[0] $docid #Read Z ticker file
$Zfile = 'I:\IS\Rishabh\Z tickers Active.xls' # Find the .rps file imported automatically from schwab trust
$RPSFile= 'L:\Trading\Schwab Trust\Import\CS<%dmmdd-01yy>.RPS'
While (Get-Content $ZFile)
{
$_-cmatch 'A$','B$'| Set-Variable X-ticker # End Loop
}
(Get-Content $RPSfile) | ForEach-Object { $_-replace '%, ' ,'X-ticker' #End Loop }
Set-Content $RPSFile

You don't need to use Powershell. Excel itself has built in mechanisms for doing what you want. For example you could use the LOOKUP function in Excel.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string