parsing a table in a word document using node.js - linux

I'm trying to create a node.js web app hosted by a linux server. the app must read and parse a table in a word document.
I've looked around and saw that Powershell can trivially accomplish this. The problem is that Powershell is an MS scripting language, and its Mac port (pash) is very unstable and chokes whenever I want to execute something as simple as this:
$wd = New-Object -ComObject Word.Application
$wd.Visible = $true
$doc = $wd.Documents.Open($filename)
$doc.Tables | ForEach-Object {
$_.Cell($_.Rows.Count, $_.Columns.Count).Range.Text
}
I've looked into other solutions like Docsplit and it's too generic (ie it converts an entire word doc to just plain text, not granular enough for my purposes).
some suggested using the saaspose API, but it costs lotsa money! I think I can do this myself.
ideas?

Here's a python module that can read/write docx files:
https://github.com/mikemaccana/python-docx

If you're deploying on a Linux machine, it's probably best to use Docsplit and then parse the output text, or you could try Apache POI.
Another option would be to try MS COM API running on Wine, but I'm not sure if it's compatible.

Related

VBA Shell(), WSrcipt.Shell and Shell.Application. What else can I try?

I'm trying to start a (very) proprietary application at work with VBA. I've used plain old Shell("C:\app\app.exe") and with WScript.Shell and Shell.Application I've tried ShellEX.Run and ShellEX.ShellExecute. All methods work, but when I log in to the application, it freezes. Are these methods doing something in the background that could cause the application to freeze? Are there any other methods I could try?
I am able to call the application easily with a batch file or with a PowerShell script. There is something about VBA that this application doesn't seem to like.
Thanks,
J

Powershell CSOM Sharepoint online list permission

I've been writing a script to connect to SharePoint online, create a document library and add some AD accounts to the permission. I've written all the code using snippets I have found through many searches but am having an issue with the permissions part.
I keep getting an error when adding the user and permission type to roledefinitionbinding.
The error is:
Collection has not been initialized at line 0 char 0.
Here is my code
$collRdb = new-object Microsoft.SharePoint.Client.RoleDefinitionBindingCollection($ctx)
$collRdb.Add($role)
$collRoleAssign = $web.RoleAssignments;
$ctx.Load($principal)
$collRoleAssign.Add($principal, $collRdb)
$ctx.ExecuteQuery()
The issue is when it runs $collRoleAssign.Add($principal, $collRdb) part and stop with the error above.
I would really appreciate a hand with this before my PC get launched out of the window.
Thanks
Mark
EDIT
All of the code is taken from this page:
http://jeffreypaarhuis.com/2012/06/07/scripting-sharepoint-online-with-powershell-using-client-object-model/
The only change is i'm using the get principal fun instead of the get group, but not sure if that's what has done it. I'm very new to powershell.
Thanks
I don't think you can add something into $collRoleAssign if it's not loaded before.
You get an error because it has null value.
I would have wrote it like this:
$collRoleAssign = $web.RoleAssignments
$ctx.Load($collRoleAssign)
Comment: I suppose you already set $principal before
$ctx.Load($principal)
Comment: here I suppose $collRdb is set and loaded
$collRoleAssign.Add($principal, $collRdb)
$ctx.ExecuteQuery()
By the way there is a ";" in your code which should not be there
I didn't try it but that should help!
Sylvain

Use Powershell to find SSN's in Word and Excell Documents

I am very noob to Powershell and have small amounts of Linux bash scripting experience. I have been looking for a way to get a list of files that have Social Security Numbers on a server. I found this in my research and it performed exactly as I had wanted when testing on my home computer except for the fact that it did not return results from my work and excel test documents. Is there a way to use a PowerShell command to get results from the various office documents as well? This server is almost all Word and excel files with a few PowerPoints.
PS C:\Users\Stephen> Get-ChildItem -Path C:\Users -Recurse -Exclude *.exe, *.dll | `
Select-String "\d{3}[-| ]\d{2}[-| ]\d{4}"
Documents\SSN:1:222-33-2345
Documents\SSN:2:111-22-1234
Documents\SSN:3:111 11 1234
PS C:\Users\Stephen> Get-childitem -rec | ?{ findstr.exe /mprc:. $_.FullName } | `
select-string "[0-9]{3}[-| ][0-9]{2}[-| ][0-9]{4}"
Documents\SSN:1:222-33-2345
Documents\SSN:2:111-22-1234
Documents\SSN:3:111 11 1234
Is there a way to use a PowerShell command to get results from the various office documents as well? This server is almost all Word and excel files with a few PowerPoints.
When interacting with MS Office files, the best way is to use COM interfaces to grab the information you need.
If you are new to Powershell, COM will definitely be somewhat of a learning curve for you, as very little "beginner" documentation exists on the internet.
Therefore I strongly advise starting off small :
First focus on opening a single Word doc and reading in the contents into a string for now.
Once you have this ready, focus on extracting relevant info (The Powershell Match operator is very helpful)
Once you are able to work with a single Word doc, try to locate all files named *.docx in a folder and repeat your process on them: foreach ($file in (ls *.docx)) { # work on $file }
Here's some reading (admittedly, all this is for Excel as I build automated Excel charting tools, but the lessons will be very helpful for automating any Office application)
Powershell and Excel - Introduction
A useful document from a deleted link (link points to the Google Cache for that doc) - http://dev.donet.com/automating-excel-spreadsheets-with-powershell
Introduction to working with "Objects" in PS - CodeProject
When you only want to restrict this to docx and xlsx, you might also want to consider plain unzipping and then searching through the contents, ignoring any XML tags (so allow between each digit one or more XML elements).

How to update a connection string of an excel file from a script (PS)

We have an excel file which contains a connection to a database to retreive data (with a select statement).
We want to update via a (preferrably powershell) script the connection string of that file to make it query another server instead.
So for exemple :
I have report.xlsx file which connects to server A.
I run update-connection.ps1
And when I open report.xlsx it now connects to server B.
Any idea how we could do that?
Thanks.
It should be fairly easy if you decide (are allowed) to store the connection (server name) in a worksheet. Your VBA code can dynamically build the connection string based on the value of a cell. (I would probably create a named range and use it in the code).
I don't know PowerShell but the code can look something like:
$workbook.Range("Server").Value2 = "PROD_01"
You can make the worksheet hidden if you wish, but it is not a serious security.
You could try automating Excel via PowerShell, as in this article: http://kentfinkle.com/PowershellAndExcel.aspx
If you don't want to automate Excel then you could try using something like ClosedXML in your PowerShell script: http://closedxml.codeplex.com/
You can parse the connectionstring with System.Data.Common.DbConnectionStringBuilder. Check this SO thread:
Powershell regex for connectionStrings?

Powershell: commandline applications not working after calling method from module

I have created a powershell module (.psm1) file that includes a few other powershell scripts. We use it for sharepoint.
So basically, here's what happens:
I have a deploy script that retrieves the module location from the registry
It loads the module using the Import-Module cmdlet (using -force switch)
This module in turn loads the Sharepoint 2010 snap in and a few other scripts that I created
It runs runs a deployment script that references functions from the included scripts
It also runs a command line application and sends the output directly to the screen
The script will usually work the first time. However, after a few number of tries the commandline tool will stop working and sending output to the screen altogether. And if I try to run a commandline tool (not a cmd-let) after running my script, it don't worky anymore: no output, nothing is done. Its just the same as hitting enter on a blank prompt. anything powershell specific or running GUI applications will work fine but running any console application will not produce any concievable results. the only solution to this, is to just close my powershell and open it again. it will work for usually once and I will have to close it again. our users certainly wont be happy about that..
The most 'notable' things on the script:
scriptblocks are used extensively (for logging), a script block is sent to a handler that executes it using invokecommand and logs the step
its manipulating sharepoint objects
all objects are properly disposed of
no static variables are created nor changed
There are a few global variables shared across all scripts
What I have tried:
I striped my code to a bare minimum: loading an xml file, and restaring a few windows services but I'm still getting this intermittently. I have no idea which part of the code could cause this. I would love to post the code, but our company policy forbids me to. so my aplogies..
Update as per the comment below:
here's roughly how I use codeblocks. I have this function below that is used everytime I want to make the user aware of a task that I'm executing and what it outcome is.
function DoTask($someString, $scriptBlock, $param)
{
try
{
OutputTaskDescription $someString
InvokeCommand $scriptBlock -ArgumentList $param
OutputResultOK
}
catch
{
OutputResultError $_.tostring()
}
}
it could then be used like this:
$stringVar = "something"
$SpSite = new-spsite
deploySomething 'Deploying something' -param $spsite -ScriptBlock {
dosomethingToObject $stringvar
dosomethingToObject $spSite.Name
}
it would then output something like:
Deploying Something ------------- OK
Deploying Something ------------- ERROR
Also notice that I pass the $spsite in the argument list and I just use the string directly. I still don't understand how this works but it seems like I can access all primitive typed variables even without passing them as arguments but I have to pass more complex objects are params, else they dont have any value.
Update:
after much searching and days of pain. I have found others with the same pain. My code exhibits the same exact symptoms as described here:
http://connect.microsoft.com/PowerShell/feedback/details/496326/stability-problem-any-application-run-fails-with-lastexitcode-1073741502
I guess there is no solution yet to this problem.
After a little while I've noticed that if I've ran some very memory intensive functions, I too have gotten that behavior where everything you try to execute just goes to the prompt again. I'd recommend setting Set-PsDebug -Trace 2 to see what those functions are actually doing. I fixed my issue by doing this and figuring out how to make my functions more efficient.

Resources