How to process *.doc and *.docx file structure with docx.js? - node.js

I'm working on a sort of project where I need to be able to modify *.doc and *.docx documents. Working with docx.js, a powerful Node.js tool for generating docx files, is pretty much clear to me, save for how to make it possible to read a document (template file) to be able to generate a dynamic docx file in the output, putting some dynamic data into the loaded template file and then saving it on the server.
Thanks for any clarity on that.

Related

NodeJS - Compile markdown into pdf (from a string in memory) and download it on web page

I'm developing a web page. I want to create an option to generate a .pdf file and allow the user to download it.
Currently I'm using jsPDF but I'm finding it very hard to properly format the document.
I was hoping to find a new way of building it in markdown format, compile it and then download it.
Is there a way that I can do this, in node.js, where say, I have a string in memory (which is the markdown text format), compile that into a pdf and then download it from the page?
I haven't found any package that really does this, if you know, feel free to just let me know which one can achieve this and I'll figure it out.
For such a thing, I recommend building your .pdf file first a HTML file, so you could edit it easily (hardcoded or dynamicaly)
then convert your html file to .pdf file.
there is alot of packages to do this
have a closer look on this package
https://www.npmjs.com/package/html-pdf-node

Are excel files stored internally as XML files?

I have come to understand that excel files(.xlsx) files are essentially xml file archives internally. I even tried verifying this by extracting the xlsx file in my local.
So if that's the case, how exactly are excel files stored and what is the structure and how do they work.
I also know they can be parsed by SAX parser of Apache POI API.
Please help

How can I use python to edit docx and/or doc file tags on a windows system?

I have a folder with a large amount of .doc and .docx files, I would like to develop a python script to edit the tags of each file so I can find a file in the folder using the tags - thus making my life a little easier.
I am unsure of how to even start and was hoping someone could point me to a library or provide some sample code to help me get started.
I am not sure if the file extenstion matters because this seems to be a windows property (right-click file > Properties > Details > Tags > type in tags) but if the extension matters I do can change all the files to be .docx
The python-docx package provides methods to access most of the metatdata in a word file. The class docx.opc.coreprops.CoreProperties in specific allows you to modify author, category, etc. I didn't see tags mentioned but if you do some more research i'm sure you can find it.
docx.opc.coreprops.CoreProperties.keywords can be used to update doc file tags.

How could I access the source code of a .one OneNote file?

How could I access the source code of a .one OneNote file?
I've tried to rename the .one file to .zip as what happens with .doc files in order to access their source code, but .one doesn't seem to work like that.
Also, I've tried to open it with Notepad++, but it isn't in a plain-text format.
I regard this as a programming question because:
I'm using content-editing-automation scripts (e.g. RegEx-related find and replace scripts). Accessing the source code of .one files helps me apply bulky automated edits on their content Using RegEx.
.one files aren't technically source code - they contain the data that describes the pages in a section and their content.
Opening them as text won't show you anything meaningful as they are binary data.
Microsoft has released the way this data is structured in .one files in the following documentation. You can use this to parse the binary file to obtain the information you need.
https://msdn.microsoft.com/en-us/library/dd924743(v=office.12).aspx
https://support.office.com/en-us/article/File-format-changes-in-OneNote-2016-for-Windows-a9129622-1755-470b-91e7-b2a461194036
The .one file format is super-complicated as it has to store images and all revisions, so it's binary and not XML-based like the rest of the office suite
That said if you do want to see the XML structure of the notebook or specific page content you can use OMSpy:
https://blogs.msdn.microsoft.com/johnguin/2011/07/28/onenote-spy-omspy-for-onenote-2010/
It works fine for 2016 Desktop.

How to pull the data or files from website using spoon /Kettle

We need to pull the data from some website using peantho kettle if any one is having some pointers please let me know.
The files are in the zip format in link available on web.
Simple. Create a job that downloads the file from the website.
then create a transform called from the job, which loads the zipped files ( you can use text file input to read zipped text files as they are) and writes them to your db.

Resources