Package convert from .doc to html - node.js

I am using mammoth npm package to convert docx to html, but it is not usable for doc files.
So which package to use to convert doc files? I have searched many but not found.
I am using nodejs.

I don't think you could find a package allowing you to convert doc files (and if you ever find one, I doubt it will have a good accuracy).
DOCX is an open file format (it's just XML : DOCX), whereas the DOC format is proprietary, so it will be way harder to get the wanted informations, if even possible.

Related

NodeJS - Compile markdown into pdf (from a string in memory) and download it on web page

I'm developing a web page. I want to create an option to generate a .pdf file and allow the user to download it.
Currently I'm using jsPDF but I'm finding it very hard to properly format the document.
I was hoping to find a new way of building it in markdown format, compile it and then download it.
Is there a way that I can do this, in node.js, where say, I have a string in memory (which is the markdown text format), compile that into a pdf and then download it from the page?
I haven't found any package that really does this, if you know, feel free to just let me know which one can achieve this and I'll figure it out.
For such a thing, I recommend building your .pdf file first a HTML file, so you could edit it easily (hardcoded or dynamicaly)
then convert your html file to .pdf file.
there is alot of packages to do this
have a closer look on this package
https://www.npmjs.com/package/html-pdf-node

How to access to iCloud Notes with pyiCloud

I'm trying to access to my iCloud Notes with a python script using pyiCloud framework, but when I try to list the notes it seems that Documents folder is empty. Does anyone know how I should make that?
>>> api.files['com~apple~Notes']['Documents'].dir() It returns:
>>> []
It sounds like you have an authentication problem that you can't get access to your Notes from the file storage (the UbiquityService). This issue might give you some more clues.
On the other note (!), I found the following a better way to get my Notes exported. I have tried a couple of ways mentioned around the web. Although there is no in-app solution to export Notes in a format other than PDF files, I have stumbled upon the following two solutions:
Export in Markdown (or in other formats in the paid version) via the Bear app. I found this way easier and of more quality in terms of keeping the formatting, attachments, etc:
Download the Bear migration Workflow script from here and follow the instructions.
[optional] At this point, you have the HTML files with inline encoded images. Use my script to decode images to get regular HTML files with the images in an accompanying directory.
Install Bear and import the exported files from Notes.
Export the files as Markdown, HTML, or whatever format you desire from File -> Export Notes within the Bear app. Don't forget to check the "Export attachments" box in the export dialog.
Export in HTML (and then convert to Markdown if you want) via the Notes Exporter app. The app gives you HTML files with inline encoded Base64 images saved with .txt extension (?!). Although I personally like this way as it generates output files that mimic closely the original Notes, the hyperlinks are missing in the exported files (it still keeps the hyperlink coloring though):
Download Notes Exporter from here.
Export Notes to the path you choose.
[optional] Rename file extensions to .html.
VoilĂ , now you have your Notes as HTML files with the same formatting and images.
Decode inline Base64 images and save HTML files with images saved in a separate adjacent directory using the script that I wrote for this:
https://gist.github.com/SHi-ON/945ea2272ea4bb29e13bd0305370da90
Hope this helps to give you an idea!

How can I use python to edit docx and/or doc file tags on a windows system?

I have a folder with a large amount of .doc and .docx files, I would like to develop a python script to edit the tags of each file so I can find a file in the folder using the tags - thus making my life a little easier.
I am unsure of how to even start and was hoping someone could point me to a library or provide some sample code to help me get started.
I am not sure if the file extenstion matters because this seems to be a windows property (right-click file > Properties > Details > Tags > type in tags) but if the extension matters I do can change all the files to be .docx
The python-docx package provides methods to access most of the metatdata in a word file. The class docx.opc.coreprops.CoreProperties in specific allows you to modify author, category, etc. I didn't see tags mentioned but if you do some more research i'm sure you can find it.
docx.opc.coreprops.CoreProperties.keywords can be used to update doc file tags.

GitBook document file conversion

i have created a gitbook account and working on a documentation, i have some difficulties in converting my work into formats(pdf, mobi and epub). Help me out
. i have the link for my work but how to convert it into a format
https://calibre-ebook.com/, Calibre is what you want
Calibre can convert (pdf, epubs, mobi, etc...) to any other book format(pdf, azw3, epub, mobi, even html)

Lucene 4.2.0 index pdf

I am using example source code from the Lucene 4.2.0 demo API:
http://lucene.apache.org/core/4_2_0/demo/overview-summary.html
I run IndexFiles.java to create an index from a directory of rtf, pdf, doc, and docx files. I then run SearcFiles.java and notice that I encounter several instances where my searches fail i.e. it does not return a document that contains the word I searched for.
I suspect it has to do with Lucene 4.2.0 not being able to correctly index non .txt files without additional customization.
Question: Can the IndexFiles.java source code (Lucene 4.2.0) correctly index pdf, doc, docx files as it is written in the provided link? Does anyone have examples or references on how to code that functionality?
Thank You
No, it can't. IndexFiles is a demo, an example for you to learn from, but not really designed for production use. If you take a look at the code, you'll see it just uses a FileInputStream (wrapped with an InputStreamReader, wrapped with a BufferedReader). Generally, Lucene won't handle how to parse different file formats (except it's own index files, of course). How to parse a file to provide meaningful content to Lucene is up to you to define.
Apache Tika might be a good place to look for this functionality. Here is a simple example using Tika with Lucene.
You might also consider using Solr.

Resources