I have a pdf file containing text and tables. I want to extract text from some region of interest (ROI).
I used pdfplumber to get the desired starting and ending coordinates. Then, I tried to crop the PDF between these coordinates and extract text but couldn't succeed as though the cropped pdf has only text from ROI but apparently the pdfstream is still holding all the info for that particular page. As a result when I am extracting it is giving me text from the whole page (of the original pdf). I don't want to convert cropped pdf into image and apply OCR on top of it due to chances of inaccuracy. Any help on how to extract text using those coordinates is very much appreciated. Thanks in advance
I am using a rich text multiline textbox field in SharePoint 2016 where users can add text & images while submitting the data from a Nintex form. The data is getting saved successfully in a SharePoint List.
The issue is when users are exporting the data to an excel, only the text is getting exported of the multiline textbox field and not the image. Please suggest if we can get the image also in exported data.
An early response would be highly appreciated!
The picture you inserted in the rich text field is stored in another place. When you export the list into excel file, it will only export text in the rich text field. You could use Hyperlink and Picture column to import picture URL from list to Excel file.
I have a pdf include some MCQ questions and the right answer is colored and underlined
so I want to extract all the answer from the pdf and put them in the last page
I use pyPDF2 to transport the text from the pdf to a text.txt file
now I want to know how to get words in color or underlined from pdf with python
then I can put them in a list and to what I want
so, what can I use in order to do that ?
I am trying to send a SVG file in pdf. The pdf have some rounded/transformed text (company name).
Thing is I get the pdf created but the rounded text does not appear rounded on output PDF but appears straight. I have used DomPDF as well as TCPDF for creating the pdf.
Here is my SVG
I'm using Office 2010 interop and C# 4. How can I convert just the 1st page of a word document to PDF? This question ("How do I convert Word files to PDF programmatically?") helped me to get started but it only shows me how to save the whole document as PDF.
Is there a way:
to save just the 1st page as PDF? (most ideal option)
delete all remaining pages and then save as PDF?
How do I go about doing it?
You can click on Save As, change the type to PDF, and above the "save" button, you have an option button. Click on that, and you should have the choice to select which pages you want to convert to PDF.
Use the SaveasPdf option and save the entire document to Pdf.
To get the first page you can use PDFSharp opensource library for processing PDF using C#.
Here is an example to split pdf documents.
In case it's helpful for someone, in Word 2016, select Save As option, choose PDF as the file format, on this same window once you've done this a new button 'Options' then will appear to the left of the OK button, click on this and choose the page range that you want save.
Using Document.ExportAsFixedFormat is more like it -> MSDN
Then you simply write something like this:
doc.ExportAsFixedFormat(path, WdExportFormat.wdExportFormatPDF, Item: WdExportItem.wdExportDocumentWithMarkup, CreateBookmarks: WdExportCreateBookmarks.wdExportCreateHeadingBookmarks,
Range: WdExportRange.wdExportFromTo, From: 1, To: 1);