This is my project topic given by my college. Can somebody please give me an idea on where to start with this topic.
I have seen a lot of topics on pdf vulnerability but the problem is they require knowing a lot of security stuff beforehand. I have less than a week to submit the project.
If somebody could just guide me to where I should start I would be really grateful.
I have already looked up didier stievens site but its getting really tough for me to understand it since there is no time.
The most important point about PDF security is that most 'popular' attacks are targeting:
application related vulnerabilities in most popular free PDF reading applications: Adobe Reader and Foxit Reader;
humans to get them to click on the malicious attachment inside PDF to initiate attack;
Check these analysis and parsing utilities and documents:
Didier Stevens's pdf tools which include make-pdf-javascript.py ( javascript injection tool), pdfid.py that scans PDF and embedded javascripts for keywords and others;
PDF Stream Dumper and its source code;
PDF Miner Py - pdf parsing library made with python;
PDF.js - javascript based PDF rendering that could help you to learn PDF structure parsing right from your browser console (widely used in lot of online services like DropBox)
Official PDF Format Specification from Adobe for PDF 1.4 and PDF 1.7
Related
I am trying to parse text from a PDF file using Computer Vision 2.0. I am following the example and have changed the MediaTypeHeaderValue to "application/pdf". I get an error that the content type is not supported. I change it to "multipart/form-data" and get an error in processing. How do I use Computer-Vision to process PDF files?
Kevin,
You are using the legacy "OCR" API that does not support PDF input. Please use the new OCR technology available as the "Read" API - see overview for processing PDF documents. The version 3.0 is in GA since May. Read supports large images and multi-page and mixed languages documents up to 2000 pages long.
Please see the Read REST API QuickStart in C#.
Note that Form Recognizer is great if you want to extract not just text, but layout insights such as tables, check-boxes, and key value pairs from forms, use pre-built models, and build custom models to process your documents. It's now in GA.
Take a look at the Form Recognizer service for extracting data from the PDF.
https://azure.microsoft.com/en-us/services/cognitive-services/form-recognizer/
Wanted to generate a PDF from a URL
(https://10.1.40.117/print/e71b7c0f-4ed1-4d0d-b868-87418d398a4a).
Please help me with the links which is used to do this using nodeJS
I use Puppeteer to generate PDFs and their documentation has many examples. Since it uses Chrom(e|ium), it closes match my development environment as well which is nice when building the web pages.
For those who might stumble on this question nowadays:
There is cool tool called Gotenberg — Docker-powered stateless API for converting HTML, Markdown and Office documents to PDF. It supports converting URLs via Google Chrome headless.
And I am happen to be an author of JS/TS client for Gotenberg — gotenberg-js-client
I welcome you to use it :)
UPD:
Gotenberg has new website now — https://gotenberg.dev
I know you can extend Adobe Premiere Pro with some simple JavaScript. The problem with that link (which I got to through the official Adobe website), is that all of sample code links are outdated (they point to the wrong location of the file, to lines that aren't correct anymore).
The second paragraph instructs you to install a bunch of things, none of which seem like things you "install", and they mention ExtendScript, which I don't understand whether is already installed with my Premiere or not (it's not available on Creative Cloud, and also the links I found on Adobe's website for it are, again, dead). I keep searching online and finding dead links to tutorials that no longer exist. Really, dead links everywhere.
I'm an experienced developer with good JS background, I just want know what I need, some simple examples of basic usage to get me started and maybe working links to some cheat-sheet I can use when I'm looking for available functions.
Extendscript is the name of the old API for automating Premiere and other Adobe apps. It's built-in and can basically do anything that you can do with the GUI, and it's javascript-based.
There is an IDE for Extendscript, the Extendscript Toolkit (ESTK) which has a debugger and allows you to inspect data etc. It's perplexingly hard to find on the Adobe website; I found it by a duckduckgo search here, I installed it through the creative cloud desktop manager, though I'm not sure how you do that with the current version.
As far as documentation goes, you're right, it's dead link city. There is a Javascript Tools Guide included with the Extendscript Toolkit, on windows it's in C:\Program Files (x86)\Adobe\Adobe ExtendScript Toolkit CC\SDK\. That covers creating UI elements, but doesn't explain Premiere's object model. AFAIK there is no official documentation for this, you have to use the ESTK data browser to look for yourself.
The CEP extensions are a new development and allow for easier integration with the host. I think you already have all the documentation there is for it. I'd advise that you pester Adobe to make it easier for developers like yourself to create tools for their users.
Here is for anyone else who gets here from a Google search: You can also go to this link to download the ESTK: https://helpx.adobe.com/download-install/kb/creative-cloud-apps-download.html
Is there any library that can parse and generate a PNG from a Doc, Docx and PDF file?
We're implementing a training system using Node, Sails.js, Express and SQL and would like to generate some PNG image tiles for training modules based on a file upload.
I've done some searching and found some libraries in C# that can do all 3, as well as a just PDF impementation for Node but can't find anything that does more than that.
A point towards any 3rd party libraries or standard implementations of this method would be great.
Thanks
You can do that sort of stuff with C# (probably only on Windows) because C# is from MS stables, the same stable that churns out doc and docx. I am not sure whether the same implementation would work on Linux or Mac (even with Mono).
If you want to achieve this in NodeJS, just create the app in C#, wrap it in a ReSTful cover and call this ReSTful service in NodeJS (via Kue or something similar).
Honestly, converting file formats is a compute intensive process process. I wouldn't recommend it doing it the same main thread any way. If you're anyway gonna spawn a worker, you might as well do it in C# where it's perhaps faster.
Not necessarily an exact match for your requirement, but since you mentioned training purpose, I would recommend Watson Developer Cloud - it has document conversion among many other features which may be relevant and useful for your objective as a whole.
Speaking of the current problem, please see Document conversion overview to see how we can convert a PDF into a desired format such as HTML. Then you could actually get the PNG files from the HTML resource bundle.
Hope this helps.
I am writing a JBOSS web app with Struts2 and would like to produce reports in PDF and XLS format. How can I do this? Are there popular packages that can do this for me?
Here's a list of PDF libraries for Java. We use iText extensively.
jFreeReport (scroll down on linked page to find) also offers Excel generation, though I have not used that.