I want to extract information from pdf - layout

I have a pdf in which two order number is mentioned on different page I have to check order number is the same or not. I have a little idea about document layout analysis. Anyone can help me with how can i do this? I have to match different things in a pdf of many pages.

Related

Extract from html to excel specific lines

Help greatly appreciated!
I'm trying to automate as much work as possible. I need to copy a lot of product information from my suppliers web page to mine. Manual work have been a pain in the ass and now I thought that I could ask for your help,
I'm trying to extract - ProductNumber, ProductTitle, ProductDescription, ProductWeightFormatted, and from ProductImageContainer picture url's. As I need to add some of my own info I Wanted to get each product in one row with 5 different cells.
Have been thinking and searching for solutions for whole day but no luck.
Site where I am copying info from
Thank You!
Have you tried using getElementsById, followed by innerText or textContent?
In your case it makes sense, as each product has certain div id.

pdf invoice in node.js

I want to create a pdf invoice which has a dynamic table i.e. items in that particular table can be 1 or 100.
Now my problem if the content of table is large so that it can not fit into a single page then i need to add new page automatically.
Right now i am using pdfkit which provide me addPage() function to add the page but it has some design constraints and some times it is very difficult to convert a particular design into pdf.
I am thinking about phantomjs but i am not sure about how it adds a new page dynamically
can someone has better soultion for this

How do you Calculate Multiple Pages to a Grand Total in LiveCycle?

I am not a programmer but I have to create an expense form for traveling. It has to be in PDF format (preferably Adobe Acrobat editable). I created the form with Excel and exported it to Adobe but, of course, the formulas do not transfer.
I have 3 pages that are identical for calculating travel expenses with the only difference being there is one extra cell on the first page that calculates totals from each page to a grand total. Sounds easy. All the pages calculate individually with no problem but I cannot reference the totals from the individual pages to the first page where the grand total is.
I am using Adobe LiveCycle because it "simplifies" the programming process for people like me. It has worked great so far but this is the only thing problem I am having and the only thing holding me from using the form.
All three pages are in the same document. When I use FormCalc for the totals within each page, it works great:
topmostSubform.Page1.P1TotalGrand::calculate - (FormCalc, client)
$=P1MileageTotal+P1TransAirTotal+P1CarRenTotal+P1HotelTotal+P1AllowTotal+P1PhoneTotal+P1MIETotal+P1BusMealTotal+P1OtherTotal
If I go to the next page, it looks like this:
topmostSubform.Page2.P1TotalGrand::calculate - (FormCalc, client)
$=P1MileageTotal+P1TransAirTotal+P1CarRenTotal+P1HotelTotal+P1AllowTotal+P1PhoneTotal+P1MIETotal+P1BusMealTotal+P1OtherTotal
I just want to be able to add them together. When I try, it doesn't recognize the 2nd page and I don't know why. The form is pretty basic and I would really appreciate any help. If you need any additional information, I'll be glad to oblige.
I believe you can do it. There should be many ways and I am a newbie too so here are two ways...
If you want grand total of 2 pages - does not matter on which page you want it, the formula should be like
topmostSubform.Page1.P1GREATTotalGrand::calculate - (FormCalc, client)
$=P1TotalGrand+P2TotalGrand
The trick is to select both by press and hold Ctrl Key, then select the field you want to select rather then type in. this should solve the problem.
If you want the total to be at page one.
Make the P2GreatTotalGrand to be a global data and create the same data field on page one and add this new one and one already on page one to get the grand total on page one.

fetch data from ms-access to ms-word

i am looking to create an invoice in either MS-excel or MS-word. This invoice will contain several fields like invoice no., customer name, product info, quantity, Amount, Date, Address of customer, phone no. etc. The function of the invoice will be, to generate a unique invoice number, every time i open it, and then the vendor will mention the customer's info, product's info and click on submit button or save it. The info mentioned in the invoice will automatically be saved in the MS-Access database whenever submit button is clicked or the document is saved.
Thus, All the records of the customer will be saved in the MS-Access database. whenever i need to search for a particular customer, i should be able to search it from either invoice no. or any unique field for that particular invoice. I hope my query is explained clearly. please let me know the easiest way to do it. I do not have the vast knowledge about this subject, so give me suggestions that are understandable by a Novice.
I think you are starting from the wrong end. Use an Access form to get the data and then run a mailmerge, the easiest way is to output a text file from Access as the data file and use a Word template for the merge.
An autonumber may suit for invoice number as long as all you need is a unique number. If you need documented sequential numbers, you will have to create then yourself. How you do it will depend on the number of users working at the same time.
I can tell you now, generating Word files with Access is a bit of a pain in the ass. If you really want to do formatting it gets hard (is my experience).
I ended up generating HTML files in which I could control everything, and opening them as .docs. But if you are really new to this, I suggest you start with some VBA tutorials, where they explain to you how you get records from you database and loop through them to generate output. And then you can start looking at file writing functions in VBA.
Can't find any tutorials real quick (my girlfriend is getting angry as we speak), but here is a sample:
http://www.access-programmers.co.uk/forums/showthread.php?t=25354
Just look around in fora, look for file generation and looping through records.
Hi just reading your post, like Remou l would strongly suggest you use Access to enter and store the data. It is possible to get a user to enter data into a spreadsheet and write the data back to Access DB. Not something l would recommend for the novice, here is a link to some code on how it could be done
Returning to your first question, of creating the invoice have you considered generating the invoices from Access using a report? They can be printed to PDF, or exported to various electronic formats. Or is there specific reason to use Word/Excel? If are going down the route of using Word to generate the invoice then use a template as Remou suggested. See this link for some samples see the section titled Access > Word. I have used the examples as a basis for Access to Word. A number of the examples though use a tecnology called DAO, which l understand will not be included in any operating system after Windows 7. Just something to be aware of.
Searching for a record in a database table this link has one possible solution . Also the author has included a example database.

Retrieving a sharepoint list in Infopath only shows first 100 records

I am retrieving a list of values from a sharepoint list, which works well but my problem is that it only retrieves the first 100 records. there are currently 500 records that should be available.
Scenario: I have two comboboxes on an infopath form:
A List of Locations
A list of areas within the locations
the list of locations will filter the list of areas but as infopath seems to only retrieve the first 100 records so most of the locations do not show any areas as there is nothing to filter.
By design, the query will only return the first page of results from the default view for the list. Change the item limit for the default view in SharePoint, and you'll change the returned values for InfoPath.
EDIT (links from my comments, here for greater readability):
Here are sources describing this fix in MSDN forum (scroll to the bottom), a blog comment that describes the SharePoint setting step-by-step, one with a screen cap of the somewhat counter-intuitive interface, and another describing performance implications on the server side.
Hope this helps.
Just documenting what I have discovered trying to resolve the problem. I have not been able to change the default view as yet as I dont have the permission to. That should change though.
One possible workaround I have found is that you can export the list to Excel which contains all the data that I was looking for. the file that sharepoint produces is an Excel Query file like "export.iqy". You can save and open the file in notepad. which will look something like the following
WEB
1
http://SharepointSite/_vti_bin/owssvr.dll?XMLDATA=1&List={14C4ED2B-3050-4C47-B5F3-6333C3B0FB28}&View={8E6124E0-23F2-4BA2-86E7-96E7F36BAEC8}&RowLimit=0&RootFolder=%2fLists%2fSharepoint%20Sites
Selection={14C4ED2B-3050-4C47-B5F3-6333C3B0FB28}-{8E6124E0-23F2-4BA2-86E7-96E7F36BAEC8}
EditWebPage=
Formatting=None
PreFormattedTextToColumns=True
ConsecutiveDelimitersAsOne=True
SingleBlockTextImport=False
DisableDateRecognition=False
DisableRedirections=False
SharePointApplication=http://SharepointSite/_vti_bin
SharePointListView={8E6124E0-23F2-4BA2-86E7-96E7F36BAEC8}
SharePointListName={14C4ED2B-3050-4C47-B5F3-6333C3B0FB28}
RootFolder=/Lists/My list
You can take the third line which is -
http://SharepointSite/_vti_bin/owssvr.dll?XMLDATA=1&List={14C4ED2B-3050-4C47-B5F3-6333C3B0FB28}&View={8E6124E0-23F2-4BA2-86E7-96E7F36BAEC8}&RowLimit=0&RootFolder=%2fLists%2fSharepoint%20Sites
And use that to retrieve the complete list. I added an new receive data connection, selected an xml document and added the above URL.
It is not formated particullary nice but it will return all the data that I was expecting.
I think that Argalatyr solution is much simpler at this point, but it depends on if i am able to get the default view changed.
there is yet one workaround of this without such hardcoding. If you open Query editor, then you have there available ribbon with menu items. Open "Home" -> "Select top rows" and enter there some realy high number (I have in my list 596 rows, so I entered there as limit of top rows 20000 and I got whole list).
Sorry, I don't have available English version of Excel, so I cannot add screenshots.
enter image description here

Resources