Automating book citation search - search

I have a list of books listed by their titles in a text file. I want to write a script which can use a web service like Google scholar or amazon to search for the books and return me a xml or bibtex file with citation info for each book.
Which programming tools can I use for this kind of automated search ?

Python would be my recommendation.
Get names from the text file, simple file reading
Construct a REST URL request to google's book API
http://books.google.com/books/feeds/volumes?q=Elizabeth+Bennet&start-index=21&max-results=10
Simple python code to get data from this URL (may need an API key, would advise using urllib2 with error handling rather than urllib)
Sample code,
import urllib
url = 'http://foo.api.request'
data = urllib.urlopen(url).read()
See the return schemas for this API (you can use the XML however you like).
See BibTeXML for conversion between the two formats.
HTH

I think it could be useful if you specify what kind of script you want to write!
Anyway... you could do some low level work and write your own HttpRequest for google and amazon or you could just rely on their API for example: http://code.google.com/apis/books/
There is a great project which does something similar what you want to do, it's called Shelves. It's written for Android but should give you some ideas how to handle your requests. Instead of downloading some citations it's downloading the cover.
http://code.google.com/p/shelves/
Just as a quick side note, saving your books in a xml file could be an option as well. In some cases it makes parsing them easier.

Related

Google Docs: Table of content page numbers

we are currently building an application on the google cloud platform, which generates reports in Google Doc. For them, it is really important to have a table of content ... with page numbers. I know this is a feature request since a few years and there are add-ons (Paragraph Styles +, which didn't work for us) that provide this solution, butt we are considering to build this ourselves. if anybody has a suggestion on how we could start with this, it would be a great help!
thanks,
Best bet is to file a feature request on the product forums.
Currently the only way to do that level of manipulation of a doc to provide a custom TOC is to use Apps Script. It provides access to the document structure sufficient enough to build and insert a basic table of contents, but I'm not sure there's enough to do paging correctly (unless you force a page break on ever page...) There's no method to answer the question of "what page is this element on?"
Hacks like writing to a DOCX and converting don't work because TOCs are recognized for what they and show up without page numbers.
Of course you could write a DOCX or PDF with the TOC as you'd like and upload as a blob rather than as a Google Doc. They can still be viewed in Drive and such.

Export google search to a spreadsheet

Is it possible for me to create a list of google search results from a specific query and export it into excel? For example, I'd like to google orthodontists in Florida and be able to export the business name, phone number and address to an excel spreadsheet. I've done a lot of searching but I can't find any solutions. I'm looking for someone to point me in the right direction. Any help is appreciated, thanks.
An API is an Application Programming Interface and it's a way for your software to interact with the software on a server. Google has an API called the "Custom Search Engine" which you can use for 100 free queries per day. Other search engines may have more generous free APIs. With a search API you can write a code to download text that contain all the relevant data. You can read more about search engine APIs here.
Another way to collect data from google is to scrape their page. This means that you use a code to download the HTML, and from that HTML you collect the relevant pieces (wikipedia link). With a programming language like python, many people use the Beautiful Soup library for scraping. With code then you can take the relevant parts of the HTML and put it into a format like CSV that is readable by Excel. With python there are ways to write to Excel, directly, too (link).
Finally, here is a link from 2007 that says with Google Spreadsheets you can import HTML.
Update: here is the MS Excel version.
The following web app https://www.resultstoexcel.com/ allows you to download Google search results to a CSV file, a Microsoft Excel readable format, for free.
If you have any problem viewing the downloaded results correctly in MS Excel, please read the FAQs section where you will find how to open the file using the correct column separators.
Where are the results coming from?
A Google search on the topic retrieves many companies that offer online access to Google Search Results through an Application Programming Interface (API). This web app uses SERPSBOT API.

Markdown to HTML conversion

I'm still in the middle of coding my final year project at university, and I have come across an issue where I need to either convert from HTML to Markdown or visa versa. Now I have no experience whatsoever of Perl, Python, etc. so I'm in need of an easy-to-implement solution, I only have about 6 weeks left to complete this now. I'm writing the data from a WMD text box to SQL Server, and I can either upload it as Markdown or HTML but if that data needs editing it cannot be in HTML as this would be too confusing for the end user who is perceived to have zero/very little computing "know how".
What should I do?
Karmastan's answer is probably the best here. Keeping the raw Markdown in the database is a really good solution as it allows users to upkeep the content in a form with which they're familiar.
However, if you have a bunch of HTML which is already converted, you might want to look at something like Markdownify: The HTML to Markdown converter for PHP.
Edit: based on what you've said below, there are a few things you should keep in mind:
Make sure that the following is set in wmd.js:
wmd_options = {"output": "Markdown"};
This ensures that you're storing Markdown in the database.
Source: How do you store the markdown using WMD in ASP.NET?
When outputting the Markdown to the web, you need to transform it to HTML. To do this, you'll need a library which does Markdown -> HTML conversion. Here are two examples:
Announcing Markdown.NET
Revisied Markdown.NET Library
I'm not a .NET developer, so I can't really help with how these libraries should be used, but hopefully the documentation will make that clear.
If you look at the web site for Markdown, you'll find a Perl script that converts Markdown-syntax documents to HTML. Keep Markdown text in your database and invoke the script whenever you need to display the text. No Perl knowledge required!

How do I aggregate data off of a google search

I am trying to aggregate movie times off of google/movies search into a usable format such as json or xml
http://www.google.com/movies?q=movie+times&sc=1&mid=&hl=en&oi=showtimes&ct=change-location&near=new+york
The Google AJAX api does not seem to work for this as you cannot do a movie search.
Does anyone know how this can be done?
Lookup the technique called web scraping.
Basically, you have to fetch the results page using some server-side scripting, and then extract data from it, to present in a formated way (json, xml, etc). Regular expressions or a DOM/XML parser could help.
This guy has a PHP script that converts Google results to RSS.

Creating PDF Invoices - Are there any templating solutions? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
Our company is looking to integrate invoices into a new system we are developing.
We require a solution to create a layout of the invoice and then convert to pdf.
We have considered just laying out the invoice in html/css then converting to pdf.
We have also considered using SVG->PDf conversion.
Both of these solutions integrate well into our existing templating language used for our web application.
Historically we have been a Microsoft based business and used Crystal Reports for such a task but we are looking for an open source Linux solution for this project.
Does any one have any suggestions of an approach or technology we could use for such a task?
Try this... create a blank invoice with Word (or whatever you want) and save it as a PDF.
Then use a PDF library to modify the PDF (insert the text at particular coordinates). We do this in the Microsoft world and it is extremely easy.
The biggest benefit is that we can use our own tools to create and modify the template. If we want to add some static text, we just crank open Word, make the change and save it to a PDF file (that is being used as a template).
For Microsoft, we use iTextSharp which is actually a C# port of the original Java version of iText
Additionally...
You can use Adobe Acrobat to insert fields in the PDF (address, phone, invoice number, line item 1, line item 2, etc...) and then use iText/iTextSharp to populate these fields at run time.
This is, in more detail, what we do... and it is extremely easy.
The normal way is to install (La)TeX (probably already on the linux box) and run pdflatex to get the pdfs. You can also use Apache FOP, if you prefer xslt and xsl-fo.
If the number of invoices to create is low you might want to use open-office (directly or as a toolkit).
If you want high-precision positioning and low-level access, a low-level pdf library (I don't know if iTextSharp works with mono) might be what you want.
I would try out LaTeX first, because it allows you to get results with the least effort.
I've previously produced invoices by templating a PostScript file, and then using Ghostscript's ps2pdf to convert those into PDFs.
We use Reportlab with Python. If you look around there are a load of ready-made forms/invoices/etc.
There are several OSS reporting engines (Jasper Reports, Pentaho and BIRT to name three) that you could use in much the same way as you have historically used Crystal Reports. One of the other posters mentions ReportLab, which is an option if you're using Python or can embed a Python runtime in your application.
Probably the most flexible solution is to create XMLs with invoice data and then by using XSLTs transform the, into PDFs, HTMls, whatever...
It depends on your environment. If you have access to Java, you might look at iText (http://www.lowagie.com/iText/), a library that allows you to generate PDF files on the fly.
There are two steps, if i understood correctly:
1) Creation of PDF template with placeholders to populate data programmatically
2) Populating the PDF template programmatically during run time
For #1, OpenOffice allows creation of PDF templates, which can then be populated programmatically. It's good enough to create simple invoices that doesn't probably involve datagrid/table kind of stuff.
For #2, you already have the answers here - iText, iTextSharp.
Hope this helps!
I love wkhtmltopdf http://code.google.com/p/wkhtmltopdf/
Not sure what your goal is here, but there is an opensource php-library called fpdf, which also has an extension for taking a pre-made pdf as layout and then populate it with more content, generating a new PDF with that info.
However, I would go for a solution that you can integrate nicely into the plattform you're building, but I wouldn't go in a HTML->PDF solution since you won't have any clue about what would fit on a piece of paper regarding sizes in that kind of enviroment, meaning you won't know when you should split the content into two separate templates.
You might also try using XSL:FO. XSL:FO is a documented standard for describing page layout: http://www.w3.org/TR/xsl/#fo-section.
I've had success on two projects creating documents by creating an XML schema that defines the content of the "PDF". I then use the XSD tool (from Microsoft) to generate a class representing this document. I then map my data into that structure, serialize the populated class to XML, along with an XSL stylesheet that defines how that data should be mapped into FO, and pass it to an FO formatter. For formatters, I have use Alt-Soft's Xml2Pdf with success. There are a few others out there. There are some tools available to help create the XSL to FO stylesheet (i.e. stylusstudio and XmlSpy), but I recommend learning the FO constructs as the tools seem to produce bloated stylesheets. FO is comparable to HTML (where a P tag is a BLOCK tag in FO), but can be tricky. This nice thing about FO, is that some formatter support conversion to other formats, such as Word, HTML, etc.
Other options:
iTextSharp (C# port of iText). Just started reading about this. Open source and free. I don't think there is any "templating" supported with this, but I could be wrong about that.
SQL Server Reporting Services. Assuming your invoice data is in, or can be put in, a format that can be read by reporting services (SQL Server, Web Service, etc), define the layout in SSRS and then publish to reporting server. Use SSRS Web Services or query parameter execution to execute the report and have it output as PDF.
This html-2-pdf site may be a helpful starting point: http://maarten.lippmann.us/?p=101
A site a friend of mine built uses a script to churn HTML pages into printable PDFs, too - http://philambdaupsilon.org. Not sure on the exact details of it, but he is an SO user, and I'll send this question to him, too.
Unfortunately, the best system on the market (at present) is passing the HTML & CSS to a ColdFusion server and have that return the rendered PDF. So if money isn't a big concern, this is the quickest to deploy solution that'll render the best results.
I've tried very hard to get FPDF, TCPDF, the R&OS pdf class, and even CodeIgniter's recommendation to work, but nothing with stable output for anything beyond the most basic/bland HTML files.
Honestly, if the ColdFusion solution isn't viable, I'd use html2ps, and then ps2pdf to convert your files into a PDF.
(This is all assuming that you don't want to take the time and design each PDF using the native PDF-creator code in PHP. This is what systems like SugarCRM use. Though its very functional with stable results, the actual creation of each PDF-generator file is a most painful process)
We have used Jasper Reports before. It's not what you'd call user-friendly, but it will talk directly to your database.
html2pdf works very well. You can use this to generate both HTML and PDF reports from the same source.
I'm fiddling with Black Sheep Invoices right now, which is great at first but now I'm having trouble actually getting it to render the PDFs. Lots of installation difficulties--probably a lot easier on your own server but i'm up on a shared host with it. The HTML output and data management portions are well done though, which is something you won't get out of just creating a postscript template. I was hoping to find a reference to a library that has an active development team though (Black Sheep is not being updated at this time).
If you want browser perfect HTML converted to PDF then try commandlineprint
You'll need to install firefox on a linux distro, disable all firefox alerts and then run it through a virtual display. Check this thread for more details.
It's infuriating to get running well but does give you the best results for HTML to PDF conversion I've seen.
OK, a search of Google Code projects turned up Simple Invoices, which is awesome and well maintained.
I use TROFF for my invoices because of its extremely simple textual encoding. The logic is a few lines of Perl. Keeping it simple.
For a Ruby solution, try Prawn: http://prawn.majesticseacreature.com/
I use open office on the server and then generate the XML for the document (just unzip the document and hack away)
Some can use Dhek template editor to define area/placeholder for existing PDF, without altering existing document, and then populate it to generate final doc (e.g. with user values from a form): https://github.com/applicius/dhek .

Resources