I have use js-report to generate pdf with thai language
all of my process is
1.get csv file from s3
2.read data and convert csv to object
3.send object to jsreport
i already done all process in local and it's work fine .
but when i deploy this project on lambda (we can attach js report to lambda. more details in this url : jsreport-aws-lambda )
but when i deploy and test, it's not displaying thai language(and i think maybe other language too.)
at first i think it's becode encoding ('base64') i try to change it's to utf-8 but the file is corrupted.
I already set meta content of html file to
enter code here
but's still not working
what can i do to solved this, please help me.
Thanks.
I've got a question. Is it possible to parse xls files by a coordinates? I've searched some npm modules but the biggest part of them are converting to json or csv. So does anybody know how to do that?
use node-xlsx
It will help you to parse ".xlsx" i.e.
const workSheetsFromFile=xlsx.parse(__dirname+'/myFile.xlsx');
Read the documentation for more functionality.
I am new to python and i'm using This script to get tweets. But the problem is that it is not giving full Text.Instead it is giving me URL of tweet.
output
'
"text": "#Damien85901071 #Loic_23 #EdwinZeTwiter #Christo33332 #lequipedusoir #Cristiano #RealMadrid_FR #realfrance_fr\u2026 ' ShortenURL",
what changes i need to make in this script to get full text ?
Look into Twitter's tweet_mode=extended option and the places in the Python code where you might need to add that to the script.
I'm trying to download an excel file which is uploaded on a Sharepoint 2013 site.
My code is as follows:
import requests
url='https://<sharepoint_site>/<document_name>.xlsx?Web=0'
author = HttpNtlmAuth('<username>','<passsword>')
response=requests.get(url,auth=author,verify=False)
print(response.status_code)
print(response.content)
This gives me a long output which is something like:
x00docProps/core.xmlPK\x01\x02-\x00\x14\x00\x06\x00\x08\x00\x00\x00!\x00\x7f\x8bC\xc3\xc1\x00\x00\x00"\x01\x00\x00\x13\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xb8\xb9\x01\x00customXml/item1.xmlPK\x05\x06\x00\x00\x00\x00\x1a\x00\x1a\x00\x12\x07\x00\x00\xd2\xba\x01\x00\x00\x00'
I did something like this before for another site and I got xml as output which was acceptable for me but I'm not sure how to handle this data.
Any ideas to process this to be like xlsx or xml?
Or maybe to download the xlsx another way?(I tried doing it through the wget library and the excel seems to get corrupted)
Any ideas would be really helpful.
Regards,
Karan
Its too late but i got similar issue... thought it might help someone else.
try writing the output to a file or apply some encoding while printing.
writing to a file:
file=open("./temp.xls", 'wb')
file.write(response.content)
file.close()
or
file=open("./temp.xls", 'wb')
file.write(response.text)
file.close()
printing with encoding
print ( resp.text.encode("utf-8") )
or
print ( resp.content.encode("utf-8") )
!Make appropriate imports.
!try 'w' or 'wb' for file write.
Hope this helps.
It seems that the file is encrypted and request can't handle this.
Maybe the web service provides an API for downloading and secure decoding.
I have a PDF file which I want to verify whether the links in that are proper. Proper in the sense - all URLs specified are linked to web pages and nothing is broken. I am looking for a simple utility or a script which can do it easily ?!
Example:
$ testlinks my.pdf
There are 2348 links in this pdf.
2322 links are proper.
Remaining broken links and page numbers in which it appears are logged in brokenlinks.txt
I have no idea of whether something like that exists, so googled & searched in stackoverflow also. But did not find anything useful yet. So would like to anyone has any idea about it !
Updated: to make the question clear.
You can use pdf-link-checker
pdf-link-checker is a simple tool that parses a PDF document and checks for broken hyperlinks. It does this by sending simple HTTP requests to each link found in a given document.
To install it with pip:
pip install pdf-link-checker
Unfortunately, one dependency (pdfminer) is broken. To fix it:
pip uninstall pdfminer
pip install pdfminer==20110515
I suggest first using the linux command line utility 'pdftotext' - you can find the man page:
pdftotext man page
The utility is part of the Xpdf collection of PDF processing tools, available on most linux distributions. See http://foolabs.com/xpdf/download.html.
Once installed, you could process the PDF file through pdftotext:
pdftotext file.pdf file.txt
Once processed, a simple perl script that searched the resulting text file for http URLs, and retrieved them using LWP::Simple. LWP::Simple->get('http://...') will allow you to validate the URLs with a code snippet such as:
use LWP::Simple;
$content = get("http://www.sn.no/");
die "Couldn't get it!" unless defined $content;
That would accomplish what you want to do, I think. There are plenty of resources on how to write regular expressions to match http URLs, but a very simple one would look like this:
m/http[^\s]+/i
"http followed by one or more not-space characters" - assuming the URLs are property URL encoded.
There are two lines of enquiry with your question.
Are you looking for regex verification that the link contains key information such as http:// and valid TLD codes? If so I'm sure a regex expert will drop by, or have a look at regexlib.com which contains lots of existing regex for dealing with URLs.
Or are you wanting to verify that a website exists then I would recommend Python + Requests as you could script out checks to see if websites exist and don't return error codes.
It's a task which I'm currently undertaking for pretty much the same purpose at work. We have about 54k links to get processed automatically.
Collect links by:
enumerating links using API, or dumping as text and linkifying the result, or saving as html PDFMiner.
Make requests to check them:
there are plethora of options depending on your needs.
https://stackoverflow.com/a/42178474/1587329's advice was inspiration to write this simple tool (see gist):
'''loads pdf file in sys.argv[1], extracts URLs, tries to load each URL'''
import urllib
import sys
import PyPDF2
# credits to stackoverflow.com/questions/27744210
def extract_urls(filename):
'''extracts all urls from filename'''
PDFFile = open(filename,'rb')
PDF = PyPDF2.PdfFileReader(PDFFile)
pages = PDF.getNumPages()
key = '/Annots'
uri = '/URI'
ank = '/A'
for page in range(pages):
pageSliced = PDF.getPage(page)
pageObject = pageSliced.getObject()
if pageObject.has_key(key):
ann = pageObject[key]
for a in ann:
u = a.getObject()
if u[ank].has_key(uri):
yield u[ank][uri]
def check_http_url(url):
urllib.urlopen(url)
if __name__ == "__main__":
for url in extract_urls(sys.argv[1]):
check_http_url(url)
Save to filename.py, run as python filename.py pdfname.pdf.