A python script fetch google adwords report, it working as expected in my local machine, but when deployed in AWS Lambda function, I got the following error
{
"errorMessage": "[Errno 30] Read-only file system: '/home/sbx_user1051'",
"errorType": "OSError",
"stackTrace": [
[
"/var/task/lambda_function.py",
24,
"lambda_handler",
"report_downloader = client.GetReportDownloader(version='v201809')"
],
[
"/var/task/googleads/adwords.py",
370,
"GetReportDownloader",
"return ReportDownloader(self, version, server)"
],
[
"/var/task/googleads/adwords.py",
1213,
"__init__",
"self.proxy_config, self._namespace, self._adwords_client.cache)"
],
[
"/var/task/googleads/common.py",
819,
"__init__",
"transport = _ZeepProxyTransport(timeout, proxy_config, cache)"
],
[
"/var/task/googleads/common.py",
667,
"__init__",
"cache = zeep.cache.SqliteCache()"
],
[
"/var/task/zeep/cache.py",
77,
"__init__",
"self._db_path = path if path else _get_default_cache_path()"
],
[
"/var/task/zeep/cache.py",
155,
"_get_default_cache_path",
"os.makedirs(path)"
],
[
"/var/lang/lib/python3.6/os.py",
210,
"makedirs",
"makedirs(head, mode, exist_ok)"
],
[
"/var/lang/lib/python3.6/os.py",
210,
"makedirs",
"makedirs(head, mode, exist_ok)"
],
[
"/var/lang/lib/python3.6/os.py",
220,
"makedirs",
"mkdir(name, mode)"
]
]
}
I know in Lambda it could only write files located in tem folder, but what confused me is that in my script, I don't write to any file at all, here is the main structure of my script:
client = adwords.AdWordsClient.LoadFromStorage('tmp/googleads.yaml')
report_downloader = client.GetReportDownloader(version='v201809')
report_query = (adwords.ReportQueryBuilder()
.Select( str)
.From('ACCOUNT_PERFORMANCE_REPORT')
.During('LAST_7_DAYS')
.Build())
results=report_downloader.DownloadReportAsStringWithAwql( report_query, 'TSV', skip_report_header=True, skip_column_header=True, skip_report_summary=True, include_zero_impressions=False)
campaigns=results.splitlines()
Please advise how to fix this issue. The env is python 3.6
It looks like Adwords is using a cache and, by default, that cache goes into the home directory of the user running your code. To fix this, set the environment variable XDG_CACHE_HOME to /tmp/.cache. You can set this in the Lambda environment variables.
Azure CS has an OCR demo (westcentralus endpoint) at
https://azure.microsoft.com/en-us/services/cognitive-services/computer-vision/?v=18.05
On a poor test image (which I'm afraid I can't post because it's an identity document), I get OCR results that 100% match the actual text for three test cases in fact - remarkable.
However, when I follow the sample at the URL below, with the westeurope endpoint, I get poorer OCR results - some text is missing:
https://learn.microsoft.com/en-us/azure/cognitive-services/Computer-vision/quickstarts/python-print-text
Why is this? More to the point - how do I access the v=18.05 endpoint?
Thanks for all speedy help.
I think I got your point: you are not using the same operation between the 2 pages you mention.
If you read the paragraph just above the working demo you are mentioning here it says:
Get started with the OCR service in general availability, and discover
below a sneak peek of the new preview OCR engine (through "Recognize
Text" API operation) with even better text recognition results for
English.
And if you have a look to the other documentation you are pointing at (this one), they are using the OCR operation:
vision_base_url = "https://westcentralus.api.cognitive.microsoft.com/vision/v2.0/"
ocr_url = vision_base_url + "ocr"
So if you want to use this new preview version, change the operation to recognizeText
It is available in West Europe region (see here), and I made a quick test: the samples provided on Azure demo page are working with this operation, and not in the other one.
But this time the operation needs 2 calls:
One POST operation to submit your request (recognizeText operation), where you will have a 202 Accepted answer with an operationId
One GET opertaion to get the results (textOperations operation), with your OperationId from the previous step. For example: https://westeurope.api.cognitive.microsoft.com/vision/v2.0/textOperations/yourOperationId
DEMO :
For the CLOSED sign from Microsoft Demos:
Result with OCR operation:
{
"language": "unk",
"orientation": "NotDetected",
"textAngle": 0.0,
"regions": []
}
Result with RecognizeText:
{
"status": "Succeeded",
"recognitionResult": {
"lines": [{
"boundingBox": [174, 488, 668, 675, 617, 810, 123, 622],
"text": "CLOSED",
"words": [{
"boundingBox": [164, 494, 659, 673, 621, 810, 129, 628],
"text": "CLOSED"
}]
}, {
"boundingBox": [143, 641, 601, 811, 589, 843, 132, 673],
"text": "WHEN ONE DOOR CLOSES, ANOTHER",
"words": [{
"boundingBox": [147, 646, 217, 671, 205, 698, 134, 669],
"text": "WHEN"
}, {
"boundingBox": [230, 675, 281, 694, 269, 724, 218, 703],
"text": "ONE"
}, {
"boundingBox": [291, 697, 359, 722, 348, 754, 279, 727],
"text": "DOOR"
}, {
"boundingBox": [370, 726, 479, 767, 469, 798, 359, 758],
"text": "CLOSES,"
}, {
"boundingBox": [476, 766, 598, 812, 588, 839, 466, 797],
"text": "ANOTHER"
}]
}, {
"boundingBox": [56, 668, 645, 886, 633, 919, 44, 700],
"text": "OPENS.ALL YOU HAVE TO DO IS WALK IN",
"words": [{
"boundingBox": [74, 677, 223, 731, 213, 764, 65, 707],
"text": "OPENS.ALL"
}, {
"boundingBox": [233, 735, 291, 756, 280, 789, 223, 767],
"text": "YOU"
}, {
"boundingBox": [298, 759, 377, 788, 367, 821, 288, 792],
"text": "HAVE"
}, {
"boundingBox": [387, 792, 423, 805, 413, 838, 376, 824],
"text": "TO"
}, {
"boundingBox": [431, 808, 472, 824, 461, 855, 420, 841],
"text": "DO"
}, {
"boundingBox": [479, 826, 510, 838, 499, 869, 468, 858],
"text": "IS"
}, {
"boundingBox": [518, 841, 598, 872, 587, 901, 506, 872],
"text": "WALK"
}, {
"boundingBox": [606, 875, 639, 887, 627, 916, 594, 904],
"text": "IN"
}]
}]
}
}
I am using Diffbot's article API for scraping the articles from any site.
Currently I am getting articles with single image, but I want to scrape all the images for the particular article.
Any suggestion will be appreciated.
The Article API should, by default, grab all the images in an article. Here's what I get in the "images" array when I run the Article API on this post:
"images": [
{
"pixelHeight": 106,
"diffbotUri": "image|3|-317133287",
"primary": true,
"pixelWidth": 474,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897265phpstormlogo.jpg"
},
{
"pixelHeight": 375,
"diffbotUri": "image|3|-2098856075",
"pixelWidth": 500,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897372Spear_point_knife_blade.jpg"
},
{
"pixelHeight": 525,
"diffbotUri": "image|3|-878345903",
"pixelWidth": 700,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897486CXM-Framework.jpg"
},
{
"pixelHeight": 375,
"diffbotUri": "image|3|-1729707743",
"pixelWidth": 500,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897666Fotolia_57724999_Subscription_Monthly_S.jpg"
},
{
"pixelHeight": 360,
"diffbotUri": "image|3|805836010",
"pixelWidth": 320,
"url": "http://dab1nmslvvntp.cloudfront.net/wp-content/uploads/2014/09/1410897716cordova_bot.png"
}
],
If you're not getting the same results for a URL, you can always define a custom ruleset that'll grab them. I wrote some tutorials on extracting repeated data here, and there are some hints here, too.
Can you give us the URL of the article that makes the API fail to return all images? Maybe we can solve the problem together by looking at the source of the issue.
I noticed that in the Instagram API when you get an image back, it looks something like this:
http://distilleryimage6.s3.amazonaws.com/a0efb5b418b711e391bd22000a1fbf1d_7.jpg
And by changing the _7 to say _5 or _6 you can get different sizes. But I have also noticed that some of the other sizes are NOT always available. Does anyone have info on what sizes are available or can you point me to a resource in the Instagram API that goes into more detail on image sizes supported via the API? Thanks.
The Instagram API should always return images in three sizes. All of which are square with side length of 150px, 306px and 612px. Videos are available at 480px square and 640px square resolutions.
"images": {
"low_resolution": {
"url": "http://distillery.s3.amazonaws.com/media/2010/07/16/4de37e03aa4b4372843a7eb33fa41cad_6.jpg",
"width": 306,
"height": 306
},
"thumbnail": {
"url": "http://distillery.s3.amazonaws.com/media/2010/07/16/4de37e03aa4b4372843a7eb33fa41cad_5.jpg",
"width": 150,
"height": 150
},
"standard_resolution": {
"url": "http://distillery.s3.amazonaws.com/media/2010/07/16/4de37e03aa4b4372843a7eb33fa41cad_7.jpg",
"width": 612,
"height": 612
}
},
There is currently no way to retrieve the cover artwork using Spotify's Web API. Are there plans to implement these or any workarounds?
June 17th 2014:
Today Spotify released a new Web API.
It is now easy to retrieve cover artwork, as all endpoints includes an array of images for every item.
Search example:
curl -X GET "https://api.spotify.com/v1/search?q=tania%20bowra&type=artist"
{
"artists" : {
...
"items" : [ {
...
"images" : [ {
"height" : 640,
"url" : "https://d3rt1990lpmkn.cloudfront.net/original/f2798ddab0c7b76dc2d270b65c4f67ddef7f6718",
"width" : 640
}, {
"height" : 300,
"url" : "https://d3rt1990lpmkn.cloudfront.net/original/b414091165ea0f4172089c2fc67bb35aa37cfc55",
"width" : 300
}, {
"height" : 64,
"url" : "https://d3rt1990lpmkn.cloudfront.net/original/8522fc78be4bf4e83fea8e67bb742e7d3dfe21b4",
"width" : 64
...
} ],
...
}
}
Old Answer:
You can get the URL to the cover art by calling Spotify's oEmbed service:
https://embed.spotify.com/oembed/?url=spotify:track:6bc5scNUVa3h76T9nvpGIH
https://embed.spotify.com/oembed/?url=spotify:album:5NCz8TTIiax2h1XTnImAQ2
https://embed.spotify.com/oembed/?url=spotify:artist:7ae4vgLLhir2MCjyhgbGOQ
With JSONP:
https://embed.spotify.com/oembed/?url=spotify:artist:7ae4vgLLhir2MCjyhgbGOQ&callback=callme
http://open.spotify.com/ urls work as well:
https://embed.spotify.com/oembed/?url=http://open.spotify.com/track/6bc5scNUVa3h76T9nvpGIH
{
"provider_url": "https:\/\/www.spotify.com",
"version": "1.0",
"thumbnail_width": 300,
"height": 380,
"thumbnail_height": 300,
"title": "Gusgus - Within You",
"width": 300,
"thumbnail_url": "https:\/\/d3rt1990lpmkn.cloudfront.net\/cover\/f15552e72e1fcf02484d94553a7e7cd98049361a",
"provider_name": "Spotify",
"type": "rich",
"html": "<iframe src=\"https:\/\/embed.spotify.com\/?uri=spotify:track:6bc5scNUVa3h76T9nvpGIH\" width=\"300\" height=\"380\" frameborder=\"0\" allowtransparency=\"true\"><\/iframe>"
}
Notice the thumbnail_url:
https://d3rt1990lpmkn.cloudfront.net/cover/f15552e72e1fcf02484d94553a7e7cd98049361a
/cover/ represents the size of the thumbnail.
Available sizes: 60, 85, 120, 140, 160, 165, 230, 300, 320, and 640.
eg: https://d3rt1990lpmkn.cloudfront.net/640/f15552e72e1fcf02484d94553a7e7cd98049361a
There are plans to implement it, as in, we want it to be there, but nobody is working on it. It is mostly a legal problem with terms of use.
Technically, it is of course possible to figure it and access the same images that for instance open.spotify.com uses by parsing html. That is not allowed of course, but there is nothing technically that stops access.
(I work at Spotify)
The iTunes and Deezer API are also useful :
http://www.apple.com/itunes/affiliates/resources/documentation/itunes-store-web-service-search-api.html
http://developers.deezer.com/api/search