I'm scraping a JS loaded website using requests. In order to do so, I go to inspect website, network console and look for the XHR calls to know where is the website calling for the data and how. Process would be as follows
Go to the link https://www.888sport.es/futbol/#/event/1006276426 in Chrome. Once that is loaded, you can click on many items with an unique ID. After doing so, a pop up window with information appears. In the XHR call I mentioned above you get a direct link to get this information as follows:
import requests
url='https://eu-offering.kambicdn.org/offering/v2018/888es/betoffer/outcome.json?lang=es_ES&market=ES&client_id=2&channel_id=1&ncid=1586874367958&id=2740660278'
#ncid is the date in timestamp format, and id is the unique id of the node clicked
response=requests.get(url=url,headers=headers)
Problem is, this isn't user friendly and require python. If I put this last url in the Chrome driver, I get the information but in plain text, and I can't interact with it. Is there any way to get a workable link from the request so that manually inserting it in a Chrome driver it loads that pop up window directly, as a regular website?
You've to make the requests as .json() so you receive a json dict, which you can access it with keys.
import requests
import json
def main(url):
r = requests.get(url).json()
print(r.keys())
hview = json.dumps(r, indent=4)
print(hview) # here to see it in nice view.
main("https://eu-offering.kambicdn.org/offering/v2018/888es/betoffer/outcome.json?lang=es_ES&market=ES&client_id=2&channel_id=1&ncid=1586874367958&id=2740660278")
Output:
dict_keys(['betOffers', 'events', 'prePacks'])
{
"betOffers": [
{
"id": 2210856430,
"closed": "2020-04-17T14:30:00Z",
"criterion": {
"id": 1001159858,
"label": "Final del partido",
"englishLabel": "Full Time",
"order": [],
"occurrenceType": "GOALS",
"lifetime": "FULL_TIME"
},
"betOfferType": {
"id": 2,
"name": "Partido",
"englishName": "Match"
},
"eventId": 1006276426,
"outcomes": [
{
"id": 2740660278,
"label": "1",
"englishLabel": "1",
"odds": 1150,
"participant": "FC Lokomotiv Gomel",
"type": "OT_ONE",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"participantId": 1003789012,
"oddsFractional": "1/7",
"oddsAmerican": "-670",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2740660284,
"label": "X",
"englishLabel": "X",
"odds": 6750,
"type": "OT_CROSS",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"oddsFractional": "23/4",
"oddsAmerican": "575",
"status": "OPEN",
"cashOutStatus": "ENABLED"
},
{
"id": 2740660286,
"label": "2",
"englishLabel": "2",
"odds": 11000,
"participant": "Khimik Svetlogorsk",
"type": "OT_TWO",
"betOfferId": 2210856430,
"changedDate": "2020-04-14T09:11:55Z",
"participantId": 1001024009,
"oddsFractional": "10/1",
"oddsAmerican": "1000",
"status": "OPEN",
"cashOutStatus": "ENABLED"
}
],
"tags": [
"OFFERED_PREMATCH",
"MAIN"
],
"cashOutStatus": "ENABLED"
}
],
"events": [
{
"id": 1006276426,
"name": "FC Lokomotiv Gomel - Khimik Svetlogorsk",
"nameDelimiter": "-",
"englishName": "FC Lokomotiv Gomel - Khimik Svetlogorsk",
"homeName": "FC Lokomotiv Gomel",
"awayName": "Khimik Svetlogorsk",
"start": "2020-04-17T14:30:00Z",
"group": "1\u00aa Divisi\u00f3n",
"groupId": 2000053499,
"path": [
{
"id": 1000093190,
"name": "F\u00fatbol",
"englishName": "Football",
"termKey": "football"
},
{
"id": 2000051379,
"name": "Bielorrusa",
"englishName": "Belarus",
"termKey": "belarus"
},
{
"id": 2000053499,
"name": "1\u00aa Divisi\u00f3n",
"englishName": "1st Division",
"termKey": "1st_division"
}
],
"nonLiveBoCount": 6,
"sport": "FOOTBALL",
"tags": [
"MATCH"
],
"state": "NOT_STARTED",
"groupSortOrder": 1999999000000000000
}
],
"prePacks": []
}
Related
I'm using mongodb in my project. And I'll import about 20,000 products to the database in the end. So, I tried to write a script to convert the data from the spreadsheet to json, and then upload them to the mongodb. But many fields were missing.
I'm trying to figure out how to layout the spreadsheet so it would contain nested data. But I didn't find any resources to do so but only this package:
https://www.npmjs.com/package/spread-sheet-to-nested-json
But it has one problem, it will always contain "title" and "children", not the actual name of the field.
This is my product json:
[
{
"sku": "ADX112",
"name": {
"en": "Multi-Mat Gallery Frames",
"ar": "لوحة بإطار"
},
"brand": "Dummy brand",
"description": {
"en": "Metal frame in a Black powder-coated finish. Tempered glass. 2 removable, acid-free paper mats included with each frame. Can be hung vertically and horizontally. D-rings included. 5x7 and 8x10 frames include easel backs. Sold individually. Made in China.",
"ar": "إطار اسود. صنع في الصين."
},
"tags": [
"art",
"frame",
"لوحة",
"إطار"
],
"colors": [
"#000000"
],
"dimensions": [
"5x7",
"8x10"
],
"units_in_stock": {
"5x7": 5,
"8x10": 7
},
"thumbnail": "https://via.placeholder.com/150",
"images": [
"https://via.placeholder.com/150",
"https://via.placeholder.com/150"
],
"unit_size": {
"en": [
"individual",
"set of 3"
],
"ar": [
"فردي",
"مجموعة من 3"
]
},
"unit_price": 2000,
"discount": 19,
"category_id": "631f3ca65b2310473b978ab5",
"subCategories_ids": [
"631f3ca65b2310473b978ab5",
"631f3ca65b2310473b978ab5"
],
"featured": false
}
]
How can I layout a spreadsheet so it would be a template for future imports?
I feel like the documentation on loading json files into cassandra is really lacking in dsbulk docs.
Here is part of the json file that im trying to load:
[
{
"tags": [
"r"
],
"owner": {
"reputation": 23,
"user_id": 12235281,
"user_type": "registered",
"profile_image": "https://www.gravatar.com/avatar/60e28f52215bff12adb9758fc2cf86dd?s=128&d=identicon&r=PG&f=1",
"display_name": "Me28",
"link": "https://stackoverflow.com/users/12235281/me28"
},
"is_answered": false,
"view_count": 3,
"answer_count": 0,
"score": 0,
"last_activity_date": 1589053659,
"creation_date": 1589053659,
"question_id": 61702762,
"link": "https://stackoverflow.com/questions/61702762/merge-dataframes-in-r-with-different-size-and-condition",
"title": "Merge dataframes in R with different size and condition"
},
{
"tags": [
"python",
"location",
"pyautogui"
],
"owner": {
"reputation": 1,
"user_id": 13507535,
"user_type": "registered",
"profile_image": "https://lh3.googleusercontent.com/a-/AOh14GgtdM9KrbH3X5Z33RCtz6xm_TJUSQS_S31deNYUcA=k-s128",
"display_name": "lowhatex",
"link": "https://stackoverflow.com/users/13507535/lowhatex"
},
"is_answered": false,
"view_count": 2,
"answer_count": 0,
"score": 0,
"last_activity_date": 1589053657,
"creation_date": 1589053657,
"question_id": 61702761,
"link": "https://stackoverflow.com/questions/61702761/want-to-get-a-grip-of-this-pyautogui-command",
"title": "Want to get a grip of this pyautogui command"
}
]
The way I have been trying to load this is following:
dsbulk load -url ./data_so1.json -k stackoverflow_t -t staging_t -h '182.14.0.1' -header false -u username -p password
This is the closest i get and it pushes the values into Cassandra row by row like this:
data
-------------------------------------------------------------------------------------------------------------------------------
"title": "'Microsoft.ACE.OLEDB.12.0' provider is not registered on the local machine giving exception on client"
"profile_image": "https://www.gravatar.com/avatar/05085ede54486bdaebefcf8363e081e2?s=128&d=identicon&r=PG&f=1",
"view_count": 422,
"question_id": 61702768,
"user_id": 12235281,
This just takes the rows as they are (including the commas). I've tried the -m key for mapping but didnt really get anywhere with it.
What would be the right way to get these values to their own respective columns?
I'm using microsoft graph api to interview with sharepoint.
Upload file to sharepoint.
https://graph.microsoft.com/v1.0/sites/abc78c05-a77b-45bf-a1a1-51f09548b497/drive/root:/test1212123.txt:/content
Then we can got the response.
{
"#odata.context": "https://graph.microsoft.com/v1.0/$metadata#sites('abc78c05-a77b-45bf-a1a1-51f09548b497')/drive/root/$entity",
"#microsoft.graph.downloadUrl": "https://yeeofficesg.sharepoint.com/sites/GdTest/_layouts/15/download.aspx?UniqueId=b9d25e13-c915-432f-b9fb-f2d36a188d9f&Translate=false&tempauth=eyJ0eXAiOiJKV1QiLCJhbGciOiJub25lIn0.eyJhdWQiOiIwMDAwMDAwMy0wMDAwLTBmZjEtY2UwMC0wMDAwMDAwMDAwMDAveWVlb2ZmaWNlc2cuc2hhcmVwb2ludC5jb21AMzgzMDNhNTQtMjUwMS00MDcwLTlkYjItYzNmNTY2OTc2NGUxIiwiaXNzIjoiMDAwMDAwMDMtMDAwMC0wZmYxLWNlMDAtMDAwMDAwMDAwMDAwIiwibmJmIjoiMTU4NDY4MjQ5OSIsImV4cCI6IjE1ODQ2ODYwOTkiLCJlbmRwb2ludHVybCI6InltcjVvWHhDU0FIaFhhV0tYVnZuVDVjK05ETnZsejhzcC9YeFp3MStQaHc9IiwiZW5kcG9pbnR1cmxMZW5ndGgiOiIxMzUiLCJpc2xvb3BiYWNrIjoiVHJ1ZSIsImNpZCI6IlpUUmhPVFk1WkdFdE5EQXlOQzAwWlRnMExUazFZelF0WkRkalpqRmpOR1UxTm1ZMCIsInZlciI6Imhhc2hlZHByb29mdG9rZW4iLCJzaXRlaWQiOiJZV0pqTnpoak1EVXRZVGMzWWkwME5XSm1MV0V4WVRFdE5URm1NRGsxTkRoaU5EazMiLCJhcHBfZGlzcGxheW5hbWUiOiJIdHRwUmVxdWVzdCBUZXN0IiwibmFtZWlkIjoiNTk3ZDQ4YmMtMDVmMy00MTU4LThhY2MtYWU1Y2M3YTljNmFkQDM4MzAzYTU0LTI1MDEtNDA3MC05ZGIyLWMzZjU2Njk3NjRlMSIsInJvbGVzIjoiYWxsc2l0ZXMud3JpdGUgYWxsZmlsZXMud3JpdGUiLCJ0dCI6IjEiLCJ1c2VQZXJzaXN0ZW50Q29va2llIjpudWxsfQ.aTVxeDdWNkowcWFDK0xYOHUvZGo3K0VVSEd1dU02MFVheEFJbnBWWUJHTT0&ApiVersion=2.0",
"createdDateTime": "2020-03-20T05:34:59Z",
"eTag": "\"{B9D25E13-C915-432F-B9FB-F2D36A188D9F},1\"",
"id": "016REKDTITL3JLSFOJF5B3T67S2NVBRDM7",
"lastModifiedDateTime": "2020-03-20T05:34:59Z",
"name": "test1212123.txt",
"webUrl": "https://yeeofficesg.sharepoint.com/sites/GdTest/Shared%20Documents/test1212123.txt",
"cTag": "\"c:{B9D25E13-C915-432F-B9FB-F2D36A188D9F},1\"",
"size": 12,
"createdBy": {
"application": {
"id": "597d48bc-05f3-4158-8acc-ae5cc7a9c6ad",
"displayName": "HttpRequest Test"
}
},
"lastModifiedBy": {
"application": {
"id": "597d48bc-05f3-4158-8acc-ae5cc7a9c6ad",
"displayName": "HttpRequest Test"
}
},
"parentReference": {
"driveId": "b!BYzHq3unv0WhoVHwlUi0l_EO2rYM2NNCptmOTvJ-EqeM9aeJ-zj_TZktSrctfA1S",
"driveType": "documentLibrary",
"id": "016REKDTN6Y2GOVW7725BZO354PWSELRRZ",
"path": "/drive/root:"
},
"file": {
"mimeType": "text/plain",
"hashes": {
"quickXorHash": "RBBCDGQwAxrUIARAFAEJSgAAAAA="
}
},
"fileSystemInfo": {
"createdDateTime": "2020-03-20T05:34:59Z",
"lastModifiedDateTime": "2020-03-20T05:34:59Z"
}
}
Then I want to update the customized column of this list.
https://graph.microsoft.com/v1.0/sites/abc78c05-a77b-45bf-a1a1-51f09548b497/lists/89a7f58c-38fb-4dff-992d-4ab72d7c0d52/items/80/fields
step3, I needs the item id (this example is : 80)
but when I upload the file, I can't got the item id directly from the response.
use this api:https://graph.microsoft.com/v1.0/sites/abc78c05-a77b-45bf-a1a1-51f09548b497/lists/89a7f58c-38fb-4dff-992d-4ab72d7c0d52/items/
I can got the items list which include the item id is needed.
Finally, my question is ,when I upload file to sharepoint, how can I got the item id which is needed by update item.
I ended up extracting the Item GUID from the response, i.e.
"#microsoft.graph.downloadUrl": "https://yeeofficesg.sharepoint.com/sites/GdTest/_layouts/15/download.aspx?UniqueId=b9d25e13-c915-432f-b9fb-f2d36a188d9f&Translate=false&tempauth=....
or
"eTag": ""{B9D25E13-C915-432F-B9FB-F2D36A188D9F},1""
or
"cTag": ""c:{B9D25E13-C915-432F-B9FB-F2D36A188D9F},1""
And then use that in the PATCH call where the item ID is required, i.e. https://graph.microsoft.com/v1.0/sites/abc78c05-a77b-45bf-a1a1-51f09548b497/lists/89a7f58c-38fb-4dff-992d-4ab72d7c0d52/items/**B9D25E13-C915-432F-B9FB-F2D36A188D9F**/fields
Might be a more elegant way to solve the problem, however this worked for me
I have an index on algolia, each document like this.
{
"title": "sample title",
"slug": "sample slug",
"content": "Head towards Rajinder Da Dhaba for some insanely delicious Kebabs!!",
"Tags": ["fashion", "shoes"],
"created": "2017-03-30T12:10:08.815Z",
"city": "delhi",
"user": {
"_id": "58b6f3ea884fdc682a820dad",
"description": "Roughly, somewhere between insanity and zen. Mostly the guy at the window seat!",
"displayName": "Jon Doe"
},
"type": "Post",
"places": [
{
"name": "Rajinder Da Dhaba",
"slug": "Rajinder-Da-Dhaba-safdarjung-9e9ffe",
"location": {
"_geoloc": [
{
"name": "Safdarjung",
"_id": "59611a2c2094b56a39afcbce",
"coordinates": {
"lng": 77.2030268,
"lat": 28.5685586
}
}
]
}
}
],
"objectID": "58dcf5a0355b590560d6ad68",
}
I want to implement autocomplete on this.
However, when i see the demos present in algolia dashboard, i found out that it returns the complete documents.
I want to only match on user.displayName, place.name, and title
and return only these fields as suggestions in the autocomplete results instead of complete documents, which match.
I know I can create separate indexes for users, places;
But is this possible with only a single index??
Did you had a look at http://algolia.com/doc/tutorials/search-ui/autocomplete/auto-complete/ ?
It shows how to have a custom display from an index.
To match on on user.displayName, place.name, and title
you can configure the "searchable attributes" from the algolia dashboard.
Hi i'm on a project and want to use Flickr for my image galery, i'm using the photosets.* method but whenever i make a request i don't get images, i only get info.
Json Result:
{
"photoset": {
"id": "77846574839405047",
"primary": "88575847594",
"owner": "998850450#N03",
"ownername": "mr.barde",
"photo": [
{
"id": "16852316982",
"secret": "857fur848c",
"server": "8568",
"farm": 9,
"title": "wallpaper-lenovo-blue-pc-brand",
"isprimary": "1",
"ispublic": 1,
"isfriend": 0,
"isfamily": 0
},
{
"id": "16665875068",
"secret": "857fur848c",
"server": "7619",
"farm": 8,
"title": "white_horses-1280x720",
"isprimary": "0",
"ispublic": 1,
"isfriend": 0,
"isfamily": 0
}
],
"page": 1,
"per_page": "2",
"perpage": "2",
"pages": 3,
"total": "6",
"title": "My First Album"
},
"stat": "ok"
}
Please would like to have actual image URLs returned, how can i do this.
Thanks to the comment by #CBroe
I found this in the Flickr API doc.
You can construct the source URL to a photo once you know its ID, server ID, farm ID and secret, as returned by many API methods.
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}.jpg
or
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{secret}_[mstzb].jpg
or
https://farm{farm-id}.staticflickr.com/{server-id}/{id}_{o-secret}_o.(jpg|gif|png)
The final result would then look something like this.
https://farm1.staticflickr.com/2/1418878_1e92283336_m.jpg
Reference: https://www.flickr.com/services/api/misc.urls.html