Scraping a website into graph.cool from a json file

Scraping a website into graph.cool from a json file - graphcool

I'm building a site that scrapes The List, a collection of upcoming concerts in the SF Bay Area in order to power an application that serves the listings up in a modern web GUI.
Right now, I have a web worker that writes to disk a claim that has a folder for each stage of the scraping process: grabbing the raw HTML, scraping the HTML, and transforming the scraped results into something that is structured. The final file is a JSON file with a bunch of objects that look like this:
{
"band": "Willie Nelson And Family",
"date": "2018-10-17T00:00:00-07:00",
"numberOfShows": "1 show",
"venue": "Graton Resort, 288 Golf course Dr., Rohnert Park",
"time": "8pm",
"soldOut": false,
"pit": false,
"multiDay": false,
"ages": "21+",
"price": "$250"
}
I want to import this file on a recurring basis into graphcool, where I plan to have three entities:
an artist, which is just the name of the band
a venue, which is the name of the venue and possibly an address
a show, which is one or many artists at a given venue on a given date and time
My question is twofold:
How do I restructure this JSON file in order to structure it in a way that graphcool will like?
and
How do I upload the contents of this file to graphcool on a recurring basis?

Related

Shopware advanced pricing - update rule prices via API

so I got this issue that I have some rules set up for customer groups (each per different sales channel). I then used those rules in advanced pricing tab for some of my products. Now I would like to change the advanced product price for a single rule using API request. From what I understand, it should be possible using 'product-price' endpoint, PATCH method (https://shopware.stoplight.io/docs/admin-api/8fe455ace068a-list-with-basic-information-of-product-price-resources).
I am making a request with payload like below and url https://shop-domain/api/product-price/fda10622ed67472e82d800618b0c36d1:
advanced pricing img
amazon rule img
postman request img -204 No Content
product-price endpoint img
{
"data":
{
"type": "product_price",
"id": "fda10622ed67472e82d800618b0c36d1",
"attributes": {
"productId": "675b627cb3034444af9904bb41901a32",
"ruleId": "ace571bd8e6f48c88f17a551ed1e2654",
"price": [
{
"currencyId": "b7d2554b0ce847cd82f3ac9bd1c0dfca",
"net": 115.44,
"gross": 234.50,
"linked": false
}
],
"quantityStart": 1
}
}
}
... but I am getting no effect and no actual response (response is empty, 204 - No Content). What am I missing? The ultimate goal here for me is to be able to set up different prices for different sales channels like Amazon or eBay for given products. Also, the request I present here is for single update, though it would be better if this could be a bulk request (I've tried /api/_action/sync upserts as well, but I'm not sure how it should be done - documentation here seems to be quite laconic).
The "id" that I'm using here is product price id that I got using GET method on the same endpoint - for listing all product prices from advanced pricing (last image; /api/product-price: https://swagger.docs.fos.gg/).
What am I missing here, how it should be done properly? Should I use customer groups for rules or maybe Sales Channels (if it even makes any difference here)?

Power Apps Best Method for Storing Images in a SharePoint List Column

I have a Power App that will be used to upload a short text message and an image. There will ultimately be several thousands of these images uploaded over the next 12-24 months so I need a reliable method of storing the images in a SP List and a reliable method of displaying the images within the Power App. My SharePoint list column type for storing images is text multi-line. I saving the images via Patch(). I am using an Add Media Button which produces an image attachment that has the name "UploadedImage1".
I have tried two methods for storing the images in a SP list where the column type is text multi-line.
// Has intermittent issues displaying images stored in the SP List
Patch( ShoutoutsData, Defaults( ShoutoutsData ), {
Title: Value(Text(Now(), "[$-en-US]yyyymmddhhmmss")),
Image: UploadedImage1.Image
} )
// Works but Image column is sometimes storing large image files and it does not require very many image uploads before the SP list starts having issues displaying in a browser (you get the "The browser is not responding do you wanna wait" message
Patch( ShoutoutsData, Defaults( ShoutoutsData ), {
Title: Value(Text(Now(), "[$-en-US]yyyymmddhhmmss")),
Image: Substitute(JSON(UploadedImage1.Image,JSONFormat.IncludeBinaryData),"""",""),
} )
As noted in the comments above each Patch() formula, there are issues with each method and therefore I need an alternate approach to this. Any recommendations?

When I've done this in the past I've used Base64 and saved it to a sharepoint list as a string.

This is a Power Apps/Power Automate solution to save to OneDrive - Save to SharePoint similar approach:
Create your Power Automate Flow first as this will be needed in Power Apps.
Use Power Apps as the trigger.
Add the Parse JSON step to extract JSON items from the Power Apps trigger. Content should be "Ask In Power Apps", then it will fill in with "triggerBody()['ParseSON-Content']". Schema should be as follows (copy and paste):
{
"type": "array",
"items": {
"type": "object",
"properties": {
"Name": {
"type": "string"
},
"Pic": {
"type": "string"
}
},
"required": [
"Name",
"Pic"
]
}
}
Select Create file step (to one drive). (If saving to SharePoint, use the create or update item step). Add folder path. For File Name, select Parse JSON "Name" in Dynamic content selection.
Due to the nature of the array being passed into Power Automate, when selecting the "Name" from the Parse JSON list, it will invoke an "Apply to each" condition as follows:
For File content, add the expression as illustrated:
The use of the decodeDataUri expression decodes the picture format from Base64 into a format suitable for using in OneDrive (or SharePoint).
Your completed Power Automate Flow should appear as follows:
In Power Apps:
Create your app with the addmedia control and a Save Image button as follows:
Create a collection of the image and name for an image by adding the following to AddmediaButton1 (part of add media control) OnSelect property: ClearCollect(pics, { Pic: UploadedImage1.Image, Name: Rand()&".jpg"}).
Add to the OnSelect property of the SAVE IMAGE button:
a. Set(MyPics, JSON(pics,JSONFormat.IncludeBinaryData)); - create a variable MyPics, formatting the collection "pics" with binary data.
b. Call your power automate flow, passing in the variable "MyPics"created above into the flow.
When you run your Power Apps application, invoke the add media control, it will create the collection when you load or change the image/media. When you go to save the image, it will trigger the flow and result in a file being saved to OneDrive (or SharePoint) with a random file name with the .jpg extension as follows:

Scraping Dropdown prompts

I'm having some issues trying to get data from a dropdown button and none of the answers in the site (or at least the ones y found) help me.
The website i'm trying to scrape is amazon, for example, 'Nike Shoes'.
When I enter a product that falls into 'Nike Shoes', I may get a product like this:
https://www.amazon.com/NIKE-Flex-2017-Running-Shoes/dp/B072LGTJKQ/ref=sr_1_1_sspa?ie=UTF8&qid=1546518735&sr=8-1-spons&keywords=nike+shoes&psc=1
Where the size and the color comes with the page. So scraping is simple.
The problem comes when I get this type of products:
https://www.amazon.com/NIKE-Lebron-Soldier-Mid-Top-Basketball/dp/B07KJJ52S4/ref=sr_1_3?ie=UTF8&qid=1546518445&sr=8-3&keywords=nike+shoes
Where I have to select a size, and maybe a color, and also the price changes if I select different sizes.
My question is, is it there a way to, for example, access every "shoe size" so I can at least check the price for that size?
If the page had some sort of list with the sizes within the source code it wouldn't be that hard, but the page changes when I select the size and no "list" of shoe sizes appears on the source (also the URL doesn't change).

Most ecommerce websites deal with variants by embedding json into html and loading appropriate selection with javascript. So once you scrape html you most likely have all of the variant data.
In your case you'd have shoe sizes, their prices etc embeded in html body. If you search unique enough variant name you can see some json in the body:
Now you need to:
Identify where it json part is:
It usually is somewhere in <script> tags or as data-<something> attribute of any tag.
Extract json part:
If it's embedded into javascript directly you can clean extract it with regex:
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '(\{.+?\}_')
Load the json as dict and parse the tree, e.g.:
import json
d = json.loads(data[0])
d['products'][0]

ExpressJS dynamically generated pages - will search engines create listings based on query parameters?

I'm creating a directory website of local businesses in a geographical area. The hope is that if you search for something like "plumbers in New York City" then you'll see a link to the domain that will show all plumber records in a 10 mile radius from NYC.
Let's say I have routes set up like this:
app.get('/location', function(req, res) {
if (req.query.zip) { // Search by zip code
mongoose.model('cities').find({zipCode: req.query.zip}, function(err, entries) {
if (err) throw err;
res.render('location.ejs', {data: entries});
});
}
});
So, an incoming url like "http://www.example.net/location?zip=10001" would pull records from the 10001 zip code (New York City) and a page will show all entries in that area. It would generate an h1 tag, title, etc with the city associated with the query string zip.
Since the page will dynamically generate based on url parameters, will search engines be able to crawl every possible zip code and create accurate search listings? In other words, are search engines smart enough to have listings like "Companies in New York City" show up from my site based on the above example?

Short answer, no. Search engines may be able to discover new pages by incrementing methods, but it is not an expected or predictable behaviour.
The easiest solution would a page linking to all other pages, thus allowing search engines to discover them.
Once these pages are indexed, visitors should be able to access a page with a URL like "http://www.example.net/location?zip=10001" by the keywords "Companies in New York City".
I would suggest you to also generate a sitemap with this module and the list of your zip codes.

Geo tagging and Google Drive search

I have the following situation:
The pictures that my app stores on Google Drive have GPS position written in the EXIF data. See here.
I'd like to use the position later for additional indexing. Maybe even to get a street address from one of the friendly Google APIs. But it looks to me that there is no obvious integration of the GPS data with Google indexing. If I leave them in the EXIF header, I can't even get to them using Drive API.
So, my only option seems to be to pull the JPEG file, parse EXIF myself, call Google Maps API to get the street address (didn't try this one yet, just assuming there is such a method), and push the address data back to my 'meta', 'description' or 'IndexableText'. Or I can push the GPS coordinates directly to my meta data when storing the JPEG (2 signed floats only, afterall). This effectively duplicates the EXIF info (and I love duplicate data).
So my question is: Am I missing something obvious? Are there any plans to do this "EXIF GPS data" -> "street address" on Google Drive level? Is it already there? Should I do it myself or wait?
Thanks, sean

Google Drive SDK returns an imageMediaMetadata field for image files that contains basic properties of the image and EXIF information. At the moment, Drive doesn't reverse geocode the geo-coordinates on the EXIF metadata to provide street addresses.
Depending on your scale and performance requirements, you either can extract them by yourself or use image metadata returned by the API and then reverse geocode with Google Maps APIs. You can use custom file properties to append an address. Properties are query-able, you can build a search feature on top of them. A sample property entity looks like what's below. Read documentation for more details.
{
"key": "address",
"value": "City of Westminster, WC2N 5DN, United Kingdom",
"visibility": "PRIVATE"
}

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string