Instagram ?__a=1 query: Why do I get login page instead of JSON? - azure

Here is a sample url that returns JSON of the instagram user's data: https://www.instagram.com/therock/?__a=1
And it returns JSON like this:
{
"logging_page_id":"profilePage_232192182",
"show_suggested_profiles":true,
"show_follow_dialog":false,
"graphql":{
"user":{
"biography":"founder",
"blocked_by_viewer":false,
"business_email":null,
"restricted_by_viewer":false,
"country_block":false,
"external_url":"https://projectrock.online/7ad",
"external_url_linkshimmed":"https://l.instagram.com/?u=https%3A%2F%2Fprojectrock.online%2F7ad&e=ATMKh6M0eOgq-_jVoR3-xJ0Q2wwVSenYemMoYM0A0nWrW9Y5P7mDXX1dkk2dDLidhEuV1Wees7Z3teLJqp7vB2k&s=1",
"edge_followed_by":{
"count":199139001
},
"followed_by_viewer":false,
"edge_follow":{
"count":406
},
"follows_viewer":false,
"full_name":"therock",
"has_ar_effects":false
I am working on an ASP.NET Core API and have an endpoint that takes in instagram handle and parses the JSON. It works fine locally, but when I hit the same endpoint on the Azure-deployed API, I get the log in page instead:
<!DOCTYPE html>
<html lang="en" class="no-js not-logged-in client-root">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>
Login • Instagram
</title>
<meta name="robots" content="noimageindex, noarchive">
<meta name="apple-mobile-web-app-status-bar-style" content="default">
<meta name="mobile-web-app-capable" content="yes">
<meta name="theme-color" content="#ffffff">
<meta id="viewport" name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1, viewport-fit=cover">
<link rel="manifest" href="/data/manifest.json">
I tried by using a third party browser-as-service (PhantomJsCloud) but returns the same log in page. I thought it was the CORS policy, but fixing it didn't work, and also setting the cookie returned, but to no avail. I am really lost here, I'd be really thankful if anyone can point to why this is happening. Thank you!

probably instagram don't want you to fetch it like that and has some mechanism to identify that your request is done programmatically. I assume when you call it in the browser it is working. You can try to cypress or puppeteer to still make it work or probably use the official api with tokens etc.
EDIT:
okay.. I played a little bit around and could make it somehow work, but not sure how reliable this is:
first I started with the following: https://codelike.pro/fetch-instagram-posts-from-profile-without-__a-parameter/
after having the parsed JSON object I searched for entry_data.ProfilePage[0].graphql.user.edge_owner_to_timeline_media.page_info.end_cursor --> used end_cursor for the following request:
https://www.instagram.com/graphql/query/?query_id=17888483320059182&id=928659671&first=100&after= where you need to used the end_cursor for the &after query param. query_id is for Media in the instagram account, id is the id of the instagram account (you can get the id of the instagram account from the parsedObject)
query_id is some kind of hardcoded thing from instagram, other ids can be found here: https://gist.github.com/Carlos-Henreis/2df27431fa5d7a84b7a5e57ee1bf6ae2#file-query_id-csv
Edit 2:
Realized this will only work when your ip is also not detected by instagram or you send a cookie of a logged-in session, otherwise you wont get the ProfilePage but a LoginAndSignupPage instead unfortunately
for more info, see here: https://stackoverflow.com/a/57722553/5195852

Related

Azure AD B2C ignores custom HTML page content

I have a custom HTML file set up for B2C's sign in / sign up user flow that looks like this:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="css/sign_up.css">
<title>My Sign up</title>
</head>
<body>
<div id="api">
</div>
</body>
</html>
I've hosted this in my web app service and placed the url into the Custom Page URI field in the flow. Screenshot here.
However, when I hit "Run User Flow" the default Microsoft selfAsserted page is still loaded. Is there anything that would cause this to happen?
To clarify: I have hit save after entering the URI and the Custom Page column says "Yes" for Local account sign up page.
You should check again, make sure the custom page status is Yes. But in your screenshot, the status is No for the custom page.
It turned out to be a CORS issue. Adding https://<resourcegroup>.b2clogin.com to my app service's CORS whitelist resolved the problem.

Puppeteer's Click API does not trigger on image map element

Puppeteer's Click API does not trigger on image map element.
I am using a puppeteer for scraping different e-commerce sites. Some e-commerce sites show a popup on page ready. I am trying to close that popup using click api by targeting element but somehow getting an error as "Node is either not visible or not an Html Element".
I have applied click on selectors:
coords='715,5,798,74'
#monetate_lightbox_mask'
body>div>div:nth-child(1)
body>div:nth-child(1):div:nth-child(1)
URLs for scraping:
https://www.hayneedle.com/product/humantouchijoymassageanywherecordlessportablemassager.cfm
https://www.hayneedle.com/product/napoleonfiberglowventedgaslogset.cfm
https://www.hayneedle.com/product/napoleonsquarepropanefirepittable1.cfm
Please suggest.
Regards,
Manjusha
I would personally use the following to wait for and click the close button:
const close_button = await page.waitForSelector( '[id$="ltBoxMap"] > [href="#close"]' );
await close_button.click();
But unfortunately, it appears that the website has implemented bot detection and is displaying the following page:
The source of the resulting web page looks like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US"><head profile="http://gmpg.org/xfn/11">
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
<meta name="viewport" content="width=1000">
<meta name="ROBOTS" content="NOINDEX, NOFOLLOW">
<meta http-equiv="cache-control" content="max-age=0">
<meta http-equiv="cache-control" content="no-cache">
<meta http-equiv="expires" content="0">
<meta http-equiv="expires" content="Tue, 01 Jan 1980 1:00:00 GMT">
<meta http-equiv="pragma" content="no-cache">
<title></title>
</head>
<body>
<h1>Access To Website Blocked</h1>
</body></html>
The bot detection service cannot be fooled simply by changing the user agent, so you will need to experiment with some other methods to bypass the service if you would like to scrape the website.

google search result shows html and angularJs code

I have created a web site.
the Google search result is showing some extra code which is part of html and angular js.
i have added the folowing metatags but no use
<meta name="description" content="">
<meta name="robots" content="nosnippnet">
<meta name="googlebot" content="NOODP, nofollow, nosnippnet">
<meta name="keywords" content="">
I have waited for 3 weeks so that google will reindex. but no use. can any one tell me where i am going wrong
You've got spelling mistake..
change nosnippnet to nosnippet

Smooth Streaming Apple URL gives 404

I have followed the walkthroughs to make an IIS Smooth Streaming publishing point support Apple devices, but have run into a problem: the target URL for the <video> tag generates a 404 response.
My isml is as follows:
<?xml version="1.0" encoding="utf-8"?>
<smil xmlns="http://www.w3.org/2001/SMIL20/Language">
<head>
<meta name="title" content="" />
<meta name="module" content="liveSmoothStreaming" />
<meta name="sourceType" content="Push" />
<meta name="publishing" content="Fragments;Streams;Archives" />
<meta name="estimatedTime" content="36000" />
<meta name="lookaheadChunks" content="2" />
<meta name="manifestWindowLength" content="0" />
<meta name="startOnFirstRequest" content="True" />
<meta name="archiveSegmentLength" content="0" />
<meta name="formats" content="m3u8-aapl" />
<meta name="m3u8-aapl-segmentlength" content="10" />
<meta name="m3u8-aapl-maxbitrate" content="1600000" />
<meta name="m3u8-aapl-allowcaching" content="False" />
<meta name="m3u8-aapl-backwardcompatible" content="False" />
<meta name="m3u8-aapl-enableencryption" content="False" />
<meta name="filters" content="" />
</head>
<body>
</body>
</smil>
The html I'm using is:
<!doctype html>
<html>
<head>
<title>Apple streaming IIS test</title>
</head>
<body>
<h1>Live Stream</h1>
<video width="640"
height="360"
src="http://10.1.1.22/video.isml/manifest(format=m3u8-aapl).m3u8"
autoplay="true"
controls="true">
Live
</video>
</body>
</html>
Note that when I type the URL http://10.1.1.22/video.isml/manifest into my browser, I get the correct XML file for Silverlight-based streaming, but adding either (format=m3u8-aapl) or (format=m3u8-aapl).m3u8 (as per these instructions) causes a 404.
Edit: I've tried a few more things with no success, but they might give insight into what's failing:
The URL http://10.1.1.22/video.isml/manifest(foo=bar) gives me the exact same response as /manifest, suitable for Silverlight.
The URL http://10.1.1.22/video.isml/manifest(format=foo) gives me a 404.
The URL http://10.1.1.22/video.isml/manifest.m3u8 gives 400 bad request.
The URL http://10.1.1.22/video.isml/manifest(foo=bar).m3u8 gives me the Silverlight response.
So it seems the extension doesn't mean anything to the server but it can't parse it if parenthesized arguments aren't present. More importantly, it's clear that the server handler is actually running for /manifest(format=m3u8-aapl) but generating a 404 in some kind of sub-request. We can rule out the server not understanding the URL and failing to run the correct handler.
After exploring some related questions, I came across this answer: The stream needs to use h.264 video and AAC audio.
Unfortunately, it's not quite that simple. The free version of Expression Encoder 4 doesn't support h.264 or AAC; they are locked down and advertised as a paid feature. The thing is, Microsoft refuses to sell Expression Encoder anymore but nonetheless haven't made these features free or offered any kind of alternative! All the third-party options they suggest are astronomically priced and geared towards large corporations.
After searching painstakingly for a reasonably priced third-party replacement, I came across a program called Unreal Media Server which both supports h.264/AAC and will output to a Smooth Streaming publishing point. (Installing a DirectShow codec pack like CCCP is also necessary.)
Just when I thought I was finished, I discovered to my horror that attempting to stream made a w3wp.exe process crash. Saying 'yes' to the offer to debug showed me a stack trace with an access violation in mpeg2tssink.dll. Fortunately, the first (and only relevant) Google search result for mpeg2tssink.dll was this question where someone else had the exact same problem. The fix was to grant the IIS_IUSRS account full control of the C:\inetpub\media\archives folder where Smooth Streaming saves its video chunks. Then things started to work.
So in short:
On the computer which provides the video stream, install CCCP, Unreal Media Server, and Unreal Live Server.
On the server, grant IIS_IUSRS (or whatever user App Pools are run under on your version of Windows/IIS) permission to modify the folder where video fragments are kept, probably C:\inetpub\media\archives

rel=image_src isn't changing the thumbnail

I added this code to the head but when I try to post something about a website inside a Facebook page, the image that I specified is still not showing up as an option. I'm using wordpress as a CMS. Any ideas why?
<meta content="something" name="title">
<meta content="something="og:description">
<link href="thumbnail.jpeg" rel="image_src">
Try using the Facebook Debugger to pin-point the issue.
Sometimes there is a caching issue and feeding your URL through this tool forces Facebook to scrape your URL again hence refreshing the cached og:tags
Further more your og:tags should look more like this :
<meta property="og:title" content="The Rock"/>
<meta property="og:type" content="movie"/>
Notice the property attribute and not the content attribute that you (possibly) used. The correct syntax is available at this link : http://ogp.me/

Resources