In my web server, I'm trying to detect when helo App crawler is hitting my site, then fetch a response on my server and return a generated page (with minimum meta info) instead of the normal JS page.
I'm trying with ToutiaoSpider userAgent for helo. But it is not working. Does anyone know which user agent is used by helo App
Helo uses ToutiaoSpider only as their userAgent.
Related
So I'm working on a web server to host a website to talk to a teradata database, but every now and then when I send a request from the website back to the server, it will do nothing until I focus the console and press any key. Does anyone know why this is and how I can fix it?
I'm using XMLHttpRequests on the website and using the default Nodejs http package on the server side. Let me know if you need more information about it that I haven't included here.
As it turns out this was an issue with CMD/Powershell select mode.
I am implementing google reCaptcha. In the Google documentation, they say the way to do it. The documentation suggests server side validation of captcha. I wanted to know why we need to verify it in the server side as it is already verified in the UI side from the google server. Is it a suggested to implement captcha in the UI side alone with no validation in the server? What are the problems(if any) if done in UI alone.
a example would be: you're creating a register form and want to prevent bots to create a account on your site, you need to verify it serverside, because in the background you're sending a request which will look something like this:
POST /register 1.1 HTTP
Host: www.example.com
{"username":"example","email:"hey#gmail.de","captcha-token":"123984f729340fmu2q34f9"}
and if you dont send the captcha-token with the request or the server doesnt validate it, this bot could just spam this request without loading the frontend page. Please mind in head, that bots dont visit your "UI" (frontend page). Just verify everything serverside like text length, bad characters, rate limits...
I am scraping website which is made on websphere.
I see that whenever the user logged in, It hits 4 url while reaching to home page.
While in 3rd URL, It has some encrypted value which looks like this
L0lDU0NTSUpKZ2tLQ2xFS0NXXXXXXXXXXXXXXXXXXX..XXXXXXXXXvZD1vbkxvYWQ!
The URL looks like this :
http://example.com/escares/wps/myportal/!ut/p/c1/XXXXXXXXXX/dl2/d1/L0lDU0NTSUpKZ2tLQ2xFS0NXXXXXXXXXXXXXXXXXXX..XXXXXXXXXvZD1vbkxvYWQ!
The problem is, I noticed this only encrypted value changes for every login.
Is there any algorithm in websphere that generates this kind of url ? Or is there any way I can replicate this encrypted value ?
Is there any one who has done crawling/scraping on the websphere site ?
wps/myportal suggests a Websphere web portal login. The 'encrypted' URI you're seeing is most likely a hash to maintain the user login sessions.
The best way to replicate this is to supply your web scraping program with a username and password to access the portal section of the website so it can POST a login while scraping. The website itself will generate the session info. You will need to instruct your scraping application to follow any dynamic URLs that are generated. Usually this is done by following any URLs in the HTML supplied by the server after logging in.
As an example, scrapy can be configured to follow any URLs in target pages when scraping:
https://doc.scrapy.org/en/latest/intro/tutorial.html#following-links
Although you are using your own solution to scrape the contents of the portal for a logged in user, hopefully the logic and progression illustrated in my examples help steer you in the right direction for resolving what appears to be a session/cookie storage issue.
Though Chris has answered the question and it helped me.
This line
Usually this is done by following any URLs in the HTML supplied by the server after logging in.
Just want to update with Node js. The same thing can be acheived by request module and cheerio for parsing the html(which comes in response) in Node JS.
P.S. : In case anyone is looking where i found that dynamic url, I found that in HTML form which came to me in response. It was the action of that form.
I'm developing a web app using node.js and use ejs as my view engine.
On my login page, I am trying to intergrate LINE login in my system.
I followed the instructions and tried run on the the login querystring in the following order,
access.line.me/dialog/oauth/weblogin?response_type=code&client_id={My Channel ID}&redirect_uri=https://localhost:3000/login&state=login
but the page shows "Can not connect to this site page." Tried refresh many times and it is still not working.
I've tried looking for the issue, but I still don't get why my login page is not showing up. Did I do anything wrong?
Thanks ahead
I think u cant use localhost url as your login url, it must be hosted and have https security
I want to integrate Central Authorization Server with node JS. I am currently using the node grand_master_cas library. It fetches the login page, but when I enter the username/password, the ticket is displayed in address bar as local host?ticket=ST-2741-uWij6ecxcLOZyM2nIqfG-cas. The browser displays the following error:
Unable to load the webpage because the server sent no data. Error
code: ERR_EMPTY_RESPONSE
Does anyone know how to resolve this problem? If you know any samples or good posts about the same issue, mention it.