Python3 Web-scraping cant login to the page? Timestamp? - python-3.x

So i just started to learn web-scraping with python3 and i want to login to this website: https://dienynas.tamo.lt/Prisijungimas/Login
The form data it requires is:
UserName: username,
Password: password,
IsMobileUser: false,
ReturnUrl: '',
RequireCaptcha: false,
Timestamp: 2020-03-31 14:11:21,
SToken: 17a48bd154307fe36dcadc6359681609f4799034ad5cade3e1b31864f25fe12f
this is my code:
from bs4 import BeautifulSoup
import requests
from lxml import html
from datetime import datetime
data = {'UserName': 'username',
'Password': 'password',
'IsMobileUser': 'false',
'ReturnUrl': '',
'RequireCaptcha': 'false'
}
login_url = 'https://dienynas.tamo.lt/Prisijungimas/Login'
url = 'https://dienynas.tamo.lt/Pranesimai'
with requests.Session() as s:
r = s.get(login_url)
soup = BeautifulSoup(r.content, "lxml")
AUTH_TOKEN = soup.select_one("input[name=SToken]")["value"]
now = datetime.now()
data['Timestamp'] = f'{now.year}-{now.month}-{now.day} {now.hour}:{now.minute}:{now.second}'
data["SToken"] = AUTH_TOKEN
r = s.post(login_url, data=data)
r = s.get(url)
print(r.text)
And I cant login to the page, I think I did Timestamp wrong? Please help :)
Edit: so today i changed my code a little bit because i found out that most data i need were in hidden inputs so:
data = {'UserName': 'username',
'Password': 'password',
}
AUTH_TOKEN = soup.find("input",{'name':"SToken"}).get("value")
Timestamp = soup.find("input",{'name':"Timestamp"}).get("value")
IsMobileUser = soup.find("input",{'name':"IsMobileUser"}).get("value")
RequireCaptcha = soup.find("input", {'name': "RequireCaptcha"}).get("value")
ReturnUrl = soup.find("input", {'name': "ReturnUrl"}).get("value")
and added this to data dictionary, i also tried to create headers:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36'}
r = s.post(login_url, data=data, headers=headers)
and yeah nothing works for me.. Maybe there is a way to find out why I cant login?

I agree with you .It seems you are not sending the correct timestamp .
The website has an input for it so you can scrape it like the token and send it or you can generate the same timestamp with the same time zone the website is using
from bs4 import BeautifulSoup
import requests
from lxml import html
from datetime import datetime
from pytz import timezone
data = {'UserName': 'username',
'Password': 'password',
'IsMobileUser': 'false',
'ReturnUrl': '',
'RequireCaptcha': 'false'
}
login_url = 'https://dienynas.tamo.lt/Prisijungimas/Login'
url = 'https://dienynas.tamo.lt/Pranesimai'
with requests.Session() as s:
r = s.get(login_url)
soup = BeautifulSoup(r.content, "lxml")
AUTH_TOKEN = soup.find("input",{'name':"SToken"}).get("value")
Timestamp = soup.find("input",{'name':"Timestamp"}).get("value") #2020-03-31 15:36:37
now = datetime.now(timezone('Etc/GMT-3'))
data['Timestamp'] = now.strftime('%Y-%m-%d %H:%M:%S') #2020-03-31 15:36:36
print('Timestamp from website',Timestamp)
print('Timestamp from python',data['Timestamp'])
data["SToken"] = AUTH_TOKEN
r = s.post(login_url, data=data)
r = s.get(url)
print(r.text)

Related

requests_html does not render javascript webpages

Why my code doesn't render the javascript pages using requests_html and beautifulSoup
from requests_html import HTMLSession
from bs4 import BeautifulSoup
session = HTMLSession()
def track(num):
url = f'https://www.trackingmore.com/track/en/{num}'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:103.0) Gecko/20100101 Firefox/103.0'}
r = session.post(url,headers=headers)
# r.html.render(timeout=20)
res = []
soup = BeautifulSoup(r.content,'lxml')
st = soup.find('div',class_ ="track-status uk-flex")
print(st.text)
if st != 'Not Found':
checkpoint = soup.find_all('div', class_="info-checkpoint")
for i in checkpoint:
date = i.find('div',class_='info-date').text.strip()
desc = i.find('div',class_='info-desc').text.strip()
res.append({
'Date':date.replace('\xa0','')[:19],
'Description':desc.replace('\xa0','')
})
return res
else :
return res
The output is like this, I can't get the value inside of each javascript function
[{'Date': '{{info.Date}} {{inf', 'Description': '{{info.StatusDescription}}'}, {'Date': '{{info.Date}} {{inf', 'Description': '{{info.StatusDescription}}'}]

WebScraping / Identical sites not working?

i would like to scrape the header-element from these both links -
For me this 2 sites look absolute identical - pics see below
Why is only the scraping for the second link working and not for the first?
import time
import requests
from bs4 import BeautifulSoup
# not working
link = "https://apps.apple.com/us/app/bingo-story-live-bingo-games/id1179108009?uo=4"
page = requests.get (link)
time.sleep (1)
soup = BeautifulSoup (page.content, "html.parser")
erg = soup.find("header")
print(f"First Link: {erg}")
# working
link = "https://apps.apple.com/us/app/jackpot-boom-casino-slots/id1554995201?uo=4"
page = requests.get (link)
time.sleep (1)
soup = BeautifulSoup (page.content, "html.parser")
erg = soup.find("header")
print(f"Second Link: {len(erg)}")
Working:
Not Working:
The page is sometimes loaded by JavaScript, so request won't support it.
You can use a while loop to check if header appears in the soup and then break
import requests
from bs4 import BeautifulSoup
headers = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36"
}
link = "https://apps.apple.com/us/app/bingo-story-live-bingo-games/id1179108009?uo=4"
while True:
soup = BeautifulSoup(requests.get(link).content, "html.parser")
header = soup.find("header")
if header:
break
print(header)
Try this to get whatever fields you wish to grab from those links. curently it fetches the title. You can modify res.json()['data'][0]['attributes']['name'] to grab any field of your interest. Mkae sure to put the urls within this list urls_to_scrape.
import json
import requests
from bs4 import BeautifulSoup
from urllib.parse import unquote
urls_to_scrape = {
'https://apps.apple.com/us/app/bingo-story-live-bingo-games/id1179108009?uo=4',
'https://apps.apple.com/us/app/jackpot-boom-casino-slots/id1554995201?uo=4'
}
base_url = 'https://apps.apple.com/us/app/bingo-story-live-bingo-games/id1179108009?uo=4'
link = 'https://amp-api.apps.apple.com/v1/catalog/US/apps/{}'
params = {
'platform': 'web',
'additionalPlatforms': 'appletv,ipad,iphone,mac',
'extend': 'customPromotionalText,customScreenshotsByType,description,developerInfo,distributionKind,editorialVideo,fileSizeByDevice,messagesScreenshots,privacy,privacyPolicyText,privacyPolicyUrl,requirementsByDeviceFamily,supportURLForLanguage,versionHistory,websiteUrl',
'include': 'genres,developer,reviews,merchandised-in-apps,customers-also-bought-apps,developer-other-apps,app-bundles,top-in-apps,related-editorial-items',
'l': 'en-us',
'limit[merchandised-in-apps]': '20',
'omit[resource]': 'autos',
'sparseLimit[apps:related-editorial-items]': '5'
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.101 Safari/537.36'
res = s.get(base_url)
soup = BeautifulSoup(res.text,"lxml")
token_raw = soup.select_one("[name='web-experience-app/config/environment']").get("content")
token = json.loads(unquote(token_raw))['MEDIA_API']['token']
s.headers['Accept'] = 'application/json'
s.headers['Referer'] = 'https://apps.apple.com/'
s.headers['Authorization'] = f'Bearer {token}'
for url in urls_to_scrape:
id_ = url.split("/")[-1].strip("id").split("?")[0]
res = s.get(link.format(id_),params=params)
title = res.json()['data'][0]['attributes']['name']
print(title)

How to do properly a facebook mobile site login

I'm trying to develop some code in order to make successful facebook logins. Now, to simplify as much as possible, i use the mbasic.facebook.com address.
My code is the following (using requests in python latest version):
if __name__ == '__main__':
base_url = 'https://mbasic.facebook.com'
with requests.session() as session:
user_agent = (
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
"AppleWebKit/537.36 (KHTML, like Gecko) "
"Chrome/76.0.3809.87 Safari/537.36"
)
cookie = 'locale=it_IT;'
default_headers = {
'User-Agent': user_agent,
'Accept-Language': 'it-IT,en;q=0.5',
'cookie': cookie,
}
session.headers.update(default_headers)
login_form_url = '/login/device-based/regular/login/?refsrc=https%3A%2F%2Fmbasic.facebook.com%2F&lwv=100&ref' \
'=dbl '
r = session.get("https://mbasic.facebook.com/login/")
page1 = BeautifulSoup(r.text, "lxml")
form = page1.find('form')
lsd = page1.find('input', {'name': 'lsd'})['value']
jazoest = page1.find('input', {'name': 'jazoest'})['value']
mts = page1.find('input', {'name': 'm_ts'})['value']
li = page1.find('input', {'name': 'li'})['value']
try_number = page1.find('input', {'name': 'try_number'})['value']
unrecognized_tries = page1.find('input', {'name': 'unrecognized_tries'})['value']
data = {'lsd': lsd, 'jazoest': jazoest, 'm_ts': mts, 'li': li, 'try_number': try_number,
'unrecognized_tries': unrecognized_tries, 'email': credentials["email"], 'pass': credentials["pass"],
'login': 'Accedi'}
r = session.post(base_url + login_form_url, data=data, verify=False)
# now, i need to complete the second part of the login
h = open("first_login.html", "w", encoding="utf-8")
h.write(r.text)
c = BeautifulSoup(r.text, "lxml")
form = c.find('a')
action = form.get('href')
r = session.get(base_url + action, data=data, verify=False)
f = open("second_login.html", "w", encoding="utf-8")
f.write(r.text)
Now, with this code i successfully get my home feed as a logged user. However, the problem begins when i try to move for instance to one specific facebook public page, because it returns me the page as if i wasn't logged in. The same weird thing happens when i try to get a specific post, because it doesn't show me any comments, like it does in my browser.
I tried to play with session cookies but to no avail.
Help
The solution was to change the user agent to:
Mozilla/5.0 (BB10; Kbd) AppleWebKit/537.35+ (KHTML, like Gecko) Version/10.3.3.3057 Mobile Safari/537.35+

Python 3.6.4, Scraping a website that requires login

Login Address: https://joffice.jeunesseglobal.com/login.asp.
Two data need to put: Username and pw.
Using cookie to access:https://joffice.jeunesseglobal.com/members/back_office.asp
Can't login.
#-*-coding:utf8-*-
import urllib
import http.cookiejar
url = 'https://joffice.jeunesseglobal.com/members/back_office.asp'
login_url = "https://joffice.jeunesseglobal.com/login.asp"
login_username = "jianghong181818"
login_password = "Js#168168!"
login_data = {
"Username" : login_username,
"pw" : login_password,
}
post_data = urllib.parse.urlencode(login_data).encode('utf-8')
headers = {'User-agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
req = urllib.request.Request(login_url, headers = headers, data = post_data)
cookie = http.cookiejar.CookieJar()
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cookie))
resp = opener.open(req)
print(resp.read().decode('utf-8'))
Use requests
Simple way:
>>>import requests
>>>page = requests.get(" https://joffice.jeunesseglobal.com/login.asp", auth=
('username', 'password'))
Making requests with HTTP Basic Auth
>>> from requests.auth import HTTPBasicAuth
>>> requests.get(" https://joffice.jeunesseglobal.com/login.asp", auth=HTTPBasicAuth('user', 'pass'))

logging into a twitter using python3 and requests

I have a project that I am working on, and the requirements are to login to a website using a username and password. I have to do it in python, and then be able to access a part of the site only accessible to people who are logged in. I have tried a few variations of coding to do this, and haven't been able to successfully log in yet. Here is my coding:
the function to login to it:
def session2(url):
#r = requests.get(url)
#ckies = []
#print("here are the cookies for twitter:\n")
#for cky in r.cookies:
# print(cky.name, cky.value)
# ckies.append(cky)
s = requests.Session()
session = s.get(url, verify=False)
print("\nheaders from site\n")
print(session.headers)
tree = html.fromstring(session.text)
# extract the auth token needed to login along with username and password
auth_token = list(set(tree.xpath("//input[#name='authenticity_token']/#value")))[0]
uname = "username"
pword = "password"
username = 'session[username_or_email]'
password = 'session[password]'
# payload = {name of username variable : string you want, name of password variable:
# string you want, name of auth token: string gotten from session
payload = dict(username = uname, password = pword , authenticity_token = auth_token)
header = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36'}
#do post request
# might have to change headers to be a header for chrome
response = s.post(
url,
data = payload,
#headers = dict(referer = url)
headers = header
)
print("\nheaders post\n")
print(response.request.headers)
session = s.get("http://www.twitter.com/username/followers", verify=False)
print("\nheaders get\n")
print(session.headers)
print("\nhtml doc\n")
print(session.text)
return session
code to call it:
url = "http://www.twitter.com/login"
sessions = session2(url)
the username on the site looks like this when you inspect it:
<input class="js-username-field email-input js-initial-focus" type="text" name="session[username_or_email]" autocomplete="on" value="" placeholder="Phone, email or username">
and the password section/token section look like this:
<input class="js-password-field" type="password" name="session[password]" placeholder="Password">
<input type="hidden" value="ef25cb09a8c7fe16c54e3df099e206e605b1170a" name="authenticity_token">
I know the auth token changes, which is why i have it get it from the function. When I try to run this, it just goes to the main page rather than the page i need.
One problem I think is that when I print out the header that I send in the post, it says:
{'Accept-Encoding': 'gzip, deflate', 'Connection': 'keep-alive', 'Accept': '/', 'User-Agent': 'python-requests/2.9.1'}
which I thought I changed to chrome's header, but it doesn't seem to stick.
Also, I know there is a way if I use Oauth, but I'm not allowed to use that, i have to do it based on being able to login like I'm using a browser.
Can you tell me if there is anything wrong with what I've done, as well as any hints on how to fix it? I've tried other stack overflow problems using requests and logging in, but those didn't work either.
EDIT: ok, i did a response.request.headers, and it came out with the right header, i think, so i don't think that is the problem
header it prints:
{'Accept': '*/*', 'Content-Type': 'application/x-www-form-urlencoded', 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36', 'Cookie': '_twitter_sess=some huge amount of number/letters; guest_id=v1%3A147509653977967101', 'Connection': 'keep-alive', 'Accept-Encoding': 'gzip, deflate'}
This will log you in:
import requests
from bs4 import BeautifulSoup
username = "uname"
password = "pass"
# login url
post = "https://twitter.com/sessions"
url = "https://twitter.com"
data = {"session[username_or_email]": username,
"session[password]": password,
"scribe_log": "",
"redirect_after_login": "/",
"remember_me": "1"}
with requests.Session() as s:
r = s.get(url)
# get auth token
soup = BeautifulSoup(r.content, "lxml")
AUTH_TOKEN = soup.select_one("input[name=authenticity_token]")["value"]
# update data, post and you are logged in.
data["authenticity_token"] = AUTH_TOKEN
r = s.post(post, data=data)
print(r.content)
You can see if we run it using my own account, we get my name from my profile:
In [30]: post = "https://twitter.com/sessions"
In [31]: url = "https://twitter.com"
In [32]: data = {"session[username_or_email]": username,
....: "session[password]": password,
....: "scribe_log": "",
....: "redirect_after_login": "/",
....: "remember_me": "1"}
In [33]: with requests.Session() as s:
....: r = s.get(url)
....: soup = BeautifulSoup(r.content, "lxml")
....: AUTH_TOKEN = soup.select_one("input[name=authenticity_token]")["value"]
....: data["authenticity_token"] = AUTH_TOKEN
....: r = s.post(post, data=data)
....: soup = BeautifulSoup(r.content, "lxml")
....: print(soup.select_one("b.fullname"))
....:
<b class="fullname">Padraic Cunningham</b>
Just be aware each time you login, you will the We noticed a recent login for your account ... email.

Resources