Beautifulsoup scraping Tripadvisor does not work [closed] - python-3.x

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 11 months ago.
The community is reviewing whether to reopen this question as of 10 months ago.
Improve this question
im a beginner with python and beautifulsoup for web scraping and i had an issue scraping Tripadvisor site for reviews like the code is not running it stays forever with no results . yet my code is working on other sites . Please help and here the code im using :
import requests
import urllib.request
from bs4 import BeautifulSoup
r = requests.get('https://www.tripadvisor.fr/Hotel_Review-g295424-d302457-Reviews-Burj_Al_Arab-Dubai_Emirate_of_Dubai.html', auth=('user', 'pass'))
print(r.text)

This returns data:
import requests
# from bs4 import BeautifulSoup
r = requests.get('https://www.tripadvisor.fr/Hotel_Review-g295424-d302457-Reviews-Burj_Al_Arab-Dubai_Emirate_of_Dubai.html')
print(r.text)
The beautifulsoup package is not used yet as it depends what you want to do, which is unspecified in the question.

Related

How to login a site using Python 3.7? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I want to write a code in python 3.7 that login a site.
For example, When I get username and password, the code should do the login process to the site.
I'm unsure of what you mean, if you're logging into a site or trying to process a login request, but I'm going to assume the former.
There's one way using Selenium. You can inspect element the site to find the id's for the input for username/password, then use the driver to send the information to those inputs.
Example:
from selenium import webdriver
username = "SITE_ACCOUNT_USERNAME"
password = "SITE_ACCOUNT_PASSWORD"
driver = webdriver.Chrome("path-to-driver")
driver.get("https://www.facebook.com/")
username_box = driver.find_element_by_id("email")
username_box.send_keys(username)
password_box = driver.find_element_by_id("pass")
password_box.send_keys(password)
try:
login_button = driver.find_element_by_id("u_0_8")
login_button.submit()
except Exception as e:
login_button = driver.find_element_by_id("u_0_2")
login_button.submit()
This logs into facebook using the chromedriver.

how can i get all the urls in chrome network by using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
How can I get all the Request URL in chrome network
Right click -> Copy -> Copy all as HAR
Then you can import it like this:
import json
obj = json.loads(
'''
<paste here>
'''
)
You can then get urls = [ entry['request']['url'] for entry in obj['log']['entries'] ]
You may need to replace \" with \\" in your text editor for it to compile.

Linux os : Sending email using multiple attachment using python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Could someone please help me with the below requirement. I am using below version of linux os :-
Red Hat Enterprise Linux Server release 6.6 (Santiago)
Python version :2.6.6
I need to send a multiple log files to an user everyday as an attachment.
In my log directory i have multiple files with *.fix extension. I need to send all these files to user as attachment. Could you please let me know the code for it ?
FYI .. its a linux server and i am not gonna use gmail.
Appreciate your earliest help. Thanks !!
There is a python package called email that helps you in sending mails.
Getting the list of *.fix files could be done using glob.
Something like this should do it:
from glob import glob
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
msg = MIMEMultipart()
# Fill out the needed properties of msg, like from, to, etc.
for filename in glob("*.fix"):
fp = open(filename)
msg.attach(MIMEText(fp.read()))
fp.close()
...
The msgcan then be sent using smtplib as shown here

(PYTHON) Manipulating certain portions of URL at user's request [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
The download link I want to manipulate is below:
http://hfrnet.ucsd.edu/thredds/ncss/grid/HFR/USWC/6km/hourly/RTV/HFRADAR,_US_West_Coast,_6km_Resolution,_Hourly_RTV_best.ncd?var=u&var=v&north=47.20&west=-126.3600&east=-123.8055&south=37.2500&horizStride=1&time_start=2015-11-01T00%3A00%3A00Z&time_end=2015-11-03T14%3A00%3A00Z&timeStride=1&addLatLon=true&accept=netcdf
I want to make anything that's in bold a variable, so I can ask the user what coordinates and data set they want. This way I can download different data sets by using this script. I would also like to use the same variables to name the new file that was downloaded ex:USWC6km20151101-20151103.
I did some research and learned that I can use the urllib.parse and urllib2, but when I try experimenting with them, it says "no module named urllib.parse."
I can use the webbrowser.open() to download the file, but manipulating the url is giving me problems
THANK YOU!!
Instead of urllib you can use requests module that makes downloading content much easier. The part that makes actual work is just 4 lines long.
# first install this module
import requests
# parameters to change
location = {
'part': 'USWC',
'part2': '_US_West_Coast',
'km': '6km',
'north': '45.0000',
'west': '-120.0000',
'east': '-119.5000',
'south': '44.5000',
'start': '2016-10-01',
'end': '2016-10-02'
}
# this is template for .format() method to generate links (very naive method)
link_template = "http://hfrnet.ucsd.edu/thredds/ncss/grid/HFR/{part}/{km}/hourly/RTV/\
HFRADAR,{part2},_{km}_Resolution,_Hourly_RTV_best.ncd?var=u&var=v&\
north={north}&west={west}&east={east}&south={south}&horizStride=1&\
time_start={start}T00:00:00Z&time_end={end}T16:00:00Z&timeStride=1&addLatLon=true&accept=netcdf"
# some debug info
link = link_template.format(**location)
file_name = location['part'] + location['km'] + location['start'].replace('-', '') + '-' + location['end'].replace('-', '')
print("Link: ", link)
print("Filename: ", file_name)
# try to open webpage
response = requests.get(link)
if response.ok:
# open file for writing in binary mode
with open(file_name, mode='wb') as file_out:
# write response to file
file_out.write(response.content)
Probably the next step would be running this in loop on list that contains location dicts. Or maybe reading locations from csv file.

urlopen is not working for python3 [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am trying to fetch data from a url. I have tried the following in Python 2.7:
import urllib2 as ul
response = ul.urlopen("http://in.bookmyshow.com/")
page_content = response.read()
print page_content
This is working fine. But when i try it in Python 3.4 it is throwing an error:
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
I am using:
import urllib.request
response = urllib.request.urlopen('http://in.bookmyshow.com/')
data = response.read()
print data
It works for me (Python 3.4.3). You need to use print(data) in Python 3.
As a side note you may also want to consider requests which makes it way easier to interact via HTTP(S).
>>> import requests
>>> r = requests.get('http://in.bookmyshow.com/')
>>> r.ok
True
>>> plaintext = r.text
Finally, if you want to get data from such complicated pages (which are intended to be displayed, as opposed to an API), you should have a look at Scrapy which will make your life easier as well.

Resources