No connection adapters were found for Python3-Requests - python-3.x

I am using beautiful soup with requests package in python3 for web scraping. This is my code.
import csv
from datetime import datetime
import requests
import csv
from datetime import datetime
from bs4 import BeautifulSoup
quote_page = ['http://10.69.161.179:8080'];
data = []
page = requests.get(quote_page)
soup = BeautifulSoup(page.content,'html.parser')
name_box = soup.find('div', attrs={'class':'caption span10'})
name= name_box.text.strip() #strip() is used to remove starting and ending
print(name);
data.append(name)
with open('sample.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
writer.writerow([name])
print ("Success");
When I execute the above code I'm getting the following error.
Traceback (most recent call last):
File "first_try.py", line 21, in <module>
page = requests.get(quote_page);
File "C:\Python\lib\site-packages\requests-2.13.0-py3.6.egg\requests\api.py", line 70, in get
return request('get', url, params=params, **kwargs)
File "C:\Python\lib\site-packages\requests-2.13.0-py3.6.egg\requests\api.py", line 56, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python\lib\site-packages\requests-2.13.0-py3.6.egg\requests\sessions.py", line 488, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python\lib\site-packages\requests-2.13.0-py3.6.egg\requests\sessions.py", line 603, in send
adapter = self.get_adapter(url=request.url)
File "C:\Python\lib\site-packages\requests-2.13.0-py3.6.egg\requests\sessions.py", line 685, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '['http://10.69.161.179:8080/#/main/dashboard/metrics']'
Can anyone help me with this? :(

Because requests.get() only accept url schema in string format. You need to unpack string inside the list [] .
quote_page = ['http://10.69.161.179:8080']
for url in quote_page:
page = requests.get(url)
.....
By the way , though semicolon is harmless under following statement, you should avoid it unless you need it for some reason
quote_page = ['http://10.69.161.179:8080'];

Related

How to get url instead of base64 value from img tags

I'm quite new regarding to the web realm and in an attempt to make my way into it today I started using Beautiful Soup and requests module. Program was going well until code execution gets on (line 78) whereon an error is raised because the source value of img tags is retrieved as base64 format. So far, I'm aware that this could be overcome by just encoding it to ascii then decoding it with base64 decoder, but the thing is that i want it as an URL value. How should i go about it?
NOTES:
(Just in case, one never knows)
python version: 3.8.5
lxml version 4.6.2
beautifulsoup4 4.9.3
ERROR
Traceback (most recent call last):
File "/home/l0new0lf/Documents/Projects/Catalog Scraping/scrape.py", line 72, in <module>
webdata['images'].append(requests.get(game_tables[j].select_one(IMG_TAG_SELECTOR)['src']).content)
File "/usr/lib/python3/dist-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 640, in send
adapter = self.get_adapter(url=request.url)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 731, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for
'data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEHAAEALAAAAAABAAEAAAICTAEAOw=='
Any help is welcome, many thanks in advance!!!
CODE
#CSS SELECTORS FOR LOOKUPS
TITLE_TAG_SELECTOR = 'tr:first-child td.ta14b.t11 div a strong'
IMG_TAG_SELECTOR = 'tr:last-child td:first-child a img'
DESCRIPTION_TAG_SELECTOR = 'tr:last-child td:last-child p'
GENRES_TAG_SELECTOR = 'tr:last-child td:last-child div.mt05 p :last-child'
GAME_SEARCH_RESULTS_TABLE_SELECTOR = 'table.mt1.tablestriped4.froboto_real.blanca'
#CSS CLASS ATTRIBUTE
GAME_TABLE_CLASS = 'table transparente tablasinbordes'
rq = requests.get(
f'https://vandal.elespanol.com/juegos/13/pc/letra/a/inicio/1')
soup = BeautifulSoup(rq.content, 'lxml')
main_table = soup.select_one(GAME_SEARCH_RESULTS_TABLE_SELECTOR)
print('main_table:', main_table)
game_tables = main_table.find_all('table', {'class': GAME_TABLE_CLASS})
print('game_tables', game_tables)
#help(game_tables)
webdata = {
'titles' : [],
'images' : [],
'descriptions' : [],
'genres': [],
}
for j in range(len(game_tables)):
webdata['titles'].append(game_tables[j].select_one(TITLE_TAG_SELECTOR).text)
print(game_tables[j].select_one(IMG_TAG_SELECTOR)['src'])
webdata['images'].append(requests.get(game_tables[j].select_one(IMG_TAG_SELECTOR)['src']).content)
webdata['descriptions'].append(game_tables[j].select_one(DESCRIPTION_TAG_SELECTOR))
webdata['genres'].append(game_tables[j].select_one(GENRES_TAG_SELECTOR))
print(titles, images, descriptions, genres, sep='\n')

Request Error when trying to download images from Netflix

Ok, I'm new with python and I'm intended to learn a little bit about webscraping which is even harder for me since I don't know anything about Web, JS, HTML etc. My idea was to download some images available on the netflix catalogue. And that even works fine for the first 5 or 6 images
import requests, os
from bs4 import BeautifulSoup
url = "https://www.netflix.com/br/browse/genre/839338"
page = requests.get(url)
page.raise_for_status()
soup = BeautifulSoup(page.text)
img_element_list= soup.select('a img')
print(f'Images avalaible (?) : {len(img_element_list)} ')
quantity = int(input('How many images: '))
for c in range(quantity):
name = img_element_list[c].get('alt')
print('Downloading ' + name + ' image...')
img_response= img_element_list[c].get('src')
print('SCR: ' + img_response + '\n\n')
img = requests.get(img_response)
file = os.path.join('Images', name)
img_file = open(file+'.jpg', 'wb')
for chunk in img.iter_content(100000):
img_file.write(chunk)
img_file.close()
But in the end after the forth or fifth image's been downloaded, the scr for the subsequent images gets like this 'data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw==' and then it raises this error:
Traceback (most recent call last):
File "C:\Users\1513 IRON\PycharmProjects\DownloadNetflixImg.py", line 20, in <module>
img = requests.get(img_response)
File "C:\Users\1513 IRON\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\1513 IRON\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\1513 IRON\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\1513 IRON\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 640, in send
adapter = self.get_adapter(url=request.url)
File "C:\Users\1513 IRON\AppData\Local\Programs\Python\Python37\lib\site-packages\requests\sessions.py", line 731, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'data:image/gif;base64,R0lGODlhAQABAAAAACH5BAEKAAEALAAAAAABAAEAAAICTAEAOw=='

unable take input from a text file in python crawler

I have created a basic crawler in python, I want to take input from a text file.
I used open/raw_input but there was an error.
When I used input("") function it is prompting for input and was working fine.
The problem only with reading a file
import re
import urllib.request
url = open('input.txt', 'r')
data = urllib.request.urlopen(url).read()
data1 = data.decode("utf8")
print(data1)
file =open('output.txt' , 'w')
file.write(data1)
file.close()
error output below.
Traceback (most recent call last):
File "scrape.py", line 8, in <module>
data = urllib.request.urlopen(url).read()
File "/usr/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.6/urllib/request.py", line 518, in open
protocol = req.type
AttributeError: '_io.TextIOWrapper' object has no attribute 'type'
the method open returns a file object, and not the content of the file as a string. if you want url to contain the content as a string, change the line to:
url = open('input.txt', 'r').read()

Why I am getting the below mentioned error?

import requests
from selenium import webdriver
chrome_options = webdriver.ChromeOptions()
links = {"profile.default_content_setting_values.notifications": 2}
chrome_options.add_experimental_option("prefs", links)
driver = webdriver.Chrome(chrome_options=chrome_options,
executable_path="F:\\automation\\chromedriver.exe")
driver.maximize_window()
driver.get('https://google.com/')
links = driver.find_elements_by_tag_name("a")
for link in links:
r = requests.head(link.get_attribute('href'))
print(link.get_attribute('href'), r.status_code, 'front_page')
driver.quit()
I am getting this error:
Traceback (most recent call last):
File "F:/automation/frontpages.py", line 15, in <module>
r = requests.head(link.get_attribute('href'))
File "F:\automation\venv\lib\site-packages\requests\api.py", line 101, in head return request('head', url, **kwargs)
File "F:\automation\venv\lib\site-packages\requests\api.py", line 60, in request return session.request(method=method, url=url, **kwargs)
File "F:\automation\venv\lib\site-packages\requests\sessions.py", line 524, in request resp = self.send(prep, **send_kwargs)
File "F:\automation\venv\lib\site-packages\requests\sessions.py", line 631, in send adapter = self.get_adapter(url=request.url)
File "F:\automation\venv\lib\site-packages\requests\sessions.py", line 722, in get_adapter raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'mailto:care#pushengage.com'
and I want to export all the links into the HTML sheet when the test case passed
Why I am getting this error?
Some link has mailto link which is not http url .You are trying to use an mailto link(mailto:some#somepage.com) into r = requests.head. Which is not an http request, so your getting this error.
Filter the those invalid links in findelements itself.
links = driver.find_elements_by_xpath("//a[not(contains(#href,'mailto'))][contains(#href,'pushengage.com')]")
for link in links:
r = requests.head(link.get_attribute('href'))
print(link.get_attribute('href'), r.status_code, 'front_page')
driver.quit()

Python Requests.Get - Giving Invalid Schema Error

Another one for you.
Trying to scrape a list of URLs from a CSV file. This is my code:
from bs4 import BeautifulSoup
import requests
import csv
with open('TeamRankingsURLs.csv', newline='') as f_urls, open('TeamRankingsOutput.csv', 'w', newline='') as f_output:
csv_urls = csv.reader(f_urls)
csv_output = csv.writer(f_output)
for line in csv_urls:
page = requests.get(line[0]).text
soup = BeautifulSoup(page, 'html.parser')
results = soup.findAll('div', {'class' :'LineScoreCard__lineScoreColumnElement--1byQk'})
for r in range(len(results)):
csv_output.writerow([results[r].text])
...Which gives me the following error:
Traceback (most recent call last):
File "TeamRankingsScraper.py", line 11, in <module>
page = requests.get(line[0]).text
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 616, in send
adapter = self.get_adapter(url=request.url)
File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 707, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'https://www.teamrankings.com/mlb/stat/runs-per-game?date=2018-04-15'
My CSV file is just a list in column A of several urls (ie. https://www...)
(The div class I'm trying to scrape doesn't exist on that page, but that's not where the problem is. At least I don't think. I just need to update that when I can get it to read from the CSV file.)
Any suggestions? Because this code works on another project, but for some reason I'm having issues with this new URL list. Thanks a lot!
from the Traceback, requests.exceptions.InvalidSchema: No connection adapters were found for 'https://www.teamrankings.com/mlb/stat/runs-per-game?date=2018-04-15'
See the random character in the url, it should start from https://www.teamrankings.com/mlb/stat/runs-per-game?date=2018-04-15
So first parse the csv and remove any random characters before the http/https using regex. That should solve your problem.
If you want to resolve your current problem with this particular url while reading csv, do :
import regex as re
strin = "https://www.teamrankings.com/mlb/stat/runs-per-game?date=2018-04-15"
re.sub(r'.*http', 'http', strin)
This will give you the correct url which request can handle.
Since you ask for the full fix of the path which is accessible in the loop, here's what you could do:
from bs4 import BeautifulSoup
import requests
import csv
import regex as re
with open('TeamRankingsURLs.csv', newline='') as f_urls, open('TeamRankingsOutput.csv', 'w', newline='') as f_output:
csv_urls = csv.reader(f_urls)
csv_output = csv.writer(f_output)
for line in csv_urls:
page = re.sub(r'.*http', 'http', line[0])
page = requests.get(page).text
soup = BeautifulSoup(page, 'html.parser')
results = soup.findAll('div', {'class' :'LineScoreCard__lineScoreColumnElement--1byQk'})
for r in range(len(results)):
csv_output.writerow([results[r].text])

Resources