I'm using pandas read_csv to download and read a file to a dataframe.
import pandas as pd
df = pd.read_csv('https://some-monitor.com/rest/data', sep=';', thousands='.', decimal=',')
Locally, the script works fine and the data is read to the dataframe. However, when I ssh into a remote server and run the script there, I get the following error:
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 678, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python3/dist-packages/pandas/io/parsers.py", line 424, in _read
filepath_or_buffer, encoding, compression)
File "/usr/lib/python3/dist-packages/pandas/io/common.py", line 195, in get_filepath_or_buffer
req = _urlopen(filepath_or_buffer)
File "/usr/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/usr/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/usr/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 413:
Why is this occurring? Why does the script work locally but not on the server? The server's and my local OS are both the same: Ubuntu.
Thanks to #DeadSec 's suggestion, the script now runs fine on the server too.
I used pretty-downloader to first download the file, then load it in pandas.
from pretty_downloader import download
download('https://some-monitor.com/rest/data', file_name='my_file.csv')
import pandas as pd
df = pd.read_csv('my_file.csv', sep=';', thousands='.', decimal=',')
Related
I am trying to make a python program which downloads youtube video when link given using Pytube module but when i try to run it, its giving a huge error which is as follows-
Traceback (most recent call last):
File "c:\Users\Sumit\vs code python\projects\proj's\yt.py", line 9, in <module>
yt = YouTube(link)
File "C:\Python39\lib\site-packages\pytube\__main__.py", line 91, in __init__
self.prefetch()
File "C:\Python39\lib\site-packages\pytube\__main__.py", line 181, in prefetch
self.vid_info_raw = request.get(self.vid_info_url)
File "C:\Python39\lib\site-packages\pytube\request.py", line 36, in get
return _execute_request(url).read().decode("utf-8")
File "C:\Python39\lib\site-packages\pytube\request.py", line 24, in _execute_request
return urlopen(request) # nosec
File "C:\Python39\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Python39\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Python39\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Python39\lib\urllib\request.py", line 555, in error
result = self._call_chain(*args)
File "C:\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Python39\lib\urllib\request.py", line 747, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "C:\Python39\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Python39\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Python39\lib\urllib\request.py", line 561, in error
return self._call_chain(*args)
File "C:\Python39\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Python39\lib\urllib\request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 410: Gone
I dont know whats the problem and I am not understanding what should I do, Please help.
Also, I am using python 3.9,the code is given below-
from pytube import YouTube
SAVE_PATH = r"C:\Users\Sumit\vs code python\projects"
link="https://www.youtube.com/watch?v=JfVOs4VSpmA&t=3s"
yt = YouTube(link)
mp4files = yt.filter('mp4')
yt.set_filename('Video')
Avideo = yt.get(mp4files[-1].extension,mp4files[-1].resolution)
try:
# downloading the video
Avideo.download(SAVE_PATH)
except:
print("Some Error!")
print('Task Completed!')
https://pytube.io/en/latest/api.html#pytube.Stream.download
from pytube import YouTube
SAVE_PATH = r"C:\Users\Sumit\vs code python\projects"
link="https://www.youtube.com/watch?v=JfVOs4VSpmA&t=3s"
yt = YouTube('http://youtube.com/watch?v=9bZkp7q19f0')
yt.streams
.filter(progressive=True, file_extension='mp4')
.order_by('resolution')
.desc()
.first()
.download(SAVE_PATH, 'videoFilename', 'mp4')
I'm trying to download a file from the URL -> https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519
I can manually download the file by accessing the URL via a browser and the file gets automatically saved onto the local machine in the Downloads folder. (The file is in JSON format)
However, I need to achieve this using a Python script. I tried using urllib.request & wget, but in both cases I keep getting the error -
urllib.request.urlretrieve(url, path)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 247, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Is there a workaround to this? Dealing with dynamic changes ?
You could try the following script to get the download url and download the json file:
import requests
import re
import urllib.request
rq= requests.get("https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519")
t = re.search("https://download.microsoft.com/download/.*?\.json", rq.text )
a= t.group()
print(a)
path = r"$(Build.sourcesdirectory)\agent.json"
urllib.request.urlretrieve(a, path)
Result:
Python 3
import urllib.request, json
with urllib.request.urlopen("https://download.microsoft.com/download/7/1/D/71D86715-5596-4529-9B13-DA13A5DE5B63/ServiceTags_Public_20210329.json") as url:
data = json.loads(url.read().decode())
print(data)
After few tries ... getting following response for google search query as given below. Anybody had solution to this please help?
search(query=self.name,tld='com',lang='en',num=100,stop=100,pause=5):
File
"C:\Users\img_cart_project\venv\lib\site-packages\googlesearch_init_.py",
line 305, in search
html = get_page(url, user_agent, verify_ssl) File "C:\Users\img_cart_project\venv\lib\site-packages\googlesearch_init_.py",
line 174, in get_page
response = urlopen(request) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 222, in urlopen
return opener.open(url, data, timeout) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 531, in open
response = meth(req, response) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 640, in http_response
response = self.parent.error( File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 563, in error
result = self._call_chain(*args) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 502, in _call_chain
result = func(*args) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 755, in http_error_302
return self.parent.open(new, timeout=req.timeout) File "C:\Program
Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 531, in open
response = meth(req, response) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 640, in http_response
response = self.parent.error( File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 569, in error
return self._call_chain(*args) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 502, in _call_chain
result = func(*args) File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.8_3.8.1776.0_x64__qbz5n2kfra8p0\lib\urllib\request.py",
line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 429: Too Many Requests
Possible duplicate.
Handle the Error in your code
Assuming that you are using the google python package and not the google-api-python-client the line you provided works fine for me.
Steps followed
Create a clean python environment (using virtualenv in my case, install with pip): virtualenv google_search_env
Activate the environment: source google_search_env/bin/activate for linux or source google_search_env/scripts/activate for windows.
Install dependencies: pip install beautifulsoup4 google
Run you python script: python search_client.py
# search_client.py
from googlesearch import search
try:
searche_results = search(query="test search", tld='com',lang='en',num=5,stop=10,pause=1)
[print("result: "+searche_result) for searche_result in searche_results]
except HTTPError:
print("429 HTTP Error.")
# more code...
except:
print("There was an issue while fetching results.")
This should print search results for test search term or the custom exception.
import re
>>> import urllib.request
>>> url="https://www.google.com/search?q=googlestock"
>>> print(url)
https://www.google.com/search?q=googlestock
>>> data=urllib.request.urlopen(url).read()
I get an error however the url works fine when opened manually. error is
File "<pyshell#4>", line 1, in <module>
data=urllib.request.urlopen(url).read()
File "C:\Users\SHARM\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\SHARM\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\SHARM\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\SHARM\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\SHARM\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\SHARM\AppData\Local\Programs\Python\Python37-32\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
If you want to do web scraping from google, you can use the "google" library.
On your command prompt, pip install google (it is literally "pip install google").
Then, just try something like that:
from googlesearch import search
for s in search("googlestock"):
print(s)
This will print all of the results from google search "googlestock". Here to learn more about this library: https://pypi.org/project/google/
I hope it helps,
BR
I have been scraping a small amount of data once a day & the code was working before 5 days ago. Now I can't seem to get anything but a 503 error code when trying.
Here's a simple cut of the code causing the issues:
from urllib.request import Request, urlopen
website = "https://bitcointalk.org/index.php?board=1.0"
req = Request(website, headers={'User-Agent': 'Mozilla/5.0'})
data = urlopen(req).read()
print(data)
which gives the error:
Traceback (most recent call last):
File "C:/Users/Clay/Desktop/python/airdrop/1.py", line 6, in <module>
data = urlopen(req).read()
File "C:\Users\Clay\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Clay\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Users\Clay\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\Clay\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Users\Clay\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\Clay\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 503: Service Temporarily Unavailable
any ideas? I am quite lost after changing the User-Agent and still having the same issue.
I also have multiple proxy servers that I've tried cycling through and it doesn't work on any of them.