Download file from website directly into Linux directory - Python - python-3.x

If I manually click on button, the browser starts downloading a CSV file (2GB) onto my computer. But I want to automate this.
This is the link to download:
Issue; when I use either (requests or pandas) libraries it just hangs. I have no idea if it is being downloaded or not.
My goal is to:
Know if the file is being downloaded and
Have the CSV downloaded to a specified directory ie.
Can someone provide the code to do this?

Try this...
import requests
URL = ""
response = requests.get(URL)
print('Download Complete')
open("/mydirectory/downloaded_file.csv", "wb").write(response.content)
Or you could do it this way and have a progress bar ...
import wget'')
The output will look like this:
11% [........ ] 73728 / 633847


Download xml file from the server with Python3

am trying to download a xml file from public data bank
I tried to do it with requests:
import requests
response = requests.get(url)
response.encoding = 'utf-8' #or response.apparent_encoding
and wget
import wget, './my.xml')
But both of the ways provide mess instead of a correct file (it looks like a broken encoding, but I cannot fix it)
If I try to download the file via web browser I get correct a UTF-8 xml file.
What am I doing wrong in the code?

How to get WKHTMLTOPDF working on Heroku?

I created a website which generates PDF using PDFKIT and I know how to install and setup environment variable path on Window. I managed to deploy my first website on Heroku but now I'm getting error "No wkhtmltopdf executable found: "b''" When trying to generate the PDF.
I have no idea, How to install and setup WKHTMLTOPDF on Heroku because this is first time I'm dealing with Linux.
I really tried everything before asking this but even following this not working for me.
Python 3 flask install wkhtmltopdf on heroku
If possible, please guide me with step by step on how to install and setup this.
I followed all the resource and everything but couldn't make it work. Every time I get the same error.
I'm using Django version 2. Python version 3.7.
This is what I get if I do heroku stack
Available Stacks
* heroku-18
Error, I'm getting when generating the PDF.
No wkhtmltopdf executable found: "b''"
If this file exists please check that this process can read it. Otherwise please install wkhtmltopdf -
My website works very well on localhost without any problem and as far as I know, I'm sure that I have done something wrong in installing wkhtmltopdf.
Thank you
It's non-trivial. If you want to avoid all of the below's headache, you can just use my service, api2pdf: Otherwise, if you want to try and work through it, see below.
1) Add this to your requirements.txt to install a special wkhtmltopdf pack for heroku as well as pdfkit.
2) I created a in my flask app. In I have a method:
def _get_pdfkit_config():
"""wkhtmltopdf lives and functions differently depending on Windows or Linux. We
need to support both since we develop on windows but deploy on Heroku.
A pdfkit configuration
if platform.system() == 'Windows':
return pdfkit.configuration(wkhtmltopdf=os.environ.get('WKHTMLTOPDF_BINARY', 'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'))
WKHTMLTOPDF_CMD = subprocess.Popen(['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')], stdout=subprocess.PIPE).communicate()[0].strip()
return pdfkit.configuration(wkhtmltopdf=WKHTMLTOPDF_CMD)
The reason I have the platform statement in there is that I develop on a windows machine and I have the local wkhtmltopdf binary on my PC. But when I deploy to Heroku, it runs in their linux containers so I need to detect first which platform we're on before running the binary.
3) Then I created two more methods - one to convert a url to pdf and another to convert raw html to pdf.
def make_pdf_from_url(url, options=None):
"""Produces a pdf from a website's url.
url (str): A valid url
options (dict, optional): for specifying pdf parameters like landscape
mode and margins
pdf of the website
return pdfkit.from_url(url, False, configuration=_get_pdfkit_config(), options=options)
def make_pdf_from_raw_html(html, options=None):
"""Produces a pdf from raw html.
html (str): Valid html
options (dict, optional): for specifying pdf parameters like landscape
mode and margins
pdf of the supplied html
return pdfkit.from_string(html, False, configuration=_get_pdfkit_config(), options=options)
I use these methods to convert to PDF.
Just follow these steps to Deploy Django app(pdfkit) on Heroku:
Step 1:: Add following packages in requirements.txt file
Step 2: Add below lines in the to add path of binary file
import os, sys, subprocess, platform
if platform.system() == "Windows":
pdfkit_config = pdfkit.configuration(wkhtmltopdf=os.environ.get('WKHTMLTOPDF_BINARY', 'C:\\Program Files\\wkhtmltopdf\\bin\\wkhtmltopdf.exe'))
os.environ['PATH'] += os.pathsep + os.path.dirname(sys.executable)
WKHTMLTOPDF_CMD = subprocess.Popen(['which', os.environ.get('WKHTMLTOPDF_BINARY', 'wkhtmltopdf')],
pdfkit_config = pdfkit.configuration(wkhtmltopdf=WKHTMLTOPDF_CMD)
Step 3: And then pass pdfkit_config as argument as below
pdf = pdfkit.from_string(html,False,options, configuration=pdfkit_config)

Creating a Spark RDD from a file located in Google Drive using Python on Colab.Research.Google

I have been successful in running Python 3 / Spark 2.2.1 program in Google's Colab.Research platform :
!apt-get update
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q
!tar xf spark-2.2.1-bin-hadoop2.7.tgz
!pip install -q findspark
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.2.1-bin-hadoop2.7"
import findspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()
this works perfectly when I uploaded text files from my local computer to the Unix VM using
from google.colab import files
datafile = files.upload()
and read them as follows :
textRDD ='hobbit.txt').rdd
so far so good ..
My problem starts when I am trying to read a file that is lying in my Google drive colab directory.
Following instructions I have authenticated user and created a drive service
from google.colab import auth
from googleapiclient.discovery import build
drive_service = build('drive', 'v3')
after which I have been able to access the file lying in the drive as follows :
file_id = '1RELUMtExjMTSfoWF765Hr8JwNCSL7AgH'
import io
from googleapiclient.http import MediaIoBaseDownload
request = drive_service.files().get_media(fileId=file_id)
downloaded = io.BytesIO()
downloader = MediaIoBaseDownload(downloaded, request)
done = False
while done is False:
# _ is a placeholder for a progress object that we ignore.
# (Our file is small, so we skip reporting progress.)
_, done = downloader.next_chunk()
print('Downloaded file contents are: {}'.format(
Downloaded file contents are: b'The king beneath the mountain\r\nThe king of ......
even this works perfectly ..
and gets the data
The king beneath the mountain
The king of carven stone
The lord of silver fountain ...
where things FINALLY GO WRONG is where I try to grab this data and put it into a spark RDD
tRDD ='utf-8'))
and I get the error ..
AnalysisException: 'Path does not exist: file:/content/The king beneath the mountain\ ....
Evidently, I am not using the correct method / parameters to read the file into spark. I have tried quite a few of the methods described
I would be very grateful if someone can help me figure out how to read this file for subsequent processing.
A complete solution to this problem is available in another StackOverflow question that is available at this URL.
Here is the notebook where this solution is demonstrated.
I have tested it and it works!
It seems that expects a file name. But you give it the file content instead. You can try either of these:
save it to a file then give the name
use just downloaded instead of'utf-8')
You can also simplify downloading from Google Drive with pydrive. I gave an example here.
Downloading is just
fid = drive.ListFile({'q':"title='hobbit.txt'"}).GetList()[0]['id']
f = drive.CreateFile({'id': fid})

Is there any way to extract a rar file on cpanel

I have a website script, it 212MB and it's in RAR format , I could not upload it via filezilla ftp , it gave me a timeout error after sometime, I could not upload it from the filemanager of cpanel as it also kept showing an error. Then I used a php script to upload it directly from the link but now I can not extract it as its RAR not ZIP. I converted the RAR into ZIP and have it on drop box and google drive but there is no direct link which I can use to upload via the php script, SO, Is there any way to extract the rar file from cpanel or using a php script or some other tweak. I have been working on it for 2 hours now and can not find a way around.
create a php file and extra the .rar with that php file. use the following code
$archive = RarArchive::open('archive.rar');
$entries = $archive->getEntries();
foreach ($entries as $entry) {

opkg-cl update 2 download error

Am trying to update using opkg-cl. Getting the following errors. Does anyone know how I go about troubleshooting this?
[root#wrap /root]$ /etc/opkg/
Downloading /Packages.gz.
Downloading file:///mnt/usb/packages/Packages.gz.
Updated list of available packages in /var/lib/opkg/lists/all-remote-shoppertrak.
Updated list of available packages in /var/lib/opkg/lists/all-remote-base.
Collected errors:
* opkg_download: Failed to download /Packages.gz: URL using bad/illegal format or missing URL.
* copy_file: ///mnt/usb/packages/Packages.gz: No such file or directory.
* file_copy: Failed to copy file ///mnt/usb/packages/Packages.gz to /tmp/opkg-8FAiHb/update-iCH5Eo/all-local.gz.
[root#wrap /root]$ ls /mnt/usb/
[root#wrap /root]$
[root#wrap /root]$
Could you provide more information on your configuration?
Such as:
opkg version
content of your opkg feeds config file /etc/opkg/*.conf)
At first glance it looks like you have a local feed configured at file:///mnt/usb/packages/, which is lacking a Package.gz file.
