(PYTHON) Manipulating certain portions of URL at user's request [closed]

(PYTHON) Manipulating certain portions of URL at user's request [closed] - string

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
The download link I want to manipulate is below:
http://hfrnet.ucsd.edu/thredds/ncss/grid/HFR/USWC/6km/hourly/RTV/HFRADAR,_US_West_Coast,_6km_Resolution,_Hourly_RTV_best.ncd?var=u&var=v&north=47.20&west=-126.3600&east=-123.8055&south=37.2500&horizStride=1&time_start=2015-11-01T00%3A00%3A00Z&time_end=2015-11-03T14%3A00%3A00Z&timeStride=1&addLatLon=true&accept=netcdf
I want to make anything that's in bold a variable, so I can ask the user what coordinates and data set they want. This way I can download different data sets by using this script. I would also like to use the same variables to name the new file that was downloaded ex:USWC6km20151101-20151103.
I did some research and learned that I can use the urllib.parse and urllib2, but when I try experimenting with them, it says "no module named urllib.parse."
I can use the webbrowser.open() to download the file, but manipulating the url is giving me problems
THANK YOU!!

Instead of urllib you can use requests module that makes downloading content much easier. The part that makes actual work is just 4 lines long.
# first install this module
import requests
# parameters to change
location = {
'part': 'USWC',
'part2': '_US_West_Coast',
'km': '6km',
'north': '45.0000',
'west': '-120.0000',
'east': '-119.5000',
'south': '44.5000',
'start': '2016-10-01',
'end': '2016-10-02'
}
# this is template for .format() method to generate links (very naive method)
link_template = "http://hfrnet.ucsd.edu/thredds/ncss/grid/HFR/{part}/{km}/hourly/RTV/\
HFRADAR,{part2},_{km}_Resolution,_Hourly_RTV_best.ncd?var=u&var=v&\
north={north}&west={west}&east={east}&south={south}&horizStride=1&\
time_start={start}T00:00:00Z&time_end={end}T16:00:00Z&timeStride=1&addLatLon=true&accept=netcdf"
# some debug info
link = link_template.format(**location)
file_name = location['part'] + location['km'] + location['start'].replace('-', '') + '-' + location['end'].replace('-', '')
print("Link: ", link)
print("Filename: ", file_name)
# try to open webpage
response = requests.get(link)
if response.ok:
# open file for writing in binary mode
with open(file_name, mode='wb') as file_out:
# write response to file
file_out.write(response.content)
Probably the next step would be running this in loop on list that contains location dicts. Or maybe reading locations from csv file.

Related

Changing .xls files to .xlsx files [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
I'm basically trying to iterate through a bunch of excel .xls files and change them to .xlsx files and I don't really do not know where to go from here. Feel like I'm making a mess with the code.
I'm getting the following error: TypeError: listdir: path should be string, bytes, os.PathLike or None, not list
So I messed a little more with he code and it might be going somewhere. I edited the code below.
file_path = Path.home().joinpath("Desktop", "test")
excel = win32.gencache.EnsureDispatch('Excel.Application')
if __name__ == "__main__":
while True:
the_path = (str(file_path) + str("\\"))
print(the_path)
os.chdir(the_path)
xls_files = os.listdir('.')
print(xls_files)
for downloadedFile in listdir(xls_files):
if downloadedFile.endswith('.xls'):
wb = excel.Workbooks.Open(xls_files)
pyexcel.save_book_as(downloadedFile, FileFormat = 51)
downloadedFile.Close()
downloadedFile.Save()
excel.Application.Quit()
I don't really know if the code I'm writing makes sense at all.
If anyone could help me figure out whether at least I'm in the right track I'd be great.
Thanks for the help!

pyexcel seems to be the right tool for this task.
Install three packages
pip install pyexcel pyexcel-xls pyexcel-xlsx
Run the following script.
from pathlib import Path
import pyexcel as p
# Find your home path by print out `Path.home()`
# Then add additional path that points to the folder that
# contains all the .xls files.
# We are essentially using absolute path here, but change
# it to relative path if necessary.
folder = Path.home().joinpath('Desktop/test')
# Iterate all the files inside the folder.
# Pick the .xls file only and convert it to .xlsx file
# The converted file will have the same name.
# E.g. foo.xls will be converted to foo.xlsx
for file in folder.iterdir():
if '.xls' in file.suffix:
book = p.get_book(file_name=str(file))
book.save_as(file.stem + '.xlsx')
NOTE:
The script has only been tested in macOS, but not Windows. There might be specifics tweaks needed to satisfy the file system in Windows. However, pathlib shall be able to handle Windows' idiosyncrasies.
Tweak the script if special needs arise.

GTFS application (python) [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I need to make a command line application which has two parameters:
the location of the input file, and
the location of the output file.
The input file is a GTFS (.txt) file.
The output file is a .shp file.
How should I do this?

To get command line parameters:
% python3 your_script.py parameter1 parameter2
Where parameter1 and parameter2 are you input file and output file names:
import sys
parameters = sys.argv
#parameters now contains 3 strings:
#"your_script.py" in parameters[0]
#"parameter1" in parameters [1]
#"parameter2" in parameters [2]
So you can use the command line arguments as variables. To open the files:
in_file = open(parameters[1] + ".txt") #from your problem statement, it sounds
#like your filenames don't include extensions. If they do, remove
#the (+ ".txt") part
out_file = open(parameters[2] + ".shp", 'w')
For more information about I/O operations in Python, see this link

Create a new csv file [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I'm using python and am trying to create a new csv file using the csv module (i.e. one that doesn't currently exist). Does anyone know how to do this?
Thanks in advance,
Max

if you want simply create a csv file you can use built in method open, for more about open check this
with open("filename.csv","a+") as f:
f.write(...)
or if you want to read an exist csv file you can use this
import csv
with open('filename.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in
print ', '.join(row)
#if you want to save the file into given path
import os
os.rename("filename.csv","path/to/new/desination/for/filename.csv")
for more check docs

Best tool for text extraction from PDF in Python 3.4 [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am using Python 3.4 and need to extract all the text from a PDF and then use it for text processing.
All the answers I have seen suggest options for Python 2.7.
I need something in Python 3.4.
Bonson

You need to install the PyPDF2 package to be able to work with PDFs in Python. PyPDF2 can extract text/images. The text is returned as a Python string. To install it, run pip install PyPDF2 from the command line. This module name is case-sensitive so make sure to type 'y' in lowercase and all other characters as uppercase.
import PyPDF2
reader = PyPDF2.PdfReader('my_file.pdf')
print(len(reader.pages)) # gives '56'
page = reader.pages[9] #'9' is the page number
page.extract_text()
The last statement returns all the text that is available in page 9 of 'my_file.pdf' document.

pdfminer.six ( https://github.com/pdfminer/pdfminer.six ) has also been recommended elsewhere and is intended to support Python 3. I can't personally vouch for it though, since it failed during installation MacOS. (There's an open issue for that and it seems to be a recent problem, so there might be a quick fix.)

Complementing #Sarah's answer. PDFMiner is a pretty good choice. I have been using it from quite some time, and until now it works pretty good on extracting the text content from a PDF. What I did is to create a function which uses the CLI client from pdfminer, and then it saves the output into a variable (which I can use later on somewhere else). The Python version I am using is 3.6, and the function works pretty good and does the required job, so maybe this can work for you:
def pdf_to_text(filepath):
print('Getting text content for {}...'.format(filepath))
process = subprocess.Popen(['pdf2txt.py', filepath], stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
stdout, stderr = process.communicate()
if process.returncode != 0 or stderr:
raise OSError('Executing the command for {} caused an error:\nCode: {}\nOutput: {}\nError: {}'.format(filepath, process.returncode, stdout, stderr))
return stdout.decode('utf-8')
You will have to import the subprocess module of course: import subprocess

slate3k is good for extracting text. I've tested it with a few PDF files using Python 3.7.3, and it's a lot more accurate than PyPDF2, for instance. It's a fork of slate, which is a wrapper for PDFMiner. Here's the code I am using:
import slate3k as slate
with open('Sample.pdf', 'rb') as f:
doc = slate.PDF(f)
doc
#prints the full document as a list of strings
#each element of the list is a page in the document
doc[0]
#prints the first page of the document
Credit to this comment on GitHub:
https://github.com/mstamy2/PyPDF2/issues/437#issuecomment-400491342

import pdfreader
pdfFileObj = open('/tmp/Test-test-test.pdf','rb')
viewer = SimplePDFViewer(pdfFileObject)
viewer.navigate(1)
viewer.render()
viewer.canvas.strings

urlopen is not working for python3 [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I am trying to fetch data from a url. I have tried the following in Python 2.7:
import urllib2 as ul
response = ul.urlopen("http://in.bookmyshow.com/")
page_content = response.read()
print page_content
This is working fine. But when i try it in Python 3.4 it is throwing an error:
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
I am using:
import urllib.request
response = urllib.request.urlopen('http://in.bookmyshow.com/')
data = response.read()
print data

It works for me (Python 3.4.3). You need to use print(data) in Python 3.
As a side note you may also want to consider requests which makes it way easier to interact via HTTP(S).
>>> import requests
>>> r = requests.get('http://in.bookmyshow.com/')
>>> r.ok
True
>>> plaintext = r.text
Finally, if you want to get data from such complicated pages (which are intended to be displayed, as opposed to an API), you should have a look at Scrapy which will make your life easier as well.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

(PYTHON) Manipulating certain portions of URL at user's request [closed] - string

Related

Changing .xls files to .xlsx files [closed]

GTFS application (python) [closed]

Create a new csv file [closed]

Best tool for text extraction from PDF in Python 3.4 [closed]

urlopen is not working for python3 [closed]

Categories

Resources