How to write scraped data into a csv file via python? - python-3.x

I downloaded historical stock data via the following code.
url = "https://query1.finance.yahoo.com/v7/finance/download/RELIANCE.BO?period1=1577110559&period2=1608732959&interval=1d&events=history&includeAdjustedClose=true"
r = requests.get(url)
Then I tried to write it in a csv file via this code.
open('ril.csv').write(r.content)
But it gave an error prompt as
TypeError: write() argument must be str, not bytes

Modified code:
url = "https://query1.finance.yahoo.com/v7/finance/download/RELIANCE.BO?period1=1577110559&period2=1608732959&interval=1d&events=history&includeAdjustedClose=true"
r = requests.get(url)
open('ril.csv','wb').write(r.content)
data was downloaded in binary form, so we need to read and write it in binary form too.

Related

How to read the data and the associated field name that is in a filled-in PDF form

I am writing a python script that needs to pull the data filled in a PDF form as part of a larger script. I tried using pyPDF3 but while it can show me the strings in the form, it does not show the filled-in data. I have a form where I have entered the value 'XXX" into a field and I want the script to be able to return that data and the name of the field but I can't seem to read the data. The fillpdfs module is very helpful but AFAICT it can return the field names but not the data.
I have this snippet:
from PyPDF3 import PdfFileWriter, PdfFileReader
# Open the PDF file
pdf_file = open('filename.pdf', 'rb')
pdf_reader = PdfFileReader(pdf_file)
# Extract text data from each page
for page_num in range(pdf_reader.numPages):
page = pdf_reader.getPage(page_num)
'XXX' in page.extractText()
There is a function for pdf forms:
dictionary = pdf_reader.getFormTextFields() # returns a python dictionary
print(dictionary)
Documentation

how do you download an image in the form of bytes?

I mean I don't want the file to be downloaded onto the hdd, just the string has to be returned in form of bytes so that it can later be passed to some other function.
Here is one way:
url = 'https://m.media-amazon.com/images/M/MV5BMTY5MTY3NjgxNF5BMl5BanBnXkFtZTcwMDExMTQyMw##._V1_SX1777_CR0,0,1777,987_AL_.jpg'
import requests
# Return data as a string
output = requests.get(url).text
# Return data as bytes
output = requests.get(url).content
You could also use urlib or urlib2.

How to convert Navigable String to File Object

I am trying to get some data from a website (using the modules named requests & BeautifulSoup) and print it in a text file but every time I try to do so, it says the following:
TypeError: descriptor 'write' requires a 'file' object but received a 'NavigableString'
I have tried using the csv library to import the data but since I couldn't add the line by line data to the csv, I decided to add all the output to a text file and then take out the data I require.
file_object = open("name-list.txt", "w") #Opening the file
name = soup.find(class_='table-responsive') #Extracting the data
name_list = name.find_all('td') #Refining the data
for final in name_list:
all = final.contents[0] #Final result
file.write(all) #This is where the Error Comes
file.close()
When I use print(all) in the for loop, I get the output that I need which consists of multi-line text including the names, age, gender, etc. of the people from the table on the website but when I try to print that output into the text file, the error pops up.

Python encoded string is still in binary format

I am trying some website scraping using urllib3 and beautiful soup. Python 3 encoding/decoding is tripping me up. This is my code
r = http.request('GET', 'https://www.************************.jsf')
if(r.status == 200):
page = r.data.decode('utf-8')
soup = BeautifulSoup(page)
print(soup.prettify())
#This prints - [Decode error - output not utf-8]
# [Decode error - output not utf-8]
print(soup.prettify().encode('utf-8'))
#This prints the data but with binary mark
# b'<!DOCTYPE html PUBLIC "-//W3C//D.......
#..........................................'
As I had done the decoding r.data.decode('utf-8') before calling beautiful soup, why do I need to encode it again and why does it still show b'' marking even after converting it to string
The b'xxx' is the representation of the binary type value (sequence of bytes -- which is natural result of the .encode(). The print() function automatically converts the object to its representation if it is not a string.
Try to write the debug infor into a file. The print function may have problems with output to a console that supports certain charset/encoding.

Custom filetype in Python 3

How to start creating my own filetype in Python ? I have a design in mind but how to pack my data into a file with a specific format ?
For example I would like my fileformat to be a mix of an archive ( like other format such as zip, apk, jar, etc etc, they are basically all archives ) with some room for packed files, plus a section of the file containing settings and serialized data that will not be accessed by an archive-manager application.
My requirement for this is about doing all this with the default modules for Cpython, without external modules.
I know that this can be long to explain and do, but I can't see how to start this in Python 3.x with Cpython.
Try this:
from zipfile import ZipFile
import json
data = json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}])
with ZipFile('foo.filetype', 'w') as myzip:
myzip.writestr('digest.json', data)
The file is now a zip archive with a json file (thats easy to read in again in many lannguages) for data you can add files to the archive with myzip write or writestr. You can read data back with:
with ZipFile('foo.filetype', 'r') as myzip:
json_data_read = myzip.read('digest.json')
newdata = json.loads(json_data_read)
Edit: you can append arbitrary data to the file with:
f = open('foo.filetype', 'a')
f.write(data)
f.close()
this works for winrar but python can no longer process the zipfile.
Use this:
import base64
import gzip
import ast
def save(data):
data = "[{}]".format(data).encode()
data = base64.b64encode(data)
return gzip.compress(data)
def load(data):
data = gzip.decompress(data)
data = base64.b64decode(data)
return ast.literal_eval(data.decode())[0]
How to use this with file:
open(filename, "wb").write(save(data)) # save data
data = load(open(filename, "rb").read()) # load data
This might look like this is able to be open with archive program
but it cannot because it is base64 encoded and they have to decode it to access it.
Also you can store any type of variable in it!
example:
open(filename, "wb").write(save({"foo": "bar"})) # dict
open(filename, "wb").write(save("foo bar")) # string
open(filename, "wb").write(save(b"foo bar")) # bytes
# there's more you can store!
This may not be appropriate for your question but I think this may help you.
I have a similar problem faced... but end up with some thing like creating a zip file and then renamed the zip file format to my custom file format... But it can be opened with the winRar.

Resources