unicode and str are the same in python3?(jupyter notebook) - python-3.x

according to official document, all str in python3 are in unicode, and actually there in no 'unicode' type in python3.
But there is a strange thing happened when I run jupyter notebook
time = re.findall(r'(\d+/\d+/\d+)', rating_bar.find('span', class_='rating-qualifier').text)[0].split('/')
di['date'] = '/'.join([str.zfill(t,2) for t in time[:2]] + time[2:] )
where rating_bar is a Beautiful Soup node, and jupyter notebook give this error
<ipython-input-9-a5ac4904b840> in parse_page(html)
25 class_='rating-qualifier').text)[
26 0].split('/')
---> 27 di['date'] = '/'.join([str.zfill(t,2) for t in time[:2]] + time[2:] )
28 di['rating'] = float(
29 rating_bar.find('div', class_='i-stars')['title'].split()[0])
TypeError: descriptor 'zfill' requires a 'str' object but received a 'unicode'
it's weird because there is no 'unicode' type in python3. And actually this code run correctly in my terminal.

Related

Issues using tkdraw.basic on jupyter VCS

So i tried to execute a code using the the tkdraw.basic library in jupyter notebook, and this is what i got :
Its just a simple program that is supposed to draw a line along the window :
import tkdraw.basic as graph
def ligne_horiz(y,larg):
for x in range(larg):
graph.plot(y,x)
graph.open_win(120,160)
ligne_horiz(50,160)
graph.wait()
hers the error :
AssertionError Traceback (most recent call last)
Cell In [12], line 7
4 for x in range(larg):
5 graph.plot(y,x)
----> 7 graph.open_win(120,160)
8 ligne_horiz(50,160)
9 graph.wait()
File /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/tkdraw/basic.py:63, in open_win(height, width, zoom)
59 # pylint: disable=global-statement
60 # I really want to use a global in this module, to make those functions
61 # easier to use.
62 global _WINDOW
---> 63 assert not _WINDOW, "ERROR: function open() was called twice!"
64 _WINDOW = tkd.Screen((height, width), zoom, grid=False)
AssertionError: ERROR: function open() was called twice!
The code work perfectly on the terminal or in an another .py folder so i think the issue come from the Extension,
I tried to find any similar issue or an update in the library but nothing wrong apparently.(maybe a parameter in the extension it self ?)

Python error upon exif data extraction via Pillow module: invalid continuation byte

I am writing a piece of code to extract exif data from images using Python. I downloaded the Pillow module using pip3 and am using some code I found online:
from PIL import Image
from PIL.ExifTags import TAGS
imagename = "path to file"
image = Image.open(imagename)
exifdata = image.getexif()
for tagid in exifdata:
tagname = TAGS.get(tagid, tagid)
data = exifdata.get(tagid)
if isinstance(data, bytes):
data = data.decode()
print(f"{tagname:25}: {data}")
On some images this code works. However, for images I took on my Olympus camera I get the following error:
GPSInfo : 734
Traceback (most recent call last):
File "_pathname redacted_", line 14, in <module>
data = data.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf0 in position 30: invalid continuation byte
When I remove the data = data.decode() part, I get the following:
GPSInfo : 734
PrintImageMatching : b"PrintIM\x000300\x00\x00%\x00\x01\x00\x14\x00\x14\x00\x02\x00\x01\x00\x00\x00\x03\x00\xf0\x00\x00\x00\x07\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\t\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x0b\x008\x01\x00\x00\x0c\x00\x00\x00\x00\x00\r\x00\x00\x00\x00\x00\x0e\x00P\x01\x00\x00\x10\x00`\x01\x00\x00 \x00\xb4\x01\x00\x00\x00\x01\x03\x00\x00\x00\x01\x01\xff\x00\x00\x00\x02\x01\x83\x00\x00\x00\x03\x01\x83\x00\x00\x00\x04\x01\x83\x00\x00\x00\x05\x01\x83\x00\x00\x00\x06\x01\x83\x00\x00\x00\x07\x01\x80\x80\x80\x00\x10\x01\x83\x00\x00\x00\x00\x02\x00\x00\x00\x00\x07\x02\x00\x00\x00\x00\x08\x02\x00\x00\x00\x00\t\x02\x00\x00\x00\x00\n\x02\x00\x00\x00\x00\x0b\x02\xf8\x01\x00\x00\r\x02\x00\x00\x00\x00 \x02\xd6\x01\x00\x00\x00\x03\x03\x00\x00\x00\x01\x03\xff\x00\x00\x00\x02\x03\x83\x00\x00\x00\x03\x03\x83\x00\x00\x00\x06\x03\x83\x00\x00\x00\x10\x03\x83\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\t\x11\x00\x00\x10'\x00\x00\x0b\x0f\x00\x00\x10'\x00\x00\x97\x05\x00\x00\x10'\x00\x00\xb0\x08\x00\x00\x10'\x00\x00\x01\x1c\x00\x00\x10'\x00\x00^\x02\x00\x00\x10'\x00\x00\x8b\x00\x00\x00\x10'\x00\x00\xcb\x03\x00\x00\x10'\x00\x00\xe5\x1b\x00\x00\x10'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x05\x05\x00\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00##\x80\x80\xc0\xc0\xff\xff\x05\x05\x05\x00\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
ResolutionUnit : 2
ExifOffset : 230
ImageDescription : OLYMPUS DIGITAL CAMERA
Make : OLYMPUS CORPORATION
Model : E-M10MarkII
Software : Version 1.2
Orientation : 1
DateTime : 2020:02:13 15:02:57
YCbCrPositioning : 2
YResolution : 350.0
Copyright :
XResolution : 350.0
Artist :
How should I fix this problem? Should I use a different Python module?
I did some digging and figured out the answer to the problem I posted about. I originally postulated that the rest of the metadata was in the byte data:
b"PrintIM\x000300\x00\x00%\x00\x01\x00\x14\x00\x14\x00\x02\x00\x01\x00\x00\x00\x03\x00\xf0\x00\x00\x00\x07\x00\x00\x00\x00\x00\x08\x00\x00\x00\x00\x00\t\x00\x00\x00\x00\x00\n\x00\x00\x00\x00\x00\x0b\x008\x01\x00\x00\x0c\x00\x00\x00\x00\x00\r\x00\x00\x00\x00\x00\x0e\x00P\x01\x00\x00\x10\x00`\x01\x00\x00 \x00\xb4\x01\x00\x00\x00\x01\x03\x00\x00\x00\x01\x01\xff\x00\x00\x00\x02\x01\x83\x00\x00\x00\x03\x01\x83\x00\x00\x00\x04\x01\x83\x00\x00\x00\x05\x01\x83\x00\x00\x00\x06\x01\x83\x00\x00\x00\x07\x01\x80\x80\x80\x00\x10\x01\x83\x00\x00\x00\x00\x02\x00\x00\x00\x00\x07\x02\x00\x00\x00\x00\x08\x02\x00\x00\x00\x00\t\x02\x00\x00\x00\x00\n\x02\x00\x00\x00\x00\x0b\x02\xf8\x01\x00\x00\r\x02\x00\x00\x00\x00 \x02\xd6\x01\x00\x00\x00\x03\x03\x00\x00\x00\x01\x03\xff\x00\x00\x00\x02\x03\x83\x00\x00\x00\x03\x03\x83\x00\x00\x00\x06\x03\x83\x00\x00\x00\x10\x03\x83\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\t\x11\x00\x00\x10'\x00\x00\x0b\x0f\x00\x00\x10'\x00\x00\x97\x05\x00\x00\x10'\x00\x00\xb0\x08\x00\x00\x10'\x00\x00\x01\x1c\x00\x00\x10'\x00\x00^\x02\x00\x00\x10'\x00\x00\x8b\x00\x00\x00\x10'\x00\x00\xcb\x03\x00\x00\x10'\x00\x00\xe5\x1b\x00\x00\x10'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x05\x05\x05\x00\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00##\x80\x80\xc0\xc0\xff\xff\x05\x05\x05\x00\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00##\x80\x80\xc0\xc0\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"
That assumption wasn't correct. Although the above is metadata, it simply isn't the metadata I am looking for (in my case the FocalLength attribute). Rather it appears to be Olympus specific metadata. The answer to my solution was to find all the metadata. I found a piece of code that worked very well in Stack Overflow: In Python, how do I read the exif data for an image?.
I used the following code by Nicolas Gervais:
import os,sys
from PIL import Image
from PIL.ExifTags import TAGS
for (k,v) in Image.open(sys.argv[1])._getexif().items():
print('%s = %s' % (TAGS.get(k), v))
I replaced sys.argv[1] with the path name to the image file.
Alternate Solution
As MattDMo mentioned, there are also specific libraries for reading EXIF data in Python. One that I found that look promising is ExifRead which can be download by typing the following in the terminal:
pip install ExifRead

sqlite3.OperationalError('near "(": syntax error') in Google Colab

Observing some odd behavior with SQLite 2.6, where the ROW_NUMBER() throws an error only in Google Colab (Python 3.6.9), whereas the code works fine in my local Python 3.6.9 and Python 3.9.1 instances. Can you help me debug this further?
Code
import sqlite3, sys
try:
print('Py.version : ' + (sys.version))
print('sqlite3.version : ' + (sqlite3.version))
print('sqlite3.sqlite_version : ' + (sqlite3.sqlite_version)+'\n')
conn = sqlite3.connect(':memory:')
conn.execute('''CREATE TABLE team_data(team text, total_goals integer);''')
conn.commit()
conn.execute("INSERT INTO team_data VALUES('Real Madrid', 53);")
conn.execute("INSERT INTO team_data VALUES('Barcelona', 47);")
conn.commit()
sql='''
SELECT
team,
ROW_NUMBER () OVER (
ORDER BY total_goals
) RowNum
FROM
team_data
'''
print('### DB Output ###')
cursor = conn.execute(sql)
for row in cursor:
print(row)
except Exception as e:
print('ERROR : ' + str(e))
finally:
conn.close()
Output
Google Colab (ROW_NUMBER() causes SQL to fail):
Py.version : 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
sqlite3.version : 2.6.0
sqlite3.sqlite_version : 3.22.0
### DB Output ###
ERROR : near "(": syntax error
Local Python 3.6.9 (Succeeds):
Py.version : 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 14:00:49) [MSC v.1915 64 bit (AMD64)]
sqlite3.version : 2.6.0
sqlite3.sqlite_version : 3.33.0
### DB Output ###
('Barcelona', 1)
('Real Madrid', 2)
Local Python 3.9.1 (Succeeds):
Py.version : 3.9.1 (default, Dec 11 2020, 09:29:25) [MSC v.1916 64 bit (AMD64)]
sqlite3.version : 2.6.0
sqlite3.sqlite_version : 3.33.0
### DB Output ###
('Barcelona', 1)
('Real Madrid', 2)
Note: Above SQL and code is simplified for error reproduction purposes only
The query in question is a window function and support for that was added in version 3.25. You can check the library (opposed to package) version with sqlite3.sqlite_version or as #forpas shared with the query select sqlite_version().
You can upgrade your sqlite version. Use this code.
!add-apt-repository -y ppa:sergey-dryabzhinsky/packages
!apt update
!apt install sqlite3
# MENU: Runtime > Restart runtime
import sqlite3
sqlite3.sqlite_version # '3.33.0'

Problems Reading Zip of Shapefiles without loading memory

I've been trying to adapt Andrew Gaidus shapefile reading routine for my needs. The Jupyter Notebook I'm using acts like it partitioned the disk of my MacBook Pro so I can't read or write to disk. Gaidus has a good procedure for avoiding using disk, but is written for prior version of Python.
Here is the code:
dls = "https://github.com/ItsMeLarry/Coursera_Capstone/raw/master/tl_2010_25009_tract00%202.zip"
lynntracts = ZipFile(io.BytesIO(urllib.request.urlopen(dls).read()))
print("Done")
filenames = [y for y in sorted(lynntracts.namelist()) for ending in ['dbf', 'prj', 'shp', 'shx'] if y.endswith(ending)]
#For some reason, I get 8, instead of 4, filenames. The first 4 start with __MACOSX. I get rid of those. The problem I
#have with the 'TypeError' occurs no matter which set of 4 files I use.
print(filenames[0], 'Example of the 4 files that I remove in the for loop')
for i in range(0,4):
del filenames[0]
print(filenames)
dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
print(r.numRecords)
Opening with io.BytesIO cured the prior problem of byte/str collision. Now see the TypeError for the ZipFile.read. I get the same error if I use io.BytesIO when calling it. Here is error output followed by error info:
Done
__MACOSX/tl_2010_25009_tract00/._tl_2010_25009_tract00.dbf Example of the 4 files that I remove in the for loop
['tl_2010_25009_tract00/tl_2010_25009_tract00.dbf', 'tl_2010_25009_tract00/tl_2010_25009_tract00.prj', 'tl_2010_25009_tract00/tl_2010_25009_tract00.shp', 'tl_2010_25009_tract00/tl_2010_25009_tract00.shx']
TypeError Traceback (most recent call last)
in ()
12 del filenames[0]
13 print(filenames)
---> 14 dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
15 r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
16 print(r.numRecords)
in (.0)
12 del filenames[0]
13 print(filenames)
---> 14 dbf, prj, shp, shx = [io.StringIO(ZipFile.read(filename)) for filename in filenames]
15 r = shapefile.Reader(shp=shp, shx=shx, dbf=dbf)
16 print(r.numRecords)
TypeError: read() missing 1 required positional argument: 'name'
Clearly, I am a beginner. I've come up empty handed trying to research this. Where do I go? What do I need to understand here? Thanks

What became of ggsave (ggplot 0.6.8) in later versions of ggplot python

In version 0.6.8 I use ggsave() from ggplot in Python.
In version 0.11 such a function does not exist. Which one should I use to replace it?
This is the code that used to work in previous version:
import ggplot as gg
plot_data = gg.ggplot(dat, gg.aes('month', 'average_workers')) + gg.geom_line() + gg.scale_y_continuous(breaks=11) + \
gg.scale_x_discrete(breaks=list_of_years_division, labels=list_of_years) + \
gg.ggtitle('Evolution of average numbers of workers per firm, monthly\nAgents : %s' %
title_pop_val+'% of Population') + gg.xlab('Years') + gg.ylab('Units') + gg.theme_bw()
gg.ggsave(plot_data, os.path.join(parameters.output_data_path, ('temp_general_average_workers%s.png' %
parameters.parameters_names)))
plot_data is a ggplot object.
I have tried:
gg.ggplot.save(plot_data, 'path.jpg', 10, 6, 300)
And the error I get is:
TypeError: object of type 'int' has no len()
I ran into the same issue. It looks like the solution is to call 'save()' on the ggplot object:
plot_data.save(filename='path.png')

Resources