Extract strings from shapefile attribute using GDAL and Python 3.X - string

I have a shapefile that consists of two fields/attributes, one being integers, the other being strings.
I can extract the integers into Python array by first using the function gdal.RasterizeLayer() to burn the shapefile into a .tiff image as the first band. Then, I use my_raster.GetRasterBand(1).ReadAsArray() to read the integers as an array.
However, I would like to extract the string values from the other field/attribute. I doing the exact same thing but I have already changed the attribute name in the gdal.RasterizeLayer() specification. However, calling GetRasterBand(1).ReadAsArray() only gives me zeros.
Does anyone know whether it is possible to read strings from rasters?
Btw: I'm using the exact same code as here.
Check it out from
Pure Python version -- gdal.RasterizeLayer

Related

weird characters in Pandas dataframe - how to standardize to UTF-8?

I'm using Python + Camelot (OCR library) to read a PDF, clean up, and write to Excel or csv. There are some non-standard dashes that print out a weird character.
Using Camelot means I'm not calling "read_csv". It's coming from the PDF. A value that is supposed to be "1-4" prints out as 1–4.
I fixed this using a regular expression but a colleague mentioned I should standardize to UTF-8. I tried to do that for the header like this:
header = df.iloc[0, 1:].str.encode('utf-8')
but then that value becomes b'1\xe2\x80\x934'.
Any advice? The goal is to simply use standard text.

How can I call convert a piece of SVG to MVG via RMagick?

I want to convert a piece of SVG to MVG. Could I do something similar to this convert msvg:pram.svg pram.mvg using RMagick methods? I don't want to save the output in a file, but I want to have it in a variable in Ruby.
In general, converting an image in one format to another format is as simple as making a copy of the image using a different suffix. https://rmagick.github.io/comtasks.html#convert
Use the to_blob method to get an image as a Ruby string:
https://rmagick.github.io/comtasks.html#blob

Python3 - Reading mixed data from a file and convert the read values to float

I have the following data stored in a file (tn.csv).
The content is as follows (Original file is very large but I simplified it):
file_content.csv
I read the data using the following source code:
python_script.py
It nicely produces the following which is a numpy array of strings.
sample_output.png
Ultimately I want to use the read data as a numpy array of floats in my python script.
All the entries should be float values. Hence, I would like to cast the entries from string to float. As some entries are not numbers but strings of text (they are variables in my script), I could not perform type casting directly. Python throws a ValueError.
Note: I define the (vector) variable nv before I read this matrix. Hence, it is well defined. So, the complexity is that some entries of the numpy arrays are variables.
What I would like to have is the following:
If I had directly typed the file (tn.csv) contents in the form of numpy array in my python script, I would have been able to use it directly without any conversion. But the matrix is very big. So, I store it in an external file. Now, I want to read the file and store it in a numpy array. The end result should be the same as if I had typed it directly in my script. Could someone help me to tackle this issue or provide me relevant links?
This is my first post in stackoverflow. If my question is not clear, please let me know.
I will rewrite it.

Loading .npz with Python 3.5 always crashes

In this simple tutorial written in Python 2.7, they have a line loading the numpy array.
train_data = np.load(open('../musicnet.npz','rb'))
Then, they get the data by calling different keys
X,Y = train_data['2494']
Everything works well in python 2.7
Data type of train_data is numpy.lib.npyio.NpzFile
My problem
However, whenever I try to do the same in Python 3.5, most of the lines work fine, except when it comes to the line of X,Y = train_data['2494'], it just freezes there forever. I would like to use Python 3.5 because my other projects are written in python 3.5.
How to rewrite this line so that it runs with Python 3.5?
Error Message
I finally managed to get the error message in terminal
It freezes there because there's tons of output right after the error message, my jupyter notebook just cannot handle that much information.
Solution
Change the encoding to 'bytes'
train_data = np.load('../musicnet.npz', encoding='bytes')
Then everything works fine.
You first said things crashed, now you say it freezes when trying to access a specific array. numpy has the same syntax in 3.5 compared to 2.7. You shouldn't have to rewrite anything.
np.load does have a couple of parameters that deal with differences between Py2 and Py3. But I'm not sure these are an issue for you.
fix_imports : bool, optional
Only useful when loading Python 2 generated pickled files on Python 3,
which includes npy/npz files containing object arrays. If `fix_imports`
is True, pickle will try to map the old Python 2 names to the new names
used in Python 3.
encoding : str, optional
What encoding to use when reading Python 2 strings. Only useful when
loading Python 2 generated pickled files in Python 3, which includes
npy/npz files containing object arrays. Values other than 'latin1',
'ASCII', and 'bytes' are not allowed, as they can corrupt numerical
data. Default: 'ASCII'
Try
print(list(train_data.keys()))
This should show the array names that were saved to the zip archive. Do they match the names in the Py2 load? Do they include the '2494' name?
A couple of things are unusual about:
X,Y = train_data['2494']
Naming an array in the zip archive by a string number, and unpacking the load into two variables.
Do you know anything about how this was savez? What was saved?
Another question - are you loading this file from the same machine that Py2 worked on? Or has the file been transferred from another machine, and possibly corrupted?
As those parameters indicate, there are differences in the pickle code between Py2 and Py3. If the original save included object dtype arrays, or non-array objects, then they will be pickled and there might be incompatibilities in the pickle versions.
Try this,
with np.load('../musicnet.npz') as train_data:
X,Y = train_data['2494']
There are 2 ways out in my point of view:
re-edit your code
train_data = np.load(open('../musicnet.npz','rb'))
to
train_data = np.load(open('../musicnet.npz','r'))
Because the mode of r/rb in python2.7 / 3.5 is a difference in your situation.
Using the default debugger to pointing the significant error. (Usually, work on my experience)

How to filter a CSV file without Pandas? (Best Substitute for Pandas in Pythonista)

I am trying to do some data analysis on Pythonista 3 (iOS app for python), however because of the C libraries of pandas it does not compile in the iOS device.
Is there any substitute for Pandas?
Would numpy be an option for data of type string?
The data set I have at the moment is the history of messages between my friends and I.
The whole history is in one csv file. Each row has the columns 'day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'
The goal of the analysis is to produce a report of our chat for the past year.
I want be able to count number of messages each friend sent. I want to be able to plot a histogram of the hours in which the messages where sent by each friend.
Then, I want to do some word counting individually and as a group.
In Pandas I know how to do that. For example:
df = read_csv("messages.csv")
number_of_messages_friend1 = len(df[df.author_of_message == 'friend1']
How can I filter a csv file without Pandas?
Since Pythonista does have numpy, you will want to look at recarrays, which are numpy's approach to this type of problem. The following worked out of the box in Pythonista for me:
import numpy as np
df=np.recfromcsv('messages.csv')
len(df[df.author_of_message==b'friend1'])
Depending on your data format, tou may find that recsfromcsv "just works", since it tries to guess data types, or you might need to customize things a bit. See genfromtext for a number of options, such as explictly specifying data types or for using converters for converting string dates to datetime objects. recsfromcsv is just a convienece wrapper around genfromtext
https://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#
Once in recarray, many of the simple indexing operations work the same as in pandas. Note you may need to do string compares using b-prefixed strings (bytes objects), unless you convert to unicode strings, as shown above.
Use the csv module from the standard library to read the messages.
You could store it into a list of collections.namedtuple for easy access.
import csv
messages = []
with open('messages.csv') as csvfile:
reader = csv.DictReader(csvfile, fieldnames=('day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'))
for row in reader:
messages.append(row)
That gives you all the messages as a list of dictionaries.
Alternatively you could use a normal csv reader combined with a collections.namedtuple to make a list of named tuples, which are slightly easier to access.
import csv
from collections import namedtuple
Msg = namedtuple('Msg', ('day_of_the_week', 'date', 'time_of_message', 'author_of_message', 'message_body'))
messages = []
with open('messages.csv') as csvfile:
msgreader = csv.reader(csvfile)
for row in msgreader:
messages.append(Msg(*row))
Pythonista now has competition on iOS. The pyto app provides python 3.8 with pandas. https://apps.apple.com/us/app/pyto-python-3-8

Resources