How to convert numpy bytes to float in python3? - python-3.x

My question is similar to this; I tried using genfromtxt but still, it doesn't work. Reads the file as expected but not as floats. Code and File excerpt below
temp = np.genfromtxt('PFRP_12.csv', names=True, skip_header=1, comments="#", delimiter=",", dtype=None)
reads as (b'"0"', b'"0.2241135"', b'"0"', b'"0.01245075"', b'"0"', b'"0"')
"1 _ 1",,,,,
"Time","Force","Stroke","Stress","Strain","Disp."
#"sec","N","mm","MPa","%","mm"
"0","0.2241135","0","0.01245075","0","0"
"0.1","0.2304713","0.0016","0.01280396","0.001066667","0.0016"
"0.2","1.707077","0.004675","0.09483761","0.003116667","0.004675"
I tried with different dtypes (none, str, float, byte), still no success. Thanks!
Edit: As Evert mentioned I tried float also but reads all them as none (nan, nan, nan, nan, nan, nan)

Another solution is to use the converters argument:
np.genfromtxt('inp.txt', names=True, skip_header=1, comments="#",
delimiter=",", dtype=None,
converters=dict((i, lambda s: float(s.decode().strip('"'))) for i in range(6)))
(you'll need to specify a converter for each column).
Side remark Oddly enough, while dtype="U12" or similar should actually produce strings instead of bytes (avoiding the .decode() part), this doesn't seem to work, and results in empty entries.

Here is a fancy, unreadable, functional programming style way of converting your input to the record array you're looking for:
>>> np.core.records.fromarrays(np.asarray([float(y.decode().strip('"')) for x in temp for y in x]).reshape(-1, temp.shape[0]), names=temp.dtype.names, formats=['f'] * len(temp.dtype.names))
or spread out across a few lines:
>>> np.core.records.fromarrays(
... np.asarray(
... [float(y.decode().strip('"')) for x in temp for y in x]
... ).reshape(-1, temp.shape[0]),
... names=temp.dtype.names,
... formats=['f'] * len(temp.dtype.names))
I wouldn't recommend this solution, but sometimes it's fun to hack something like this together.
The issue with your data is a bit more complicated than it may seem.
That is because the numbers in your CSV files really are not numbers: they are explicitly strings, as they have surrounding double quotes.
So, there are 3 steps involved in the conversion to float:
- decode the bytes to Python 3 (unicode) string
- remove (strip) the double quotes from each end of each string
- convert the remaining string to float
This happens inside the double list comprehension, on line 3. It's a double list comprehension, since a rec-array is essentially 2D.
The resulting list, however is 1D. I turn it back into a numpy array (np.asarray) so I can easily reshape to something 2D. That (now plain float) array is then given to np.core.records.fromarrays, with the names taken from the original rec-array, and the formats set for each field to float.

Related

Data Being Read as Strings instead of Floats

A Pytorch program, which I don't fully understand, produced an output and wrote it into weight.txt. I'm trying to do some further calculations based on this output.
I'd like the output to be interpreted as a list of length 3, each entry of which is a list of floats of length 240.
I use this to load in the data
w=open("weight.txt","r")
weight=[]
for number in w:
weight.append(number)
print(len(weight)) yields 3. So far so good.
But then print(len(weight[0])) yields 6141. That's bad!
On closer inspection, it's because weight[0] is being read character-by-character instead of number-by-number. So for example, print(weight[0][0]) yields - instead of -1.327657848596572876e-01. These numbers are separated by single spaces, which are also being read as characters.
How do I fix this?
Thank you
Edit: I tried making a repair function:
def repair(S):
numbers=[]
num=''
for i in range(len(S)):
if S[i]!=' ':
num+=S[i]
elif S[i]==' ':
num=float(num)
numbers.append(num)
num=''
elif i==len(S)-1:
num+=S[i]
num=float(num)
numbers.append(num)
return numbers
Unfortunately, print(repair('123 456')) returns [123.0] instead of the desired [123.0 456.0].
You haven't told us what your input file looks like, so it's hard to give an exact answer. But, assuming it looks like this:
123 312.8 12
2.5 12.7 32
the following program:
w=open("weight.txt","r")
weight=[]
for line in w:
for n in line.split():
weight.append(float(n))
print weight
will print:
[123.0, 312.8, 12.0, 2.5, 12.7, 32.0]
which is closer to what you're looking for, I presume?
The crux of the issue here is that for number in w in your program simply goes through each line: You have to have another loop to split that line into its constituents and then convert appropriately.

Python 3: display long integers in pandas dataframe

I'm trying to set the value of a column in a pandas data frame to some big numbers with a simple line:
df['Capital'] = 58143898.13876611
and it shows in df as format 5.814380e+07. I want it as 58143898.
What I've tried:
df['Capital'] = int(58143898.13876611)
Similar question: How to print all digits of a large number in python?, but it's outdated because I learned from NameError: global name 'long' is not defined that long is replaced by int in python 3. But it still shows 5.814380e+07.
Yet if I only print the following line, it does show as 58143898.
In [2] int(58143898.13876611)
Out[2] 58143898
Please help! Thank you so much in advance :)
You are complaining that pandas represents your number
in floating point rather than integer form.
Coldspeed points out that .astype() can
change a column to some desired type.
To better understand what pandas is doing with your input, look at:
df.dtypes
You want the Capital column to have an integer type.
To avoid the need to monkey about with types later,
you may want to take care to convert inputs to int before constructing the dataframe.
Pandas looks at all values in a column
and chooses a type compatible with the entire set of values.
So e.g. a column containing [1, 2.5, 3]
will force all values to float rather than int.
It's worth noting that missing values can have very noticeable effects on this.
You may want something like:
df2 = df1.dropna()
Certain FP bit patterns are reserved for use as NaN,
but pandas regrettably does not reserve maxint,
nor any other integer value,
to model the notion of missing data.
Accordingly an input like [1, None, 3] will be promoted from int to float.

XOR two strings of different length

So I am trying to XOR two strings together but am unsure if I am doing it correctly when the strings are different length.
The method I am using is as follows.
def xor_two_str(a,b):
xored = []
for i in range(max(len(a), len(b))):
xored_value = ord(a[i%len(a)]) ^ ord(b[i%len(b)])
xored.append(hex(xored_value)[2:])
return ''.join(xored)
I get output like so.
abc XOR abc: 000
abc XOR ab: 002
ab XOR abc: 5a
space XOR space: 0
I know something is wrong and I will eventually want to convert the hex value to ascii so am worried the foundation is wrong. Any help would be greatly appreciated.
Your code looks mostly correct (assuming the goal is to reuse the shorter input by cycling back to the beginning), but your output has a minor problem: It's not fixed width per character, so you could get the same output from two pairs characters with a small (< 16) difference as from a single pair of characters with a large difference.
Assuming you're only working with "bytes-like" strings (all inputs have ordinal values below 256), you'll want to pad your hex output to a fixed width of two, with padding zeroes changing:
xored.append(hex(xored_value)[2:])
to:
xored.append('{:02x}'.format(xored_value))
which saves a temporary string (hex + slice makes the longer string then slices off the prefix, when format strings can directly produce the result without the prefix) and zero-pads to a width of two.
There are other improvements possible for more Pythonic/performant code, but that should be enough to make your code produce usable results.
Side-note: When running your original code, xor_two_str('abc', 'ab') and xor_two_str('ab', 'abc') both produced the same output, 002 (Try it online!), which is what you'd expect (since xor-ing is commutative, and you cycle the shorter input, reversing the arguments to any call should produce the same results). Not sure why you think it produced 5a. My fixed code (Try it online!) just makes the outputs 000000, 000002, 000002, and 00; padded properly, but otherwise unchanged from your results.
As far as other improvements to make, manually converting character by character, and manually cycling the shorter input via remainder-and-indexing is a surprisingly costly part of this code, relative to the actual work performed. You can do a few things to reduce this overhead, including:
Convert from str to bytes once, up-front, in bulk (runs in roughly one seventh the time of the fastest character by character conversion)
Determine up front which string is shortest, and use itertools.cycle to extend it as needed, and zip to directly iterate over paired byte values rather than indexing at all
Together, this gets you:
from itertools import cycle
def xor_two_str(a,b):
# Convert to bytes so we iterate by ordinal, determine which is longer
short, long = sorted((a.encode('latin-1'), b.encode('latin-1')), key=len)
xored = []
for x, y in zip(long, cycle(short)):
xored_value = x ^ y
xored.append('{:02x}'.format(xored_value))
return ''.join(xored)
or to make it even more concise/fast, we just make the bytes object without converting to hex (and just for fun, use map+operator.xor to avoid the need for Python level loops entirely, pushing all the work to the C layer in the CPython reference interpreter), then convert to hex str in bulk with the (new in 3.5) bytes.hex method:
from itertools import cycle
from operator import xor
def xor_two_str(a,b):
short, long = sorted((a.encode('latin-1'), b.encode('latin-1')), key=len)
xored = bytes(map(xor, long, cycle(short)))
return xored.hex()

How to convert a String to a float in Python 2.7

I am trying to convert decimal geographic coordinates as strings to a float.
The coordinates are in a csv like this '51213512'. With my Python script I am just reading the coordinates and add the '.'. If I am not adding the comma the rest of my script isn't working.
I already tried a few things but nothing worked for me. This is what I got so far.
latitude=float(long('51.213512'))
The Result is a ValueError:
ValueError: invalid literal for long() with base 10: 'Long'
not too sure why you are using long in this examply if you want to convert this variable to a float just use the float function on its own, you seem to be confusing the long and float functions. dont use both you will be confusing python (basically dosent know what to do because your giving it 2 arguments at once)
I recommend just using the float function on its own. This will avoid confusion
latitude = float('51.2135512')
Get rid of the 'long' and it should work
latitude = float('51.213512')
Edit: Okay, since you're getting the coordinates and manually converting to decimal strings, all you need to do is use the code I said originally. The long function
converts integers or strings of integers to long types, not float types.
>>> long(5)
5L
>>> long('5')
5L
>>> long(5.5)
5L
>>> long('5.5')
ValueError: invalid literal for long() with base 10: '5.5'

Converting lists of digits stored as strings into integers Python 2.7

Among other things, my project requires the retrieval of distance information from file, converting the data into integers, then adding them to a 128 x 128 matrix.
I am at an impasse while reading the data from line.
I retrieve it with:
distances = []
with open(filename, 'r') as f:
for line in f:
if line[0].isdigit():
distances.extend(line.splitlines())`
This produces a list of strings.
while
int(distances) #does not work
int(distances[0]) # produces the correct integer when called through console
However, the spaces foobar the procedure later on.
An example of list:
['966']['966', '1513' 2410'] # the distance list increases with each additional city. The first item is actually the distance of the second city from the first. The second item is the distance of the third city from the first two.
int(distances[0]) #returns 966 in console. A happy integer for the matrix. However:
int(distances[1]) # returns:
Traceback (most recent call last):
File "", line 1, in
ValueError: invalid literal for int() with base 10: '1513 2410'
I have a slight preference for more pythonic solutions, like list comprehension and the like, but in reality- any and all help is greatly appreciated.
Thank you for your time.
All the information you get from a file is a string at first. You have to parse the information and convert it to different types and formats in your program.
int(distances) does not work because, as you have observed, distances is a list of strings. You cannot convert an entire list to an integer. (What would be the correct answer?)
int(distances[0]) works because you are converting only the first string to an integer, and the string represents an integer so the conversion works.
int(distances[1]) doesn't work because, for some reason, there is no comma between the 2nd and 3rd element of your list, so it is implicitly concatenated to the string 1513 2410. This cannot be converted to an integer because it has a space.
There are a few different solutions that might work for you, but here are a couple of obvious ones for your use case:
distance.extend([int(elem) for elem in line.split()])
This will only work if you are certain every element of the list returned by line.split() can undergo this conversion. You can also do the whole distance list later all at once:
distance = [int(d) for d in distance]
or
distance = map(int, distance)
You should try a few solutions out and implement the one you feel gives you the best combination of working correctly and readability.
My guess is you want to split on all whitespace, rather than newlines. If the file's not large, just read it all in:
distances = map(int, open('file').read().split())
If some of the values aren't numeric:
distances = (int(word) for word in open('file').read().split() if word.isdigit())
If the file is very large, use a generator to avoid reading it all at once:
import itertools
with open('file') as dists:
distances = itertools.chain.from_iterable((int(word) for word in line.split()) for line in dists)

Resources