Converting lists of digits stored as strings into integers Python 2.7 - string

Among other things, my project requires the retrieval of distance information from file, converting the data into integers, then adding them to a 128 x 128 matrix.
I am at an impasse while reading the data from line.
I retrieve it with:
distances = []
with open(filename, 'r') as f:
for line in f:
if line[0].isdigit():
distances.extend(line.splitlines())`
This produces a list of strings.
while
int(distances) #does not work
int(distances[0]) # produces the correct integer when called through console
However, the spaces foobar the procedure later on.
An example of list:
['966']['966', '1513' 2410'] # the distance list increases with each additional city. The first item is actually the distance of the second city from the first. The second item is the distance of the third city from the first two.
int(distances[0]) #returns 966 in console. A happy integer for the matrix. However:
int(distances[1]) # returns:
Traceback (most recent call last):
File "", line 1, in
ValueError: invalid literal for int() with base 10: '1513 2410'
I have a slight preference for more pythonic solutions, like list comprehension and the like, but in reality- any and all help is greatly appreciated.
Thank you for your time.

All the information you get from a file is a string at first. You have to parse the information and convert it to different types and formats in your program.
int(distances) does not work because, as you have observed, distances is a list of strings. You cannot convert an entire list to an integer. (What would be the correct answer?)
int(distances[0]) works because you are converting only the first string to an integer, and the string represents an integer so the conversion works.
int(distances[1]) doesn't work because, for some reason, there is no comma between the 2nd and 3rd element of your list, so it is implicitly concatenated to the string 1513 2410. This cannot be converted to an integer because it has a space.
There are a few different solutions that might work for you, but here are a couple of obvious ones for your use case:
distance.extend([int(elem) for elem in line.split()])
This will only work if you are certain every element of the list returned by line.split() can undergo this conversion. You can also do the whole distance list later all at once:
distance = [int(d) for d in distance]
or
distance = map(int, distance)
You should try a few solutions out and implement the one you feel gives you the best combination of working correctly and readability.

My guess is you want to split on all whitespace, rather than newlines. If the file's not large, just read it all in:
distances = map(int, open('file').read().split())
If some of the values aren't numeric:
distances = (int(word) for word in open('file').read().split() if word.isdigit())
If the file is very large, use a generator to avoid reading it all at once:
import itertools
with open('file') as dists:
distances = itertools.chain.from_iterable((int(word) for word in line.split()) for line in dists)

Related

Python - how to recursively search a variable substring in texts that are elements of a list

let me explain better what I mean in the title.
Examples of strings where to search (i.e. strings of variable lengths
each one is an element of a list; very large in reality):
STRINGS = ['sftrkpilotndkpilotllptptpyrh', 'ffftapilotdfmmmbtyrtdll', 'gftttepncvjspwqbbqbthpilotou', 'htfrpilotrtubbbfelnxcdcz']
The substring to find, which I know is for sure:
contained in each element of STRINGS
is also contained in a SOURCE string
is of a certain fixed LENGTH (5 characters in this example).
SOURCE = ['gfrtewwxadasvpbepilotzxxndffc']
I am trying to write a Python3 program that finds this hidden word of 5 characters that is in SOURCE and at what position(s) it occurs in each element of STRINGS.
I am also trying to store the results in an array or a dictionary (I do not know what is more convenient at the moment).
Moreover, I need to perform other searches of the same type but with different LENGTH values, so this value should be provided by a variable in order to be of more general use.
I know that the first point has been already solved in previous posts, but
never (as far as I know) together with the second point, which is the part of the code I could not be able to deal with successfully (I do not post my code because I know it is just too far from being fixable).
Any help from this great community is highly appreciated.
-- Maurizio
You can iterate over the source string and for each sub-string use the re module to find the positions within each of the other strings. Then if at least one occurrence was found for each of the strings, yield the result:
import re
def find(source, strings, length):
for i in range(len(source) - length):
sub = source[i:i+length]
positions = {}
for s in strings:
# positions[s] = [m.start() for m in re.finditer(re.escape(sub), s)]
positions[s] = [i for i in range(len(s)) if s.startswith(sub, i)] # Using built-in functions.
if not positions[s]:
break
else:
yield sub, positions
And the generator can be used as illustrated in the following example:
import pprint
pprint.pprint(dict(find(
source='gfrtewwxadasvpbepilotzxxndffc',
strings=['sftrkpilotndkpilotllptptpyrh',
'ffftapilotdfmmmbtyrtdll',
'gftttepncvjspwqbbqbthpilotou',
'htfrpilotrtubbbfelnxcdcz'],
length=5
)))
which produces the following output:
{'pilot': {'ffftapilotdfmmmbtyrtdll': [5],
'gftttepncvjspwqbbqbthpilotou': [21],
'htfrpilotrtubbbfelnxcdcz': [4],
'sftrkpilotndkpilotllptptpyrh': [5, 13]}}

Doubts about string

So, I'm doing an exercise using python, and I tried to use the terminal to do step by step to understand what's happening but I didn't.
I want to understand mainly why the conditional return just the index 0.
Looking 'casino' in [Casinoville].lower() isn't the same thing?
Exercise:
Takes a list of documents (each document is a string) and a keyword.
Returns list of the index values into the original list for all documents containing the keyword.
Exercise solution
def word_search(documents, keyword):
indices = []
for i, doc in enumerate(documents):
tokens = doc.split()
normalized = [token.rstrip('.,').lower() for token in tokens]
if keyword.lower() in normalized:
indices.append(i)
return indices
My solution
def word_search(documents, keyword):
return [i for i, word in enumerate(doc_list) if keyword.lower() in word.rstrip('.,').lower()]
Run
>>> doc_list = ["The Learn Python Challenge Casino.", "They bought a car", "Casinoville"]
Expected output
>>> word_search(doc_list, 'casino')
>>> [0]
Actual output
>>> word_search(doc_list, 'casino')
>>> [0, 2]
Let's try to understand the difference.
The "result" function can be written with list-comprehension:
def word_search(documents, keyword):
return [i for i, word in enumerate(documents)
if keyword.lower() in
[token.rstrip('.,').lower() for token in word.split()]]
The problem happens with the string : "Casinoville" at index 2.
See the output:
print([token.rstrip('.,').lower() for token in doc_list[2].split()])
# ['casinoville']
And here is the matter: you try to ckeck if a word is in the list. The answer is True only if all the string matches (this is the expected output).
However, in your solution, you only check if a word contains a substring. In this case, the condition in is on the string itself and not the list.
See it:
# On the list :
print('casino' in [token.rstrip('.,').lower() for token in doc_list[2].split()])
# False
# On the string:
print('casino' in [token.rstrip('.,').lower() for token in doc_list[2].split()][0])
# True
As result, in the first case, "Casinoville" isn't included while it is in the second one.
Hope that helps !
The question is "Returns list of the index values into the original list for all documents containing the keyword".
you need to consider word only.
In "Casinoville" case, word "casino" is not in, since this case only have word "Casinoville".
When you use the in operator, the result depends on the type of object on the right hand side. When it's a list (or most other kinds of containers), you get an exact membership test. So 'casino' in ['casino'] is True, but 'casino' in ['casinoville'] is False because the strings are not equal.
When the right hand side of is is a string though, it does something different. Rather than looking for an exact match against a single character (which is what strings contain if you think of them as sequences), it does a substring match. So 'casino' in 'casinoville' is True, as would be casino in 'montecasino' or 'casino' in 'foocasinobar' (it's not just prefixes that are checked).
For your problem, you want exact matches to whole words only. The reference solution uses str.split to separate words (the with no argument it splits on any kind of whitespace). It then cleans up the words a bit (stripping off punctuation marks), then does an in match against the list of strings.
Your code never splits the strings you are passed. So when you do an in test, you're doing a substring match on the whole document, and you'll get false positives when you match part of a larger word.

Data Being Read as Strings instead of Floats

A Pytorch program, which I don't fully understand, produced an output and wrote it into weight.txt. I'm trying to do some further calculations based on this output.
I'd like the output to be interpreted as a list of length 3, each entry of which is a list of floats of length 240.
I use this to load in the data
w=open("weight.txt","r")
weight=[]
for number in w:
weight.append(number)
print(len(weight)) yields 3. So far so good.
But then print(len(weight[0])) yields 6141. That's bad!
On closer inspection, it's because weight[0] is being read character-by-character instead of number-by-number. So for example, print(weight[0][0]) yields - instead of -1.327657848596572876e-01. These numbers are separated by single spaces, which are also being read as characters.
How do I fix this?
Thank you
Edit: I tried making a repair function:
def repair(S):
numbers=[]
num=''
for i in range(len(S)):
if S[i]!=' ':
num+=S[i]
elif S[i]==' ':
num=float(num)
numbers.append(num)
num=''
elif i==len(S)-1:
num+=S[i]
num=float(num)
numbers.append(num)
return numbers
Unfortunately, print(repair('123 456')) returns [123.0] instead of the desired [123.0 456.0].
You haven't told us what your input file looks like, so it's hard to give an exact answer. But, assuming it looks like this:
123 312.8 12
2.5 12.7 32
the following program:
w=open("weight.txt","r")
weight=[]
for line in w:
for n in line.split():
weight.append(float(n))
print weight
will print:
[123.0, 312.8, 12.0, 2.5, 12.7, 32.0]
which is closer to what you're looking for, I presume?
The crux of the issue here is that for number in w in your program simply goes through each line: You have to have another loop to split that line into its constituents and then convert appropriately.

How to convert numpy bytes to float in python3?

My question is similar to this; I tried using genfromtxt but still, it doesn't work. Reads the file as expected but not as floats. Code and File excerpt below
temp = np.genfromtxt('PFRP_12.csv', names=True, skip_header=1, comments="#", delimiter=",", dtype=None)
reads as (b'"0"', b'"0.2241135"', b'"0"', b'"0.01245075"', b'"0"', b'"0"')
"1 _ 1",,,,,
"Time","Force","Stroke","Stress","Strain","Disp."
#"sec","N","mm","MPa","%","mm"
"0","0.2241135","0","0.01245075","0","0"
"0.1","0.2304713","0.0016","0.01280396","0.001066667","0.0016"
"0.2","1.707077","0.004675","0.09483761","0.003116667","0.004675"
I tried with different dtypes (none, str, float, byte), still no success. Thanks!
Edit: As Evert mentioned I tried float also but reads all them as none (nan, nan, nan, nan, nan, nan)
Another solution is to use the converters argument:
np.genfromtxt('inp.txt', names=True, skip_header=1, comments="#",
delimiter=",", dtype=None,
converters=dict((i, lambda s: float(s.decode().strip('"'))) for i in range(6)))
(you'll need to specify a converter for each column).
Side remark Oddly enough, while dtype="U12" or similar should actually produce strings instead of bytes (avoiding the .decode() part), this doesn't seem to work, and results in empty entries.
Here is a fancy, unreadable, functional programming style way of converting your input to the record array you're looking for:
>>> np.core.records.fromarrays(np.asarray([float(y.decode().strip('"')) for x in temp for y in x]).reshape(-1, temp.shape[0]), names=temp.dtype.names, formats=['f'] * len(temp.dtype.names))
or spread out across a few lines:
>>> np.core.records.fromarrays(
... np.asarray(
... [float(y.decode().strip('"')) for x in temp for y in x]
... ).reshape(-1, temp.shape[0]),
... names=temp.dtype.names,
... formats=['f'] * len(temp.dtype.names))
I wouldn't recommend this solution, but sometimes it's fun to hack something like this together.
The issue with your data is a bit more complicated than it may seem.
That is because the numbers in your CSV files really are not numbers: they are explicitly strings, as they have surrounding double quotes.
So, there are 3 steps involved in the conversion to float:
- decode the bytes to Python 3 (unicode) string
- remove (strip) the double quotes from each end of each string
- convert the remaining string to float
This happens inside the double list comprehension, on line 3. It's a double list comprehension, since a rec-array is essentially 2D.
The resulting list, however is 1D. I turn it back into a numpy array (np.asarray) so I can easily reshape to something 2D. That (now plain float) array is then given to np.core.records.fromarrays, with the names taken from the original rec-array, and the formats set for each field to float.

Python: How can read as float numbers a series of strings from a text file?

I'm trying to read from a text file line by line a series of strings like these:
11,52.15384615384615,52.84615384615385,14.0,45.15384615384615,39.76923076923077
10,27.09090909090909,54.81818181818182,64.36363636363636,65.54545454545455,21.90909090909091
(The first number is an integer used as index), and what I would like to get are float numbers such as
11, 52.15, 52.85, 14.00, 45.15, 39.77
10, 27.09, 54.82, 64.36, 65,54, 21.91
How can I convert these strings to a list of numbers?
Sounds like you are trying to get a list of floats together from a text file. You can use a dict to map the index you mention to the list of floats. Then just open the file, read line by line, use split(',') to split the string into a list of strings. Then grab the first integer as you index, use list slice to look the rest of the strings and convert/round them and add them to a new list which you can later assign to your index.
It's easier to read the code probably than it is to explain it.
my_float_dict = dict()
with open('my_float_strings.txt','r') as f:
for line in f:
string_list = line.split(',')
index = int(string_list[0])
line_float_list = []
for field in string_list[1:]:
line_float_list.append(round(float(field),2))
my_float_dict[index] = line_float_list
print my_float_dict
From your example I think this is what you are looking for.
Below the string s is converted to a float then rounded to 2 decimal points
>>>s='51.843256'
>>>round(float(s), 2)
51.84

Resources