Skipping over array elements of certain types - string

I have a csv file that gets read into my code where arrays are generated out of each row of the file. I want to ignore all the array elements with letters in them and only worry about changing the elements containing numbers into floats. How can I change code like this:
myValues = []
data = open(text_file,"r")
for line in data.readlines()[1:]:
myValues.append([float(f) for f in line.strip('\n').strip('\r').split(',')])
so that the last line knows to only try converting numbers into floats, and to skip the letters entirely?
Put another way, given this list,
list = ['2','z','y','3','4']
what command should be given so the code knows not to try converting letters into floats?

You could use try: except:
for i in list:
try:
myVal.append(float(i))
except:
pass

Related

Removing \n from a list

so i'm currently learning about mail merging and was issued a challenge on it. The idea is to open a names file, read the name on the current line and then replace it in the letter and save that letter as a new item.
I figured a good idea to do this would be a for loop.
Open file > for loop > append names to list > loop the list and replace ect.
Except when I try to actually append the names to the list, i get this:
['Aang\nZuko\nAppa\nKatara\nSokka\nMomo\nUncle Iroh\nToph']
The code I am using is:
invited_names = []
with open ("./Input/Names/invited_names.txt") as names:
invited_names.append(names.read())
for item in invited_names:
new_names = [str.strip("\n") for str in invited_names]
print(new_names)
Have tried to replace the \n and now .strip but I have not been able to remove the \n. Any ideas?
EDIT: not sure if it helps but the .txt file for the names looks like this:
Aang
Zuko
Appa
Katara
Sokka
Momo
Uncle Iroh
Toph
As you can see, read() only returns a giant string of what you have in your invited_names.txt file. But instead, you can use readlines() which returns a list which contains strings of every line (Thanks to codeflush.dev for the comment). Then use extend() method to add this list to another list invited_names.
Again, you are using for loop and list comprehension at the same time. As a result, you are running the same list comprehension code for many times. So, you can cut off any of them. But I prefer you should keep the list comprehension because it is efficient.
Try this code:
invited_names = []
with open ("./Input/Names/invited_names.txt") as names:
invited_names.extend(names.readlines()) # <--
new_names = [str.strip("\n") for str in invited_names]
print(new_names)

F string is adding new line

I am trying to make a name generator. I am using F string to concatenate the first and the last names. But instead of getting them together, I am getting them in a new line.
print(f"Random Name Generated is:\n{random.choice(firstname_list)}{random.choice(surname_list)}")
This give the output as:
Random Name Generated is:
Yung
heady
Instead of:
Random Name Generated is:
Yung heady
Can someone please explain why so?
The code seems right, perhaps could be of newlines (\n) characters in element of list.
Check the strings of lists.
import random
if __name__ == '__main__':
firstname_list = ["yung1", "yung2", "yung3"]
surname_list = ["heady1", "heady2", "heady3"]
firstname_list = [name.replace('\n', '') for name in firstname_list]
print(f"Random Name Generated is:\n{random.choice(firstname_list)} {random.choice(surname_list)}")
Output:
Random Name Generated is:
yung3 heady2
Since I had pulled these values from UTF-8 encoded .txt file, the readlines() did convert the names to list elements but they had a hidden '\xa0\n' in it.
This caused this particular printing problem. Using .strip() helped to remove the spaces.
print(f"Random Name Generated is:\n{random.choice(firstname_list).strip()} {random.choice(surname_list).strip()}")

How to make tokenize not treat contractions and their counter parts as the same when comparing two text files?

I am currently working on a data structure that is supposed to compare two text files and make a list of the strings they have in common. my program receives the content of the two files as two strings a & b (one file's content per variable). I then use the tokenize function in a for loop to break the string by each sentence. They are then stored into a set to avoid duplicate entries. I remove all duplicate lines within each variable before I compare them. I then compare each the two variables to each other and only keep the string they have in common. I have a bug that occurs in the last part when they are comparing against each other. The program will treat contractions and their proper counter parts as the same when it should not. For Example it will read Should not and Shouldn't as the same and will produce an incorrect answer. I want to make it not read contraction and their counter parts as the same.
import nltk
def sentences(a, b): #the variables store the contents of the files in the form of strings
a_placeholder = a
set_a = set()
a = []
for punctuation_a in nltk.sent_tokenize(a_placeholder):
if punctuation_a not in set_a:
set_a.add(punctuation_a)
a.append(punctuation_a)
b_placeholder = b
set_b = set()
b = []
for punctuation_b in nltk.sent_tokenize(b_placeholder):
if punctuation_b not in set_b:
set_b.add(punctuation_b)
b.append(punctuation_b)
a_new = a
for punctuation in a_new:
if punctuation not in set_b:
set_a.remove(punctuation)
a.remove(punctuation)
else:
pass
return []

Iterate over images with pattern

I have thousands of images which are labeled IMG_####_0 where the first image is IMG_0001_0.png the 22nd is IMG_0022_0.png, the 100th is IMG_0100_0.png etc. I want to perform some tasks by iterating over them.
I used this fnames = ['IMG_{}_0.png'.format(i) for i in range(150)] to iterate over the first 150 images but I get this error FileNotFoundError: [Errno 2] No such file or directory: '/Users/me/images/IMG_0_0.png' which suggests that it is not the correct way to do it. Any ideas about how to capture this pattern while being able to iterate over the specified number of images i.e in my case from IMG_0001_0.png to IMG_0150_0.png
fnames = ['IMG_{0:04d}_0.png'.format(i) for i in range(1,151)]
print(fnames)
for fn in fnames:
try:
with open(fn, "r") as reader:
# do smth here
pass
except ( FileNotFoundError,OSError) as err:
print(err)
Output:
['IMG_0000_0.png', 'IMG_0001_0.png', ..., 'IMG_0148_0.png', 'IMG_0149_0.png']
Dokumentation: string-format()
and format mini specification.
'{:04d}' # format the given parameter with 0 filled to 4 digits as decimal integer
The other way to do it would be to create a normal string and fill it with 0:
print(str(22).zfill(10))
Output:
0000000022
But for your case, format language makes more sense.
You need to use a format pattern to get the format you're looking for. You don't just want the integer converted to a string, you specifically want it to always be a string with four digits, using leading 0's to fill in any empty space. The best way to do this is:
'IMG_{:04d}_0.png'.format(i)
instead of your current format string. The result looks like this:
In [2]: 'IMG_{:04d}_0.png'.format(3)
Out[2]: 'IMG_0003_0.png'
generate list of possible names and try if exist is slow and horrible way to iterate over files.
try look to https://docs.python.org/3/library/glob.html
so something like:
from glob import iglob
filenames = iglob("/path/to/folder/IMG_*_0.png")

I have a single line list and want to covert it to a multi dimensional list

I have a text file that I converted into a list, but I want it to be a multi-dimensional list. Is there a way to do this easily?
This is my code:
crimefile = open(fileName, 'r')
yourResult = [line.split(',') for line in crimefile.readlines()]
Your code does create a 2-dimensional list (assuming your file is multiple lines of numbers where each number is separated by a comma). If you want to print out each individual list in yourResult, try this: for list in yourResult: print (list) To access a certain item in the list, for example the first number on each line, simply replace print (list) with print (list[0])

Resources