Python isalpha giving wrong results - python-3.x

with open("text.txt") as f:
for line in f:
line.isalpha()
False
File has only one line and contents are:
"abc"

I think this is because there is a space after the "abc" content

As far as I know file lines are usually terminated by newline character \n which is the answer why isalpha() returns false.

As the others pointed out, it must be for some other characters in the file; likely either "\n" for line termination, or some others.
In brief, you want to remove those characters. Try:
line.strip().isalpha()
Full explanation below.
Load data:
with open("text.txt") as f:
for line in f:
line.isalpha()
The output of line is:
>>> line
'abc\n'
And of course the result of isalpha() is false:
>>> print(line.isalpha())
False
However, removing the \n you obtain the correct result:
>>> line.strip()
'abc'
>>> line.strip.isalpha()
True
(To troubleshoot this, you may want to just output the line in the interpreter, without print statements, otherwise you won't see special characters as '\n')

Related

How to modify and print list items in python?

I am a beginner in python, working on a small logic, i have a text file with html links in it, line by line. I have to read each line of the file, and print the individual links with same prefix and suffix,
so that the model looks like this.
<item>LINK1</item>
<item>LINK2</item>
<item>LINK3</item>
and so on.
I have tried this code, but something is wrong in my approach,
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
In the output, the suffix was not as expected, as i am a beginner, can anyone sort this out for me?
<item>www.google.com
</item>
<item>www.bing.com
</item>
I think when you use .readLine you also put the end of line character into i.
If i understand you correctly and you want to print
item www.google.com item
Then try
https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip
print(str("") + i.strip() + str(""))
When you use the readlines() method, it also includes the newline character from your file ("\n") before parsing the next line.
You could use a method called .strip() which strips off spaces or newline characters from the beginning and end of each line which would correctly format your code.
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i.strip() + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
I assume you wanted to print in the following way
www.google.com
When you use readlines it gives extra '\n' at end of each line. to avoid that you can strip the string and in printing you can use fstrings.
with open(fname) as f:
lin=f.readlines()
for i in lin:
print(f"<item>{i.strip()}<item>")
Another method:
with open('stacksource') as f:
lin=f.read().splitlines()
for i in lin:
print(f"<item>{i}<item>")
Here splitlines() splits the lines and gives a list

file reading in python usnig different methods

# open file in read mode
f=open(text_file,'r')
# iterate over the file object
for line in f.read():
print(line)
# close the file
f.close()
the content of file is "Congratulations you have successfully opened the file"! when i try to run this code the output comes in following form:
c (newline) o (newline) n (newline) g.................
...... that is each character is printed individually on a new line because i used read()! but with readline it gives the answer in a single line! why is it so?
r.read() returns one string will all characters (the full file content).
Iterating a string iterates it character wise.
Use
for line in f: # no read()
instead to iterate line wise.
f.read() returns the whole file in a string. for i in iterates something. For a string, it iterates over its characters.
For readline(), it should not print the line. It would read the first line of the file, then print it character by character, like read. Is it possible that you used readlines(), which returns the lines as a list.
One more thing: there is with which takes a "closable" object and auto-closes it at the end of scope. And you can iterate over a file object. So, your code can be improved like this:
with open(text_file, 'r') as f:
for i in f:
print(i)

Removing \n from a list of strings

Using this code...
def read_restaurants(file):
file = open('restaurants_small.txt', 'r')
contents_list = file.readlines()
for line in contents_list:
line.strip('\n')
print (contents_list)
file.close()
read_restaurants('restaurants_small.txt')
I get this result...
['Georgie Porgie\n', '87%\n', '$$$\n', 'Canadian,Pub Food\n', '\n', 'Queen St. Cafe\n', '82%\n', '$\n', 'Malaysian,Thai\n', '\n', 'Dumplings R Us\n', '71%\n', '$\n', 'Chinese\n', '\n', 'Mexican Grill\n', '85%\n', '$$\n', 'Mexican\n', '\n', 'Deep Fried Everything\n', '52%\n', '$\n', 'Pub Food\n']
I want to strip out the \n...I've read through a lot of answers on here that I thought might help, but nothing seems to work specifically with this!
I guess the for...in process needs to be stored as a new list, and I need to return that...just not sure how to do it!
A bit more of a pythonic (and, to my mind, easier to read) approach:
def read_restaurants(filename):
with open(filename) as fh:
return [line.rstrip() for line in fh]
Also, since no one has quite clarified this: the reason your original approach doesn't work is that line.strip() returns a modified version of line, but it doesn't alter line:
>>> line = 'hello there\n'
>>> print(repr(line))
'hello there\n'
>>> line.strip()
'hello there'
>>> print(repr(line))
'hello there\n']
So whenever you call stringVar.strip(), you need to do something with the output - build a list, like above, or store it in a variable, or something like that.
You can replace your regular for loop with list comprehension and you don't have to pass '\n' as an argument since strip() method removes leading and trailing white characters by default:
contents_list = [line.strip() for line in contents_list]
You are right: you will need a new list. Also, probably you want to use rstrip() instead of strip():
def read_restaurants(file_name):
file = open(file_name, 'r')
contents_list = file.readlines()
file.close()
new_contents_list = [line.rstrip('\n') for line in contents_list]
return new_contents_list
Then you can do the following:
print(read_restaurants('restaurant.list'))

Python code to read first 14 characters, uniquefy based on them, and parse duplicates

I have a list of more than 10k os string that look like different versions of this (HN5ML6A02FL4UI_3 [14 numbers or letters_1-6]), where some are duplicates except for the _1 to _6.
I am trying to find a way to list these and remove the duplicate 14 character (that comes before the _1-_6).
Example of part of the list:
HN5ML6A02FL4UI_3
HN5ML6A02FL4UI_1
HN5ML6A01BDVDN_6
HN5ML6A01BDVDN_1
HN5ML6A02GVTSV_3
HN5ML6A01CUDA2_1
HN5ML6A01CUDA2_5
HN5ML6A02JPGQ9_5
HN5ML6A02JI8VU_1
HN5ML6A01AJOJU_5
I have tried versions of scripts using Reg Expressions: var n = /\d+/.exec(info)[0]; into the following that were posted into my previous question. and
I also used a modified version of the code from : How can I strip the first 14 characters in an list element using python?
More recently I used this script and I am still not getting the correct output.
import os, re
def trunclist('rhodopsins_play', 'hope4'):
with open('rhodopsins_play','r') as f:
newlist=[]
trunclist=[]
for line in f:
if line.strip().split('_')[0] not in trunclist:
newlist.append(line)
trunclist.append(line.split('_')[0])
print newlist, trunclist
# write newlist to file, with carriage returns
with open('hope4','w') as out:
for line in newlist:
out.write(line)
My inputfile.txt contains more than 10k of data which looks like the list above, where the important part are the characters are in front of the '_' (underscore), then outputting a file of the uniquified ABCD12356_1.
Can someone help?
Thank you for your help
Import python and run this script that is similar to the above. It is slitting at the '_' This worked on the file
def trunclist(inputfile, outputfile):
with open(inputfile,'r') as f:
newlist=[]
trunclist=[]
for line in f:
if line.strip().split('_')[0] not in trunclist:
newlist.append(line)
trunclist.append(line.split('_')[0])
print newlist, trunclist
# write newlist to file, with carriage returns
with open(outputfile,'w') as out:
for line in newlist:
out.write(line)

How do you delete \n from a line?

I am reading lines from a file and putting them into a list. However, when I read the lines, they read in with a newline (\n). I have tried to remove it with str.strip(), str.rstrip(), str.strip("\n"), str.rstrip("\n"), str.strip("\\n"), and str.rstrip("\\n"), but
none of them have done what I want them to.
Here is the code.
lines=[]
with open(v) as x:
for line in x:
if "\n" in line:
lines.append(line)
for line in lines:
line.strip()
if '\n' in line:
print "I'm a stupid computer."
print lines
This yields precisely this output.
I'm a stupid computer.
I'm a stupid computer.
I'm a stupid computer.
I'm a stupid computer.
I'm a stupid computer.
I'm a stupid computer.
I'm a stupid computer.
I'm a stupid computer.
I'm a stupid computer.
I'm a stupid computer.
['6\n', '1 2\n', '2 3\n', '3 1\n', '10 11\n', '100 10\n', '11 100\n', '1 100\n', '2 3\n', '3 2\n']
I'm not sure what I'm missing.
line.strip() creates a copy of the line without the leading/trailing whitespace. You are not doing anything with the copy, you need to assign it back to the line. You want:
line = line.strip()
You could also just use:
with open(v) as fin:
lines = [line.strip() for line in fin.readlines()]
You probably don't want want to only add the lines that contain a newline. Maybe what you do want is to omit those lines that don't contain anything else:
with open(v) as fin:
lines = [line.strip() for line in fin.readlines() if line.strip()]
String objects are immutable in Python. line.strip() doesn't change line; it returns a stripped copy. Use line = line.strip() instead (or better yet for your example, just append the stripped version to the list in the first place:
if "\n" in line:
lines.append(line.strip())
You need to assign the output of strip() back to the variable:
line = line.strip()
\n is one character, you can slice it.
line = line[0:len(line)-1]
or per #Henry's comment,
line[:-1]

Resources