Removing \n from a list of strings - python-3.x

Using this code...
def read_restaurants(file):
file = open('restaurants_small.txt', 'r')
contents_list = file.readlines()
for line in contents_list:
line.strip('\n')
print (contents_list)
file.close()
read_restaurants('restaurants_small.txt')
I get this result...
['Georgie Porgie\n', '87%\n', '$$$\n', 'Canadian,Pub Food\n', '\n', 'Queen St. Cafe\n', '82%\n', '$\n', 'Malaysian,Thai\n', '\n', 'Dumplings R Us\n', '71%\n', '$\n', 'Chinese\n', '\n', 'Mexican Grill\n', '85%\n', '$$\n', 'Mexican\n', '\n', 'Deep Fried Everything\n', '52%\n', '$\n', 'Pub Food\n']
I want to strip out the \n...I've read through a lot of answers on here that I thought might help, but nothing seems to work specifically with this!
I guess the for...in process needs to be stored as a new list, and I need to return that...just not sure how to do it!

A bit more of a pythonic (and, to my mind, easier to read) approach:
def read_restaurants(filename):
with open(filename) as fh:
return [line.rstrip() for line in fh]
Also, since no one has quite clarified this: the reason your original approach doesn't work is that line.strip() returns a modified version of line, but it doesn't alter line:
>>> line = 'hello there\n'
>>> print(repr(line))
'hello there\n'
>>> line.strip()
'hello there'
>>> print(repr(line))
'hello there\n']
So whenever you call stringVar.strip(), you need to do something with the output - build a list, like above, or store it in a variable, or something like that.

You can replace your regular for loop with list comprehension and you don't have to pass '\n' as an argument since strip() method removes leading and trailing white characters by default:
contents_list = [line.strip() for line in contents_list]

You are right: you will need a new list. Also, probably you want to use rstrip() instead of strip():
def read_restaurants(file_name):
file = open(file_name, 'r')
contents_list = file.readlines()
file.close()
new_contents_list = [line.rstrip('\n') for line in contents_list]
return new_contents_list
Then you can do the following:
print(read_restaurants('restaurant.list'))

Related

Python isalpha giving wrong results

with open("text.txt") as f:
for line in f:
line.isalpha()
False
File has only one line and contents are:
"abc"
I think this is because there is a space after the "abc" content
As far as I know file lines are usually terminated by newline character \n which is the answer why isalpha() returns false.
As the others pointed out, it must be for some other characters in the file; likely either "\n" for line termination, or some others.
In brief, you want to remove those characters. Try:
line.strip().isalpha()
Full explanation below.
Load data:
with open("text.txt") as f:
for line in f:
line.isalpha()
The output of line is:
>>> line
'abc\n'
And of course the result of isalpha() is false:
>>> print(line.isalpha())
False
However, removing the \n you obtain the correct result:
>>> line.strip()
'abc'
>>> line.strip.isalpha()
True
(To troubleshoot this, you may want to just output the line in the interpreter, without print statements, otherwise you won't see special characters as '\n')

How to modify and print list items in python?

I am a beginner in python, working on a small logic, i have a text file with html links in it, line by line. I have to read each line of the file, and print the individual links with same prefix and suffix,
so that the model looks like this.
<item>LINK1</item>
<item>LINK2</item>
<item>LINK3</item>
and so on.
I have tried this code, but something is wrong in my approach,
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
In the output, the suffix was not as expected, as i am a beginner, can anyone sort this out for me?
<item>www.google.com
</item>
<item>www.bing.com
</item>
I think when you use .readLine you also put the end of line character into i.
If i understand you correctly and you want to print
item www.google.com item
Then try
https://www.journaldev.com/23625/python-trim-string-rstrip-lstrip-strip
print(str("") + i.strip() + str(""))
When you use the readlines() method, it also includes the newline character from your file ("\n") before parsing the next line.
You could use a method called .strip() which strips off spaces or newline characters from the beginning and end of each line which would correctly format your code.
def file_read(fname):
with open(fname) as f:
#Content_list is the list that contains the read lines.
content_list = f.readlines()
for i in content_list:
print(str("<item>") + i.strip() + str("</item>"))
file_read(r"C:\Users\mandy\Desktop\gd.txt")
I assume you wanted to print in the following way
www.google.com
When you use readlines it gives extra '\n' at end of each line. to avoid that you can strip the string and in printing you can use fstrings.
with open(fname) as f:
lin=f.readlines()
for i in lin:
print(f"<item>{i.strip()}<item>")
Another method:
with open('stacksource') as f:
lin=f.read().splitlines()
for i in lin:
print(f"<item>{i}<item>")
Here splitlines() splits the lines and gives a list

Transform a "multiple line" - function into a "one line" - function

I try to transform a function that consists of multiple lines into a function that only consists of one line.
The multiple-line function looks like this:
text = “Here is a tiny example.”
def add_text_to_list(text):
new_list = []
split_text = text.splitlines() #split words in text and change type from “str” to “list”
for line in split_text:
cleared_line = line.strip() #each line of split_text is getting stripped
if cleared_line:
new_list.append(cleared_line)
return new_list
I 100% understand how this function works and what it does, yet I have trouble implementing this into a valid “oneliner”. I also know that I need to come up with a list comprehension. What I'm trying to do is this (in chronological order):
1. split words of text with text.splitlines()
2. strip lines of text.splitlines with line.strip()
3. return modified text after both of these steps
The best I came up with:
def one_line_version(text):
return [line.strip() for line in text.splitlines()] #step 1 is missing
I appreciate any kind of help.
Edit: Thanks #Tenfrow!
You forgot about if in the list comprehension
def add_text_to_list(text):
return [line.strip() for line in text.splitlines() if line.strip()]

print complete line which includes a search word. the eol is a dot not a line feed

I have a long text (winter's tale). Now I want search for the word 'Luzifer' and than the complete line, which includes the word 'Luzifer' should be printed. With complete line I means all between2 dots.
My scrip is printing 'Luzifer' and all following words til end of line dot. But I want have the full line.
For example. the text line is:
'Today Luzifer has a bad day. And he is ill'
My scrip is printing: 'Luzifer has a bad day.'
But I need the complete line inclusive today.
Is there a function or way to rad back ?
Here my script:
#!/usr/bin/python3.6
# coding: utf-8
import re
def suchen(regAusdruck, textdatei):
f = open(textdatei, 'r', encoding='utf-8')
rfctext = f.read()
f.close()
return re.findall(regAusdruck, rfctext)
pattern1 = r'\bLuzifer\b[^.;:!?]{2,}'
print(suchen(pattern1, "tale.txt"))
One of the most straightforward ways of handling this is to read in your entire text (hopefully it is not too big), split on '.', and then return the strings that contain your search word. For good measure, I think it will be useful to replace the newline characters with a space so that you don't have any strings broken into multiple lines.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
return [sentence for sentence in sentences if regAusdruck in sentence]
# Alternatively...
# return list(filter(lambda x: regAusdruck in x, sentences))
print(suchen('Luzifer', "tale.txt"))
If you really need to use a regular expression (which may be the case for more complicated searches) a modification is only needed in the return statement.
def suchen(regAusdruck, textdatei):
with open(textdatei, 'r', encoding='utf-8') as f:
entire_text = f.read()
entire_text = entire_text.replace('\n', ' ') # replace newlines with space
sentences = entire_text.split('.')
# We assume you passed in a compiled regular expression object.
return [sentence for sentence in sentences if regAusdruck.search(sentence)]
# Alternatively...
# return list(filter(regAusdruck.search, sentences))
import re
print(suchen(re.compile(r'\bluzifer\b', flags=re.IGNORECASE), "tale.txt"))

How to remove '#' comments from a string?

The problem:
Implement a Python function called stripComments(code) where code is a parameter that takes a string containing the Python code. The function stripComments() returns the code with all comments removed.
I have:
def stripComments(code):
code = str(code)
for line in code:
comments = [word[1:] for word in code.split() if word[0] == '#']
del(comments)
stripComments(code)
I'm not sure how to specifically tell python to search through each line of the string and when it finds a hashtag, to delete the rest of the line.
Please help. :(
You could achieve this through re.sub function.
import re
def stripComments(code):
code = str(code)
return re.sub(r'(?m)^ *#.*\n?', '', code)
print(stripComments("""#foo bar
bar foo
# buz"""))
(?m) enables the multiline mode. ^ asserts that we are at the start. <space>*# matches the character # at the start with or without preceding spaces. .* matches all the following characters except line breaks. Replacing those matched characters with empty string will give you the string with comment lines deleted.
def remove_comments(filename1, filename2):
""" Remove all comments beginning with # from filename1 and writes
the result to filename2
"""
with open(filename1, 'r') as f:
lines = f.readlines()
with open(filename2, 'w') as f:
for line in lines:
# Keep the Shebang line
if line[0:2] == "#!":
f.writelines(line)
# Also keep existing empty lines
elif not line.strip():
f.writelines(line)
# But remove comments from other lines
else:
line = line.split('#')
stripped_string = line[0].rstrip()
# Write the line only if the comment was after the code.
# Discard lines that only contain comments.
if stripped_string:
f.writelines(stripped_string)
f.writelines('\n')
For my future reference.
def remove_comments(lines: list[str]) -> list[str]:
new_lines = []
for line in lines:
if line.startswith("#"): # Deal with comment as the first character
continue
line = line.split(" #")[0]
if line.strip() != "":
new_lines.append(line)
return new_lines
print(remove_comments("Hello #World!\n\nI have a question # that #".split('\n')))
>>> ['Hello', 'I have a question']
This implementation has benefit of not requiring the re module and being easy to understand. It also removes pre-existing blank lines, which is useful for my use case.

Resources