Transform a "multiple line" - function into a "one line" - function - python-3.x

I try to transform a function that consists of multiple lines into a function that only consists of one line.
The multiple-line function looks like this:
text = “Here is a tiny example.”
def add_text_to_list(text):
new_list = []
split_text = text.splitlines() #split words in text and change type from “str” to “list”
for line in split_text:
cleared_line = line.strip() #each line of split_text is getting stripped
if cleared_line:
new_list.append(cleared_line)
return new_list
I 100% understand how this function works and what it does, yet I have trouble implementing this into a valid “oneliner”. I also know that I need to come up with a list comprehension. What I'm trying to do is this (in chronological order):
1. split words of text with text.splitlines()
2. strip lines of text.splitlines with line.strip()
3. return modified text after both of these steps
The best I came up with:
def one_line_version(text):
return [line.strip() for line in text.splitlines()] #step 1 is missing
I appreciate any kind of help.
Edit: Thanks #Tenfrow!

You forgot about if in the list comprehension
def add_text_to_list(text):
return [line.strip() for line in text.splitlines() if line.strip()]

Related

Problem with reading text then put the text to the list and sort them in the proper way

Open the file romeo.txt and read it line by line. For each line, split the line into a list of words using the split() method. The program should build a list of words. For each word on each line check to see if the word is already in the list and if not append it to the list. When the program completes, sort and print the resulting words in alphabetical order.
This is the question my problem is I cannot write a proper code and gathering true data, always my code gives me 4 different lists for each raw!
** This is my code**
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line=line.rstrip()
line =line.split()
if line in last:
print(true)
else:
lst.append(line)
print(lst)
*** the text is here, please copy and paste in text editor***
But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief
You are not checking the presence of individual words in the list, but rather the presence of the entire list of words in that line.
With some modifications, you can achieve what you are trying to do this way:
fname = input("Enter file name: ")
fh = open(fname)
lst = list()
for line in fh:
line = line.rstrip()
words = line.split()
for word in words:
if word not in lst:
lst.append(word)
print(lst)
However, a few things I would like to point out looking at your code:
Why are you using rstrip() instead of strip()?
It is better to use list = [] as opposed to your lst = list(). It is shorter, faster, more Pythonic and avoids the use of this confusing lst variable.
You should want to remove punctuation marks attached to words, eg: ,.: which do not get removed by split()
If you want a loop body to not do anything, use pass. Why are you printing true? Also, in Python, it's True and not true.

How to split strings from .txt file into a list, sorted from A-Z without duplicates?

For instance, the .txt file includes 2 lines, separated by commas:
John, George, Tom
Mark, James, Tom,
Output should be:
[George, James, John, Mark, Tom]
The following will create the list and store each item as a string.
def test(path):
filename = path
with open(filename) as f:
f = f.read()
f_list = f.split('\n')
for i in f_list:
if i == '':
f_list.remove(i)
res1 = []
for i in f_list:
res1.append(i.split(', '))
res2 = []
for i in res1:
res2 += i
res3 = [i.strip(',') for i in res2]
for i in res3:
if res3.count(i) != 1:
res3.remove(i)
res3.sort()
return res3
print(test('location/of/file.txt'))
Output:
['George', 'James', 'John', 'Mark', 'Tom']
Your file opening is fine, although the 'r' is redundant since that's the default. You claim it's not, but it is. Read the documentation.
You have not described what task is so I have no idea what's going on there. I will assume that it is correct.
Rather than populating a list and doing a membership test on every iteration - which is O(n^2) in time - can you think of a different data structure that guarantees uniqueness? Google will be your friend here. Once you discover this data structure, you will not have to perform membership checks at all. You seem to be struggling with this concept; the answer is a set.
The input data format is not rigorously defined. Separators may be commas or commas with trailing spaces, and may appear (or not) at the end of the line. Consider making an appropriate regular expression and using its splitting feature to split individual lines, though normal splitting and stripping may be easier to start.
In the following example code, I've:
ignored task since you've said that that's fine;
separated actual parsing of file content from parsing of in-memory content to demonstrate the function without a file;
used a set comprehension to store unique results of all split lines; and
used a generator to sorted that drops empty strings.
from io import StringIO
from typing import TextIO, List
def parse(f: TextIO) -> List[str]:
words = {
word.strip()
for line in f
for word in line.split(',')
}
return sorted(
word for word in words if word != ''
)
def parse_file(filename: str) -> List[str]:
with open(filename) as f:
return parse(f)
def test():
f = StringIO('John, George , Tom\nMark, James, Tom, ')
words = parse(f)
assert words == [
'George', 'James', 'John', 'Mark', 'Tom',
]
f = StringIO(' Han Solo, Boba Fet \n')
words = parse(f)
assert words == [
'Boba Fet', 'Han Solo',
]
if __name__ == '__main__':
test()
I came up with a very simple solution if anyone will need:
lines = x.read().split()
lines.sort()
new_list = []
[new_list.append(word) for word in lines if word not in new_list]
return new_list
with open("text.txt", "r") as fl:
list_ = set()
for line in fl.readlines():
line = line.strip("\n")
line = line.split(",")
[list_.add(_) for _ in line if _ != '']
print(list_)
I think that you missed a comma after Jim in the first line.
You can avoid the use of a loop by using split property :
content=file.read()
my_list=content.split(",")
to delete the occurence in your list you can transform it to set :
my_list=list(set(my_list))
then you can sort it using sorted
so the finale code :
with open("file.txt", "r") as file :
content=file.read()
my_list=content.replace("\n","").replace(" ", "").split(",")
result=sorted(list(set(my_list)))
you can add a key to your sort function

Python: Reading line with 'readline()' function and appending to a list

My code:
In my file i have these numbers in a list
charge_account = ['4654145', '9658115', '5658845', '5658045', '6181531', '2134874', '5964554']
I am reading the file with a function, appending it to a list and then returning the list:
import os
os.system('cls')
def fileReader():
contentList = []
with open('charge_accounts.txt','r') as f:
line = f.readline().rstrip('\n')
while line !="":
line = f.readline().rstrip(' \n')
contentList.append(line)
# print(contentList)
# print(len(contentList))
#contentList = contentList[:-1]
print(contentList)
return contentList
Now my question is, when i read all the file content and append them to my list, i am getting an extra blank string at the end of the list.
output:
['4654145', '9658115', '5658845', '5658045', '6181531', '2134874', '5964554', '']
Now i have solved it by using slicing (as i commented them out) but i still have not figured out why i am getting the ' ' in the end of the list. i tried filtering it out but noting happens. i have checked if it there is an extra line in the end of the file but what am i doing wrong ?
There are a couple of things. You are reading the file line by line in the while loop. This means that after the last line is read, the while condition is still true so you read an extra line (which is empty) but still added to your list.
But you don't need a while loop: use lines = f.readlines(). It will read the whole file in a list, and you almost have the list you are aiming for. Almost, because you need to strip each element:
def fileReader():
with open('charge_accounts.txt','r') as f:
lines = f.readlines()
return [line.strip() for line in lines]
print(fileReader())
while line !="":
contentList.append(line)
line = f.readline().rstrip(' \n')
print(contentList)
I realized i had to append the while loop primer into the list which i read before the loop started. content.append(line) had to be the first statement in the while loop. This solves the blank entry in the end of list, which in hindsight i realize means that i skipped the first readline value.

python: How to read a file and store each line using map function?

I'm trying to reconvert a program that I wrote but getting rid of all for loops.
The original code reads a file with thousands of lines that are structured like:
Ex. 2 lines of a file:
As you can see, the first line starts with LPPD;LEMD and the second line starts with DAAE;LFML. I'm only interested in the very first and second element of each line.
The original code I wrote is:
# Libraries
import sys
from collections import Counter
import collections
from itertools import chain
from collections import defaultdict
import time
# START
# #time=0
start = time.time()
# Defining default program argument
if len(sys.argv)==1:
fileName = "file.txt"
else:
fileName = sys.argv[1]
takeOffAirport = []
landingAirport = []
# Reading file
lines = 0 # Counter for file lines
try:
with open(fileName) as file:
for line in file:
words = line.split(';')
# Relevant data, item1 and item2 from each file line
origin = words[0]
destination = words[1]
# Populating lists
landingAirport.append(destination)
takeOffAirport.append(origin)
lines += 1
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
airports_dict = defaultdict(list)
# Merge lists into a dictionary key:value
for key, value in chain(Counter(takeOffAirport).items(),
Counter(landingAirport).items()):
# 'AIRPOT_NAME':[num_takeOffs, num_landings]
airports_dict[key].append(value)
# Sum key values and add it as another value
for key, value in airports_dict.items():
#'AIRPOT_NAME':[num_totalMovements, num_takeOffs, num_landings]
airports_dict[key] = [sum(value),value]
# Sort dictionary by the top 10 total movements
airports_dict = sorted(airports_dict.items(),
key=lambda kv:kv[1], reverse=True)[:10]
airports_dict = collections.OrderedDict(airports_dict)
# Print results
print("\nAIRPORT"+ "\t\t#TOTAL_MOVEMENTS"+ "\t#TAKEOFFS"+ "\t#LANDINGS")
for k in airports_dict:
print(k,"\t\t", airports_dict[k][0],
"\t\t\t", airports_dict[k][1][1],
"\t\t", airports_dict[k][1][0])
# #time=1
end = time.time()- start
print("\nAlgorithm execution time: %0.5f" % end)
print("Total number of lines read in the file: %u\n" % lines)
airports_dict.clear
takeOffAirport.clear
landingAirport.clear
My goal is to simplify the program using map, reduce and filter. So far I have sorted teh creation of the two independent lists, one for each first element of each file line and another list with the second element of each file line by using:
# Creates two independent lists with the first and second element from each line
takeOff_Airport = list(map(lambda sub: (sub[0].split(';')[0]), lines))
landing_Airport = list(map(lambda sub: (sub[0].split(';')[1]), lines))
I was hoping to find the way to open the file and achieve the exact same result as the original code by been able to opemn the file thru a map() function, so I could pass each list to the above defined maps; takeOff_Airport and landing_Airport.
So if we have a file as such
line 1
line 2
line 3
line 4
and we do like this
open(file_name).read().split('\n')
we get this
['line 1', 'line 2', 'line 3', 'line 4', '']
Is this what you wanted?
Edit 1
I feel this is somewhat reduntant but since map applies a function to each element of an iterator we will have to have our file name in a list, and we ofcourse define our function
def open_read(file_name):
return open(file_name).read().split('\n')
print(list(map(open_read, ['test.txt'])))
This gets us
>>> [['line 1', 'line 2', 'line 3', 'line 4', '']]
So first off, calling split('\n') on each line is silly; the line is guaranteed to have at most one newline, at the end, and nothing after it, so you'd end up with a bunch of ['all of line', ''] lists. To avoid the empty string, just strip the newline. This won't leave each line wrapped in a list, but frankly, I can't imagine why you'd want a list of one-element lists containing a single string each.
So I'm just going to demonstrate using map+strip to get rid of the newlines, using operator.methodcaller to perform the strip on each line:
from operator import methodcaller
def readFile(fileName):
try:
with open(fileName) as file:
return list(map(methodcaller('strip', '\n'), file))
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
Sadly, since your file is context managed (a good thing, just inconvenient here), you do have to listify the result; map is lazy, and if you didn't listify before the return, the with statement would close the file, and pulling data from the map object would die with an exception.
To get around that, you can implement it as a trivial generator function, so the generator context keeps the file open until the generator is exhausted (or explicitly closed, or garbage collected):
def readFile(fileName):
try:
with open(fileName) as file:
yield from map(methodcaller('strip', '\n'), file)
except IOError:
print ("\n\033[0;31mIoError: could not open the file:\033[00m %s" %fileName)
yield from will introduce a tiny amount of overhead over directly iterating the map, but not much, and now you don't have to slurp the whole file if you don't want to; the caller can just iterate the result and get a split line on each iteration without pulling the whole file into memory. It does have the slight weakness that opening the file will be done lazily, so you won't see the exception (if there is any) until you begin iterating. This can be worked around, but it's not worth the trouble if you don't really need it.
I'd generally recommend the latter implementation as it gives the caller flexibility. If they want a list anyway, they just wrap the call in list and get the list result (with a tiny amount of overhead). If they don't, they can begin processing faster, and have much lower memory demands.
Mind you, this whole function is fairly odd; replacing IOErrors with prints and (implicitly) returning None is hostile to API consumers (they now have to check return values, and can't actually tell what went wrong). In real code, I'd probably just skip the function and insert:
with open(fileName) as file:
for line in map(methodcaller('strip', '\n'), file)):
# do stuff with line (with newline pre-stripped)
inline in the caller; maybe define split_by_newline = methodcaller('split', '\n') globally to use a friendlier name. It's not that much code, and I can't imagine that this specific behavior is needed in that many independent parts of your file, and inlining it removes the concerns about when the file is opened and closed.

How to loop through a text file and find the matching keywords in Python3

I am working on a project to define a search function in Python3. Goal is to output the keywords from a list and the sentence(s) from adele.txt that contain(s) the keywords.
This is a user defined list, userlist=['looking','for','wanna'],
adele.txt is on the github page, https://github.com/liuyu82910/search
Below is my function. The first loop is to get all the lines in lowercase from adele.txt, second loop to get the each word in lowercase in userlist. My code is not looping correctly. What I want is to loop all the lines in the text and compare with all the words from the list. What did I do wrong?
def search(list):
with open('F:/adele.txt','r') as file:
for line in file:
newline=line.lower()
for word in list:
neword=word.lower()
if neword in newline:
return neword,'->',newline
else:
return False
This is my current result, it stops looping, I only got one result:
Out[122]:
('looking', '->', 'looking for some education\n')
Desired output would be:
'looking', '->', 'looking for some education'
... #there are so many sentences that contain looking
'looking',->'i ain't mr. right but if you're looking for fast love'
...
'for', -> 'looking for some education'
...#there are so many sentences that contain for
'wanna',->'i don't even wanna waste your time'
...
Here:
if neword in newline:
return neword,'->',newline
else:
return False
You are returning (either a tuple or False) on the very first iteration. return means "exit the function here and now".
The simple solution is to store all matches in a list (or dict etc) and return that:
# s/list/targets/
def search(targets):
# let's not do the same thing
# over and over and over again
targets = [word.lower() for word in targets]
results = []
# s/file/source/
with open('F:/adele.txt','r') as source:
for line in source:
line = line.strip().lower()
for word in targets:
if word in line:
results.append((word, line))
# ok done
return results

Resources