How to separate a string into 2 list of lists - python-3.x

I got this string:
\n
\n
N\tO\tHP\tM\tD\tU\tI\tN\tO\n
E\tS\tA\tE\tI\tT\tL\tN\tI\tN\n
N\tP\tN\tN\tN\tG\tAO\tD\tC\n
\n
\n
PERMANENTE
PETTINE
\n
\n
actually if you looks at original string ,you cannot see the \t and \n ,so I just edited to better understanding.
What is I'm trying to do is separate to 2 different list of lists,for example:
lists1 = [[NOHPMDUINO][ESAEITLNIN][NPNNNGAODC]]
lists2 = [[PERMANENTE][PETTINE]]
I tried to use many methods to solve this, but without success.
at first I removed the new lines at the beginning with .strip('\n') method, and I tried to use replace , but I don't know how to make it right.
Thank you zsomko and snakecharmerb,
Using the method of zsomko and adding strip() to remove the newline at the beginning , here is the loop that I did to divide to 2 variables:
var = True
for line in t:
if line !=['']:
if var:
group1.append(line)
else:
group2.append(line)
else:
var = False
I hope this will help to someone :) If somebody has better solution ,more efficient ,I would like to hear

First eliminate the tabs and split the string into lines:
lines = [line.replace('\t', '') for line in string.splitlines()]
Then the following would yield the list of lists in the variable groups as expected:
groups = []
group = []
for line in lines:
if group and not line:
groups.append(group)
group = []
elif line:
group.append(line)

You can break the string into separate lines using its splitlines method - this will give you a list of lines without their terminating newline ('\n') characters.
Then you can loop over the list and replace the tab characters with empty strings using the str.replace method.
>>> for line in s.splitlines():
... if not line:
... # Skip empty lines
... continue
... cleaned = line.replace('\t', '')
... print(cleaned)
...
NOHPMDUINO
ESAEITLNIN
NPNNNGAODC
PERMANENTE
PETTINE
Grouping the output in lists of lists is a little trickier. The question doesn't mention the criteria for grouping, so let's assume that lines which are not separated by empty lines should be listed together.
We can use a generator to iterate over the string, group adjacent lines and emit them as lists like this:
>>> def g(s):
... out = []
... for line in s.splitlines():
... if not line:
... if out:
... yield out
... out = []
... continue
... cleaned = line.replace('\t', '')
... out.append([cleaned])
... if out:
... yield out
...
>>>
The generator collects lines in a list (out) which it yields each time it finds a blank line and the list is not empty; if the list is yielded it is replaced with an empty list. After looping over the lines in the string it yields the list again, if it isn't empty, in case the string didn't end with blank lines.
Looping over the generator returns the lists of lists in turn.
>>> for x in g(s):print(x)
...
[['NOHPMDUINO'], ['ESAEITLNIN'], ['NPNNNGAODC']]
[['PERMANENTE'], ['PETTINE']]
Alternatively, if you want a list of lists of lists, call list on the generator:
>>> lists = list(g(s))
>>> print(lists)
[[['NOHPMDUINO'], ['ESAEITLNIN'], ['NPNNNGAODC']], [['PERMANENTE'], ['PETTINE']]]
If you want to assign the result to named variables, you can unpack the call to list:
>>> group1, group2 = list(g(s))
>>> group1
[['NOHPMDUINO'], ['ESAEITLNIN'], ['NPNNNGAODC']]
>>> group2
[['PERMANENTE'], ['PETTINE']]
but note to do this you need to know the number of lists that will be generated in advance.

Related

How to split strings from .txt file into a list, sorted from A-Z without duplicates?

For instance, the .txt file includes 2 lines, separated by commas:
John, George, Tom
Mark, James, Tom,
Output should be:
[George, James, John, Mark, Tom]
The following will create the list and store each item as a string.
def test(path):
filename = path
with open(filename) as f:
f = f.read()
f_list = f.split('\n')
for i in f_list:
if i == '':
f_list.remove(i)
res1 = []
for i in f_list:
res1.append(i.split(', '))
res2 = []
for i in res1:
res2 += i
res3 = [i.strip(',') for i in res2]
for i in res3:
if res3.count(i) != 1:
res3.remove(i)
res3.sort()
return res3
print(test('location/of/file.txt'))
Output:
['George', 'James', 'John', 'Mark', 'Tom']
Your file opening is fine, although the 'r' is redundant since that's the default. You claim it's not, but it is. Read the documentation.
You have not described what task is so I have no idea what's going on there. I will assume that it is correct.
Rather than populating a list and doing a membership test on every iteration - which is O(n^2) in time - can you think of a different data structure that guarantees uniqueness? Google will be your friend here. Once you discover this data structure, you will not have to perform membership checks at all. You seem to be struggling with this concept; the answer is a set.
The input data format is not rigorously defined. Separators may be commas or commas with trailing spaces, and may appear (or not) at the end of the line. Consider making an appropriate regular expression and using its splitting feature to split individual lines, though normal splitting and stripping may be easier to start.
In the following example code, I've:
ignored task since you've said that that's fine;
separated actual parsing of file content from parsing of in-memory content to demonstrate the function without a file;
used a set comprehension to store unique results of all split lines; and
used a generator to sorted that drops empty strings.
from io import StringIO
from typing import TextIO, List
def parse(f: TextIO) -> List[str]:
words = {
word.strip()
for line in f
for word in line.split(',')
}
return sorted(
word for word in words if word != ''
)
def parse_file(filename: str) -> List[str]:
with open(filename) as f:
return parse(f)
def test():
f = StringIO('John, George , Tom\nMark, James, Tom, ')
words = parse(f)
assert words == [
'George', 'James', 'John', 'Mark', 'Tom',
]
f = StringIO(' Han Solo, Boba Fet \n')
words = parse(f)
assert words == [
'Boba Fet', 'Han Solo',
]
if __name__ == '__main__':
test()
I came up with a very simple solution if anyone will need:
lines = x.read().split()
lines.sort()
new_list = []
[new_list.append(word) for word in lines if word not in new_list]
return new_list
with open("text.txt", "r") as fl:
list_ = set()
for line in fl.readlines():
line = line.strip("\n")
line = line.split(",")
[list_.add(_) for _ in line if _ != '']
print(list_)
I think that you missed a comma after Jim in the first line.
You can avoid the use of a loop by using split property :
content=file.read()
my_list=content.split(",")
to delete the occurence in your list you can transform it to set :
my_list=list(set(my_list))
then you can sort it using sorted
so the finale code :
with open("file.txt", "r") as file :
content=file.read()
my_list=content.replace("\n","").replace(" ", "").split(",")
result=sorted(list(set(my_list)))
you can add a key to your sort function

How to loop through a text file and find the matching keywords in Python3

I am working on a project to define a search function in Python3. Goal is to output the keywords from a list and the sentence(s) from adele.txt that contain(s) the keywords.
This is a user defined list, userlist=['looking','for','wanna'],
adele.txt is on the github page, https://github.com/liuyu82910/search
Below is my function. The first loop is to get all the lines in lowercase from adele.txt, second loop to get the each word in lowercase in userlist. My code is not looping correctly. What I want is to loop all the lines in the text and compare with all the words from the list. What did I do wrong?
def search(list):
with open('F:/adele.txt','r') as file:
for line in file:
newline=line.lower()
for word in list:
neword=word.lower()
if neword in newline:
return neword,'->',newline
else:
return False
This is my current result, it stops looping, I only got one result:
Out[122]:
('looking', '->', 'looking for some education\n')
Desired output would be:
'looking', '->', 'looking for some education'
... #there are so many sentences that contain looking
'looking',->'i ain't mr. right but if you're looking for fast love'
...
'for', -> 'looking for some education'
...#there are so many sentences that contain for
'wanna',->'i don't even wanna waste your time'
...
Here:
if neword in newline:
return neword,'->',newline
else:
return False
You are returning (either a tuple or False) on the very first iteration. return means "exit the function here and now".
The simple solution is to store all matches in a list (or dict etc) and return that:
# s/list/targets/
def search(targets):
# let's not do the same thing
# over and over and over again
targets = [word.lower() for word in targets]
results = []
# s/file/source/
with open('F:/adele.txt','r') as source:
for line in source:
line = line.strip().lower()
for word in targets:
if word in line:
results.append((word, line))
# ok done
return results

How to print multiple lines from a file python

I'm trying to print several lines from a text file onto python, where it is outputted. My current code is:
f = open("sample.txt", "r").readlines()[2 ,3]
print(f)
However i'm getting the error message of:
TypeError: list indices must be integers, not tuple
Is there anyway of fixing this or printing multiple lines from a file without printing them out individually?
You are trying to pass a tuple to the [...] subscription operation; 2 ,3 is a tuple of two elements:
>>> 2 ,3
(2, 3)
You have a few options here:
Use slicing to take a sublist from all the lines. [2:4] slices from the 3rd line and includes the 4th line:
f = open("sample.txt", "r").readlines()[2:4]
Store the lines and print specific indices, one by one:
f = open("sample.txt", "r").readlines()
print f[2].rstrip()
print f[3].rstrip()
I used str.rstrip() to remove the newline that's still part of the line before printing.
Use itertools.islice() and use the file object as an iterable; this is the most efficient method as no lines need to be stored in memory for more than just the printing work:
from itertools import islice
with open("sample.txt", "r") as f:
for line in islice(f, 2, 4):
print line.rstrip()
I also used the file object as a context manager to ensure it is closed again properly once the with block is done.
Assign the whole list of lines to a variable, and then print lines 2 and 3 separately.
with open("sample.txt", "r") as fin:
lines = fin.readlines()
print(lines[2])
print(lines[3])

Merge Two wordlists into one file

I have two wordlists, as per examples below:
wordlist 1 :
code1
code2
code3
wordlist 2 :
11
22
23
I want to take wordlist 2 and put every number in a line with first line in wordlist 1
example of the output :
code111
code122
code123
code211
code222
code223
code311
.
.
Can you please help me with how to do it? Thanks!
You can run two nested for loops to iterate over both lists, and append the concatenated string to a new list.
Here is a little example:
## create lists using square brackets
wordlist1=['code1', ## wrap something in quotes to make it a string
'code2','code3']
wordlist2=['11','22','23']
## create a new empty list
concatenated_words=[]
## first for loop: one iteration per item in wordlist1
for i in range(len(wordlist1)):
## word with index i of wordlist1 (square brackets for indexing)
word1=wordlist1[i]
## second for loop: one iteration per item in wordlist2
for j in range(len(wordlist2)):
word2=wordlist2[j]
## append concatenated words to the initially empty list
concatenated_words.append(word1+word2)
## iterate over the list of concatenated words, and print each item
for k in range(len(concatenated_words)):
print(concatenated_words[k])
list1 = ["text1","text2","text3","text4"]
list2 = [11,22,33,44]
def iterativeConcatenation(list1, list2):
result = []
for i in range(len(list2)):
for j in range(len(list1)):
result = result + [str(list1[i])+str(list2[j])]
return result
have you figured it out? depends on if you want to input the names on each list, or do you want it to for instance automatically read then append or extend a new text file? I am working on a little script atm and a very quick and simple way, lets say u want all text files in the same folder that you have your .py file:
import os
#this makes a list with all .txt files in the folder.
list_names = [f for f in os.listdir(os.getcwd()) if f.endswith('.txt')]
for file_name in list_names:
with open(os.getcwd() + "/" + file_name) as fh:
words = fh.read().splitlines()
with open(outfile, 'a') as fh2:
for word in words:
fh2.write(word + '\n')

Python read file contents into nested list

I have this file that contains something like this:
OOOOOOXOOOO
OOOOOXOOOOO
OOOOXOOOOOO
XXOOXOOOOOO
XXXXOOOOOOO
OOOOOOOOOOO
And I need to read it into a 2D list so it looks like this:
[[O,O,O,O,O,O,X,O,O,O,O],[O,O,O,O,O,X,O,O,O,O,O],[O,O,O,O,X,O,O,O,O,O,O],[X,X,O,O,X,O,O,O,O,O,O],[X,X,X,X,O,O,O,O,O,O,O,O],[O,O,O,O,O,O,O,O,O,O,O]
I have this code:
ins = open(filename, "r" )
data = []
for line in ins:
number_strings = line.split() # Split the line on runs of whitespace
numbers = [(n) for n in number_strings]
data.append(numbers) # Add the "row" to your list.
return data
But it doesn't seem to be working because the O's and X's do not have spaces between them. Any ideas?
Just use data.append(list(line.rstrip())) list accepts a string as argument and just splits them on every character.

Resources