Merge Two wordlists into one file - linux

I have two wordlists, as per examples below:
wordlist 1 :
code1
code2
code3
wordlist 2 :
11
22
23
I want to take wordlist 2 and put every number in a line with first line in wordlist 1
example of the output :
code111
code122
code123
code211
code222
code223
code311
.
.
Can you please help me with how to do it? Thanks!

You can run two nested for loops to iterate over both lists, and append the concatenated string to a new list.
Here is a little example:
## create lists using square brackets
wordlist1=['code1', ## wrap something in quotes to make it a string
'code2','code3']
wordlist2=['11','22','23']
## create a new empty list
concatenated_words=[]
## first for loop: one iteration per item in wordlist1
for i in range(len(wordlist1)):
## word with index i of wordlist1 (square brackets for indexing)
word1=wordlist1[i]
## second for loop: one iteration per item in wordlist2
for j in range(len(wordlist2)):
word2=wordlist2[j]
## append concatenated words to the initially empty list
concatenated_words.append(word1+word2)
## iterate over the list of concatenated words, and print each item
for k in range(len(concatenated_words)):
print(concatenated_words[k])

list1 = ["text1","text2","text3","text4"]
list2 = [11,22,33,44]
def iterativeConcatenation(list1, list2):
result = []
for i in range(len(list2)):
for j in range(len(list1)):
result = result + [str(list1[i])+str(list2[j])]
return result

have you figured it out? depends on if you want to input the names on each list, or do you want it to for instance automatically read then append or extend a new text file? I am working on a little script atm and a very quick and simple way, lets say u want all text files in the same folder that you have your .py file:
import os
#this makes a list with all .txt files in the folder.
list_names = [f for f in os.listdir(os.getcwd()) if f.endswith('.txt')]
for file_name in list_names:
with open(os.getcwd() + "/" + file_name) as fh:
words = fh.read().splitlines()
with open(outfile, 'a') as fh2:
for word in words:
fh2.write(word + '\n')

Related

Trying to add to a 2d list, getting IndexError: tuple index out of range

I've written a function that will recursively go through a folder in the directory and add the contents of all .dat files to a two-dimensional list. Each column represents a file with each line on a new row. I'm using for loops to achieve this, but I'm getting IndexError: tuple index out of range when it tries to put this information into the list. I've looked at every way of getting this information into the list: appending, inserting and just assigning, but they all come out with similar errors.
def initialiseitems():
items = ([], [])
count = 0
for root, dirs, files in os.walk("Bundles/Items/", topdown=False):
for name in files:
if os.path.splitext(os.path.basename(name))[1] == ".dat":
if os.path.splitext(os.path.basename(name))[0] != "English":
prefile = open(os.path.join(root, name), "r")
file = prefile.readlines()
for lineNumber in range(0, sum(1 for line in file)):
line = file[lineNumber].replace('\n', '')
items[count].append(line)
count = count+1
return items
It should just put them all in the array. It's evident the method of getting this into the list is wrong. What's the best way to do this? Preferably, with no external libraries. Thanks
Edit: Full error
Traceback (most recent call last):
File "C:/Users/Kenzi/PycharmProjects/workshophelper/main.py", line 3, in <module>
items = initialisation.initialiseitems()
File "C:\Users\Kenzi\PycharmProjects\workshophelper\initialisation.py", line 15, in initialiseitems
items[count].append(line)
IndexError: tuple index out of range
You have a logic problem.
You are incrementing count = count+1 for any *.dat file that is not English.dat
You use count to index into items = ([], []) which has exactly 2 elements.
The third *.dat will create a count of 2 -your indexes are only 0 and 1 --> error.
You can simplify your code:
def initialiseitems():
items = ([], [])
count = 0
for root, dirs, files in os.walk("Bundles/Items/", topdown=False):
for name in files:
if name.endswith(".dat"):
if not name.startswith("English"):
fullname = os.path.join(root, name)
with open(fullname, "r") as f:
items[0].append( fullname )
items[1].append( [x.strip() for x in f.splitlines()]
return items
This returns a tuple (immutable object that contains 2 lists) - the first list contains the filenames,
the second list the filecontent:
(["file1","file2"],[ [ ...content of file 1...],[ ...content of file 2 ...]]
The [ ...content of file 1...] is a list of lines without \n at it's ends.
You can work with the result like so:
result = initialiseitems()
for filename,filecontent in zip (result[0],result[1]):
# filename is a string like "path/path/path/filename"
# filecontent is a list like [ line1, line2, line3, ...]
items = ([], []) was declaring the list a a tuple, and not a list. Don't know what it is, but it doesn't work here. I changed it to items = [] instead.
Furthermore, the for loops i had setup to put the files into the array was and inefficient. I have already loaded the file into a 1d list by opening it and assigning it to a variable, so appending this 1d variable to items is effectively appending the whole column. Also, read Patrick's post - i limited the list i was trying to declare to only two columns/rows. not nice.
The code now looks like:
def initialiseitems():
items = []
count = 0
for root, dirs, files in os.walk("Bundles/Items/", topdown=False):
for name in files:
if os.path.splitext(os.path.basename(name))[1] == ".dat":
if os.path.splitext(os.path.basename(name))[0] != "English":
prefile = open(os.path.join(root, name), "r")
file = prefile.readlines()
items.append(file)
return items
Also, thanks Jack for fixing my grammar.

How to separate a string into 2 list of lists

I got this string:
\n
\n
N\tO\tHP\tM\tD\tU\tI\tN\tO\n
E\tS\tA\tE\tI\tT\tL\tN\tI\tN\n
N\tP\tN\tN\tN\tG\tAO\tD\tC\n
\n
\n
PERMANENTE
PETTINE
\n
\n
actually if you looks at original string ,you cannot see the \t and \n ,so I just edited to better understanding.
What is I'm trying to do is separate to 2 different list of lists,for example:
lists1 = [[NOHPMDUINO][ESAEITLNIN][NPNNNGAODC]]
lists2 = [[PERMANENTE][PETTINE]]
I tried to use many methods to solve this, but without success.
at first I removed the new lines at the beginning with .strip('\n') method, and I tried to use replace , but I don't know how to make it right.
Thank you zsomko and snakecharmerb,
Using the method of zsomko and adding strip() to remove the newline at the beginning , here is the loop that I did to divide to 2 variables:
var = True
for line in t:
if line !=['']:
if var:
group1.append(line)
else:
group2.append(line)
else:
var = False
I hope this will help to someone :) If somebody has better solution ,more efficient ,I would like to hear
First eliminate the tabs and split the string into lines:
lines = [line.replace('\t', '') for line in string.splitlines()]
Then the following would yield the list of lists in the variable groups as expected:
groups = []
group = []
for line in lines:
if group and not line:
groups.append(group)
group = []
elif line:
group.append(line)
You can break the string into separate lines using its splitlines method - this will give you a list of lines without their terminating newline ('\n') characters.
Then you can loop over the list and replace the tab characters with empty strings using the str.replace method.
>>> for line in s.splitlines():
... if not line:
... # Skip empty lines
... continue
... cleaned = line.replace('\t', '')
... print(cleaned)
...
NOHPMDUINO
ESAEITLNIN
NPNNNGAODC
PERMANENTE
PETTINE
Grouping the output in lists of lists is a little trickier. The question doesn't mention the criteria for grouping, so let's assume that lines which are not separated by empty lines should be listed together.
We can use a generator to iterate over the string, group adjacent lines and emit them as lists like this:
>>> def g(s):
... out = []
... for line in s.splitlines():
... if not line:
... if out:
... yield out
... out = []
... continue
... cleaned = line.replace('\t', '')
... out.append([cleaned])
... if out:
... yield out
...
>>>
The generator collects lines in a list (out) which it yields each time it finds a blank line and the list is not empty; if the list is yielded it is replaced with an empty list. After looping over the lines in the string it yields the list again, if it isn't empty, in case the string didn't end with blank lines.
Looping over the generator returns the lists of lists in turn.
>>> for x in g(s):print(x)
...
[['NOHPMDUINO'], ['ESAEITLNIN'], ['NPNNNGAODC']]
[['PERMANENTE'], ['PETTINE']]
Alternatively, if you want a list of lists of lists, call list on the generator:
>>> lists = list(g(s))
>>> print(lists)
[[['NOHPMDUINO'], ['ESAEITLNIN'], ['NPNNNGAODC']], [['PERMANENTE'], ['PETTINE']]]
If you want to assign the result to named variables, you can unpack the call to list:
>>> group1, group2 = list(g(s))
>>> group1
[['NOHPMDUINO'], ['ESAEITLNIN'], ['NPNNNGAODC']]
>>> group2
[['PERMANENTE'], ['PETTINE']]
but note to do this you need to know the number of lists that will be generated in advance.

How to Read Multiple Files in a Loop in Python and get count of matching words

I have two text files and 2 lists (FIRST_LIST,SCND_LIST),i want to find out count of each file matching words from FIRST_LIST,SCND_LIST individually.
FIRST_LIST =
"accessorizes","accessorizing","accessorized","accessorize"
SCND_LIST=
"accessorize","accessorized","accessorizes","accessorizing"
text File1 contains:
This is a very good question, and you have received good answers which describe interesting topics accessorized accessorize.
text File2 contains:
is more applied,using accessorize accessorized,accessorizes,accessorizing
output
File1 first list count=2
File1 second list count=0
File2 first list count=0
File2 second list count=4
This code i have tried to achive this functionality but not able to get the expected output.
if any help appreciated
import os
import glob
files=[]
for filename in glob.glob("*.txt"):
files.append(filename)
# remove Punctuations
import re
def remove_punctuation(line):
return re.sub(r'[^\w\s]', '', line)
two_files=[]
for filename in files:
for line in open(filename):
#two_files.append(remove_punctuation(line))
print(remove_punctuation(line),end='')
two_files.append(remove_punctuation(line))
FIRST_LIST = "accessorizes","accessorizing","accessorized","accessorize"
SCND_LIST="accessorize","accessorized","accessorizes","accessorizing"
c=[]
for match in FIRST_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
c.append(match)
print(c)
len(c)
d=[]
for match in SCND_LIST:
if any(match in value for value in two_files):
#c=match+1
print (match)
d.append(match)
print(d)
len(d)
Using Counter and some list comprehension is one of many different approaches to solve your problem.
I assume, your sample output being wrong since some words are part of both lists and both files but are not counted. In addition I added a second line to the sample strings in order to show how that is working with multi-line strings which might be the typical contents of a given file.
io.StringIO objects emulate your files, but working with real files from your file system works exactly the same since both provide a file-like object or file-like interface:
from collections import Counter
list_a = ["accessorizes", "accessorizing", "accessorized", "accessorize"]
list_b = ["accessorize", "accessorized", "accessorizes", "accessorizing"]
# added a second line to each string just for the sake
file_contents_a = 'This is a very good question, and you have received good answers which describe interesting topics accessorized accessorize.\nThis is the second line in file a'
file_contents_b = 'is more applied,using accessorize accessorized,accessorizes,accessorizing\nThis is the second line in file b'
# using io.StringIO to simulate a file input (--> file-like object)
# you should use `with open(filename) as ...` for real file input
file_like_a = io.StringIO(file_contents_a)
file_like_b = io.StringIO(file_contents_b)
# read file contents and split lines into a list of strings
lines_of_file_a = file_like_a.read().splitlines()
lines_of_file_b = file_like_b.read().splitlines()
# iterate through all lines of each file (for file a here)
for line_number, line in enumerate(lines_of_file_a):
words = line.replace('.', ' ').replace(',', ' ').split(' ')
c = Counter(words)
in_list_a = sum([v for k,v in c.items() if k in list_a])
in_list_b = sum([v for k,v in c.items() if k in list_b])
print("Line {}".format(line_number))
print("- in list a {}".format(in_list_a))
print("- in list b {}".format(in_list_b))
# iterate through all lines of each file (for file b here)
for line_number, line in enumerate(lines_of_file_b):
words = line.replace('.', ' ').replace(',', ' ').split(' ')
c = Counter(words)
in_list_a = sum([v for k,v in c.items() if k in list_a])
in_list_b = sum([v for k,v in c.items() if k in list_b])
print("Line {}".format(line_number))
print("- in list a {}".format(in_list_a))
print("- in list b {}".format(in_list_b))
# actually, your two lists are the same
lists_are_equal = sorted(list_a) == sorted(list_b)
print(lists_are_equal)

Delta words between two TXT files

I would like to count the delta words between two files.
file_1.txt has content One file with some text and words..
file_1.txt has content One file with some text and additional words to be found..
diff command on Unix systems gives the following infos. difflib can give a similar output.
$ diff file_1.txt file_2.txt
1c1
< One file with some text and words.
---
> One file with some text and additional words to be found.
Is there an easy way to found the words added or removed between two files, or at least between two lines as git diff --word-diff does.
First of all you need to read your files into strings with open() where 'file_1.txt' is path to your file and 'r' is for "reading mode".
Similar for the second file. And don't forget to close() your files when you're done!
Use split(' ') function to split strings you have just read into lists of words.
file_1 = open('file_1.txt', 'r')
text_1 = file_1.read().split(' ')
file_1.close()
file_2 = open('file_2.txt', 'r')
text_2 = file_2.read().split(' ')
file_2.close()
Next step you need to get difference between text_1 and text_2 list variables (objects).
There are many ways to do it.
1)
You can use Counter class from collections library.
Pass your lists to the class's constructor, then find the difference by subtraction in straight and reverse order, call elements() method to get elements and list() to transform it to the list type.
from collections import Counter
text_count_1 = Counter(text_1)
text_count_2 = Counter(text_2)
difference = list((text_count_1 - text_count_2).elements()) + list((text_count_2 - text_count_1).elements())
Here is the way to calculate the delta words.
from collections import Counter
text_count_1 = Counter(text_1)
text_count_2 = Counter(text_2)
delta = len(list((text_count_2 - text_count_1).elements())) \
- len(list((text_count_1 - text_count_2).elements()))
print(delta)
2)
Use Differ class from difflib library. Pass both lists to compare() method of Differ class and then iterate it with for.
from difflib import Differ
difference = []
for d in Differ().compare(text_1, text_2):
difference.append(d)
Then you can count the delta words like this.
from difflib import Differ
delta = 0
for d in Differ().compare(text_1, text_2):
status = d[0]
if status == "+":
delta += 1
elif status == "-":
delta -= 1
print(delta)
3)
You can write difference method by yourself. For example:
def get_diff (list_1, list_2):
d = []
for item in list_1:
if item not in list_2:
d.append(item)
return d
difference = get_diff(text_1, text_2) + get_diff(text_2, text_1)
I think that there are other ways to do this. But I will limit by three.
Since you get the difference list you can manage the output like whatever you wish.
..and here is yet another way to do this with dict()
#!/usr/bin/python
import sys
def loadfile(filename):
h=dict()
f=open(filename)
for line in f.readlines():
words=line.split(' ')
for word in words:
h[word.strip()]=1
return h
first=loadfile(sys.argv[1])
second=loadfile(sys.argv[2])
print "in both first and second"
for k in first.keys():
if k and k in second.keys():
print k

Python read file contents into nested list

I have this file that contains something like this:
OOOOOOXOOOO
OOOOOXOOOOO
OOOOXOOOOOO
XXOOXOOOOOO
XXXXOOOOOOO
OOOOOOOOOOO
And I need to read it into a 2D list so it looks like this:
[[O,O,O,O,O,O,X,O,O,O,O],[O,O,O,O,O,X,O,O,O,O,O],[O,O,O,O,X,O,O,O,O,O,O],[X,X,O,O,X,O,O,O,O,O,O],[X,X,X,X,O,O,O,O,O,O,O,O],[O,O,O,O,O,O,O,O,O,O,O]
I have this code:
ins = open(filename, "r" )
data = []
for line in ins:
number_strings = line.split() # Split the line on runs of whitespace
numbers = [(n) for n in number_strings]
data.append(numbers) # Add the "row" to your list.
return data
But it doesn't seem to be working because the O's and X's do not have spaces between them. Any ideas?
Just use data.append(list(line.rstrip())) list accepts a string as argument and just splits them on every character.

Resources