how to use file to re copy a string using position and words only - recreate

use a file to create the sentence
sentence = 'the cat sat on the cat mat'
indivdual_words = ['the', 'cat', 'sat', 'on', 'mat']
positions = [1, 2, 3, 4, 1, 2, 5]
f = open('word_file.txt', 'w+')
f.write(str(words))
f.close()
f = open('pos_file.txt', 'w+')
f.write(str(positions))
f.close()
program should see 1 as the and 2 as cat etc

Since you're storing everything as strings, you'll end up with file contents that match a valid python expression. You can use ast.literal_eval to get the actual python object out of the string representation.
from ast import literal_eval
with open('word_file.txt') as f:
data = f.read().strip()
words = ast.literal_eval(data)
with open('pos_file.txt') as f:
data = f.read().strip()
pos = ast.literal_eval(data)
Then just do the opposite of what you did before.
result = " ".join([words[i-1] for i in pos])

Since you're dumping the representation of the lists, the best way is to read them back using ast.literal_eval
import ast
with open('word_file.txt') as f:
indivdual_words = ast.literal_eval(f.read())
with open('pos_file.txt') as f:
positions = ast.literal_eval(f.read())
then recreate the sentence using a list comprehension to generate the words in sequence, joined with spaces:
sentence = " ".join([indivdual_words[i-1] for i in positions])
result:
the cat sat on the cat mat

After you create the reading and writing capable file objects (w for word file, n for index file):
1) iterate through the word file object, appending each word to an empty list
2) iterate through the index file object, assigning each word in word list to temporary variable word via index from index file object, and then add that word to the originally empty sentence you are trying to form.
word_list = []
for word in w:
wordlist.append(word)
sentence = ''
for index in n:
word = wordlist[index]
sentence+= word
sentence+= ' '

Related

How to iterate through all keys within dic with same values one by one with sequence

I'm working on some text file which contains too many words and i want to get all words with there length . For example first i wanna get all words who's length is 2 and the 3 then 4 up to 15 for example
Word = this , length = 4
hate :4
love :4
that:4
china:5
Great:5
and so on up to 15
I was trying to do with this following code but i couldn't iterate it through all keys one by one .And through this code I'm able to get just words which has the length 5 but i want this loop to start it from 2 to up to 15 with sequence
text = open(r"C:\Users\israr\Desktop\counter\Bigdata.txt")
d = dict()
for line in text:
line = line.strip()
line = line.lower()
words = line.split(" ")
for word in words:
if word not in d:
d[word] = len(word)
def getKeysByValue(d, valueToFind):
listOfKeys = list()
listOfItems = d.items()
for item in listOfItems:
if item[1] == valueToFind:
listOfKeys.append(item[0])
return listOfKeys
listOfKeys = getKeysByValue(d, 5)
print("Keys with value equal to 5")
#Iterate over the list of keys
for key in listOfKeys:
print(key)
What I have done is:
Changed the structure of your dictionary:
In your version of dictionary, a "word" has to be the key having value equal to its length. Like this:
{"hate": 4, "love": 4}
New version:
{4: ["hate", "love"], 5:["great", "china"]} Now the keys are integers and values are lists of words. For instance, if key is 4, the value will be a list of all words from the file with length 4.
After that, the code is populating dictionary from the data read from file. If the key is not present in the dictionary it is created otherwise the words are added to the list against that key.
Keys are sorted and their values are printed. That is all words of that length are printed in sequence.
You Forgot to close the file in your code. Its a good practice to release any resource being used by a program when it finishes execution. (To avoid Resource or Memory Leak and other such errors). Most of the time this can be done by just closing that resource. Closing the file, for instance, releases the file and it can thus be used by other program now.
# 24-Apr-2020
# 03:11 AM (GMT +05)
# TALHA ASGHAR
# Open the file to read data from
myFile = open(r"books.txt")
# create an empty dictionary where we will store word counts
# format of data in dictionary will be:
# {1: [words from file of length 1], 2:[words from file of length 2], ..... so on }
d = dict()
# iterate over all the lines of our file
for line in myFile:
# get words from the current line
words = line.lower().strip().split(" ")
# iterate over each word form the current line
for word in words:
# get the length of this word
length = len(word)
# there is no word of this length in the dictionary
# create a list against this length
# length is the key, and the value is the list of words with this length
if length not in d.keys():
d[length] = [word]
# if there is already a word of this length append current word to that list
else:
d[length].append(word)
for key in sorted(d.keys()):
print(key, end=":")
print(d[key])
myFile.close()
Your first part of code is correct, dictionary d will give you all the unique words with their respective length.
Now you want to get all the words with their length, as shown below:
{'this':4, 'that':4, 'water':5, 'china':5, 'great':5.......till length 15}
To get such dictionary you can sort the dictionary by their values as below.
import operator
sorted_d = sorted(d.items(), key=operator.itemgetter(1))
sorted_d will be in the below format:
{'this':4, 'that':4, 'water':5, 'china':5, 'great':5,......., 'abcdefghijklmno':15,...}

Writing sequences into separate list or array

I'm trying to extracts these sequences into separate lists or arrays in Python from a file.
My data looks like:
>gene_FST
AGTGGGTAATG--TGATG...GAAATTTG
>gene_FPY
AGT-GG..ATGAAT---AAATGAAAT--G
I would like to have
seq1 = [AGTGGGTAATG--TGATG...GAAATTTG]
seq2 = [AGT-GG..ATGAAT---AAATGAAAT--G]
My plan is to later compare the contents of the list
I would appreciate any advise
So far, here's what I have done, that
f = open (r"C:\Users\Olukayode\Desktop\my_file.txt", 'r') #first r - before the normal string it converts normal string to raw string
def parse_fasta(lines):
seq = []
seq1 = []
seq2 = []
head = []
data = ''
for line in lines:
if line.startswith('>'):
if data:
seq.append(data)
data = ''
head.append(line[1:])
else:
data+= line.rstrip()
seq.append(data)
return seq
h = parse_fasta(f)
print(h)
print(h[0])
print(h[1])
gives:
['AGTGGGTAATG--TGATG...GAAATTTG', 'AGT-GG..ATGAAT---AAATGAAAT--G']
AGTGGGTAATG--TGATG...GAAATTTG
AGT-GG..ATGAAT---AAATGAAAT--G
I think I just figured it out, I can pass each string the list containing both sequences into a separate list, if possible though
If you want to get the exact results you were looking for in your original question, i.e.
seq1 = [AGTGGGTAATG--TGATG...GAAATTTG]
seq2 = [AGT-GG..ATGAAT---AAATGAAAT--G]
you can do it in a variety of ways. Instead of changing anything you already have though, you can just convert your data into a dictionary and print the dictionary items.
your code block...
h = parse_fasta(f)
sDict = {}
for i in range(len(h)):
sDict["seq"+str(i+1)] = [h[i]]
for seq, data in sDict.items():
print(seq, "=", data)

Return a dictionary with keys that are the first letter of a word and lists of those words?

I want to write a function that takes a list of words and keys and outputs those keys as dictionary keys with any words starting with that letter attached.
How could this be achieved using simple python 3 code?
eg. takes (['apples', 'apple', 'bananna', 'fan'], 'fad')
returns {'a' : ['apple', 'apples'], 'f' : ['fan']}
so far i have tried:
def dictionary(words, char_keys)
char_keys = remove_duplicates(char_keys)
ret = {}
keys_in_dict = []
words = sorted(words)
for word in words:
if word[0] in char_keys and word[0] not in keys_in_dict:
ret[word[0]] = word
keys_in_dict.append(word[0])
elif word[0] in keys_in_dict:
ret[word[0]] += (word)
return ret
This gives kinda the right output but it the output is in a single string rather than a list of strings.(the def is not indented properly i know)
If the input is a list of strings, you can check if the char is in the dict, if yes, append the word, otherwise add a list with the word:
def dictionary(inpt):
result = {}
for word in inpt:
char = word[0]
if char in result:
result[char].append(word)
else:
result[char] = [word]
return result
The modern way to do this is to use a collections.defaultdict with list as argument.
def dictionary(inpt):
result = defaultdict(list)
for word in inpt:
result[word[0]].append(word)
return result
Not sure if your list of inputs are consisted with only strings or it can also include sub-lists of strings (and I'm not so sure why "fad" disappeared in your example). Obviously, in the latter scenario it will need some more effort. For simplicity I assume if contains only strings and here's a piece of code which hopefully points the direction:
d = {}
for elem in input_list[0]:
if elem[0] in input_list[1]
lst = d.get(elem[0], [])
lst.append(elem)
d[elem] = lst

Creating a list of objects from a dictionary

So first of all I have a function that count words in a text file, and a program that creates a dictionary based on how many occurences of the word is in that text file. The program is
def counter (AllWords):
d = {}
for word in AllWords:
if word in d.keys():
d[word] = d[word] + 1
else:
d[word] = 1
return d;
f = open("test.txt", "r")
AllWords = []
for word in f.read().split():
AllWords.append(word.lower())
print(counter(AllWords))
Now given that dictionary, I want to create a list of objects such that the objects will have two instance variables, the word (string) and how many time it appears (integer). Any help is appreciated!
What about:
list(d.items())
It will create a list of tuples like:
[('Foo',3),('Bar',2)]
Or you can define your own class:
class WordCount:
def __init__(self,word,count):
self.word = word
self.count = count
and use list comprehension:
[WordCount(*item) for item in d.items()]
So here you create a list of WordCount objects.
Nevertheless, your counter(..) method is actually not necessary: Python already has a Counter:
from collections import Counter
which is "a dictionary with things" so to speak: you can simply construct it like:
from collections import Counter
Counter(allWords)
No need to reinvent to wheel to count items.
What about a quasi one-liner to do all the heavy lifting, using of course collections.Counter and the mighty str.split ?
import collections
with open("text.txt") as f:
c = collections.Counter(f.read().split())
Now c contains the couples: word,number of occurences of the word

Create a dictionary from a file

I am creating a code that allows the user to input a .txt file of their choice. So, for example, if the text read:
"I am you. You ArE I."
I would like my code to create a dictionary that resembles this:
{I: 2, am: 1, you: 2, are: 1}
Having the words in the file appear as the key, and the number of times as the value. Capitalization should be irrelevant, so are = ARE = ArE = arE = etc...
This is my code so far. Any suggestions/help?
>> file = input("\n Please select a file")
>> name = open(file, 'r')
>> dictionary = {}
>> with name:
>> for line in name:
>> (key, val) = line.split()
>> dictionary[int(key)] = val
Take a look at the examples in this answer:
Python : List of dict, if exists increment a dict value, if not append a new dict
You can use collections.Counter() to trivially do what you want, but if for some reason you can't use that, you can use a defaultdict or even a simple loop to build the dictionary you want.
Here is code that solves your problem. This will work in Python 3.1 and newer.
from collections import Counter
import string
def filter_punctuation(s):
return ''.join(ch if ch not in string.punctuation else ' ' for ch in s)
def lower_case_words(f):
for line in f:
line = filter_punctuation(line)
for word in line.split():
yield word.lower()
def count_key(tup):
"""
key function to make a count dictionary sort into descending order
by count, then case-insensitive word order when counts are the same.
tup must be a tuple in the form: (word, count)
"""
word, count = tup
return (-count, word.lower())
dictionary = {}
fname = input("\nPlease enter a file name: ")
with open(fname, "rt") as f:
dictionary = Counter(lower_case_words(f))
print(sorted(dictionary.items(), key=count_key))
From your example I could see that you wanted punctuation stripped away. Since we are going to split the string on white space, I wrote a function that filters punctuation to white space. That way, if you have a string like hello,world this will be split into the words hello and world when we split on white space.
The function lower_case_words() is a generator, and it reads an input file one line at a time and then yields up one word at a time from each line. This neatly puts our input processing into a tidy "black box" and later we can simply call Counter(lower_case_words(f)) and it does the right thing for us.
Of course you don't have to print the dictionary sorted, but I think it looks better this way. I made the sort order put the highest counts first, and where counts are equal, put the words in alphabetical order.
With your suggested input, this is the resulting output:
[('i', 2), ('you', 2), ('am', 1), ('are', 1)]
Because of the sorting it always prints in the above order.

Resources