Convert string from external file to Python dictionary - string

I've looked around but I haven't really found a solution. I have external TXT file that contains strings separated by commas, each "item" is on new line:
item1: value1
item2: value2
etc. you get the idea.
I am creating script that works with dictionary and I need to convert the file to Python dictionary. I'm still a newbie so I'm kinda lost.
I have tried this which I found here String to dictionary :
dict((key, None) for key in string.split(': '))
But I am unable to figure out how to replace 'None' with sth that represents Value of a Key. I did try dict((key, value)) but 'value' is not recognized.

You can solve it like this:
dict((key, value) for key, value in [string.split(': ')])
Note that this will fail if there are multiple ': ''s in the string.
This peace of code works like the following:
>>> string = 'hello: world'
>>> string.split(': ')
['hello', 'world']
>>> a, b = string.split(': ')
>>> a
'hello'
>>> b
'world'
The values will be strings this way. If they should be for example integers, use (key, int(value)). This, also, is not failsafe if there are none-integer values.
Remark: As of Python 2.7, which you tagged your question with, you can use dictionary comprehensions. This is a bit cleaner:
{key: value for key, value in [string.split(': ')]}
This will, however, get you a lot of dictionaries of size 1. I think you'll need something like this:
{key: value for line in fileobject.readlines() for key, value in [line.split(': ')]}
You could also use the split directly:
dict(tuple(string.split(': ')) for string in fileobject.readlines())

I asked around on #Python (FreeNode) to help me with this as well and there seems to be more "neat" solution.
fileobject = open('text.txt', 'r')
fdict = {}
for line in fileobject.readlines():
for key, value in [line.split(':')]:
fdict[key] = value
This seems more clean to me and more understandable.

Related

How to sort s list of strings in python ignoring digits in those strings?

I would like to sort a list of strings in python, however digits included in those string should be omitted by the sorting script. Example of such list can be found below:
list = ['aaa', '1aaa', 'abc', '2abc', '3abc', 'b2bb', 'b3bb']
I have found one topic on stackoverflow, namely this one, but this did not answer my question.
After more research I have found this page, but my implementation does not work:
import re
def numbers_sort(file):
lines = []
lines += [line for line in open(file).readlines()]
print(''.join(sorted(lines, key=lambda key: [x for x in re.sub('^[-+]?[0-9]+$', '')])),end="")
I have been also trying to use isdigit() as key of the sorted function, however for no avail.
Thank you for your help.
The key-Argument expects a function, that convert one string.
def numbers_sort(filename):
with open(filename) as lines:
print(''.join(sorted(lines, key=lambda s: re.sub('[-+]?[0-9]+', '', s))), end="")

How to assign number to each value in python

I am comparatively new to python and data science and I was working with a CSV file which looks something like:
value1, value2
value3
value4...
Thing is, I want to assign a unique number to each of these values in the csv file such that the unique number acts as the key and the item in the CSV acts as the value like in a dictionary.
I tried using pandas but if possible, I wanted to know how I can solve this without using any libraries.
The desired output should be something like this:
{
"value1": 1,
"value2": 2,
"value3": 3,
.
.
.
and so on..
}
Was just about to talk about pandas before I saw that you wanted to do it in vanilla Python. I'd do it with pandas personally, but here you go:
You can read in lines from a file, split them by delimiter (','), and then get your word tokens.
master_dict = {}
counter = 1
with open("your_csv.csv", "r") as f:
for line in f:
words = line.split(',') # you may or may not want to add a call to .strip() as well
for word in words:
master_dict[counter] = word
counter += 1

Convert a string with components separated by symbols into a nested python dictionary

I'm struggling with a Python question and would appreciate any help. Do have patience, my Python is basic at the moment.
Question:
How do I transform a string structure like this:
text="key1=value1;key2=value2\nkeyA=valueA\n..."
into a Python dictionary like this:
{0:{'key1':'value1', 'key2':'value2'}, 1:{'keyA':'valueA'}}
Realize that ';' separates items in the inner dictionary while ā€˜\nā€™ separates items on the outer dictionary.
The key, values in the inner dictionaries are strings. The keys for the outer dictionaries are indexes.
There needs to be another function to transform this Python dictionary back into its original string form.
Where I am now:
I was thinking of creating a loop that is able to do this but I'm struggling to create it.
a[0]["key1"]="value1"
a[0]["key2"]="value2"
a[1]["keyA"]="valueA"
The best I did was to split the string by '\n' like this:
text ='k1=v1;k2=v2\nk3=v3\nk4=v4'
text = text.split("\n")
output: ['k1=v1;k2=v2', 'k3=v3', 'k4=v4']
And looped the elements into the dictionary like this:
dic = {}
for i,x in enumerate(text):
dic[i] = x
output: {0: 'k1=v1;k2=v2', 1: 'k3=v3', 2: 'k4=v4'}
But how do I get these values within the dictionary into the key, value structure as seen above?
You can use the following dict comprehension:
{i: dict(p.split('=', 1) for p in l.split(';')) for i, l in enumerate(text.split('\n')) if l}
With your sample input:
text="key1=value1;key2=value2\nkeyA=valueA\n"
This returns:
{0: {'key1': 'value1', 'key2': 'value2'}, 1: {'keyA': 'valueA'}}
There may be a more clean and precise way to solve your problem but for now, You can manage with this one.
def make_dict(string):
temp={}
string=string.split(';')
string=[i.split('=') for i in string]
for a,b in string:
temp[a]=b
return temp
text ='k1=v1;k2=v2\nk3=v3\nk4=v4'
text=text.split('\n')
dic={}
for i,x in enumerate(text):
dic[i] = make_dict(x)
>>> print(dic)
>>> {0: {'k1': 'v1', 'k2': 'v2'}, 1: {'k3': 'v3'}, 2: {'k4': 'v4'}}
If you want to reverse the above process then it can be done by the following way.
def convert_again(dct):
fetch_values = list(dct.values())
change_values_to_list = [list(i.items()) for i in fetch_values]
# add "=" in between the key-value pairs
for i in range(len(change_values_to_list)):
for j in range(len(change_values_to_list[i])):
change_values_to_list[i][j]='='.join(change_values_to_list[i][j])
# Now add ";"
answer = [';'.join(i) for i in change_values_to_list]
# Now add "\n"
final_answer = '\\n'.join(answer)
return final_answer
#Driver Code
dct= {0: {'k1': 'v1', 'k2': 'v2'}, 1: {'k3': 'v3'}, 2: {'k4': 'v4'}}
print(convert_again(dct)) # --> "k1=v1;k2=v2\nk3=v3\nk4=v4"
I've written a solution that can be extended for other examples as well.
I've created a more complicated example
Notice how, we have another set which is seperted by a ;. If I can demonstrate it for this example, it should work for others as well
It is important to note that this will work only if the last 2 characters in text is "\n". If "\n" is not present in the last two characters, then remove the line list1.remove("")
text="key3=value3;key4=value4\nkey1=value1;key2=value2\nkeyA=valueA\nkeyB=valueB\n"
I am first splitting by \n, that would mean that there would be an "" present in the list, since last 2 characters of text is "\n"
list1 = text.split("\n")
list1.remove("")
Now I'm creating two lists, one to append string separated by ";" and another to append strings NOT separated by ";"
list2 = [] #Stuff seperated by ;
list3 = [] #Stuff not seperated by ;
for items in list1:
if ';' in items:
list2.append(items)
else:
list3.append(items)
I created an empty dictionary which eventually have what you want:
result_dict = {} #To store all key, value pairs
list2 now has ['key3=value3;key4=value4', 'key1=value1;key2=value2']
#First storing key, value pairs seperated by ";"
for i in range(0, len(list2)):
a = list2[i].split(";")
result_dict[i] = dict(s.split('=') for s in a)
list3 has ['keyA=valueA', 'keyB=valueB']
#Now storing key, value pairs not seperated by ";"
nosemicolon_dict = dict(s.split('=') for s in list3)
for key, value in nosemicolon_dict.items():
for j in range(0, len(nosemicolon_dict)):
result_dict[j + i + 1] = {key:value}
FOR YOUR EXAMPLE, Run the same code above, replace text with your example also ensuring to take into account whether "\n" is the last two characters in your example or not. If you DON'T have "\n" at the end of the string, remove list1.remove("") THIS LINE
print(result_dict)
gave me:
{0: {'key1': 'value1', 'key2': 'value2'}, 1: {'keyA': 'valueA'}}

Python trouble debugging i/0, how do I get the correct format?

I am attempting to make a dictionary into a formatted string and then write it to a file, however my entire formatting seems to be incorrect. I'm not sure how to debug since all my tester cases are given different files. I was able to use the interactive mode in python to find out what my function is actually writing to the file, and man is it so wrong! Can you help me correctly format?
Given a sorted dictionary, I created it into a string. I need the function to return it like so:
Dictionary is : {'orange':[1,3],'apple':[2]}
"apple:\t2\norange:\t1,\t3\n"
format is: Every key-value pair of the dictionary
should be output as: a string that starts with key, followed by ":", a tab, then the integers from the
value list. Every integer should be followed by a "," and a tab except for the very last one, which should be followed by a newline
Here is my function that I thought would work:
def format_item(key,value):
return key+ ":\t"+",\t".join(str(x) for x in value)
def format_dict(d):
return sorted(format_item(key,value) for key, value in d.items())
def store(d,filename):
with open(filename, 'w') as f:
f.write("\n".join(format_dict(d)))
f.close()
return None
I now have too many tabs on the last line. How do I edit the last line only out of the for loop?
ex input:
d = {'orange':[1,3],'apple':[2]}
my function gives: ['apple:\t2', 'orange:\t1,\t3']
but should give: "apple:\t2\norange:\t1,\t3\n"
Adding the newline character to the end of the return statement in format_item seems to yield the correct output.
return key+ ":\t"+",\t".join(str(x) for x in value) + '\n'
In [10]: format_dict(d)
Out[10]: ['apple:\t2\n', 'orange:\t1,\t3\n']

Create dictionary from Fasta file

I recently picked up a program i started some time ago (sorting and listing of gene code) and since I am a beginner and couldn't find anything about this specific problem online i will need some help.
I want to crate a dictionary from a Fasta (fna) file that looks like this:
(actual file uploaded, the editor distorts everything i copy&paste; I hope that is allowed):
http://www.filedropper.com/firstreads
http://pastebin.com/NNXV09A7 <- smallReads
I know how to make a dictionary manually but i have no idea how i could combine it with reading the dictionary entries from a file.
I appreciate any help.
edit: manual dict from example above:
dict= {'ATTC': 'T', 'CATT': 'C'}
or using the constructor:
dict([('ATTC', 'T'), ('CATT', 'C')])
edit2:
Thanks to Byte Commander i was able to make the function by adding the parameters:
def makeSuffixDict (inputfile="smallReads.fna", n=15):
my_dict = {}
with open(inputfile) as file:
for line in file:
word = line.split()[-1]
my_dict[word[:-n]] = word[-n:]
return()
if __name__ == "__main__":
makeSuffixDict
for keys,values in my_dict.items():
print(keys)
print(values)
print('\n')
However, when I try to change the suffix length, i get the same result. How can i make the suffix length variable?
There is also one small issue and that's the "'': '5'" at the beginning. why is it there and can i make it not show up?
{'': '5', 'ATTG': 'T', 'GCCC': 'T', 'TTTT': 'T', 'AGTC': 'C'}
Similar to this when I try another file with 30000 reads instead of 5 every now and then numbers pop up and I have no clue where they come from.
Example:
CAAGATCTAATATGAATTACAGAGAGCTGTTCAGCAAATACTTGTTGCATCAATGGAATTACAGCAGTAACACATATATTGACCTGGAACCAGAATCATGTTCTGAATGCAGAAGTACGTACTTTCTTTTTCTTTCTTGAGAACGCTGGATCTTTTTTAAAATGTTAATTTGCAGTTTGAAGCTGTTTAGGTTAAAAAAAAAATACAAGAAGCAGCAGCAAAAGAGACC : A
2407 : 9
ATTCTTTCATACCATTAAATATTTATTTTTCAAAACTGATCTTAGTAGAGGCCTAGTACTGTCTCATATAAATATAGGATAATATATATAATAAATCCCCTGACATCAGACATTAAGGTTACTCCCAATTACTTATTATCTTTATATATATGTTAAAAATATGTGTGTATAATATGTAAGTAAACAATTTGCATAGTTTATATGTGGTAATATATGGTTAATATATAGG : C
# create an empty dictionary:
my_dict = {}
# open the file:
with open("file.fna") as file:
# read the file line by line:
for line in file:
# split at whitespace and take the last word:
word = line.split()[-1]
# add entry to dictionary: all but the last character -> key; last character -> value
my_dict[word[:-1]] = word[-1]
See this code running on ideone.com (reading from STDIN instead of file though...)
Update:
If you want variable length suffixes, replace the last line above with this, where n is the length of the suffix which becomes the dictionary value:
my_dict[word[:-n]] = word[-n:]
See this code running on ideone.com (reading from STDIN instead of file though...)
Update 2:
Your code as stated in the question has some problems with the indentation. Also, there are no braces after return, but you need them to call a function. You need to return the dictionary created in the function as well to work with it outside.
I also now parse the 3rd whitespace-separated word in each row instead of the last one. Lines with not exactly 3 words are ignored.
Here's my version:
def makeSuffixDict (inputfile="smallReads.fna", n=15):
my_dict = {}
with open(inputfile) as file:
for line in file:
words = line.split()
if len(words) == 3: # <-- replaced "last word" with "3rd word"
word = words[2]
my_dict[word[:-n]] = word[-n:]
return my_dict
if __name__ == "__main__":
my_dict = makeSuffixDict()
for key,value in my_dict.items():
print(key, value)

Resources