Converting string to dictionary from a opened file - python-3.x

A text file contains dictionary as below
{
"A":"AB","B":"BA"
}
Below are code of python file
with open('devices_file') as d:
print (d["A"])
Result should print AB.

As #rassar and #Ivrf suggested in comments you can use ast.literal_eval() as well as json.loads() to achieve this. Both code snippets outputs AB.
Solution with ast.literal_eval():
import ast
with open("devices_file", "r") as d:
content = d.read()
result = ast.literal_eval(content)
print(result["A"])
Solution with json.loads():
import json
with open("devices_file") as d:
content = json.load(d)
print(content["A"])
Python documentation about ast.eval_literal() and json.load().
Also: I noticed that you're not using the correct syntax in the code snippet in your question. Indented lines should be indented with 4 spaces, and between the print keyword and the associated parentheses there's no whitespace allowed.

Related

I'm looking for a way to extract strings from a text file using specific criterias

I have a text file containing random strings. I want to use specific criterias to extract the strings that match these criterias.
Example text :
B311-SG-1700-ASJND83-ANSDN762
BAKSJD873-JAN-1293
Example criteria :
All the strings that contains characters seperated by hyphens this way : XXX-XX-XXXX
Output : 'B311-SG-1700'
I tried creating a function but I can't seem to know how to use criterias for string specifically and how to apply them.
Based on your comment here is a python script that might do what you want (I'm not that familiar with python).
import re
p = re.compile(r'\b(.{4}-.{2}-.{4})')
results = p.findall('B111-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293\nB211-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293 B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293')
print(results)
Output:
['B111-SG-1700', 'B211-SG-1700', 'B311-SG-1700']
You can read a file as a string like this
text_file = open("file.txt", "r")
data = text_file.read()
And use findall over that. Depending on the size of the file it might require a bit more work (e.g. reading line by line for example
You can use re module to extract the pattern from text:
import re
text = """\
B311-SG-1700-ASJND83-ANSDN762 BAKSJD873-JAN-1293
BAKSJD873-JAN-1293 B312-SG-1700-ASJND83-ANSDN762"""
for m in re.findall(r"\b.{4}-.{2}-.{4}", text):
print(m)
Prints:
B311-SG-1700
B312-SG-1700

Python:Comparing the strings in two files and printing the match does not works properly

I am trying to compare the strings of file "formatted_words.txt" with another customised file "dictionary.txt" and in the output I am trying to print those words from "formatted_words.txt"formatted_words file which are present in file "dictionary.txt"dictionary file.
from itertools import izip
with open("formatted_words.txt") as words_file:
with open("dictionary.txt") as dict_file:
all_strings = list(map(str.strip,dict_file))
for word in words_file:
for a_string in all_strings:
if word in a_string:
print a_string
Nevertheless, in the output, all the words of the file "formatted_words.txt" are getting printed, though many words from this file are not in the "dictionary.txt".I cannot use any builtin python dictionary.Any help would be appreciated.
Using sets:
with open('formatted_words.txt') as words_file:
with open('dictionary.txt') as dict_file:
all_strings = set(map(str.strip, dict_file))
words = set(map(str.strip, words_file))
for word in all_strings.intersection(words):
print(word)
Prints nothing because the intersection is empty

The output values in one line.(python3/csv.write)

I write a list of dics into a csv file. But the output is in one line. How could witer each value in new lines?
f = open(os.getcwd() + '/friend1.csv','w+',newline='')
for Member in MemberList:
f.write(str(Member))
f.close()
Take a look at the writing example in the csv module of the standard library and this question. Either that, or simply append a newline ("\n") after each write: f.write(str(Member)) + "\n").

a bytes-like object is required, not 'str': typeerror in compressed file

I am finding substring in compressed file using following python script. I am getting "TypeError: a bytes-like object is required, not 'str'". Please any one help me in fixing this.
from re import *
import re
import gzip
import sys
import io
import os
seq={}
with open(sys.argv[1],'r') as fh:
for line1 in fh:
a=line1.split("\t")
seq[a[0]]=a[1]
abcd="AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG"
print(a[0],"\t",seq[a[0]])
count={}
with gzip.open(sys.argv[2]) as gz_file:
with io.BufferedReader(gz_file) as f:
for line in f:
for b in seq:
if abcd in line:
count[b] +=1
for c in count:
print(c,"\t",count[c])
fh.close()
gz_file.close()
f.close()
and input files are
TruSeq2_SE AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
the second file is compressed text file. The line "if abcd in line:" shows the error.
The "BufferedReader" class gives you bytestrings, not text strings - you can directly compare both objects in Python3 -
Since these strings just use a few ASCII characters and are not actually text, you can work all the way along with byte strings for your code.
So, whenever you "open" a file (not gzip.open), open it in binary mode (i.e.
open(sys.argv[1],'rb') instead of 'r' to open the file)
And also prefix your hardcoded string with a b so that Python uses a binary string inernally: abcd=b"AGATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG" - this will avoid a similar error on your if abcd in line - though the error message should be different than the one you presented.
Alternativally, use everything as text - this can give you more methods to work with the strings (Python3's byte strigns are somewhat crippled) presentation of data when printing, and should not be much slower - in that case, instead of the changes suggested above, include an extra line to decode the line fetched from your data-file:
with io.BufferedReader(gz_file) as f:
for line in f:
line = line.decode("latin1")
for b in seq:
(Besides the error, your progam logic seens to be a bit faulty, as you don't actually use a variable string in your innermost comparison - just the fixed bcd value - but I suppose you can fix taht once you get rid of the errors)

Python code to read first 14 characters, uniquefy based on them, and parse duplicates

I have a list of more than 10k os string that look like different versions of this (HN5ML6A02FL4UI_3 [14 numbers or letters_1-6]), where some are duplicates except for the _1 to _6.
I am trying to find a way to list these and remove the duplicate 14 character (that comes before the _1-_6).
Example of part of the list:
HN5ML6A02FL4UI_3
HN5ML6A02FL4UI_1
HN5ML6A01BDVDN_6
HN5ML6A01BDVDN_1
HN5ML6A02GVTSV_3
HN5ML6A01CUDA2_1
HN5ML6A01CUDA2_5
HN5ML6A02JPGQ9_5
HN5ML6A02JI8VU_1
HN5ML6A01AJOJU_5
I have tried versions of scripts using Reg Expressions: var n = /\d+/.exec(info)[0]; into the following that were posted into my previous question. and
I also used a modified version of the code from : How can I strip the first 14 characters in an list element using python?
More recently I used this script and I am still not getting the correct output.
import os, re
def trunclist('rhodopsins_play', 'hope4'):
with open('rhodopsins_play','r') as f:
newlist=[]
trunclist=[]
for line in f:
if line.strip().split('_')[0] not in trunclist:
newlist.append(line)
trunclist.append(line.split('_')[0])
print newlist, trunclist
# write newlist to file, with carriage returns
with open('hope4','w') as out:
for line in newlist:
out.write(line)
My inputfile.txt contains more than 10k of data which looks like the list above, where the important part are the characters are in front of the '_' (underscore), then outputting a file of the uniquified ABCD12356_1.
Can someone help?
Thank you for your help
Import python and run this script that is similar to the above. It is slitting at the '_' This worked on the file
def trunclist(inputfile, outputfile):
with open(inputfile,'r') as f:
newlist=[]
trunclist=[]
for line in f:
if line.strip().split('_')[0] not in trunclist:
newlist.append(line)
trunclist.append(line.split('_')[0])
print newlist, trunclist
# write newlist to file, with carriage returns
with open(outputfile,'w') as out:
for line in newlist:
out.write(line)

Resources