How to create nested dictionary from strings - python-3.x

I have a string related to a programs output, now I need to convert the string into a dictionary. I have tried it by using dict() and zip() commands but I am not able to fetch the results.
This is the code I have so far:
string = "Eth1/1 vlan-1 typemode-eth status:access eth1/2 vlan-1 type-eth status:access"
list1=string.split(' ')
print(list1)
['Eth1/1', 'vlan-1', 'typemode-access']
and further than this I have no idea:
{'eth1/1': {'Speed': '10Gb', 'Vlan': 1, 'Type Mode': 'eth', 'status': 'access'}, 'eth1/2': {'Speed': '10Gb', 'Vlan': 1, 'Type Mode': 'eth', 'status': 'access'}}

From your result to get a value see the following example. See inline comments.
import re
result = {}
string = "Eth1/1 vlan-1 typemode-eth status:access eth1/2 vlan-1 type-eth status:access"
a = re.search('access', string) # this gives 2 positions for the word access.
list1 = [string[0:a[0]], string[[a[0]+1]:]] # two substrings. a[0] is used to get
# roughly the middle of the string where the spplitpoint is of both
# substrings. Using access as key word gives flexibility if there is a third
# substring as well.
result = dict(list1) # result should be same as result2.
# y1 z1
result2 = {'eth1/1': {'Speed': '10Gb', 'Vlan': 1, 'Type Mode': 'eth', 'status': 'access'},
'eth1/2': {'Speed': '10Gb', 'Vlan': 1, 'Type Mode': 'eth', 'status': 'access'}}
# y2 = eth1/2.
# y1 y2
x = result['eth1/1']['Speed'] # replace any word at y1 or z1 to fetch another result.
print ('Got x : %s' % x) # this prints '10Gb'.
Basically what you've created is nested dictionaries. So addressing y1 first is enabling to get data from that particular dictionary. after y1 calling for z1 is getting the value from that particular key inside your first nested dictionary. If you change the keywords at x you get different different values back (regardless that it looks the same in your example; ttry with different values to see the result). Enjoy!

Try this code below:
string = "Eth1/1 vlan-1 typemode-eth status:access eth1/2 vlan-1 type-eth status:access eth1/3 vlan-1 type-eth status:access"
strList = string.split(" ")
indexPos = []
for data in range(0,len(strList)):
if strList[data].lower()[0:3] == 'eth':
print('Found',data)
indexPos.append(data)
dataDict = dict()
for i in range(0,len(indexPos)):
stringDict = dict()
stringDict['Speed'] = '10Gb'
if i is not len(indexPos)-1:
string = strList[indexPos[i]:indexPos[i+1]]
else:
string = strList[indexPos[i]:]
for i in range(0,len(string)):
if i is not 0:
if i is not 3:
valueSplit = string[i].split('-')
else:
print(i)
valueSplit = string[i].split(':')
stringDict[valueSplit[0]] = valueSplit[1]
dataDict[string[0]] = stringDict
I have written this code according the pattern in code. Please let me know if it work for you.

Related

How do I convert a text file into a dictionary

I would like to parse the following text file into a dictionary:
Train A
Travelled 150km
No longer in use
Stored in warehouse
Train B
Travelled 100km
Used by X company
Daily usage
Actively upgrading
The end result dictionary should have Train A and Train B as keys, and the rest of values as list of values:
{
'Train A': ['Travelled 150km', 'No longer in use', 'Stored in warehouse'],
'Train B': ['Travelled 100km', 'Used by X company', 'Daily usage', 'Actively upgrading']
}
I've currently tried
with open('file.txt') as f:
data = f.read().split('\n')
dict = {}
for i in data:
key = i[0]
value = i[1:]
d[key] = value
print(dict)
Really not too sure where im wrong. I want to split the \n after Train A, where Train A is Key and all the other information listed is the value
Your units are separated by blank lines - so, you should first split by two newlines, not by one. The following implementation is somewhat inefficient (it splits the same variable twice), but it works, and you can improve it if you want:
[{x.split("\n")[0]: x.split("\n")[1:]} for x in data.split("\n\n")]
#[{'Train A': ['Travelled 150km', 'No longer in use',
# 'Stored in warehouse']},
# {'Train B': ['Travelled 100km', 'Used by X company',
# 'Daily usage', 'Actively upgrading'}]
You are close. You need to split the files using empty line first ('\n\n'), then continue on with your idea.
with open('file.txt') as f:
data = f.read().split('\n\n') # <=== this is what's missing
print(data)
d = {}
for i in data:
i = i.split('\n')
key = i[0]
value = i[1:]
d[key] = value
print(d)

How can I set up a new column in python with a value based on the return of a function?

I am doing some text mining in python and want to set up a new column with the value 1 if the return of my search function is true and 0 if it's false.
I have tried various if statements, but cannot get anything to work.
A simplified version of what I'm doing is below:
import pandas as pd
import nltk
nltk.download('punkt')
df = pd.DataFrame (
{
'student number' : [1,2,3,4,5],
'answer' : [ 'Yes, she is correct.', 'Yes', 'no', 'north east', 'No its North East']
# I know there's an apostrophe missing
}
)
print(df)
# change all text to lower case
df['answer'] = df['answer'].str.lower()
# split the answer into individual words
df['text'] = df['answer'].apply(nltk.word_tokenize)
# Check if given words appear together in a list of sentence
def check(sentence, words):
res = []
for substring in sentence:
k = [ w for w in words if w in substring ]
if (len(k) == len(words) ):
res.append(substring)
return res
# Driver code
sentence = df['text']
words = ['no','north','east']
print(check(sentence, words))
This is what you want I think:
df['New'] = df['answer'].isin(words)*1
This one works for me:
for i in range(0, len(df)):
if set(words) <= set(df.text[i]):
df['NEW'][i] = 1
else:
df['NEW'][i] = 0
You don't need the function if you use this method.

i want to find a common character from n number string inside a single multidimensional list using python

def bibek():
test_list=[[]]
x=int(input("Enter the length of String elements using enter -: "))
for i in range(0,x):
a=str(input())
a=list(a)
test_list.append(a)
del(test_list[0]):
def filt(b):
d=['b','i','b']
if b in d:
return True
else:
return False
for t in test_list:
x=filter(filt,t)
for i in x:
print(i)
bibek()
suppose test_list=[['b','i','b'],['s','i','b'],['r','i','b']]
output should be ib since ib is common among all
an option is to use set and its methods:
test_list=[['b','i','b'],['s','i','b'],['r','i','b']]
common = set(test_list[0])
for item in test_list[1:]:
common.intersection_update(item)
print(common) # {'i', 'b'}
UPDATE: now that you have clarified your question i would to this:
from difflib import SequenceMatcher
test_list=[['b','i','b','b'],['s','i','b','b'],['r','i','b','b']]
# convert the list to simple strings
strgs = [''.join(item) for item in test_list]
common = strgs[0]
for item in strgs[1:]:
sm = SequenceMatcher(isjunk=None, a=item, b=common)
match = sm.find_longest_match(0, len(item), 0, len(common))
common = common[match.b:match.b+match.size]
print(common) # 'ibb'
the trick here is to use difflib.SequenceMatcher in order to get the longest string.
one more update after clarification of your question this time using collections.Counter:
from collections import Counter
strgs='app','bapp','sardipp', 'ppa'
common = Counter(strgs[0])
print(common)
for item in strgs[1:]:
c = Counter(item)
for key, number in common.items():
common[key] = min(number, c.get(key, 0))
print(common) # Counter({'p': 2, 'a': 1})
print(sorted(common.elements())) # ['a', 'p', 'p']

Python 3.x replace for loop with something faster

I am trying to produce a vector that represents the match of a string and a list's elements. I have made a function in python3.x:
def vector_build (docs, var):
vector = []
features = docs.split(' ')
for ngram in var:
if ngram in features:
vector.append(docs.count(ngram))
else:
vector.append(0)
return vector
It works fine:
vector_build ('hi my name is peter',['hi', 'name', 'are', 'is'])
Out: [1, 1, 0, 1]
But this function is not scalable to significant data. When its string parameter 'docs' is heavier than 190kb it takes more time that need. So I am trying to replace the for loop with map function like:
var = ['hi', 'name', 'are', 'is']
doc = 'hi my name is peter'
features = doc.split(' ')
vector = list(map(var,if ngram in var in features: vector.append(doc.count(ngram))))
But this return this error:
SyntaxError: invalid syntax
Is there a way to replace that for loop with map, lambda, itertools in order to make the execution faster?
You can use list comprehension for this task. Also, lookups in a set of features should help the function some as well.
var = ['hi', 'name', 'are', 'is']
doc = 'hi my name is peter'
features = doc.split(' ')
features_set = set(features) #faster lookups
vector = [doc.count(ngram) if ngram in features_set else 0 for ngram in var]
print(vector)

Assigning multiple values to dictionary keys from a file in Python 3

I'm fairly new to Python but I haven't found the answer to this particular problem.
I am writing a simple recommendation program and I need to have a dictionary where cuisine is a key and name of a restaurant is a value. There are a few instances where I have to split a string of a few cuisine names and make sure all other restaurants (values) which have the same cuisine get assigned to the same cuisine (key). Here's a part of a file:
Georgie Porgie
87%
$$$
Canadian, Pub Food
Queen St. Cafe
82%
$
Malaysian, Thai
Mexican Grill
85%
$$
Mexican
Deep Fried Everything
52%
$
Pub Food
so it's just the first and the last one with the same cuisine but there are more later in the file.
And here is my code:
def new(file):
file = "/.../Restaurants.txt"
d = {}
key = []
with open(file) as file:
lines = file.readlines()
for i in range(len(lines)):
if i % 5 == 0:
if "," not in lines[i + 3]:
d[lines[i + 3].strip()] = [lines[i].strip()]
else:
key += (lines[i + 3].strip().split(', '))
for j in key:
if j not in d:
d[j] = [lines[i].strip()]
else:
d[j].append(lines[i].strip())
return d
It gets all the keys and values printed but it doesn't assign two values to the same key where it should. Also, with this last 'else' statement, the second restaurant is assigned to the wrong key as a second value. This should not happen. I would appreciate any comments or help.
In the case when there is only one category you don't check if the key is in the dictionary. You should do this analogously as in the case of multiple categories and then it works fine.
I don't know why you have file as an argument when you have a file then overwritten.
Additionally you should make 'key' for each result, and not += (adding it to the existing 'key'
when you check if j is in dictionary, clean way is to check if j is in the keys (d.keys())
def new(file):
file = "/.../Restaurants.txt"
d = {}
key = []
with open(file) as file:
lines = file.readlines()
for i in range(len(lines)):
if i % 5 == 0:
if "," not in lines[i + 3]:
if lines[i + 3] not in d.keys():
d[lines[i + 3].strip()] = [lines[i].strip()]
else:
d[lines[i + 3]].append(lines[i].strip())
else:
key = (lines[i + 3].strip().split(', '))
for j in key:
if j not in d.keys():
d[j] = [lines[i].strip()]
else:
d[j].append(lines[i].strip())
return d
Normally, I find that if you use names for the dictionary keys, you may have an easier time handling them later.
In the example below, I return a series of dictionaries, one for each restaurant. I also wrap the functionality of processing the values in a method called add_value(), to keep the code more readable.
In my example, I'm using codecs to decode the value. Although not necessary, depending on the characters you are dealing with it may be useful. I'm also using itertools to read the file lines with an iterator. Again, not necessary depending on the case, but might be useful if you are dealing with really big files.
import copy, itertools, codecs
class RestaurantListParser(object):
file_name = "restaurants.txt"
base_item = {
"_type": "undefined",
"_fields": {
"name": "undefined",
"nationality": "undefined",
"rating": "undefined",
"pricing": "undefined",
}
}
def add_value(self, formatted_item, field_name, field_value):
if isinstance(field_value, basestring):
# handle encoding, strip, process the values as you need.
field_value = codecs.encode(field_value, 'utf-8').strip()
formatted_item["_fields"][field_name] = field_value
else:
print 'Error parsing field "%s", with value: %s' % (field_name, field_value)
def generator(self, file_name):
with open(file_name) as file:
while True:
lines = tuple(itertools.islice(file, 5))
if not lines: break
# Initialize our dictionary for this item
formatted_item = copy.deepcopy(self.base_item)
if "," not in lines[3]:
formatted_item['_type'] = lines[3].strip()
else:
formatted_item['_type'] = lines[3].split(',')[1].strip()
self.add_value(formatted_item, 'nationality', lines[3].split(',')[0])
self.add_value(formatted_item, 'name', lines[0])
self.add_value(formatted_item, 'rating', lines[1])
self.add_value(formatted_item, 'pricing', lines[2])
yield formatted_item
def split_by_type(self):
d = {}
for restaurant in self.generator(self.file_name):
if restaurant['_type'] not in d:
d[restaurant['_type']] = [restaurant['_fields']]
else:
d[restaurant['_type']] += [restaurant['_fields']]
return d
Then, if you run:
p = RestaurantListParser()
print p.split_by_type()
You should get:
{
'Mexican': [{
'name': 'Mexican Grill',
'nationality': 'undefined',
'pricing': '$$',
'rating': '85%'
}],
'Pub Food': [{
'name': 'Georgie Porgie',
'nationality': 'Canadian',
'pricing': '$$$',
'rating': '87%'
}, {
'name': 'Deep Fried Everything',
'nationality': 'undefined',
'pricing': '$',
'rating': '52%'
}],
'Thai': [{
'name': 'Queen St. Cafe',
'nationality': 'Malaysian',
'pricing': '$',
'rating': '82%'
}]
}
Your solution is simple, so it's ok. I'd just like to mention a couple of ideas that come to mind when I think about this kind of problem.
Here's another take, using defaultdict and split to simplify things.
from collections import defaultdict
record_keys = ['name', 'rating', 'price', 'cuisine']
def load(file):
with open(file) as file:
data = file.read()
restaurants = []
# chop up input on each blank line (2 newlines in a row)
for record in data.split("\n\n"):
fields = record.split("\n")
# build a dictionary by zipping together the fixed set
# of field names and the values from this particular record
restaurant = dict(zip(record_keys, fields))
# split chops apart the type cuisine on comma, then _.strip()
# removes any leading/trailing whitespace on each type of cuisine
restaurant['cuisine'] = [_.strip() for _ in restaurant['cuisine'].split(",")]
restaurants.append(restaurant)
return restaurants
def build_index(database, key, value):
index = defaultdict(set)
for record in database:
for v in record.get(key, []):
# defaultdict will create a set if one is not present or add to it if one does
index[v].add(record[value])
return index
restaurant_db = load('/var/tmp/r')
print(restaurant_db)
by_type = build_index(restaurant_db, 'cuisine', 'name')
print(by_type)

Resources