Modifying a nested dictionary - python-3.x

I'm using regex to look through a file to check for a username and whether their action caused an error or not (ERROR for error, INFO for successful action)
I am using a nested dictionary to keep track of actions, with the username as the primary key and the nested dictionary of how many INFO and ERROR lines they generate.
#!/usr/bin/env python3
import re
users = {}
with open('logfile.txt') as f:
for line in f:
regex_user = r"(INFO|ERROR): .* \((.+)\)$"
"""searches for users and if there message was info or error"""
user = re.serach(regex_user, line)
if user is None:
continue
name = user[2]
cat = user[1]
try:
# Method 1?
users[name][cat] = users[name].get(cat, 0) + 1
# Method 2?
users[name][cat] = users.get(name, {}).get(cat, 0) + 1
except KeyError:
print("Where are my keys?")
I am wondering which of the two methods (if any) are correctly modifying the dictionary to increase the count of the respective nested key.
Output should look like:
{'John': {'INFO': 22, 'ERROR': 3}}
if the log contained 22 lines of INFO and 3 lines of ERROR for user John.

You probably want to solve this with a nested collections.defaultdict or alternatively using setdefault() though the latter is not a clean in my opinion. collections.defaultdict() will allow you to reference a dictionary key that does not yet exist which is the crux of your issue.
Try something like:
import collections
import json
import random
users = collections.defaultdict(lambda: collections.defaultdict(int))
for i in range(100):
user = random.choice(["John", "Jane", "Sanjay"])
cat = random.choice(["INFO", "ERROR"])
users[user][cat] += 1
print(json.dumps(users, indent=2))
That will print out something like:
{
"Jane": {
"INFO": 16,
"ERROR": 19
},
"John": {
"ERROR": 21,
"INFO": 10
},
"Sanjay": {
"INFO": 18,
"ERROR": 16
}
}
though each run will be different.
As we noted by #Ryukashin, a defaultdict() does not print() as nicely as a dict(). So casting back is possilbe if it is really needed for some reason. The easiest (no not the most efficient) manner for that might be:
users = json.loads(json.dumps(users))

Related

Getting the data updated with pymongo while pprint does not work

I have the same issue about getting the data updated with pymongo
How to get the data updated with pymongo
But pprint() cannot solve my problem.
Environment: Python, MongoDB, Pymongo
I'm getting "x" as my recent update value, just like getting the "results" in previous link, that why I'm wondering the same solution isn't fit, my x is <pymongo.results.UpdateResult object at 0x00000216021EA080>
print(x)
print(type(x))
<pymongo.results.UpdateResult object at 0x00000216021EA080>
<class 'pymongo.results.UpdateResult'>
Here is the code:
import pymongo
import datetime
import json
from pprint import pprint
def init_db(ip, db, coll):
myclient = pymongo.MongoClient('mongodb://' + ip + '/')
mydb = myclient[db]
mycol = mydb[coll]
return mydb, mycol
def change_db_data(myquery_json, one_or_many_bool, newvalues_json ):
if one_or_many_bool == True:
x = mycol.products.update_many(myquery_json, newvalues_json)
else:
x = mycol.products.update_one(myquery_json, newvalues_json)
return x
ip_input = input("Enter the ip: ")
exist_DB_name = input("Enter exist DB name: ")
exist_coll_name = input("Enter exist collection name: ")
mydb, mycol = init_db(ip_input, exist_DB_name, exist_coll_name)
myquery_str = input("Enter ur query: ")
update_one_or_many = input("U are update one or many values? (ex:1 for many , 0 for one): ")
newvalues_str = input("Enter new values: ")
one_or_many_bool = bool(int(update_one_or_many))
myquery_json =json.loads(myquery_str)
newvalues_json =json.loads(newvalues_str)
x = change_db_data(myquery_json, one_or_many_bool, newvalues_json)
print(x)
print(type(x))
pprint(x)
pprint(type(x))
The output
Enter the ip: localhost:27017
Enter exist DB name: (practice_10_14)-0002
Enter exist collection name: collection_new02cp
Enter ur query: { "name": { "$regex": "^R" }, "Age" : { "$gt": 50 } }
U are update one or many values? (ex:1 for many , 0 for one): 1
Enter new values: { "$set": { "name": "Old Mr.R" }}
<pymongo.results.UpdateResult object at 0x00000216021EA080>
<class 'pymongo.results.UpdateResult'>
<pymongo.results.UpdateResult object at 0x00000216021EA080>
<class 'pymongo.results.UpdateResult'>
My print(x) and pprint(x) print the same value, which pprint() isn't work.
I want it print out the modify data
But pprint() cannot solve my problem.
The pprint() method is not the solution in the question that you linked. Both that question and the answer use pprint(), so that method is not the problem itself. Rather, the problem in that question is that they were attempting to pprint() the wrong thing. That's the same thing that is happening here.
The answer to that question was to print something other than the object returned directly from the method. Currently, PyMongo returns an UpdateResult object for those update methods. Python doesn't know how to print this object directly:
>>> print(x)
<pymongo.results.UpdateResult object at 0x7fe842511df0>
But you can print its properties:
>>> print(x.modified_count)
1
It also includes a raw_result property which might be what you are interested in:
>>> print(x.raw_result)
{'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}
This all works the exact same with pprint():
>>> pprint(x)
<pymongo.results.UpdateResult object at 0x7fe842511df0>
>>> pprint(x.modified_count)
1
>>> pprint(x.raw_result)
{'n': 1, 'nModified': 1, 'ok': 1.0, 'updatedExisting': True}
Finally, I think you may have copied the code from the other question too directly. Both of your update statements have the following:
... mycol.products.update ...
But products is the name of the collection from the other question. In that other question they are performing db.products.update where they are starting with the database (db) itself. But in your code you have already established the collection object as a variable (mycol = mydb[coll]). And you are attempting to use mycol when doing the update. So your updates should probably drop the reference to products and be something like:
... mycol.update ...

How to get the amount of subfield nodes in a xml file

I am trying to extract data from an xml file. I get the xml file by accessing a previously generated url to the api of the xml provider. Normally the datafields I need are only present once, but sometimes, the datafield node is present multiple times.
This is the code I use: (it's only a part of the code, so indenting might be a bit off)
from urllib.request import urlopen
import pandas as pd
import xml.etree.ElementTree as ET
with urlopen(str(row)) as response:
doc = ET.parse(response)
root = doc.getroot()
namespaces = {
"zs": "http://www.loc.gov/zing/srw/",
"": "http://www.loc.gov/MARC21/slim",
}
datafield_nodes_path = "./zs:records/zs:record/zs:recordData/record/datafield" # XPath
datafield_attribute_filters = [ #which fields to extract
{
"tag": "100", #author
"ind1": "1",
"ind2": " ",
}]
no_aut = True
for datafield_node in root.iterfind(datafield_nodes_path, namespaces=namespaces):
if any(datafield_node.get(k) != v for attr_dict in datafield_attribute_filters for k,v in attr_dict.items()):
continue
for subfield_node in datafield_node.iterfind("./subfield[#code='a']", namespaces=namespaces):
clean_aut.append(subfield_node.text) #this gets the author name
no_aut = False
if no_aut: clean_aut.append(None)
This works fine for 80% of the URLs I access, but the remaining 20% are either broken or have multiple subfield_nodes for the datafield_attribute_filter I'm searching.
Here's an example URL of multiple occurrences: example link
When this URL gets loaded into urlopen I get the Author nine times instead of once.
Is there a way to count the number of occurences and if the datafield_node is present more than once, to only take the first occuring datafield_node?
I have tried using findall from ET but got no usable results.
Any help is appreciated
Although it is not how I wanted to solve it, this did the trick:
append_author=0
no_aut = True
for datafield_node in root.iterfind(datafield_nodes_path, namespaces=namespaces):
if any(datafield_node.get(k) != v for attr_dict in datafield_attribute_filters for k,v in attr_dict.items()):
continue
if append_author ==0
for subfield_node in datafield_node.iterfind("./subfield[#code='a']", namespaces=namespaces):
clean_aut.append(subfield_node.text) #this gets the author name
no_aut = False
append_author+=1
as soon as the first field gets appended, the others get skipped

Ksql python library reading response of query error

I'm trying to read from ksql in python with this script:
from ksql import KSQLAPI
client = KSQLAPI('http://0.0.0.0:8088',)
columns = ['id INT','name VARCHAR']
client.create_stream(table_name='students', columns_type= columns, topic='students')
query = client.query("SELECT name FROM students WHERE id = 1 ")
for student in query:
print(student)
As a response from the library, I was expecting a sequence of objects, as the documentation says, printed by the generator. Instead, it returns me a string representing pieces of an array of objects, this:
[{"header":{"queryId":"transient_STUDENTS_5788262560238090205","schema":"`NAME` STRING"}},
{"row":{"columns":["Alex"]}},
]
It then throws a RuntimeError and a StopIteration
RuntimeError: generator raised StopIteration
So I handled the generator like this:
query = client.query("SELECT name FROM students WHERE id = 1 ")
next(query)
for student in query:
if student.strip() == ']': break
student = student.strip()[:-1]
student = json.loads(student)
print(student)
The question is, is there another way to run the query with the library and get another type of response? If not, how is the correct way to handle this generator response?

Generating dictionaries using a for loop

I am trying to build a Player class that keeps track of NFL player data. To hold weekly stats I thought I would use a dictionary (weekly_stats) that would be comprised of dictionaries that represent week 1-17 (i.e weekly_stats = {'week1': {'pass_attempts = 0', #more stats etc}, 'week2': {'pass_attempts = 0',}, # etc on to 'week17'}. There are a lot of stats and I may add some more another time so instead of copying and pasting that 17 times and manually incrementing the dictionary key, I tried to use a for loop:
def Player:
def __init__(self):
self.weekly_stats ={}
for i in range(1,17):
self.weekly_stats['week'+str(i)]: { # dict keys will be 'week1', 'week2', etc
'pass_attempts' : 0,
'completions' : 0,
#etc,
}
}
I am trying to use i to say self.weekly_stats['week1'] , self.weekly_stats['week2'], etc through each iteration.
When I create a Player object this code seems to run as an object is created and no error is thrown. However, when I try to access this weekly_stats dict:
print(players['tom_brady'].weekly_stats['week2']['pass_attempts'])
it returns KeyError: 'week2'. It seems like the dictionary keys are not being created? Can anyone help with this?
You should use the method 'update' to add a key to a dict
class Player:
def __init__(self):
self.weekly_stats = {}
for i in range(1, 17):
self.weekly_stats.update({"week" + str(i): {
"pass_attempts": 0,
"completions": 0,
}})
This article has more info on dict manipulation: https://www.geeksforgeeks.org/python-add-new-keys-to-a-dictionary/
I believe the problem is due to the fact that you are trying to access a key passing_attempts where the dictionary has a key pass_attempts instead.
The following is a suggested solution:
NUM_WEEKS = 16
class Player:
def __init__(self):
self.weekly_stats = {
f'week{i+1}': {
"pass_attempts": 0,
"completions": 0
} for i in range(NUM_WEEKS)
}
players = {'tom_brady': Player()}
print(players['tom_brady'].weekly_stats['week2']['pass_attempts'])

Python ldap3 print entries based on variable

I am new to Python, stupid question ahead
I need to compare the entries from a MySQL database with ldap. I created a dictionary to hold the corresponding values, when I try to loop through the dictionary and pass them to ldap3 entries to show the results it takes the variable as literal.
for x in values_dict:
value2=values_dict[x]
try:
ldap_conn.entries[0].value2
except Exception as error:
print(error)
else:
print(value2)
attribute 'value2' not found
If I replace value2 with 'sn' or any of the other attributes I have pulled it works fine. I have also played around with exec() but this returns nothing.
for x in values_dict:
value2=values_dict[x]
test1='ldap_conn.entries[0].{}'.format(value2)
try:
result1=exec(test1)
except Exception as error:
print(error)
else:
print(result1)
None
Any ideas?
EDIT 1 : As requested here are the values for values_dict. As stated previously the loop does parse these correctly, ldap does return the attributes, but when I try to use a variable to lookup the attributes from entries the variable is taken literally.
values_dict = {
"First_Name": "givenname",
"Middle_Name": "middlename",
"Last_Name": "sn",
"Preferred_Name": "extensionattribute2",
"Work_Location": "physicaldeliveryofficename",
"Work_City": "l",
"Work_State": "st",
"Work_Postal": "postalcode",
"Work_Email": "mail"
}
The syntax somevariable.someattr, which you are using here:
ldap_conn.entries[0].value2
Always means "access an attribute named someattr of somevariable". It's always interpreted as a literal string. If you need to dynamically access an attribute, use the getattr function:
getattr(ldap_conn.entries[0], value2)
You're not currently assigning that that result anywhere, so you probably want something like:
result1 = getattr(ldap_conn.entries[0], value2)

Resources