Pytest changes the behaviour of a generator recursive function - python-3.x

This is my code:
where 'schema' and 'config' are heavily nested dictionaries, this function work as intended when run separately.
def find_config_difference(schema, config, key_name=None):
for k in schema:
try:
if "default" in schema[k] and k in config:
if schema[k]["default"] != config[k]:
out_of_date_default.append(f"key {k} in {key_name} has out of date default value.")
schema[k] = schema[k]["default"]
except TypeError:
pass
finally:
if k in config:
if isinstance((schema[k] and config[k]), dict):
find_config_difference(schema[k], config[k], key_name=k)
else:
yield f"{k} in {key_name} is missing/extra in config"
output = list(find_config_difference(schema, config))
This is my pytest code :
when run through pytest it does not call the function recursively and 'for' loop only goes through the outermost keys of 'schema'.
import unittest
from unittest.mock import patch
from config_compare import *
import ast
config = ast.literal_eval(open('config.json', 'r').read())
schema = ast.literal_eval(open('config_schema.json', 'r').read())
class Test_config_compare(unittest.TestCase):
def test_find_config_difference(self):
missing = list(find_config_difference(schema, config, key_name=None))
length = len(missing)
print(length)
Here is a section of 'schema', which is almost similar to 'config'.
it would only iterate through keys($schema, additionalProperties, definitions, button_content)
schema = {
"$schema": "http://json-schema.org/draft-07/schema#",
"additionalProperties": false,
"definitions": {
"color_hex": {"pattern": "^#([A-Fa-f0-9]{6})$", "type": "string"},
"palette": {
"additionalProperties": false,
"properties": {
"background": {"$ref": "#/definitions/color_hex", "default": "#FFFFFF"},
"primary": {
"rules": {"testTR": 12, "tier": 2},
},
},
},
},
"button_content": {
"additionalProperties": false,
"properties": {
"accessibilityLabel": {"type": "string"},
},
"required": ["accessibilityLabel", "value"],
"type": "object",
},
}
When I removed the yield generator it works fine. But, I also don't want to remove it and store the string in a variable. So is there a way to work around it?

Your recursive function cannot possibly work. It's a generator function, but when you make the recursive call, you're not iterating over the returned generator object. You probably want to yield from the recursive call:
finally:
if k in config:
if isinstance((schema[k] and config[k]), dict):
yield from find_config_difference(schema[k], config[k], key_name=k)
If you want to be selective about which values yielded by the recursion get yielded, you could write your own for loop iterating on the recursive result with whatever logic you want in it, yielding or not as you desire.
All that said, the fact that your function is both a generator that yields values, and it has side effects (such as updating out_of_date_default and some of the schema[k] values) seems like a dubious design. You should probably make your function only do one of those things (either modify things in place, or yield new values, not both).

Related

How to mock Athena query results values with Moto3 for a specific table?

I am using pytest and moto3 to test some code similar to this:
response = athena_client.start_query_execution(
QueryString='SELECT * FROM xyz',
QueryExecutionContext={'Database': myDb},
ResultConfiguration={'OutputLocation': someLocation},
WorkGroup=myWG
)
execution_id = response['QueryExecutionId']
if response['QueryExecution']['Status']['State'] == 'SUCCEEDED':
response = athena_client.get_query_results(
QueryExecutionId=execution_id
)
results = response['ResultSet']['Rows']
...etc
In my test I need that the values from results = response['ResultSet']['Rows'] are controlled by the test. I am using some code like this:
backend = athena_backends[DEFAULT_ACCOUNT_ID]["us-east-1"]
rows = [{"Data": [{"VarCharValue": "xyz"}]}, {"Data": [{"VarCharValue": ...}, etc]}]
column_info = [
{
"CatalogName": "string",
"SchemaName": "string",
"TableName": "xyz",
"Name": "string",
"Label": "string",
"Type": "string",
"Precision": 123,
"Scale": 123,
"Nullable": "NOT_NULL",
"CaseSensitive": True,
}
]
results = QueryResults(rows=rows, column_info=column_info)
backend.query_results[NEEDED_QUERY_EXECUTION_ID] = results
but that is not working as I guess NEEDED_QUERY_EXECUTION_ID is not known before from the test. How can I control it?
UPDATE
Based on suggestion I tried to use:
results = QueryResults(rows=rows, column_info=column_info)
d = defaultdict(lambda: results.to_dict())
backend.query_results = d
to force a return of values, but it seems not working as from the moto3's models.AthenaBackend.get_query_results, I have this code:
results = (
self.query_results[exec_id]
if exec_id in self.query_results
else QueryResults(rows=[], column_info=[])
)
return results
which will fail as the if condition won't be satifsfied.
Extending the solution of the defaultdict, you could create a custom dictionary that contains all execution_ids, and always returns the same object:
class QueryDict(dict):
def __contains__(self, item):
return True
def __getitem__(self, item):
rows = [{"Data": [{"VarCharValue": "xyz"}]}, {"Data": [{"VarCharValue": "..."}]}]
column_info = [
{
"CatalogName": "string",
"SchemaName": "string",
"TableName": "xyz",
"Name": "string",
"Label": "string",
"Type": "string",
"Precision": 123,
"Scale": 123,
"Nullable": "NOT_NULL",
"CaseSensitive": True,
}
]
return QueryResults(rows=rows, column_info=column_info)
backend = athena_backends[DEFAULT_ACCOUNT_ID]["us-east-1"]
backend.query_results = QueryDict()
An alternative solution to using custom dictionaries would to be seed Moto.
Seeding Moto ensures that it will always generate the same 'random' identifiers, which means you always know what the value of NEEDED_QUERY_EXECUTION_ID is going to be.
backend = athena_backends[DEFAULT_ACCOUNT_ID]["us-east-1"]
rows = [{"Data": [{"VarCharValue": "xyz"}]}, {"Data": [{"VarCharValue": "..."}]}]
column_info = [...]
results = QueryResults(rows=rows, column_info=column_info)
backend.query_results["bdd640fb-0667-4ad1-9c80-317fa3b1799d"] = results
import requests
requests.post("http://motoapi.amazonaws.com/moto-api/seed?a=42")
# Test - the execution id will always be the same because we just seeded Moto
execution_id = athena_client.start_query_execution(...)
Documentation on seeding Moto can be found here: http://docs.getmoto.org/en/latest/docs/configuration/recorder/index.html#deterministic-identifiers
(It only talks about seeding Moto in the context of recording/replaying requests, but the functionality can be used on it's own.)

Groovy: How do iterate through a map to create a new map with values baed on a specific condition

I am in no way an expert with groovy so please don't hold that against me.
I have JSON that looks like this:
{
"metrics": [
{
"name": "metric_a",
"help": "This tracks your A stuff.",
"type": "GAUGE",
"labels": [
"pool"
],
"unit": "",
"aggregates": [],
"meta": [
{
"category": "CAT A",
"deployment": "environment-a"
}
],
"additional_notes": "Some stuff (potentially)"
},
...
]
...
}
I'm using it as a source for automated documentation of all the metrics. So, I'm iterating through it in various ways to get the information I need. So far so good, I'm most of the way there. The problem is this all needs to be organized per the deployment environment. Meaning, multiple metrics will share the same value for deployment.
My thought was I could create a map with deployment as the key and the metric name for any metric that has a matching deployment as the value. Once I have that map, it should be easy for me to organize things the way they should be. I can't figure out how to do that. The result is all the metric names are added which is expected since I'm not doing anything to filter them out. I was thinking that groupBy would make sense here but I can't figure out how to use it effectively and frankly I'm not sure it will solve my problem by itself. Here is my code so far:
parentChild = [:]
children = []
metrics.each { metric ->
def metricName = metric.name
def depName = metric.meta.findResult{ it.deployment }
children.add(metricName)
parentChild.put(depName, children)
}
What is the best way to create a new map where the values for each key are based off a specific condition?
EDIT: The desired result would be each key in the resulting map would be a unique deployment value from all the metrics (as a string). Each value would be name of each metric that contains that deployment (as an array).
[environment-a:
[metric_a,metric_b,metric_c,...],
environment-b:
[metric_d,metric_e,metric_f,...]
...]
I would use a combo of withDefault() to pre-fill each map-entry value with a fresh TreeSet-instance (sorted no-duplicates set) and standard inject().
I reduced your sample data to the bare minimum and added some new nodes:
import groovy.json.*
String input = '''\
{
  "metrics": [
{
"name": "metric_a",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_b",
"meta": [
{
"deployment": "environment-a"
}
]
},
{
"name": "metric_c",
"meta": [
{
"deployment": "environment-a"
},
{
"deployment": "environment-b"
}
]
},
{
"name": "metric_d",
"meta": [
{
"deployment": "environment-b"
}
]
}
  ]
}'''
def json = new JsonSlurper().parseText input
def groupedByDeployment = json.metrics.inject( [:].withDefault{ new TreeSet() } ){ res, metric ->
  metric.meta.each{ res[ it.deployment ] << metric.name }
res
}
assert groupedByDeployment.toString() == '[environment-a:[metric_a, metric_b, metric_c], environment-b:[metric_c, metric_d]]'
If your metrics.meta array is supposed to have a single value, you can simplify the code by replacing the line:
metric.meta.each{ res[ it.deployment ] << metric.name }
with
res[ metric.meta.first().deployment ] << metric.name

umongo, pymongo, python 3, how do i load data from reference field(s)

I'm trying to understand how and why it's so hard to load my referenced data, in unmongo/pymongo
#instance.register
class MyEntity(Document):
account = fields.ReferenceField('Account', required=True)
date = fields.DateTimeField(
default=lambda: datetime.utcnow(),
allow_none=False
)
positions = fields.ListField(fields.ReferenceField('Position'))
targets = fields.ListField(fields.ReferenceField('Target'))
class Meta:
collection = db.myentity
when i retrieve this with:
def find_all(self):
items = self._repo.find_all(
{
'user_id': self._user_id
}
)
return items
and then dump it like so:
from bson.json_util import dumps
all_items = []
for item in all_items:
all_items.append(item.dump())
return dumps(all_items)
i get the following JSON object:
[
{
"account": "5e990db75f22b6b45d3ce814",
"positions": [
"5e9a594373e07613b358bdbb",
"5e9a594373e07613b358bdbe",
"5e9a594373e07613b358bdc1"
],
"date": "2020-04-18T01:34:59.919000+00:00",
"id": "5e9a594373e07613b358bdcb",
"targets": [
"5e9a594373e07613b358bdc4",
"5e9a594373e07613b358bdc7",
"5e9a594373e07613b358bdca"
]
}
]
and without dump
<object Document models.myentity.schema.MyEntity({
'targets':
<object umongo.data_objects.List([
<object umongo.frameworks.pymongo.PyMongoReference(
document=Target,
pk=ObjectId('5e9a594373e07613b358bdc4')
)>,
<object umongo.frameworks.pymongo.PyMongoReference(
document=Target,
pk=ObjectId('5e9a594373e07613b358bdc7')
)>,
<object umongo.frameworks.pymongo.PyMongoReference(
document=Target,
pk=ObjectId('5e9a594373e07613b358bdca'))>]
)>,
'id': ObjectId('5e9a594373e07613b358bdcb'),
'positions':
<object umongo.data_objects.List([
<object umongo.frameworks.pymongo.PyMongoReference(
document=Position,
pk=ObjectId('5e9a594373e07613b358bdbb')
)>,
<object umongo.frameworks.pymongo.PyMongoReference(
document=Position,
pk=ObjectId('5e9a594373e07613b358bdbe'))>,
<object umongo.frameworks.pymongo.PyMongoReference(
document=Position,
pk=ObjectId('5e9a594373e07613b358bdc1'))>])>,
'date': datetime.datetime(2020, 4, 18, 1, 34, 59, 919000),
'account': <object umongo.frameworks.pymongo.PyMongoReference(document=Account, pk=ObjectId('5e990db75f22b6b45d3ce814'))>
})>
I'm really struggling on how to dereference this. I'd like, recursively that all loaded fields, if i specify them it in umongo schema, are dereferenced. Is this not in the umongo API?
i.e. what if there's a reference field in 'target' as well? I understand this can be expensive on the DB, but is there some way to specify this on the schema definition itself? i.e. in meta class, that i always want the full, dereferenced object for a particular field?
the fact that i'm finding very little documentation / commentary on this, that it's not even mentioned in the umongo docs, and some solutions for other ODMs i've found (like mongoengine) are painfully writing out recursive, manual functions per field / per query. This suggests to me there's a reason this is not a popular question. Might be an anti-pattern? if so, why?
I'm not that new to mongo, but new to python / mongo. I feel like I'm missing something fundamental here.
EDIT: so right after posting, i did find this issue:
https://github.com/Scille/umongo/issues/42
which provides a way forward
is this still the best approach? Still trying to understand why this is treated like an edge case.
EDIT 2: progress
class MyEntity(Document):
account = fields.ReferenceField('Account', required=True, dump=lambda: 'fetch_account')
date = fields.DateTimeField(
default=lambda: datetime.utcnow(),
allow_none=False
)
#trade = fields.DictField()
positions = fields.ListField(fields.ReferenceField('Position'))
targets = fields.ListField(fields.ReferenceField('Target'))
class Meta:
collection = db.trade
#property
def fetch_account(self):
return self.account.fetch()
so with the newly defined property decorator, i can do:
items = MyEntityService().find_all()
allItems = []
for item in allItems:
account = item.fetch_account
log(account.dump())
allItems.append(item.dump())
When I dump account, all is good. But I don't want to explicitly/manually have to do this. It still means I have to recursively unpack and then repack each referenced doc, and any child references, each time I make a query. It also means the schema SOT is no longer contained just in the umongo class, i.e. if a field changes, I'll have to refactor every query that uses that field.
I'm still looking for a way to decorate/flag this on the schema itself.
e.g.
account = fields.ReferenceField('Account', required=True, dump=lambda: 'fetch_account')
dump=lambda: 'fetch_account' i just made up, it doesn't do anything, but that's more or less the pattern i'm going for, not sure if this is possible (or even smart: other direction, pointers to why i'm totally wrong in my approach are welcome) ....
EDIT 3:
so this is where i've landed:
#property
def fetch_account(self):
return self.account.fetch().dump()
#property
def fetch_targets(self):
targets_list = []
for target in self.targets:
doc = target.fetch().dump()
targets_list.append(doc)
return targets_list
#property
def fetch_positions(self):
positions_list = []
for position in self.positions:
doc = position.fetch().dump()
positions_list.append(doc)
return positions_list
and then to access:
allItems = []
for item in items:
account = item.fetch_account
positions = item.fetch_positions
targets = item.fetch_targets
item = item.dump()
item['account'] = account
item['positions'] = positions
item['targets'] = targets
# del item['targets']
allTrades.append(item)
I could clean it up/abstract it some, but i don't see how i could really reduce the general verbosity at at this point. It does seem to be give me the result i'm looking for though:
[
{
"date": "2020-04-18T01:34:59.919000+00:00",
"targets": [
{
"con_id": 331641614,
"value": 106,
"date": "2020-04-18T01:34:59.834000+00:00",
"account": "5e990db75f22b6b45d3ce814",
"id": "5e9a594373e07613b358bdc4"
},
{
"con_id": 303019419,
"value": 0,
"date": "2020-04-18T01:34:59.867000+00:00",
"account": "5e990db75f22b6b45d3ce814",
"id": "5e9a594373e07613b358bdc7"
},
{
"con_id": 15547841,
"value": 9,
"date": "2020-04-18T01:34:59.912000+00:00",
"account": "5e990db75f22b6b45d3ce814",
"id": "5e9a594373e07613b358bdca"
}
],
"account": {
"user_name": "hello",
"account_type": "LIVE",
"id": "5e990db75f22b6b45d3ce814",
"user_id": "U3621607"
},
"positions": [
{
"con_id": 331641614,
"value": 104,
"date": "2020-04-18T01:34:59.728000+00:00",
"account": "5e990db75f22b6b45d3ce814",
"id": "5e9a594373e07613b358bdbb"
},
{
"con_id": 303019419,
"value": 0,
"date": "2020-04-18T01:34:59.764000+00:00",
"account": "5e990db75f22b6b45d3ce814",
"id": "5e9a594373e07613b358bdbe"
},
{
"con_id": 15547841,
"value": 8,
"date": "2020-04-18T01:34:59.797000+00:00",
"account": "5e990db75f22b6b45d3ce814",
"id": "5e9a594373e07613b358bdc1"
}
],
"id": "5e9a594373e07613b35
8bdcb"
}
]
It seems like this is a design choice in umongo.
In Mongoid for example (the Ruby ODM for MongoDB), when an object is referenced it is fetched from the database automatically through associations as needed.
As an aside, in an ODM the features of "define a field structure" and "seamlessly access data through application objects" are quite separate. For example, my experience with Hibernate in Java suggests it is similar to what you are discovering with umongo - once the data is loaded, it provides a way of accessing the data using application-defined field structure with types etc., but it doesn't really help with loading the data from application domain transparently.

SQLAlchemy / jsonpatch - how to make patch paths case-insensitive?

I've been trying to find some documentation for jsonpatch==1.16 on how to make PATCH paths case-insensitive. The problem is that:
PATCH /users/123
[
{"op": "add", "path": "/firstname", "value": "Spammer"}
]
Seems to mandate that the DB (MySQL / MariaDB) column is also exactly firstname and not for example Firstname or FirstName. When I change the path in the JSON to /FirstName, which is what the DB column is, then the patch works just fine. But I'm not sure if you are supposed to use CamelCase in the JSON in this case? It seems a bit non-standard.
How can I make jsonpatch at least case-insensitive? Or alternatively, is there some way to insert some mapping in the middle, for example like this:
def users_mapping(self, path):
select = {
"/firstname": "FirstName",
"/lastname": "last_name", # Just an example
}
return select.get(path, None)
Using Python 3.5, SQLAlchemy 1.1.13 and Flask-SQLAlchemy 2.2
Well, the answer is: yes, you can add mapping. Here's my implementation with some annotations:
The endpoint handler (eg. PATCH /news/123):
def patch(self, news_id):
"""Change an existing News item partially using an instruction-based JSON,
as defined by: https://tools.ietf.org/html/rfc6902
"""
news_item = News.query.get_or_404(news_id)
self.patch_item(news_item, request.get_json())
db.session.commit()
# asdict() comes from dictalchemy method make_class_dictable(news)
return make_response(jsonify(news_item.asdict()), 200)
The method it calls:
# news = the db.Model for News, from SQLAlchemy
# patchdata = the JSON from the request, like this:
# [{"op": "add", "path": "/title", "value": "Example"}]
def patch_item(self, news, patchdata, **kwargs):
# Map the values to DB column names
mapped_patchdata = []
for p in patchdata:
# Replace eg. /title with /Title
p = self.patch_mapping(p)
mapped_patchdata.append(p)
# This follows the normal JsonPatch procedure
data = news.asdict(exclude_pk=True, **kwargs)
# The only difference is that I pass the mapped version of the list
patch = JsonPatch(mapped_patchdata)
data = patch.apply(data)
news.fromdict(data)
And the mapping implementation:
def patch_mapping(self, patch):
"""This is used to map a patch "path" or "from" to a custom value.
Useful for when the patch path/from is not the same as the DB column name.
Eg.
PATCH /news/123
[{ "op": "move", "from": "/title", "path": "/author" }]
If the News column is "Title", having "/title" would fail to patch
because the case does not match. So the mapping converts this:
{ "op": "move", "from": "/title", "path": "/author" }
To this:
{ "op": "move", "from": "/Title", "path": "/Author" }
"""
# You can define arbitrary column names here.
# As long as the DB column is identical, the patch will work just fine.
mapping = {
"/title": "/Title",
"/contents": "/Contents",
"/author": "/Author"
}
mutable = deepcopy(patch)
for prop in patch:
if prop == "path" or prop == "from":
mutable[prop] = mapping.get(patch[prop], None)
return mutable

get this JSON representation of your neo4j objects

I want to get data from thisarray of json object :
[
{
"outgoing_relationships": "http://myserver:7474/db/data/node/4/relationships/out",
"data": {
"family": "3",
"batch": "/var/www/utils/batches/d32740d8-b4ad-49c7-8ec8-0d54fcb7d239.resync",
"name": "rahul",
"command": "add",
"type": "document"
},
"traverse": "http://myserver:7474/db/data/node/4/traverse/{returnType}",
"all_typed_relationships": "http://myserver:7474/db/data/node/4/relationships/all/{-list|&|types}",
"property": "http://myserver:7474/db/data/node/4/properties/{key}",
"self": "http://myserver:7474/db/data/node/4",
"properties": "http://myserver:7474/db/data/node/4/properties",
"outgoing_typed_relationships": "http://myserver:7474/db/data/node/4/relationships/out/{-list|&|types}",
"incoming_relationships": "http://myserver:7474/db/data/node/4/relationships/in",
"extensions": {},
"create_relationship": "http://myserver:7474/db/data/node/4/relationships",
"paged_traverse": "http://myserver:7474/db/data/node/4/paged/traverse/{returnType}{?pageSize,leaseTime}",
"all_relationships": "http://myserver:7474/db/data/node/4/relationships/all",
"incoming_typed_relationships": "http://myserver:7474/db/data/node/4/relationships/in/{-list|&|types}"
}
]
what i tried is :
def messages=[];
for ( i in families) {
messages?.add(i);
}
how i can get familes.data.name in message array .
Here is what i tried :
def messages=[];
for ( i in families) {
def map = new groovy.json.JsonSlurper().parseText(i);
def msg=map*.data.name;
messages?.add(i);
}
return messages;
and get this error :
javax.script.ScriptException: groovy.lang.MissingMethodException: No signature of method: groovy.json.JsonSlurper.parseText() is applicable for argument types: (com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex) values: [v[4]]\nPossible solutions: parseText(java.lang.String), parse(java.io.Reader)
Or use Groovy's native JSON parsing:
def families = new groovy.json.JsonSlurper().parseText( jsonAsString )
def messages = families*.data.name
Since you edited the question to give us the information we needed, you can try:
def messages=[];
families.each { i ->
def map = new groovy.json.JsonSlurper().parseText( i.toString() )
messages.addAll( map*.data.name )
}
messages
Though it should be said that the toString() method in com.tinkerpop.blueprints.pgm.impls.neo4j.Neo4jVertex makes no guarantees to be valid JSON... You should probably be using the getProperty( name ) function of Neo4jVertex rather than relying on a side-effect of toString()
What are you doing to generate the first bit of text (which you state is JSON and make no mention of how it's created)
Use JSON-lib.
GJson.enhanceClasses()
def families = json_string as JSONArray
def messages = families.collect {it.data.name}
If you are using Groovy 1.8, you don't need JSON-lib anymore as a JsonSlurper is included in the GDK.
import groovy.json.JsonSlurper
def families = new JsonSlurper().parseText(json_string)
def messages = families.collect { it.data.name }

Resources