I'm trying to flattern this json into a pandas dataframe, but it's getting the better of me.
[{
'contact': {
'id': 101,
'email': 'email1#address.com',
},
'marketingPreference': [{
'marketingId': 1093,
'isOptedIn': True,
'dateModifed': '2022-05-10T14:29:24Z'
}]
},
{
'contact': {
'id': 102,
'email': 'email2#address.com',
},
'marketingPreference': [{
'marketingId': 1093,
'isOptedIn': True,
'dateModifed': '2022-05-10T14:29:24Z'
}]
}
]
I am looking for the columns to be: Id, Email, MarketingId, IsOptedIn, DateModifed.
Even though marketingPreference is an array, there is only ever one json object inside.
You can use pd.json_normalize
df = pd.json_normalize(data, record_path='marketingPreference', meta=[['contact', 'id'], ['contact', 'email']])
print(df)
marketingId isOptedIn dateModifed contact.id contact.email
0 1093 True 2022-05-10T14:29:24Z 101 email1#address.com
1 1093 True 2022-05-10T14:29:24Z 102 email2#address.com
Related
Hello All and thank you in advance for your help :)
Can someone help me understand how I can take the below code, which displays data for a specified playlist, and have it only show the artist and track names? I have been toying around with the API documentation for several hours and I have not been able to make heads or tales of it. Right now when it displays data it gives me a whole bunch of data in a jumbled mess. Also, note that I put dummy values in the client_id and Secret parts of this code.
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy
import json
PlaylistExample = '37i9dQZEVXbMDoHDwVN2tF'
cid = '123'
secret = 'xyz'
auth_manager = SpotifyClientCredentials(client_id=cid, client_secret=secret)
sp = spotipy.Spotify(auth_manager=auth_manager)
playlist_id = 'spotify:user:spotifycharts:playlist:37i9dQZEVXbJiZcmkrIHGU'
results = sp.playlist(playlist_id)
print(json.dumps(results, indent=4))
Would something like this be useful?:
print("Song - Artist - Album\n")
for item in results['tracks']['items']:
print(
item['track']['name'] + ' - ' +
item['track']['artists'][0]['name'] + ' - ' +
item['track']['album']['name']
)
Your output will look similar to this:
Song - Artist - Album
ONLY - ZHU - ONLY
Bad - 2012 Remaster - Michael Jackson - Bad 25th Anniversary
Orion - Rodrigo y Gabriela - Rodrigo y Gabriela
Shape of You - Ed Sheeran - ÷ (Deluxe)
Alternatively, you could create your own structure based on the returned one by Spotify but just keeping what you need:
result_dict = {
'tracks': {
'items': [],
'limit': 100,
'next': None,
'offset': 0,
'previous': None,
'total': 16
},
'type': 'playlist',
'uri': '<playlist_uri>'
}
And your track structure that goes inside 'items' from above:
track_dict = {
'track': {
'album': {
'name': item['track']['album']['name'],
},
'artists': [{
'name': item['track']['artists'][0]['name'],
}],
'name': item['track']['name'],
}
}
Then iterate and insert one by one:
for item in results['tracks']['items']:
track_dict = {
'track': {
'album': {
'name': item['track']['album']['name'],
},
'artists': [{
'name': item['track']['artists'][0]['name'],
}],
'name': item['track']['name'],
}
}
# Append the track dict structure to your results dict structure
result_dict['tracks']['items'].append(track_dict)
Having this as a result when printing result_dict:
{
'tracks': {
'items': [{
'track': {
'album': {
'name': 'ONLY'
},
'artists': [{
'name': 'ZHU'
}],
'name': 'ONLY'
}
}, {
'track': {
'album': {
'name': 'Bad 25th Anniversary'
},
'artists': [{
'name': 'Michael Jackson'
}],
'name': 'Bad - 2012 Remaster'
}
}, {
'track': {
'album': {
'name': 'Rodrigo y Gabriela'
},
'artists': [{
'name': 'Rodrigo y Gabriela'
}],
'name': 'Orion'
}
}, {
'track': {
'album': {
'name': '÷ (Deluxe)'
},
'artists': [{
'name': 'Ed Sheeran'
}],
'name': 'Shape of You'
}
}],
'limit': 100,
'next': None,
'offset': 0,
'previous': None,
'total': 4
},
'type': 'playlist',
'uri': '<playlist_uri>'
}
I am trying to update existing array element by adding new fields into it.
...
{
"_id": "f08b466a-163b-4d9e-98f5-d900ef0f1a26",
"firstName": "foo",
"result": [
{
"_id":"957ee97d-d461-4d6c-8a80-57351bdc29f7",
"subjectName":"Mathematics",
"marks": 60
},
{
"_id":"0591d9a0-fd0f-4876-9bd3-dec4d5ab452e",
"subjectName":"Science",
"marks": 70
},
{
"_id":"21f42104-791b-4522-81ce-f7ae1b30d075",
"subjectName":"Social science",
"marks": 50
}
]
},
...
Now I want to add new field to science subject called "isFavorite: true"
like,
{
"_id": "f08b466a-163b-4d9e-98f5-d900ef0f1a26",
"firstName": "foo",
"result": [
{
"_id":"957ee97d-d461-4d6c-8a80-57351bdc29f7",
"subjectName":"Mathematics",
"marks": 60
},
{
"_id":"0591d9a0-fd0f-4876-9bd3-dec4d5ab452e",
"subjectName":"Science",
"marks": 70
"isFavorite": true #-----------------New field----------
},
{
"_id":"21f42104-791b-4522-81ce-f7ae1b30d075",
"subjectName":"Social science",
"marks": 50
}
]
},
...
What I tried so far?
from pymongo import MongoClient
...
collection = mongoInstance["student"]
student = collection.find_one({"_id": "f08b466a-163b-4d9e-98f5-d900ef0f1a26"})
for result in student["result"]:
if result["_id"] == "0591d9a0-fd0f-4876-9bd3-dec4d5ab452e":
result["isFavorite"] = True
break
collection.update_one({"_id": "f08b466a-163b-4d9e-98f5-d900ef0f1a26"}, {"$set": student })
This is working, but I believe there might be simple way to just find student document by it's id and adding new field to array item by item._id.
Looking for some elegant Mongodb query to find and update specific array element.
#Alex Blex was on the right lines regarding the positional operator; the pymongo syntax is very similar:
db.mycollection.update_one({'_id': 'f08b466a-163b-4d9e-98f5-d900ef0f1a26',
'result._id': '0591d9a0-fd0f-4876-9bd3-dec4d5ab452e'},
{'$set': {'result.$.isFavorite': True}})
Full example using the sample data provided:
from pymongo import MongoClient
import pprint
db = MongoClient()['mydatabase']
db.mycollection.insert_one({
'_id': 'f08b466a-163b-4d9e-98f5-d900ef0f1a26',
'firstName': 'foo',
'result': [
{
'_id': '957ee97d-d461-4d6c-8a80-57351bdc29f7',
'subjectName': 'Mathematics',
'marks': 60
},
{
'_id': '0591d9a0-fd0f-4876-9bd3-dec4d5ab452e',
'subjectName': 'Science',
'marks': 70
},
{
'_id': '21f42104-791b-4522-81ce-f7ae1b30d075',
'subjectName': 'Social science',
'marks': 50
}
]
})
db.mycollection.update_one({'_id': 'f08b466a-163b-4d9e-98f5-d900ef0f1a26',
'result._id': '0591d9a0-fd0f-4876-9bd3-dec4d5ab452e'},
{'$set': {'result.$.isFavorite': True}})
pprint.pprint(list(db.mycollection.find())[0])
result:
{'_id': 'f08b466a-163b-4d9e-98f5-d900ef0f1a26',
'firstName': 'foo',
'result': [{'_id': '957ee97d-d461-4d6c-8a80-57351bdc29f7',
'marks': 60,
'subjectName': 'Mathematics'},
{'_id': '0591d9a0-fd0f-4876-9bd3-dec4d5ab452e',
'isFavorite': True,
'marks': 70,
'subjectName': 'Science'},
{'_id': '21f42104-791b-4522-81ce-f7ae1b30d075',
'marks': 50,
'subjectName': 'Social science'}]}
I am using the below line of codes, however the ACL of only 'owner-account' is applied and the one with 'child-account' doesn't get applied. how to get this fixed. this is more a question related to dictionary I guess..Any help is appreciated.
import json
import boto3
import logging
def lambda_handler(event, context):
s3 = boto3.resource('s3')
object_acl = s3.ObjectAcl('bucket_name','bucket_key')
response = object_acl.put(
AccessControlPolicy={
'Grants': [
{
'Grantee': {
'ID':'child-account',
'Type': 'CanonicalUser'
},
'Grantee': {
'ID':'owner-account',
'Type': 'CanonicalUser'
},
'Permission': 'FULL_CONTROL'
},
],
'Owner': {
'ID': 'ssm-service-internal-account'
}
})
print(response)
The dictionary structure is wrong. It should be like this
AccessControlPolicy={
'Grants': [
{
'Grantee': {
'ID':'child-account',
'Type': 'CanonicalUser'
},
'Permission': 'FULL_CONTROL'
},
{
'Grantee': {
'ID':'owner-account',
'Type': 'CanonicalUser'
},
'Permission': 'FULL_CONTROL'
}
],
'Owner': {
'ID': 'ssm-service-internal-account'
}
})
This does not seem to apply validation to the collection. No exceptions are thrown and documents can have attributes of the wrong type. Perhaps I am doing it in the wrong section of the codebase? Right now it is in __init__.py
__init__.py
db = database(client, settings.mongo_db_name)
from api.models import Company
validation_level = 'strict'
if 'companies' not in db.collection_names():
db.create_collection('companies', validator=Company.validator, validationLevel=validation_level)
else:
db.command({
'collMod': 'companies',
'validator': Company.validator,
'validationLevel': validation_level,
})
Company Model:
from api import db
class Company(Model):
collection = db.companies
validator = {
'$jsonSchema': {
'bsonType': 'object',
'required': ['name', 'description'],
'properties': {
'logo': {
'bsonType': 'string',
},
'name': {
'bsonType': 'string',
'description': 'name of company is required',
'minLength': 4,
},
'description': {
'bsonType': 'string',
'description': 'description of company is required',
'minLength': 4,
},
'website': {
'bsonType': 'string',
},
'request_delete': {
'bsonType': 'bool',
},
'deleted': {
'bsonType': 'bool',
},
}
}
}
As discussed here, I also tried this without success:
db.command(OrderedDict([
('collMod', 'companies'),
('validator', Company.validator),
('validationLevel', validation_level),
]))
If this were successful, would I see validation rules when running the following command?
pprint(db.command('collstats', 'companies'))
Update
I added OrderedDict to both command arguments and the validator. This works... when I run specific tests. It does not work with
python -m unittest discover
I'm using Python 3.6.8, PyMongo 3.8.0, and MongoDB 3.6.3
Usually, it's better to validate a document BEFORE inserting it to the database. In this case, you'll omit errors in schema
I am trying to get the Used Disk Space (Percent) for my EC2 instance from Cloudwatch with the help of a lambda function. It returns no value.
And when I try to specify the Filesystem and Mountpath it shows an error -
Parameter validation failed:\nUnknown parameter in MetricDataQueries[0].MetricStat.Metric.Dimensions[0]: \"Filesystem\", must be one of: Name, Value",
"errorType": "ParamValidationError"
Here is the full code.
import boto3
import datetime
def lambda_handler(event, context):
client = boto3.client('cloudwatch')
response = client.get_metric_data(
MetricDataQueries=[
{
'Id': 'd1',
'MetricStat': {
'Metric': {
'Namespace': 'cloudwatch',
'MetricName': 'DiskSpaceUtilization',
'Dimensions': [
{
'Name': 'InstanceId',
'Value': '*****************',
'Filesystem': '/****/****'
},
]
},
'Period': 300,
'Stat': 'Maximum',
'Unit': 'Percent'
},
'ReturnData': True
},
],
StartTime=datetime.datetime.utcnow() - datetime.timedelta(seconds=600),
EndTime=datetime.datetime.utcnow(),
ScanBy='TimestampDescending',
MaxDatapoints=60
)
return response
I expect the output as DiskSpaceUtilization - x%.
But currently the output is
"MetricDataResults": [
{
"Id": "d1",
"Label": "DiskSpaceUtilization",
"Timestamps": [],
"Values": [],
"StatusCode": "Complete"
}
],
Filesystem is a separate dimension, change this:
'Dimensions': [
{
'Name': 'InstanceId',
'Value': '*****************',
'Filesystem': '/****/****'
},
]
to this:
'Dimensions': [
{
'Name': 'InstanceId',
'Value': '*****************'
},
{
'Name': 'Filesystem',
'Value': '/****/****'
}
]
and see what you get then (there could be other issues after you fix this one).