I have this CSV:
color,property,type,id
red,house,building,02
I'm trying to convert a csv to dictionary with the following structure:
{
"0": {"val1": 1, "val2": 2, "val3": 3, ..., "valn": n},
"1": {"val1": 45, "val2": 7, "val3": None, ..., "valn": 68},
}
Where as val1, val2 and so on are the header names of the columns and "0" and "1" are the number of rows.
So we should have:
CSV content is like this:
color,property,type,id
red,house,building,02
blue,department,flat,04
{
"0": {"color": "red", "property": "house", "type": "building", ..., "valn": n},
"1": {"color": "blue", "property": "farm", "type": "area", ..., "valn": n},
}
How can I achieve this result without using any library? I'd like to implement it from the scratch and don't use CSV library or the like.
Thank you.
Try this approach:
inp = """color,property,type,id
red,house,building,02
blue,department,flat,04
cyan,,flat,10
"""
lines = inp.split('\n')
colnames = list(map(lambda x: x.strip(), lines[0].split(',')))
lines = lines[1:]
res = {}
for i, line in enumerate(lines[:-1]):
res[i] = {
colname: val if val != '' else None
for colname, val in zip(colnames, map(lambda x: x.strip(), line.split(',')))
}
print(res)
However for additional features like type deduction code will be more complex: you can follow answers to this question
Related
I am learning python and in the code below I am trying to replace all values where "id": null, with "id": na,...
there may not always be an id in every block. If c does not have that pattern in it then it is skipped. Thank you :)
file
{
"objects": [
{
"version": "1",
"id": null,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
desired
{
"objects": [
{
"version": "1",
"id": na,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
python3
import json
with open('file.json',encoding='utf8') as in_file:
data = json.load(in_file)
for c in data['objects']: # read all elements in object into c
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
You want to use None in python for a NoneType not null i.e. if not obj['id']: but None is falsy so you can just use if not
import json
with open('file.json', encoding='utf8') as in_file:
data = json.load(in_file)
for obj in data['objects']:
if not obj['id']:
obj['id'] = 'na'
print(data)
"id": an is invalid as Json and in a Python dictionary I think you want to either initialise an as a variable or use a string "id": "an"
Or more concisely use comprehensions:
import json
with open('file.json', encoding='utf8') as in_file:
print(json.dumps({'objects': [{k: 'na' if k == 'id' and not v else v for k, v in entry.items()} for entry in json.load(in_file)['objects']]}))
As #Mohamed Fathallah suggests use json.dump() to write the data to a file or json.dumps() to display it as a json formatted string.
The Error is in following code
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
As #Dan-Dev mentioned, the null in c#, c++, etc. is None in Python. You can use this code and read my code as side information to understand more.
import json
with open('file.json', 'r') as file:
data = json.load(file)
for obj in data['objects']:
if obj.get('id') is None:
obj['id'] = 'na'
# if you want to write back
with open('data.json', 'w') as file:
json.dump(data, file, indent=2)
I have a dataframe:
data = pd.DataFrame([
{"id": 1, "user_id": 999, "phone_number": "61412308310", "email": "can#gmail.com"},
{"id": 2, "user_id": 129, "phone_number": "61477708777", "email": "acdc#gmail.com"},
{"id": 3, "user_id": 213, "phone_number": "61488908495", "email": "adel99#gmail.com"},
{"id": 4, "user_id": 145, "phone_number": "61477708777", "email": "austr#gmail.com"},
{"id": 5, "user_id": 214, "phone_number": "61421445777", "email": "austr#gmail.com"},
{"id": 6, "user_id": 214, "phone_number": "61421445326", "email": "jango#gmail.com"},
])
There is a function that finds related rows. This function works if the value is of type list. How can I make it work with int(or float)?
For example, I want to find all related rows based on the value of the user_id column = 129. Since this is an int, the function will not work because of the line dataframe[dataframe.isin([value]).any(axis=1)]. If I try to find related rows based on the value of the phone_number column = ['61477708777'], the function will work correctly.
Function:
def get_related_values(data: pd.DataFrame, value: Union[int, list]) -> pd.DataFrame:
related_values = data[data.isin(value).any(axis=1)]
if set(np.array(value)) != set(np.array(related_values).reshape(-1)):
return get_related_values(data, np.array(related_values).reshape(-1))
else:
return related_values
wrap the searched value in a list:
def get_related_values(data, value):
related_values = data[data.isin(value).any(axis=1)]
if set(np.array(value)) != set(np.array(related_values).reshape(-1)):
return get_related_values(data, np.array(related_values).reshape(-1))
else:
return related_values
get_related_values(data, [214])
NB. I also fixed the recursive function by replacing find_related_values with get_related_values.
Output:
id user_id phone_number email
1 2 129 61477708777 acdc#gmail.com
3 4 145 61477708777 austr#gmail.com
4 5 214 61421445777 austr#gmail.com
5 6 214 61421445326 jango#gmail.com
I have a json file with list of dicts. I want to modify its content by adding key:value in every dict with index as the value
Note that the json file is malform, so I need to remove the extra '[]'
file.json
[
{
"sample1": 1,
"sample2": "value"
}
[]
{
"sampleb": "123",
"some": "some"
}
...............
]
code
""" open the files"""
with open("list1.json", "r") as f:
data = f.read()
data = data.replace("][", ",")
data = json.loads(data)
for v in data:
for i, c in v.items():
c["rank"] = i + 1
""" put back to file"""
with open("list1.json", "w") as file:
file.write(data)
So what I am trying to achieve is something like
[
{
"rank": 1
"sample1": 1,
"sample2": "value"
},
{
"rank": 2,
"sampleb": "123",
"some": "some"
}
...............
]
But I got error
c["rank"] = i
TypeError: 'str' object does not support item assignment
printing the index print 1 shows
0,
1,
.....
0,
1,
........
0,
1
But it should be
0,
1,
2,
3,
4
5
...
100
Any ideas?
I have a nested dictionary which comprises of multiple lists and dictionaries. The "Stations" key contains contains the values which I want to convert to CSV file. I am only after the certain values. A snippet of the dictionary is as below:
data = { "brands": {...},
"fueltypes": {...},
"stations": {"items": [
{
"brandid": "",
"stationid": "",
"brand": "Shell",
"code": "2126",
"name": "Cumnock General Store",
"address": "31 Obley St, CUMNOCK NSW 2867",
"location": {
"latitude": -32.928744,
"longitude": 148.755153
},
"state": "NSW"
},
{
"brandid": "",
"stationid": "",
"brand": "Shell",
"code": "2200",
"name": "Tea Tree Cafe",
"address": "160 Mount Darragh Rd, SOUTH PAMBULA NSW 2549",
"location": {
"latitude": -36.944277,
"longitude": 149.845399
},
"state": "NSW"
}....]}}
In order to obtain certain values in "Stations" key, I created blank lists for each of those values and appended accordingly. After that I used the ZIP function to combine the list and converted to a CSV. The Code that I have used is as below:
Station_Code = []
Station_Name = []
Latitude = []
Longitude = []
Address = []
Brand = []
for k,v in data["stations"].items():
for item in range(len(v)):
Station_Code.append(v[item]["code"])
Station_Name.append(v[item]["name"])
Latitude.append(v[item]["location"]["latitude"])
Longitude.append(v[item]["location"]["longitude"])
Address.append(v[item]["address"])
Brand.append(v[item]["brand"])
#print(f'{v[item]["code"]} - {v[item]["name"]} - {v[item]["location"]["latitude"]}')
rows = zip(Station_Code, Station_Name, Latitude, Longitude, Address, Brand )
with open("Exported_File.csv", "w") as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
Is there any other alternate/short ways of extracting this information?
If you're using pandas, there's a fairly easy way to do this.
import pandas as pd
# Convert dict to a pandas DataFrame
df = pd.DataFrame(data["stations"]["items"])
# 'location' is a dict, so we need to extract the 'latitude' and 'longitude'.
df['latitude'] = df['location'].apply(lambda x: x['latitude'])
df['longitude'] = df['location'].apply(lambda x: x['longitude'])
# Select subset of columns for final csv
df = df[['code', 'name', 'latitude', 'longitude', 'address', 'brand']]
df.to_csv('exported-file.csv', index=False, header=False)
How can you query the db.collection by dates when the dates are stored as strings? Since this database is large and growing, a for loop to convert each datetime does not make sense for a long term solution.
I am creating a pipeline to query a collection for any given dates, but every query I try results in an empty list [].
date format: "ts": "2018-09-26T21:02:19+00:00"
I am looking for a solution that avoids reformmating the datetime key in a for loop because the database is growing, and it would take longer than running a non datetime query, converting to pandas then converting to datetime later downstream in the script.
I've tried several attempts from various SO posts and they produce empty results:
1.
n = db.collection.find({'ts':{'$lt':datetime.now(), '$gt':datetime.now() - timedelta(hours=10000)}})
print(n)
[]
2.:
start = datetime(2019, 2, 2, 6, 35, 6, 764)
end = datetime(2019, 2, 20, 6, 55, 3, 381)
doc = db.collection.find({'ts': {'$gte': start, '$lt': end}})
print(doc)
[]
However I am beginning to think it is how my date is formatted in the ts key. Here is an example of a document:
{
"_id": {
"$oid": "5babf3dab512dd0165efd36c"
},
"d": [
{
"d": [
17317,
16556,
9680,
55982,
45948
],
"h": 74.65,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "9829563c95d0155f",
"t": 24.82,
"ti": "0000000000000000"
},
{
"d": [
17821,
17488,
9199,
56447,
44089
],
"h": 80.09,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "a42fbc88a44a316f",
"t": 25.1,
"ti": "0000000000000000"
}
],
"gi": "GW-P1007"}
Am i missing something here? Is this a formatting problem?
you can convert string to datetime and compare them like this:
from datetime import datetime
from datetime import timedelta
q = list(db.collection.find())
result = []
for i in q:
for j in i["d"]:
time = datetime.strptime(j["ts"], "%Y-%m-%dT%X+00:00")
end = datetime.now()
start = end - timedelta(hours=10000)
if time >= start and time <= end:
result.append(i) #or append all document
As I see in your data, I think you should make a loop in "d" in your document but for convert and compare date you can do this.
you can convert datetime to string and do find like you want. Do this:
a = datetime.now()
now = a.strftime("%Y-%m-%dT%X+00:00")
And now you can use find method.
for query in a array:
db.collection.find( { "d": { $elemMatch: {"ts" : {'$lt':end, '$gt':start } } } )