Convert CSV to dictionary without using libraries - python-3.x

I have this CSV:
color,property,type,id
red,house,building,02
I'm trying to convert a csv to dictionary with the following structure:
{
"0": {"val1": 1, "val2": 2, "val3": 3, ..., "valn": n},
"1": {"val1": 45, "val2": 7, "val3": None, ..., "valn": 68},
}
Where as val1, val2 and so on are the header names of the columns and "0" and "1" are the number of rows.
So we should have:
CSV content is like this:
color,property,type,id
red,house,building,02
blue,department,flat,04
{
"0": {"color": "red", "property": "house", "type": "building", ..., "valn": n},
"1": {"color": "blue", "property": "farm", "type": "area", ..., "valn": n},
}
How can I achieve this result without using any library? I'd like to implement it from the scratch and don't use CSV library or the like.
Thank you.

Try this approach:
inp = """color,property,type,id
red,house,building,02
blue,department,flat,04
cyan,,flat,10
"""
lines = inp.split('\n')
colnames = list(map(lambda x: x.strip(), lines[0].split(',')))
lines = lines[1:]
res = {}
for i, line in enumerate(lines[:-1]):
res[i] = {
colname: val if val != '' else None
for colname, val in zip(colnames, map(lambda x: x.strip(), line.split(',')))
}
print(res)
However for additional features like type deduction code will be more complex: you can follow answers to this question

Related

find and replace element in file using python 3.7

I am learning python and in the code below I am trying to replace all values where "id": null, with "id": na,...
there may not always be an id in every block. If c does not have that pattern in it then it is skipped. Thank you :)
file
{
"objects": [
{
"version": "1",
"id": null,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
desired
{
"objects": [
{
"version": "1",
"id": na,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
python3
import json
with open('file.json',encoding='utf8') as in_file:
data = json.load(in_file)
for c in data['objects']: # read all elements in object into c
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
You want to use None in python for a NoneType not null i.e. if not obj['id']: but None is falsy so you can just use if not
import json
with open('file.json', encoding='utf8') as in_file:
data = json.load(in_file)
for obj in data['objects']:
if not obj['id']:
obj['id'] = 'na'
print(data)
"id": an is invalid as Json and in a Python dictionary I think you want to either initialise an as a variable or use a string "id": "an"
Or more concisely use comprehensions:
import json
with open('file.json', encoding='utf8') as in_file:
print(json.dumps({'objects': [{k: 'na' if k == 'id' and not v else v for k, v in entry.items()} for entry in json.load(in_file)['objects']]}))
As #Mohamed Fathallah suggests use json.dump() to write the data to a file or json.dumps() to display it as a json formatted string.
The Error is in following code
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
As #Dan-Dev mentioned, the null in c#, c++, etc. is None in Python. You can use this code and read my code as side information to understand more.
import json
with open('file.json', 'r') as file:
data = json.load(file)
for obj in data['objects']:
if obj.get('id') is None:
obj['id'] = 'na'
# if you want to write back
with open('data.json', 'w') as file:
json.dump(data, file, indent=2)

get related values for int

I have a dataframe:
data = pd.DataFrame([
{"id": 1, "user_id": 999, "phone_number": "61412308310", "email": "can#gmail.com"},
{"id": 2, "user_id": 129, "phone_number": "61477708777", "email": "acdc#gmail.com"},
{"id": 3, "user_id": 213, "phone_number": "61488908495", "email": "adel99#gmail.com"},
{"id": 4, "user_id": 145, "phone_number": "61477708777", "email": "austr#gmail.com"},
{"id": 5, "user_id": 214, "phone_number": "61421445777", "email": "austr#gmail.com"},
{"id": 6, "user_id": 214, "phone_number": "61421445326", "email": "jango#gmail.com"},
])
There is a function that finds related rows. This function works if the value is of type list. How can I make it work with int(or float)?
For example, I want to find all related rows based on the value of the user_id column = 129. Since this is an int, the function will not work because of the line dataframe[dataframe.isin([value]).any(axis=1)]. If I try to find related rows based on the value of the phone_number column = ['61477708777'], the function will work correctly.
Function:
def get_related_values(data: pd.DataFrame, value: Union[int, list]) -> pd.DataFrame:
related_values = data[data.isin(value).any(axis=1)]
if set(np.array(value)) != set(np.array(related_values).reshape(-1)):
return get_related_values(data, np.array(related_values).reshape(-1))
else:
return related_values
wrap the searched value in a list:
def get_related_values(data, value):
related_values = data[data.isin(value).any(axis=1)]
if set(np.array(value)) != set(np.array(related_values).reshape(-1)):
return get_related_values(data, np.array(related_values).reshape(-1))
else:
return related_values
get_related_values(data, [214])
NB. I also fixed the recursive function by replacing find_related_values with get_related_values.
Output:
id user_id phone_number email
1 2 129 61477708777 acdc#gmail.com
3 4 145 61477708777 austr#gmail.com
4 5 214 61421445777 austr#gmail.com
5 6 214 61421445326 jango#gmail.com

Add key value with index as value in every dict in a list

I have a json file with list of dicts. I want to modify its content by adding key:value in every dict with index as the value
Note that the json file is malform, so I need to remove the extra '[]'
file.json
[
{
"sample1": 1,
"sample2": "value"
}
[]
{
"sampleb": "123",
"some": "some"
}
...............
]
code
""" open the files"""
with open("list1.json", "r") as f:
data = f.read()
data = data.replace("][", ",")
data = json.loads(data)
for v in data:
for i, c in v.items():
c["rank"] = i + 1
""" put back to file"""
with open("list1.json", "w") as file:
file.write(data)
So what I am trying to achieve is something like
[
{
"rank": 1
"sample1": 1,
"sample2": "value"
},
{
"rank": 2,
"sampleb": "123",
"some": "some"
}
...............
]
But I got error
c["rank"] = i
TypeError: 'str' object does not support item assignment
printing the index print 1 shows
0,
1,
.....
0,
1,
........
0,
1
But it should be
0,
1,
2,
3,
4
5
...
100
Any ideas?

Is there a shorter way of exporting python nested dictionary to CSV?

I have a nested dictionary which comprises of multiple lists and dictionaries. The "Stations" key contains contains the values which I want to convert to CSV file. I am only after the certain values. A snippet of the dictionary is as below:
data = { "brands": {...},
"fueltypes": {...},
"stations": {"items": [
{
"brandid": "",
"stationid": "",
"brand": "Shell",
"code": "2126",
"name": "Cumnock General Store",
"address": "31 Obley St, CUMNOCK NSW 2867",
"location": {
"latitude": -32.928744,
"longitude": 148.755153
},
"state": "NSW"
},
{
"brandid": "",
"stationid": "",
"brand": "Shell",
"code": "2200",
"name": "Tea Tree Cafe",
"address": "160 Mount Darragh Rd, SOUTH PAMBULA NSW 2549",
"location": {
"latitude": -36.944277,
"longitude": 149.845399
},
"state": "NSW"
}....]}}
In order to obtain certain values in "Stations" key, I created blank lists for each of those values and appended accordingly. After that I used the ZIP function to combine the list and converted to a CSV. The Code that I have used is as below:
Station_Code = []
Station_Name = []
Latitude = []
Longitude = []
Address = []
Brand = []
for k,v in data["stations"].items():
for item in range(len(v)):
Station_Code.append(v[item]["code"])
Station_Name.append(v[item]["name"])
Latitude.append(v[item]["location"]["latitude"])
Longitude.append(v[item]["location"]["longitude"])
Address.append(v[item]["address"])
Brand.append(v[item]["brand"])
#print(f'{v[item]["code"]} - {v[item]["name"]} - {v[item]["location"]["latitude"]}')
rows = zip(Station_Code, Station_Name, Latitude, Longitude, Address, Brand )
with open("Exported_File.csv", "w") as f:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
Is there any other alternate/short ways of extracting this information?
If you're using pandas, there's a fairly easy way to do this.
import pandas as pd
# Convert dict to a pandas DataFrame
df = pd.DataFrame(data["stations"]["items"])
# 'location' is a dict, so we need to extract the 'latitude' and 'longitude'.
df['latitude'] = df['location'].apply(lambda x: x['latitude'])
df['longitude'] = df['location'].apply(lambda x: x['longitude'])
# Select subset of columns for final csv
df = df[['code', 'name', 'latitude', 'longitude', 'address', 'brand']]
df.to_csv('exported-file.csv', index=False, header=False)

(Python) Query by dates that are stored as strings in mongoDB collection

How can you query the db.collection by dates when the dates are stored as strings? Since this database is large and growing, a for loop to convert each datetime does not make sense for a long term solution.
I am creating a pipeline to query a collection for any given dates, but every query I try results in an empty list [].
date format: "ts": "2018-09-26T21:02:19+00:00"
I am looking for a solution that avoids reformmating the datetime key in a for loop because the database is growing, and it would take longer than running a non datetime query, converting to pandas then converting to datetime later downstream in the script.
I've tried several attempts from various SO posts and they produce empty results:
1.
n = db.collection.find({'ts':{'$lt':datetime.now(), '$gt':datetime.now() - timedelta(hours=10000)}})
print(n)
[]
2.:
start = datetime(2019, 2, 2, 6, 35, 6, 764)
end = datetime(2019, 2, 20, 6, 55, 3, 381)
doc = db.collection.find({'ts': {'$gte': start, '$lt': end}})
print(doc)
[]
However I am beginning to think it is how my date is formatted in the ts key. Here is an example of a document:
{
"_id": {
"$oid": "5babf3dab512dd0165efd36c"
},
"d": [
{
"d": [
17317,
16556,
9680,
55982,
45948
],
"h": 74.65,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "9829563c95d0155f",
"t": 24.82,
"ti": "0000000000000000"
},
{
"d": [
17821,
17488,
9199,
56447,
44089
],
"h": 80.09,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "a42fbc88a44a316f",
"t": 25.1,
"ti": "0000000000000000"
}
],
"gi": "GW-P1007"}
Am i missing something here? Is this a formatting problem?
you can convert string to datetime and compare them like this:
from datetime import datetime
from datetime import timedelta
q = list(db.collection.find())
result = []
for i in q:
for j in i["d"]:
time = datetime.strptime(j["ts"], "%Y-%m-%dT%X+00:00")
end = datetime.now()
start = end - timedelta(hours=10000)
if time >= start and time <= end:
result.append(i) #or append all document
As I see in your data, I think you should make a loop in "d" in your document but for convert and compare date you can do this.
you can convert datetime to string and do find like you want. Do this:
a = datetime.now()
now = a.strftime("%Y-%m-%dT%X+00:00")
And now you can use find method.
for query in a array:
db.collection.find( { "d": { $elemMatch: {"ts" : {'$lt':end, '$gt':start } } } )

Resources