I have a dataframe:
data = pd.DataFrame([
{"id": 1, "user_id": 999, "phone_number": "61412308310", "email": "can#gmail.com"},
{"id": 2, "user_id": 129, "phone_number": "61477708777", "email": "acdc#gmail.com"},
{"id": 3, "user_id": 213, "phone_number": "61488908495", "email": "adel99#gmail.com"},
{"id": 4, "user_id": 145, "phone_number": "61477708777", "email": "austr#gmail.com"},
{"id": 5, "user_id": 214, "phone_number": "61421445777", "email": "austr#gmail.com"},
{"id": 6, "user_id": 214, "phone_number": "61421445326", "email": "jango#gmail.com"},
])
There is a function that finds related rows. This function works if the value is of type list. How can I make it work with int(or float)?
For example, I want to find all related rows based on the value of the user_id column = 129. Since this is an int, the function will not work because of the line dataframe[dataframe.isin([value]).any(axis=1)]. If I try to find related rows based on the value of the phone_number column = ['61477708777'], the function will work correctly.
Function:
def get_related_values(data: pd.DataFrame, value: Union[int, list]) -> pd.DataFrame:
related_values = data[data.isin(value).any(axis=1)]
if set(np.array(value)) != set(np.array(related_values).reshape(-1)):
return get_related_values(data, np.array(related_values).reshape(-1))
else:
return related_values
wrap the searched value in a list:
def get_related_values(data, value):
related_values = data[data.isin(value).any(axis=1)]
if set(np.array(value)) != set(np.array(related_values).reshape(-1)):
return get_related_values(data, np.array(related_values).reshape(-1))
else:
return related_values
get_related_values(data, [214])
NB. I also fixed the recursive function by replacing find_related_values with get_related_values.
Output:
id user_id phone_number email
1 2 129 61477708777 acdc#gmail.com
3 4 145 61477708777 austr#gmail.com
4 5 214 61421445777 austr#gmail.com
5 6 214 61421445326 jango#gmail.com
Related
i have an array of JSON objects that arrived from the source I can't control, and sometimes there are values, like this:
[{"name": "HDD", "brand": "Samsung", "price": "$100"},
<NULL>,
{"name": "Mouse", "brand": "Logitech", "price": "$10"}]
is there any way to handle it in python? I'm getting a syntax error on reading the value.
i tried to put it this way:
try:
products = sorted(products, key=lambda k: k['price'], reverse=True)
except SyntaxError:
print("Error")
but no luck.
I am learning python and in the code below I am trying to replace all values where "id": null, with "id": na,...
there may not always be an id in every block. If c does not have that pattern in it then it is skipped. Thank you :)
file
{
"objects": [
{
"version": "1",
"id": null,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
desired
{
"objects": [
{
"version": "1",
"id": na,
"date": 1,
...,
},
{
"version": "1",
"id": 1xx1,
"date": 1,
...,
},
python3
import json
with open('file.json',encoding='utf8') as in_file:
data = json.load(in_file)
for c in data['objects']: # read all elements in object into c
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
You want to use None in python for a NoneType not null i.e. if not obj['id']: but None is falsy so you can just use if not
import json
with open('file.json', encoding='utf8') as in_file:
data = json.load(in_file)
for obj in data['objects']:
if not obj['id']:
obj['id'] = 'na'
print(data)
"id": an is invalid as Json and in a Python dictionary I think you want to either initialise an as a variable or use a string "id": "an"
Or more concisely use comprehensions:
import json
with open('file.json', encoding='utf8') as in_file:
print(json.dumps({'objects': [{k: 'na' if k == 'id' and not v else v for k, v in entry.items()} for entry in json.load(in_file)['objects']]}))
As #Mohamed Fathallah suggests use json.dump() to write the data to a file or json.dumps() to display it as a json formatted string.
The Error is in following code
if c = "\"id\""\: null,: # check to see if element in c is "id": null,
data[c] = "\"id\""\: na, # replace c with
As #Dan-Dev mentioned, the null in c#, c++, etc. is None in Python. You can use this code and read my code as side information to understand more.
import json
with open('file.json', 'r') as file:
data = json.load(file)
for obj in data['objects']:
if obj.get('id') is None:
obj['id'] = 'na'
# if you want to write back
with open('data.json', 'w') as file:
json.dump(data, file, indent=2)
I have this CSV:
color,property,type,id
red,house,building,02
I'm trying to convert a csv to dictionary with the following structure:
{
"0": {"val1": 1, "val2": 2, "val3": 3, ..., "valn": n},
"1": {"val1": 45, "val2": 7, "val3": None, ..., "valn": 68},
}
Where as val1, val2 and so on are the header names of the columns and "0" and "1" are the number of rows.
So we should have:
CSV content is like this:
color,property,type,id
red,house,building,02
blue,department,flat,04
{
"0": {"color": "red", "property": "house", "type": "building", ..., "valn": n},
"1": {"color": "blue", "property": "farm", "type": "area", ..., "valn": n},
}
How can I achieve this result without using any library? I'd like to implement it from the scratch and don't use CSV library or the like.
Thank you.
Try this approach:
inp = """color,property,type,id
red,house,building,02
blue,department,flat,04
cyan,,flat,10
"""
lines = inp.split('\n')
colnames = list(map(lambda x: x.strip(), lines[0].split(',')))
lines = lines[1:]
res = {}
for i, line in enumerate(lines[:-1]):
res[i] = {
colname: val if val != '' else None
for colname, val in zip(colnames, map(lambda x: x.strip(), line.split(',')))
}
print(res)
However for additional features like type deduction code will be more complex: you can follow answers to this question
I got an assignment to import a CSV file with some fields, and I need to create a new CSV file with different fields that contains the original fields (in a different order).
original csv:
full name,Posiotion,Phone,Email,LinkedIn,Source,Comment
I tried to look up online and this is as far as i got:
import csv
with open("mobileTL.csv", 'r') as csv_file:
reader = csv.reader(csv_file)
newcsvdict = {"First name": [], "Middle name": [], "Last name": [], "Email": [], "Creation date": [], "Status": [],
"Position": [], "ID/SSN": [], "Source": [], "Source type": [], "Availability": [], "Salary expectations": [],
"Phone": [], "Mobile": [], "Street Adress": [], "City": [], "State": [], "Country": [], "Zip": [],
"LinkedIn URL": [], "Resume file name": [], "Migration ID": [], "Comment": [], "Comment2": []}
next(reader)
for row in reader:
first = ""
last = ""
if row[0] != "":
first = row[0].split()[0]
last = row[0].split()[1]
newcsvdict["First name"].append(first)
newcsvdict["Last name"].append(last)
newcsvdict["Phone"].append(row[2])
newcsvdict["Position"].append(row[1])
newcsvdict["Email"].append(row[3])
newcsvdict["Source"].append(row[5])
newcsvdict["Comment"].append(row[6])
newcsvdict["LinkedIn URL"].append(row[4])
with open('new.csv', 'w') as csv_file:
w = csv.DictWriter(csv_file, newcsvdict.keys())
w.writeheader()
w.writerows(newcsvdict)
It does create a new file but for some reason only the header is written.
First, the reason why it's only writing the header is because you'll get an error:
Traceback (most recent call last):
File "test.py", line 29, in <module>
w.writerows(newcsvdict)
...
wrong_fields = rowdict.keys() - self.fieldnames
AttributeError: 'str' object has no attribute 'keys'
You need to learn not to ignore error messages. The cause of that problem is that you were using writerows (note plural rows, which expects an iterable of rows) instead of writerow (note singular row, which expects just one row). To use writerows, you need to pass a list of dicts like this:
w.writerows([newcsvdict, newcsvdict, newcsvdict])
You should be using writerow, since you seem to only have 1 row, newcsvdict. Though, when I went ahead and did that, the output does not seem to be what you need:
First name,Middle name,Last name,Email,Creation date,Status,Position,ID/SSN,Source,Source type,Availability,Salary expectations,Phone,Mobile,Street Adress,City,State,Country,Zip,LinkedIn URL,Resume file name,Migration ID,Comment,Comment2
"['aaa', 'bbb', 'ccc']",[],"['AAA', 'BBB', 'CCC']","['aaa#email.com', 'bbb#email.com', 'ccc#email.com']",[],[],"['Pos1', 'Pos2', 'Pos3']",[],"['aaa', 'bbb', 'ccc']",[],[],[],"['123', '456', '789']",[],[],[],[],[],[],"['aaa', 'bbb', 'ccc']",[],[],"['aaa', 'bbb', 'ccc']",[]
That looks weird, because you created a dict with a list for each value (ex. "First name": []). Maybe that's what you want... but my understanding of your requirement is that you want for the new CSV is to have the same number of rows but different columns.
For that, it does not make sense to store the values as a list. One solution is to read one row, create a dict for it, then writerow it, then just repeat for the steps for all the rows. You can also use DictReader to easily access the values from the old CSV as a dict.
with open("new.csv", "w") as new_file:
new_row = dict.fromkeys([
"First name", "Middle name", "Last name", "Email",
"Creation date", "Status", "Position", "ID/SSN",
"Source", "Source type", "Availability", "Salary expectations",
"Phone", "Mobile", "Street Adress", "City",
"State", "Country", "Zip", "LinkedIn URL",
"Resume file name", "Migration ID", "Comment", "Comment2"
])
writer = csv.DictWriter(new_file, fieldnames=new_row.keys())
writer.writeheader()
with open("old.csv", 'r') as old_file:
old_csv = csv.DictReader(old_file)
for row in old_csv:
first = ""
last = ""
if row["full name"] != "":
first, last = row["full name"].split()
new_row["First name"] = first
new_row["Last name"] = last
new_row["Phone"] = row["Phone"]
new_row["Position"] = row["Position"]
new_row["Email"] = row["Email"]
new_row["Source"] = row["Source"]
new_row["Comment"] = row["Comment"]
new_row["LinkedIn URL"] = row["LinkedIn"]
writer.writerow(new_row)
How can you query the db.collection by dates when the dates are stored as strings? Since this database is large and growing, a for loop to convert each datetime does not make sense for a long term solution.
I am creating a pipeline to query a collection for any given dates, but every query I try results in an empty list [].
date format: "ts": "2018-09-26T21:02:19+00:00"
I am looking for a solution that avoids reformmating the datetime key in a for loop because the database is growing, and it would take longer than running a non datetime query, converting to pandas then converting to datetime later downstream in the script.
I've tried several attempts from various SO posts and they produce empty results:
1.
n = db.collection.find({'ts':{'$lt':datetime.now(), '$gt':datetime.now() - timedelta(hours=10000)}})
print(n)
[]
2.:
start = datetime(2019, 2, 2, 6, 35, 6, 764)
end = datetime(2019, 2, 20, 6, 55, 3, 381)
doc = db.collection.find({'ts': {'$gte': start, '$lt': end}})
print(doc)
[]
However I am beginning to think it is how my date is formatted in the ts key. Here is an example of a document:
{
"_id": {
"$oid": "5babf3dab512dd0165efd36c"
},
"d": [
{
"d": [
17317,
16556,
9680,
55982,
45948
],
"h": 74.65,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "9829563c95d0155f",
"t": 24.82,
"ti": "0000000000000000"
},
{
"d": [
17821,
17488,
9199,
56447,
44089
],
"h": 80.09,
"ts": "2018-09-26T21:02:19+00:00",
"p": [
61,
76,
137,
152,
122
],
"si": "a42fbc88a44a316f",
"t": 25.1,
"ti": "0000000000000000"
}
],
"gi": "GW-P1007"}
Am i missing something here? Is this a formatting problem?
you can convert string to datetime and compare them like this:
from datetime import datetime
from datetime import timedelta
q = list(db.collection.find())
result = []
for i in q:
for j in i["d"]:
time = datetime.strptime(j["ts"], "%Y-%m-%dT%X+00:00")
end = datetime.now()
start = end - timedelta(hours=10000)
if time >= start and time <= end:
result.append(i) #or append all document
As I see in your data, I think you should make a loop in "d" in your document but for convert and compare date you can do this.
you can convert datetime to string and do find like you want. Do this:
a = datetime.now()
now = a.strftime("%Y-%m-%dT%X+00:00")
And now you can use find method.
for query in a array:
db.collection.find( { "d": { $elemMatch: {"ts" : {'$lt':end, '$gt':start } } } )