I have 2 dictionaries that look like that :
subjects = {'aaa' : 1,
'bbb' : 1,
'ccc': 1}
objects = {'aaa' : 1,
'bbb' : 1,
'ccc': 1}
I want to output them to a csv file that will have the string, times as a subject and times as an object.
For the 2 dictionaries I want the csv file to look like that :
aaa,1,1
bbb,1,1
ccc,1,1
You can try using pandas, it's really useful for these kind of tasks.
>>> import pandas as pd
>>> subjects = {'aaa' : 1, 'bbb' : 1, 'ccc': 1}
>>> objects = {'aaa' : 1, 'bbb' : 1, 'ccc': 1}
>>> df1 = pd.DataFrame([subjects]).T
>>> df2 = pd.DataFrame([objects]).T
>>> pd.concat([df1,df2],axis=1).to_csv('./out.csv', header=False)
aaa,1,1
bbb,1,1
ccc,1,1
Or you can do the same without pandas:
subjects = {'aaa' : 1, 'bbb' : 1, 'ccc': 1}
objects = {'aaa' : 1, 'bbb' : 1, 'ccc': 1}
with open('./out.csv','w') as f:
for k in subjects:
f.write(f'{k},{subjects[k]},{objects[k]}\n')
Related
I have csv file , which have one column and inside this column have string , string contains many values , i want to convert this string in muultiple columns
here is example data:
df = pd.DataFrame({'column1':[{'A':2,'B':3,'c':2}]})
print(df)
column1
0 {'A': 2, 'B': 3, 'c': 2}
1 {'A': 3, 'B': 5, 'c': 10}
i want output:
df = pd.DataFrame({'A':[2],'B':[3],'c':[2]})
try this:
pd.DataFrame([*df['column1'].apply(eval)])
First convert string that looks like dictionary to an actual dictionary
import json
my_dict = json.loads(column1)
# Gives you {'A': 2, 'B': 3, 'c': 2}
Then convert that dictionary to a dataframe:
pd.Dataframe([my_dict])
I am wondering about some code.
I have a dictionary example:
{#abc_1 : 4, #joly_55 : 3, #ttt_13 : 5, ... , #ddd_49: 500,
'#ccc_3' : 12, '#juju_7' : 50, '#ttt_13' : 7}
I have a data frame like this:
index name_list
0 ['#abc_1', '#joly_55', ... , '#ddd_49']
1 ['#ccc_3', '#juju_7', ... , ']
2
3
...
and the problem is the map method I tried didn't work.
How would I modify my dataframe to appear like this:
index name_list magazine_map
0 ['#abc_1', '#joly_55', ... ,'#ddd_49'] [4, 3, ... , 500]
1 ['#ccc_3', '#juju_7', ... , '#ttt_13'] [12, 50, ... , 7]
...
What code do I need to generate the above output?
You can do this using apply:
d = {'#abc_1' : 4, '#joly_55' : 3, '#ttt_13' : 5, '#ddd_49': 500, '#ccc_3' : 12, '#juju_7' : 50, '#ttt_13' : 7}
df3 = pd.DataFrame({'name_list' : [['#abc_1', '#joly_55', '#ttt_13', '#ddd_49'], ['#ccc_3': 12, '#juju_7']]}
df3['magazine_map'] = df3['name_list'].apply(lambda x: [d[y] for y in x])
df3
Output:
namel magazine_map
0 [#abc_1, #joly_55, #ttt_13, #ddd_49] [4, 3, 7, 500]
1 [#ccc_3, #juju_7] [12, 50]
I am trying to find a difference between two excel files with the number of rows. I first want to sort both workbooks on two column then output a third file with the differences. I'm having trouble exporting a difference file properly.
Any help is highly appreciated!!! Thanks in advance!
import pandas as pd
df1 = pd.DataFrame({
'ID' : ['3', '3', '55','55', '66', '66'],
'date' : [20180102, 20180103, 20180104, 20180105, 20180106, 20180107],
'age': [0, 1, 9, 4, 2, 3],
})
df2 = pd.DataFrame({
'ID' : ['3', '55', '3','66', '55', '66'],
'date' : [20180103, 20180104, 20180102, 20180106, 20180105, 20180107],
'age': [0, 1, 9, 9, 8, 7],
})
df3 = df1.sort_values(by= ['ID', 'date'] , ascending=False)
df4 = df2.sort_values(by= ['ID', 'date'] , ascending=False)
dfDiff = df3.copy()
for row in range(dfDiff.shape[0]):
for col in range(dfDiff.shape[1]):
value_old = df3.iloc[row,col]
value_new = df4.iloc[row,col]
if value_old == value_new:
dfDiff.iloc[row,col] = df4.iloc[row,col]
else:
dfDiff.iloc[row,col] = ('{}->{}').format(value_old,value_new)
writer = pd.ExcelWriter('diff', engine='xlsxwriter')
dfDiff.to_excel(writer, sheet_name='DIFF', index= False)
workbook = writer.book
worksheet = writer.sheets['DIFF']
worksheet.hide_gridlines(2)
writer.save()
I think you are only missing the .xlsx at the end of your file path
df1 = pd.DataFrame({
'ID' : ['3', '3', '55','55', '66', '66'],
'date' : [20180102, 20180103, 20180104, 20180105, 20180106, 20180107],
'age': [0, 1, 9, 4, 2, 3],
})
df2 = pd.DataFrame({
'ID' : ['3', '55', '3','66', '55', '66'],
'date' : [20180103, 20180104, 20180102, 20180106, 20180105, 20180107],
'age': [0, 1, 9, 9, 8, 7],
})
df3 = df1.sort_values(by= ['ID', 'date'] , ascending=False)
df4 = df2.sort_values(by= ['ID', 'date'] , ascending=False)
dfDiff = df3.copy()
for row in range(dfDiff.shape[0]):
for col in range(dfDiff.shape[1]):
value_old = df3.iloc[row,col]
value_new = df4.iloc[row,col]
if value_old == value_new:
dfDiff.iloc[row,col] = df4.iloc[row,col]
else:
dfDiff.iloc[row,col] = ('{}->{}').format(value_old,value_new)
# added `.xlsx' to path here
writer = pd.ExcelWriter('diff.xlsx', engine='xlsxwriter')
dfDiff.to_excel(writer, sheet_name='DIFF', index= False)
workbook = writer.book
worksheet = writer.sheets['DIFF']
worksheet.hide_gridlines(2)
writer.save()
I have a data with two columns: Product and Category. See below for an example of the data:
import pandas as pd
df = pd.DataFrame({'Product': ['A', 'A', 'A', 'A', 'B', 'B', 'B', 'B'],
'Category': ['Text', 'Text2', 'Text3', 'Text4', 'Text', 'Text2', 'Text3', 'Text4'],
'Value': [80, 10, 5, 5, 5, 3, 2, 0]})
I would like to visualize this data in a diagram:
Here the "Total" is the total value of the entire data frame, "A" & "B" boxes are the total value for each product, and then the values for each product & category are in the right-most box.
I'm not very familiar with the viz packages available in Python. Is there a package that exists that does these types of visualizations.
You can use graphviz. But you need to extract your own blocks/nodes
Example:
from graphviz import Graph
g = Graph()
g.attr(rankdir='RL')
T = df['Value'].sum()
g.node('1', 'Total = ' + str(T), shape='square')
A = df.loc[df.Product == 'A', ['Category', 'Value']].to_string(index=False)
g.node('2', A, shape='square')
B = df.loc[df.Product == 'B', ['Category', 'Value']].to_string(index=False)
g.node('3', B, shape='square')
g.edges(['21', '31'])
g.render(view=True)
I'm fairly new to python and I don't know how can I retrieve a value from a inner dictionary:
This is the value I have in my variable:
variable = {'hosts': 1, 'key':'abc', 'result': {'data':[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]}, 'version': 2}
What I want to do is assign a new variable the number of licenses 'mike' has, for example.
Sorry for such a newbie, and apparent simple question, but I'm only using python for a couple of days and need this functioning asap. I've search the oracle (google) and stackoverflow but haven't been able to find an answer...
PS: Using python3
Working through it and starting with
>>> from pprint import pprint
>>> pprint(variable)
{'hosts': 1,
'key': 'abc',
'result': {'data': [{'id': 'john', 'licenses': 2},
{'id': 'mike', 'licenses': 1}]},
'version': 2}
First, let's get to the result dict:
>>> result = variable['result']
>>> pprint(result)
{'data': [{'id': 'john', 'licenses': 2}, {'id': 'mike', 'licenses': 1}]}
and then to its data key:
>>> data = result['data']
>>> pprint(data)
[{'id': 'john', 'licenses': 2}, {'id': 'mike', 'licenses': 1}]
Now, we have to scan that for the 'mike' dict:
>>> for item in data:
... if item['id'] == 'mike':
... print item['licenses']
... break
...
1
You could shorten that to:
>>> for item in variable['result']['data']:
... if item['id'] == 'mike':
... print item['licenses']
... break
...
1
But much better would be to rearrange your data structure like:
variable = {
'hosts': 1,
'version': 2,
'licenses': {
'john': 2,
'mike': 1,
}
}
Then you could just do:
>>> variable['licenses']['mike']
1
You can use nested references as follows:
variable['result']['data'][1]['licenses'] += 1
variable['result'] returns:
{'data':[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]}
variable['result']['data'] returns:
[{'licenses': 2, 'id':'john'},{'licenses': 1, 'id':'mike'}]
variable['result']['data'][1] returns:
{'licenses': 1, 'id':'mike'}
variable['result']['data'][1]['licenses'] returns:
1
which we then increment using +=1