Related
I have two datasets df1 and df2
my goal is to create an excel-file with fruit name and inside file I want to create two sheets with customer details and second sheet with vender details.
df1 = pd.DataFrame({
"Fruit": ["apple", "orange", "banana", "apple", "orange"],
"customerName": ["John", "Sam", "David", "Rebeca", "Sydney"],
"customerID": [877, 546, 767, 887, 890],
"PurchasePrice": [1, 2, 5, 6, 4]})
df2 = pd.DataFrame({
"Fruit": ["apple", "orange", "banana", "apple", "orange"],
"VenderName": ["share", "cami", "sniff", "tom", "Adam"],
"VenderID": [0091, 0092, 0094, 0097, 0076]})
I know how to do groupby with on dataset and generate a file.
grouped = df.groupby("Fruit")
# run this to generate separate Excel files
for fruit, group in grouped:
group.to_excel(excel_writer=f"{fruit}.xlsx", sheet_name= customer, index=False)
Could please help to in solving this issue.
Use ExcelWriter:
from pandas import ExcelWriter
fruits = set(df1["Fruit"].unique().tolist() + df2["Fruit"].unique().tolist())
for fruit in fruits:
sheets = {
"Customer": df1.loc[df1["Fruit"].eq(fruit)],
"Vendor": df2.loc[df2["Fruit"].eq(fruit)]
}
with ExcelWriter(f"{fruit}_.xlsx") as writer:
for sh_name, table in sheets.items():
table.to_excel(writer, sheet_name=sh_name, index=False)
I'm using Python 3.8. I have two lists, with each element being a dict ...
>>> existing_dicts = [{"id": 1}, {"id": 2}]
>>> cur_dicts = [{"id": 2}]
I wanted to find the dicts that were no longer in "cur_dicts" that were originally in "existing_dicts". So in the above example,
{"id": 1}
is my desired result since it is in "existing_dicts" but not in "cur_dicts". I tried the below to find the difference ...
>>> deleted_dicts = list(set(existing_dicts) - set(cur_dicts))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict'
What's a better way to do this?
The set approach is not working since your dictionaries elements of lists. It seems that turning lists of dictionaries into sets is not allowed.
Instead, you can use a list comprehension where you check if any element in the list existing_dict is in the cur_dicts,
deleted_dicts = [x for x in existing_dicts if not (x in cur_dicts)]
If the dictionary is not in cur_dicts, it is added to deleted_dicts. This relies on the fact that dictionaries can be compared for equality with an == operator.
Full example, extended with duplicate entries and larger dictionaries:
existing_dicts = [{"id": 1}, {"id": 2}, {"id": 2}, {"id": 2, "id2" : 3}, {"id": 1, "id2": 2}]
cur_dicts = [{"id": 2}, {"id": 1, "id2": 2}]
deleted_dicts = [x for x in existing_dicts if not (x in cur_dicts)]
print(deleted_dicts)
How can I read a Numpy array from a string? Take a string like:
"[[ 0.5544 0.4456], [ 0.8811 0.1189]]"
and convert it to an array:
a = from_string("[[ 0.5544 0.4456], [ 0.8811 0.1189]]")
where a becomes the object: np.array([[0.5544, 0.4456], [0.8811, 0.1189]]).
I'm looking for a very simple interface. A way to convert 2D arrays (of floats) to a string and then a way to read them back to reconstruct the array:
arr_to_string(array([[0.5544, 0.4456], [0.8811, 0.1189]])) should return "[[ 0.5544 0.4456], [ 0.8811 0.1189]]".
string_to_arr("[[ 0.5544 0.4456], [ 0.8811 0.1189]]") should return the object array([[0.5544, 0.4456], [0.8811, 0.1189]]).
Ideally arr_to_string would have a precision parameter that controlled the precision of floating points converted to strings, so that you wouldn't get entries like 0.4444444999999999999999999.
There's nothing I can find in the NumPy docs that does this both ways. np.save lets you make a string but then there's no way to load it back in (np.load only works for files).
The challenge is to save not only the data buffer, but also the shape and dtype. np.fromstring reads the data buffer, but as a 1d array; you have to get the dtype and shape from else where.
In [184]: a=np.arange(12).reshape(3,4)
In [185]: np.fromstring(a.tostring(),int)
Out[185]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
In [186]: np.fromstring(a.tostring(),a.dtype).reshape(a.shape)
Out[186]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
A time honored mechanism to save Python objects is pickle, and numpy is pickle compliant:
In [169]: import pickle
In [170]: a=np.arange(12).reshape(3,4)
In [171]: s=pickle.dumps(a*2)
In [172]: s
Out[172]: "cnumpy.core.multiarray\n_reconstruct\np0\n(cnumpy\nndarray\np1\n(I0\ntp2\nS'b'\np3\ntp4\nRp5\n(I1\n(I3\nI4\ntp6\ncnumpy\ndtype\np7\n(S'i4'\np8\nI0\nI1\ntp9\nRp10\n(I3\nS'<'\np11\nNNNI-1\nI-1\nI0\ntp12\nbI00\nS'\\x00\\x00\\x00\\x00\\x02\\x00\\x00\\x00\\x04\\x00\\x00\\x00\\x06\\x00\\x00\\x00\\x08\\x00\\x00\\x00\\n\\x00\\x00\\x00\\x0c\\x00\\x00\\x00\\x0e\\x00\\x00\\x00\\x10\\x00\\x00\\x00\\x12\\x00\\x00\\x00\\x14\\x00\\x00\\x00\\x16\\x00\\x00\\x00'\np13\ntp14\nb."
In [173]: pickle.loads(s)
Out[173]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
There's a numpy function that can read the pickle string:
In [181]: np.loads(s)
Out[181]:
array([[ 0, 2, 4, 6],
[ 8, 10, 12, 14],
[16, 18, 20, 22]])
You mentioned np.save to a string, but that you can't use np.load. A way around that is to step further into the code, and use np.lib.npyio.format.
In [174]: import StringIO
In [175]: S=StringIO.StringIO() # a file like string buffer
In [176]: np.lib.npyio.format.write_array(S,a*3.3)
In [177]: S.seek(0) # rewind the string
In [178]: np.lib.npyio.format.read_array(S)
Out[178]:
array([[ 0. , 3.3, 6.6, 9.9],
[ 13.2, 16.5, 19.8, 23.1],
[ 26.4, 29.7, 33. , 36.3]])
The save string has a header with dtype and shape info:
In [179]: S.seek(0)
In [180]: S.readlines()
Out[180]:
["\x93NUMPY\x01\x00F\x00{'descr': '<f8', 'fortran_order': False, 'shape': (3, 4), } \n",
'\x00\x00\x00\x00\x00\x00\x00\x00ffffff\n',
'#ffffff\x1a#\xcc\xcc\xcc\xcc\xcc\xcc##ffffff*#\x00\x00\x00\x00\x00\x800#\xcc\xcc\xcc\xcc\xcc\xcc3#\x99\x99\x99\x99\x99\x197#ffffff:#33333\xb3=#\x00\x00\x00\x00\x00\x80##fffff&B#']
If you want a human readable string, you might try json.
In [196]: import json
In [197]: js=json.dumps(a.tolist())
In [198]: js
Out[198]: '[[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]'
In [199]: np.array(json.loads(js))
Out[199]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
Going to/from the list representation of the array is the most obvious use of json. Someone may have written a more elaborate json representation of arrays.
You could also go the csv format route - there have been lots of questions about reading/writing csv arrays.
'[[ 0.5544 0.4456], [ 0.8811 0.1189]]'
is a poor string representation for this purpose. It does look a lot like the str() of an array, but with , instead of \n. But there isn't a clean way of parsing the nested [], and the missing delimiter is a pain. If it consistently uses , then json can convert it to list.
np.matrix accepts a MATLAB like string:
In [207]: np.matrix(' 0.5544, 0.4456;0.8811, 0.1189')
Out[207]:
matrix([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
In [208]: str(np.matrix(' 0.5544, 0.4456;0.8811, 0.1189'))
Out[208]: '[[ 0.5544 0.4456]\n [ 0.8811 0.1189]]'
Forward to string:
import numpy as np
def array2str(arr, precision=None):
s=np.array_str(arr, precision=precision)
return s.replace('\n', ',')
Backward to array:
import re
import ast
import numpy as np
def str2array(s):
# Remove space after [
s=re.sub('\[ +', '[', s.strip())
# Replace commas and spaces
s=re.sub('[,\s]+', ', ', s)
return np.array(ast.literal_eval(s))
If you use repr() to convert array to string, the conversion will be trivial.
I'm not sure there's an easy way to do this if you don't have commas between the numbers in your inner lists, but if you do, then you can use ast.literal_eval:
import ast
import numpy as np
s = '[[ 0.5544, 0.4456], [ 0.8811, 0.1189]]'
np.array(ast.literal_eval(s))
array([[ 0.5544, 0.4456],
[ 0.8811, 0.1189]])
EDIT: I haven't tested it very much, but you could use re to insert commas where you need them:
import re
s1 = '[[ 0.5544 0.4456], [ 0.8811 -0.1189]]'
# Replace spaces between numbers with commas:
s2 = re.sub('(\d) +(-|\d)', r'\1,\2', s1)
s2
'[[ 0.5544,0.4456], [ 0.8811,-0.1189]]'
and then hand on to ast.literal_eval:
np.array(ast.literal_eval(s2))
array([[ 0.5544, 0.4456],
[ 0.8811, -0.1189]])
(you need to be careful to match spaces between digits but also spaces between a digit an a minus sign).
In my case I found following command helpful for dumping:
string = str(array.tolist())
And for reloading:
array = np.array( eval(string) )
This should work for any dimensionality of numpy array.
numpy.fromstring() allows you to easily create 1D arrays from a string. Here's a simple function to create a 2D numpy array from a string:
import numpy as np
def str2np(strArray):
lItems = []
width = None
for line in strArray.split("\n"):
lParts = line.split()
n = len(lParts)
if n==0:
continue
if width is None:
width = n
else:
assert n == width, "invalid array spec"
lItems.append([float(str) for str in lParts])
return np.array(lItems)
Usage:
X = str2np("""
-2 2
-1 3
0 1
1 1
2 -1
""")
print(f"X = {X}")
Output:
X = [[-2. 2.]
[-1. 3.]
[ 0. 1.]
[ 1. 1.]
[ 2. -1.]]
I am using CoxPH implementation of lifelines package in python. Currently, results are in tabular view of coefficients and related stats and can be seen with print_summary(). Here is an example
df = pd.DataFrame({'duration': [4, 6, 5, 5, 4, 6],
'event': [0, 0, 0, 1, 1, 1],
'cat': [0, 1, 0, 1, 0, 1]})
cph = CoxPHFitter()
cph.fit(df, duration_col='duration', event_col='event', show_progress=True)
cph.print_summary()
out[]
[Table of results from print_summary()][1]
How can I get only Concordance index as dataframe or list. cph.summary
returns a dataframe of main results i.e. p-values and coef but it does not include concordance index and other surrounding information.
you can access the c-index with cph.concordance_index_ - and you could put this into a list or dataframe if you wish.
You can also compute the concordance index for Cox model using a small script available at this link. The code is given below.
from lifelines.utils import concordance_index
cph = CoxPHFitter().fit(df, 'T', 'E')
Cindex = concordance_index(df['T'], -cph.predict_partial_hazard(df), df['E'])
This code will give C-index value, which also matches with cph.concordance_index_
I have a problem when adding value inside the nested dictionary using the same keys and the value is always shown the same value, The fact is, i want update the value event the keys is same. This algorithm is the basic of Artificial Fish Swarm Algorithm
# example >> fish_template = {0:{'weight':3.1,'visual':2,'step':1},1:'weight':3,'visual':4,'step':2}}
fish = {}
fish_value = {}
weight = [3.1, 3, 4.1, 10]
visual = [2, 4, 10, 3]
step = [1, 2, 5, 1.5]
len_fish = 4
for i in range(0,len_fish):
for w, v, s in zip(weight, visual, step):
fish_value["weight"] = w
fish_value["visual"] = v
fish_value["step"] = s
fish[i] = fish_value
print("show fish",fish)
I expect the result to be like fish_template, but it isn't. The values for the keys 'weight', 'visual', 'step' are always the same with values of 0, 1, 2, and 3. Any solution?
The issue is with fish[i], you simply created a dict with the same element: fish_value. Python does not generate a new memory for the same variable name, so all your dict keys point to the same value=fish_value, which gets overwritten and all your dict values take the last state of fish_value. To overcome this, you can do the following:
fish = {}
weight = [3.1, 3, 4.1, 10]
visual = [2, 4, 10, 3]
step = [1, 2, 5, 1.5]
len_fish = 4
for i in range(0, len_fish):
fish[i]= {"weight": weight[i], "visual": visual[i], "step": step[i]}
print("show fish", fish)
As #Error mentioned, the for loop can be replaced by this one-liner:
fish = dict((i, {"weight": weight[i], "visual": visual[i], "step": step[i]}) for i in range(len_fish))
Not sure I fully understand what you're trying to do here, but the problem is the last line of your inner for loop. You're looping over i in the main loop, then then inner loop is setting fish[i] multiple times. As a result all your fish_value will look identical.
Because of aliasing; the line fish[i] = fish_value is bad practice, fish_value gets overwritten each time you loop; then fish[i] = fish_value just assigns a shallow copy into fish[i], which is not what you want.
But really you can avoid the loop with a dict comprehension.
Anyway, better coding practice is to declare your own Fish class with members weight, visual, step, as below. Note how:
we use zip() function to combine the separate w,v,s lists into a tuple-of-list.
Then the syntax *wvs unpacks each tuple into three separate values ('weight', 'visual', 'step'). This is called tuple unpacking, it saves you needing another loop, or indexing.
a custom __repr__() method (with optional ASCII art) makes each object user-legible. (Strictly we should be overriding __str__ rather than __repr__, but this works)
Code:
class Fish():
def __init__(self, weight=None, visual=None, step=None):
self.weight = weight
self.visual = visual
self.step = step
def __repr__(self):
"""Custom fishy __repr__ method, with ASCII picture"""
return f'<ยบ)))< ๐ [ Weight: {self.weight}, visual: {self.visual}, step: {self.step} ]'
# define whatever other methods you need on 'Fish' object...
# Now create several Fish'es...
swarm = [ Fish(*wvs) for wvs in zip([3.1, 3, 4.1, 10], [2, 4, 10, 3], [1, 2, 5, 1.5]) ]
# zip() function combines the lists into a tuple-of-list. `*wvs` unpacks each tuple into three separate values ('weight', 'visual', 'step')
# See what we created...
>>> swarm
[<ยบ)))< ๐ [ Weight: 3.1, visual: 2, step: 1 ], <ยบ)))< ๐ [ Weight: 3, visual: 4, step: 2 ], <ยบ)))< ๐ [ Weight: 4.1, visual: 10, step: 5 ], <ยบ)))< ๐ [ Weight: 10, visual: 3, step: 1.5 ]]
# ... or for prettier output...
>>> for f in swarm: print(f)
<ยบ)))< ๐ [ Weight: 3.1, visual: 2, step: 1 ]
<ยบ)))< ๐ [ Weight: 3, visual: 4, step: 2 ]
<ยบ)))< ๐ [ Weight: 4.1, visual: 10, step: 5 ]
<ยบ)))< ๐ [ Weight: 10, visual: 3, step: 1.5 ]