String reformatting with levels , I am stuck - string

I have below string that needs to be reformatted
"itemStockDetailsMap_506540 = {\"506540_Navy\":{\"24DUMMY\":{\"count\":0.0,\"type\":2},\"18DUMMY\":{\"count\":0.0,\"type\":2},\"16DUMMY\":{\"count\":0.0,\"type\":2},\"8DUMMY\":{\"count\":0.0,\"type\":2},\"20DUMMY\":{\"count\":0.0,\"type\":2},\"10DUMMY\":{\"count\":0.0,\"type\":2},\"12DUMMY\":{\"count\":0.0,\"type\":2},\"22DUMMY\":{\"count\":0.0,\"type\":2},\"14DUMMY\":{\"count\":0.0,\"type\":2}},
\"506540_Mocha\":{\"20DUMMY\":{\"count\":0.0,\"type\":2},\"22DUMMY\":{\"count\":0.0,\"type\":2},\"10DUMMY\":{\"count\":0.0,\"type\":2},\"8DUMMY\":{\"count\":0.0,\"type\":2},\"12DUMMY\":{\"count\":0.0,\"type\":2},\"14DUMMY\":{\"count\":0.0,\"type\":2},\"16DUMMY\":{\"count\":0.0,\"type\":2},\"24DUMMY\":{\"count\":0.0,\"type\":2},\"18DUMMY\":{\"count\":0.0,\"type\":2}}
,\"506540_Grey\":{\"18DUMMY\":{\"count\":0.0,\"type\":2},\"12DUMMY\":{\"count\":0.0,\"type\":2},\"10DUMMY\":{\"count\":0.0,\"type\":2},\"20DUMMY\":{\"count\":0.0,\"type\":2},\"14DUMMY\":{\"count\":0.0,\"type\":2},\"22DUMMY\":{\"count\":0.0,\"type\":2},\"24DUMMY\":{\"count\":0.0,\"type\":2},\"16DUMMY\":{\"count\":0.0,\"type\":2},\"8DUMMY\":{\"count\":0.0,\"type\":2}}}"
I want to represent it in
colour size count
Navy 18 0.0
Navy 8 0.0
......
Grey 10 0.0
........
Please guide me me if there are any cool tricks to get this reformatted ..
Many Thanks,

Let's say your string is in variable a:
require(rjson)
a <- sub(".*?(\\{.*\\})", "\\1", a)
a <- fromJSON(a)
a <- do.call(rbind, a)
cbind.data.frame(
colour = rep(sub("\\d+_(\\w+)", "\\1", rownames(a)), each=dim(a)[2]),
size = as.numeric(rep(sub("(\\d+)\\w+", "\\1", colnames(a)), dim(a)[1])),
count = as.numeric(unname(unlist(sapply(a, "[", "count")))))
# colour size count
# 1 Navy 24 2
# 2 Navy 18 1
# 3 Navy 16 0
# 4 Navy 8 0

Try the following:
install.packages("rjson")
# 'yourData' is the JSON string in the OP
do.call(rbind, lapply(rjson::fromJSON(yourData), function(xx) do.call(rbind, xx)))
count type
24DUMMY 0 2
18DUMMY 0 2
16DUMMY 0 2
8DUMMY 0 2
20DUMMY 0 2
10DUMMY 0 2
12DUMMY 0 2
22DUMMY 0 2
14DUMMY 0 2
20DUMMY 0 2
22DUMMY 0 2
10DUMMY 0 2
8DUMMY 0 2
12DUMMY 0 2
14DUMMY 0 2
16DUMMY 0 2
24DUMMY 0 2
18DUMMY 0 2
18DUMMY 0 2
12DUMMY 0 2
10DUMMY 0 2
20DUMMY 0 2
14DUMMY 0 2
22DUMMY 0 2
24DUMMY 0 2
16DUMMY 0 2
8DUMMY 0 2

Related

Count number of non zero columns in a given set of columns of a data frame - pandas

I have a df as shown below
df:
Id Jan20 Feb20 Mar20 Apr20 May20 Jun20 Jul20 Aug20 Sep20 Oct20 Nov20 Dec20 Amount
1 20 0 0 12 1 3 1 0 0 2 2 0 100
2 0 0 2 1 0 2 0 0 1 0 0 0 500
3 1 2 1 2 3 1 1 2 2 3 1 1 300
From the above I would like to calculate Activeness value which is the number of non zero columns in the month columns as given below.
'Jan20', 'Feb20', 'Mar20', 'Apr20', 'May20', 'Jun20', 'Jul20',
'Aug20', 'Sep20', 'Oct20', 'Nov20', 'Dec20'
Expected Output:
Id Jan20 Feb20 Mar20 Apr20 May20 Jun20 Jul20 Aug20 Sep20 Oct20 Nov20 Dec20 Amount Activeness
1 20 0 0 12 1 3 1 0 0 2 2 0 100 7
2 0 0 2 1 0 2 0 0 1 0 0 0 500 4
3 1 2 1 2 3 1 1 2 2 3 1 1 300 12
I tried below code:
df['Activeness'] = pd.Series(index=df.index, data=np.count_nonzero(df[['Jan20', 'Feb20',
'Mar20', 'Apr20', 'May20', 'Jun20', 'Jul20',
'Aug20', 'Sep20', 'Oct20', 'Nov20', 'Dec20']], axis=1))
which is working well, but I would like to know is there any method that is faster than this.
You can try:
df['Activeness'] = df.filter(like = '20').ne(0, axis =1).sum(1)

Writing Function on Data Frame in Pandas

I have data in excel which have two columns 'Peak Value' & 'Label'. I want to add value in 'Label' column based on 'Peak Value' column.
So, Input looks like below
Peak Value 0 0 0 88 0 0 88 0 0 88 0
Label 0 0 0 0 0 0 0 0 0 0 0
Input
Whenever the value in 'Peak Value' is greater than zero then it add 1 in 'Label' and replace all the zeros below it. For the next value greater than zero it should get incremented to 2 and replace all the zeros by 2.
So, the output will look like this:
Peak Value 0 0 0 88 0 0 88 0 0 88 0
Label 0 0 0 1 1 1 2 2 2 3 3
Output
and so on....
I tried writing function but I am only able to add 1 when the value is greater than 0 in 'Peak Value'.
def funct(row):
if row['Peak Value']>0:
val = 1
else:
val = 0
return val
df['Label']= df.apply(funct, axis=1)
May be you could try using cumsum and ffill:
import numpy as np
df['Labels'] = (df['Peak Value'] > 0).groupby(df['Peak Value']).cumsum()
df['Labels'] = df['Labels'].replace(0, np.nan).ffill().replace(np.nan, 0).astype(int)
Output:
Peak Value Labels
0 0 0
1 0 0
2 0 0
3 88 1
4 0 1
5 0 1
6 88 2
7 0 2
8 0 2
9 88 3
10 0 3

Python3.x, Pandas: creating a list of y values depending on the x values

I have a two data sets that are composed of different x values. It looks like the following.
import pandas as pd
data1=pd.csv_read('Data1.csv')
data2=pd.csv_read('Data2.csv')
print(data1)
data1_x data1_y1 data1_y2 data1_y3
-347.2498 0 2 8
-237.528509 0 3 7
-127.807218 0 0 6
-18.085927 11 5 0
print(data2)
data2_x data2_y1 data2_y2 data2_y3
-394.798507 2 0 0
-285.265994 1 0 0
-175.733482 0 0 1
-66.200969 4 0 0
I am creating new x that includes all the values by using the following code. new_x=reduce(np.union1d, (data1.iloc[:,0], data1.iloc[:,0]))
print(new_x)
array([-394.799,-347.25,-285.266,-237.529,-175.733,-127.807,-66.201,-18.0859])
Currently, I am trying to create a new y lists for each data set that keeps the same y values if the corresponding x values are present but fills with blank if there is no corresponding x value initially.
For instance, print(New_data2) would look something like this.
New_x_data2 New_y1_data2 New_y2_data2 New_y3_data2
-394.799 2 0 0
-347.25
-285.266 1 0 0
-237.529
-175.733 0 0 1
-127.807 0 0 6
-66.201 4 0 0
-18.0859 11 5 0
Especially, I am lost in figuring out how to get the new y value. Any ideas?
import pandas as pd
from re import sub
repl = lambda x : sub("data\d_(\w+)", "New_\\1_data2", x)
data1.rename(repl, axis = 'columns').append(data2.rename(repl, axis='columns')).sort_values('New_x_data2')
Out[1024]:
New_x_data2 New_y1_data2 New_y2_data2 New_y3_data2
0 -394.798507 2 0 0
0 -347.249800 0 2 8
1 -285.265994 1 0 0
1 -237.528509 0 3 7
2 -175.733482 0 0 1
2 -127.807218 0 0 6
3 -66.200969 4 0 0
3 -18.085927 11 5 0

How to iterate through 'nested' dataframes without 'for' loops in pandas (python)?

I'm trying to check the cartesian distance between each set of points in one dataframe to sets of scattered points in another dataframe, to see if the input gets above a threshold 'distance' of my checking points.
I have this working with nested for loops, but is painfully slow (~7 mins for 40k input rows, each checked vs ~180 other rows, + some overhead operations).
Here is what I'm attempting in vectorialized format - 'for every pair of points (a,b) from df1, if the distance to ANY point (d,e) from df2 is > threshold, print "yes" into df1.c, next to input points.
..but I'm getting unexpected behavior from this. With given data, all but one distances are > 1, but only df1.1c is getting 'yes'.
Thanks for any ideas - the problem is probably in the 'df1.loc...' line:
import numpy as np
from pandas import DataFrame
inp1 = [{'a':1, 'b':2, 'c':0}, {'a':1,'b':3,'c':0}, {'a':0,'b':3,'c':0}]
df1 = DataFrame(inp1)
inp2 = [{'d':2, 'e':0}, {'d':0,'e':3}, {'d':0,'e':4}]
df2 = DataFrame(inp2)
threshold = 1
df1.loc[np.sqrt((df1.a - df2.d) ** 2 + (df1.b - df2.e) ** 2) > threshold, 'c'] = "yes"
print(df1)
print(df2)
a b c
0 1 2 yes
1 1 3 0
2 0 3 0
d e
0 2 0
1 0 3
2 0 4
Here is an idea to help you to start...
Source DFs:
In [170]: df1
Out[170]:
c x y
0 0 1 2
1 0 1 3
2 0 0 3
In [171]: df2
Out[171]:
x y
0 2 0
1 0 3
2 0 4
Helper DF with cartesian product:
In [172]: x = df1[['x','y']] \
.reset_index() \
.assign(k=0).merge(df2.assign(k=0).reset_index(),
on='k', suffixes=['1','2']) \
.drop('k',1)
In [173]: x
Out[173]:
index1 x1 y1 index2 x2 y2
0 0 1 2 0 2 0
1 0 1 2 1 0 3
2 0 1 2 2 0 4
3 1 1 3 0 2 0
4 1 1 3 1 0 3
5 1 1 3 2 0 4
6 2 0 3 0 2 0
7 2 0 3 1 0 3
8 2 0 3 2 0 4
now we can calculate the distance:
In [169]: x.eval("D=sqrt((x1 - x2)**2 + (y1 - y2)**2)", inplace=False)
Out[169]:
index1 x1 y1 index2 x2 y2 D
0 0 1 2 0 2 0 2.236068
1 0 1 2 1 0 3 1.414214
2 0 1 2 2 0 4 2.236068
3 1 1 3 0 2 0 3.162278
4 1 1 3 1 0 3 1.000000
5 1 1 3 2 0 4 1.414214
6 2 0 3 0 2 0 3.605551
7 2 0 3 1 0 3 0.000000
8 2 0 3 2 0 4 1.000000
or filter:
In [175]: x.query("sqrt((x1 - x2)**2 + (y1 - y2)**2) > #threshold")
Out[175]:
index1 x1 y1 index2 x2 y2
0 0 1 2 0 2 0
1 0 1 2 1 0 3
2 0 1 2 2 0 4
3 1 1 3 0 2 0
5 1 1 3 2 0 4
6 2 0 3 0 2 0
Try using scipy implementation, it is surprisingly fast
scipy.spatial.distance.pdist
https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html
or
scipy.spatial.distance_matrix
https://docs.scipy.org/doc/scipy-0.19.1/reference/generated/scipy.spatial.distance_matrix.html

Is there a way to make one integer increase when a second integer increases by a set amount in python 3?

I'm trying to create a simple game in python 3 and I'm trying to build in an EXP system, for example, every 50 experience points, your health (Which is already an integer) increases by one. Is there a command for this?
(I'm coding this on repl.it if that matters)
I've never shunned guessing. :)
Let me suppose that you are incrementing a variable called experience_points and that, once for every 50 times you increment that you want to increment a variable called health by one.
experience_points += 1
if experience_points % 50 == 0:
health +=1
This bit of code shows how this might work. Notice how health goes up one for every 50 times that 'experience_points` goes up one.
Welcome to the modulus operator!
>>> experience_points = 0
>>> health = 0
>>> while True:
... # do something in the game
... experience_points += 1
... if experience_points % 50 == 0:
... health += 1
... print (experience_points, health, '<--', end='')
... if experience_points > 160:
... break
...
1 0 <--2 0 <--3 0 <--4 0 <--5 0 <--6 0 <--7 0 <--8 0 <--9 0 <--10 0 <--11 0 <--12 0 <--13 0 <--14 0 <--15 0 <--16 0 <--17 0 <--18 0 <--19 0 <--20 0 <--21 0 <--22 0 <--23 0 <--24 0 <--25 0 <--26 0 <--27 0 <--28 0 <--29 0 <--30 0 <--31 0 <--32 0 <--33 0 <--34 0 <--35 0 <--36 0 <--37 0 <--38 0 <--39 0 <--40 0 <--41 0 <--42 0 <--43 0 <--44 0 <--45 0 <--46 0 <--47 0 <--48 0 <--49 0 <--50 1 <--51 1 <--52 1 <--53 1 <--54 1 <--55 1 <--56 1 <--57 1 <--58 1 <--59 1 <--60 1 <--61 1 <--62 1 <--63 1 <--64 1 <--65 1 <--66 1 <--67 1 <--68 1 <--69 1 <--70 1 <--71 1 <--72 1 <--73 1 <--74 1 <--75 1 <--76 1 <--77 1 <--78 1 <--79 1 <--80 1 <--81 1 <--82 1 <--83 1 <--84 1 <--85 1 <--86 1 <--87 1 <--88 1 <--89 1 <--90 1 <--91 1 <--92 1 <--93 1 <--94 1 <--95 1 <--96 1 <--97 1 <--98 1 <--99 1 <--100 2 <--101 2 <--102 2 <--103 2 <--104 2 <--105 2 <--106 2 <--107 2 <--108 2 <--109 2 <--110 2 <--111 2 <--112 2 <--113 2 <--114 2 <--115 2 <--116 2 <--117 2 <--118 2 <--119 2 <--120 2 <--121 2 <--122 2 <--123 2 <--124 2 <--125 2 <--126 2 <--127 2 <--128 2 <--129 2 <--130 2 <--131 2 <--132 2 <--133 2 <--134 2 <--135 2 <--136 2 <--137 2 <--138 2 <--139 2 <--140 2 <--141 2 <--142 2 <--143 2 <--144 2 <--145 2 <--146 2 <--147 2 <--148 2 <--149 2 <--150 3 <--151 3 <--152 3 <--153 3 <--154 3 <--155 3 <--156 3 <--157 3 <--158 3 <--159 3 <--160 3 <--161 3 <--

Resources