Difference Pivot_table Pandas and excel - excel

When I create a pivot table from data in Pandas (python), I get an other result than when I create it with Excel. I think this is due to the fact of characters. Someone knows the difference between the pivot table in Pandas and Excel?
I've made this example. I have the excel file 'funds_steven' with following data in 1 column. (column name = Steven_Funds)
Steven_Funds
0 100
1 -58
2 89
3 24
4 -89
5 76
6 -4
7 -180
8 767
9 0
10 0
11 56
12 32
13 0
14 0
15 12
How can I read this in and calculate the sum of the values?

Related

Cumulative sum in excel with certain criteria

I have typed out an equation that I have dragged it down in a column in my excel table. I think I’m fairly close… and would love some feedback around this.
I want cumulative sum of the first cell $J$3 to the cell row it’s currently on (J53 for example). And I want cumulative sum of the particular cells that meet these conditions (ie… COUNTIF($B$3:B53,B53)*COUNTIF(AC53,1).
I know the Sumif() statement below isn’t correct… but this was as close as I could get!
=IF((COUNTIF($B$3:B53,B53)*COUNTIF(AC53,1)),(SUMIF($J$3:J53,J53)),0)
As shown in the table below
Projectid(B)
successornot(AC)
production(J)
result I want
1
1
20
20
1
1
40
60
1
1
10
70
2
0
20
0
2
0
400
0
3
1
20
20
4
0
1
0
5
0
24
0
6
0
50
0
7
1
10
10
7
1
40
50
7
1
20
70
Give a try on
=IF(B2=0,0,SUMIFS($C$2:$C2,$A$2:$A2,A2,$B$2:$B2,">0"))

row substraction in lambda pandas dataframe

I have a dataframe with multiple columns. One of the column is the cumulative revenue column. If the year is not ended then the revenue will be constant for the rest of the period because the coming daily revenue is 0.
The dataframe looks like this
Now I want to create a new column where the row is substracted by the last row and if the result is 0 then print 0 for that row in the new column. If not zero then use the row value. The new dataframe should look like this:
My idea was to do this with the apply lambda method. So this is the thinking:
{df['2017new'] = df['2017'].apply(lambda x: 0 if row - lastrow == 0 else x)}
But i do not know how to write the row - lastrow part of the code. How to do this? Thanks in advance!
By using np.where
df2['New']=np.where(df2['2017'].diff().eq(0),0,df2['2017'])
df2
Out[190]:
2016 2017 New
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0
We can shift the data and fill the values based on condition using np.where i.e
df['new'] = np.where(df['2017']-df['2017'].shift(1)==0,0,df['2017'])
or with df.where i.e
df['new'] = df['2017'].where(df['2017']-df['2017'].shift(1)!=0,0)
2016 2017 new
0 10 21 21
1 15 34 34
2 70 40 40
3 90 53 53
4 93 53 0
5 99 53 0

Spotfire Add several columns with a custom expression

I would like add several columns in a Bar Chart in Y with a custom expression. I have several columns which begin with "HB" or "PASS".
Their number change as well as their name every time I refresh the table. But HB or PASS remains in column name.
I tried to use this expression :
Sum($map("[$csearch([pvtable],"PASS*")]",","))/Count([SUBLOT_ID])
or
$map("[$csearch([pvtable],"PASS*")]",","))
If I have only one column with PASS or HB in key word it works, but not if I have several columns with this key words in their name.
It's an example of my datas. They are in percentage.
LOT_ID SUBLOD_ID WL_PART_CNT PASS_HB1 PASS_HB2 HB5 HB10 HB13 HB25
Q640123 01 3841 86 11 0.25 0.5 0.25 2
Q640123 05 3841 96 3 0 1 0 0
Q640123 10 3841 80 12 0 2 4 2
Q640123 16 3841 40 50 1 1 4 4
Q640123 22 3841 85 5 9 0.5 0.5 0
Q640345 01 3841 86 11 0.25 0.5 0.25 2
Q640345 05 3841 96 3 1 0 0 0
Q640345 10 3841 80 12 0 2 4 2
Q640345 16 3841 40 50 1 1 4 4
Q640345 22 3841 85 5 9 0.5 0.5 0
I want to put LOT_ID in X, and PASS together in Y. I don't want to color my bar chart but I would like a result like this. One bar chart with all columns PASS and an other with all columns HB.
This bar chart represent HB.
Thank you for your help, regards, Laurent
You shouldn't need the $map function, only the $csearch
Sum($csearch([pvtable],"PASS*")) /Count([SUBLOT_ID])
EDIT
After looking at your test data, you will need to map the values.
$map("sum([$csearch([pvtable],"PASS*")])","+"),$map("sum([$csearch([pvtable],"HB*")])","+")
Then, on your X-AXIS you will need: <[LOT_ID] NEST [Axis.Default.Names]>

Find total no of links to and from node based on data in csv

I have a csv with the following info
Src Rx LinkId Weight
===================================
2 1 4000 10
2 1 4056 15
3 1 4100 10
3 1 4156 15
28 1 10650 8
113 2 15051 205
113 3 15058 205
1 4 3952 9
1 4 3951 5
1 4 3950 34
2 4 4052 9
47 4 18672 44
47 4 18670 38
69 4 4701 11
69 4 4700 21
70 4 4801 11
`
The linkId is unique. Each row represents the link between two devices. For example, source 2 and rx 1 means that a link goes from 2 to 1.
I intend to compute the total weight of all the links originating from each device and coming into each device like so:
Device Out weight In weight
=============================
2 25 205
1 48 58
and so on.
I would like to know if doing this is possible in excel. If yes, how.
Using a pivot table may be the best solution here and I think that if you select this table and click pivot-table it will give you your answer.
Alternatively, you can make a column for each in and out and use =sumif(Src, 1, weight ) and then use the totals at the bottom of each column.

Table transformation in Excel

I have a spreadsheet with data in following format:
CarID Day DistanceTraveled
Ford1 1 10
Ford1 2 12
Nissan1 1 13
Ford1 3 41
Nissan1 2 20
Nissan1 3 10
...
And so on. There are a few hundreds of records in format like this, with a few dozens of cars.
I have to transform it into a following format:
Day Ford1 Nissan1
1 10 13
2 12 20
3 41 10
Is it any fast and automatic way to achieve it in Excel?
Just for the sake of an answer:

Resources