Pandas Pivot table prevent filling empty values with 0 - python-3.x

I have a pandas DF column (Coded Sentiment (NUM)) that has numerical values, but not all column rows contain a value. See bottom half of the DF some rows are blank.
ContentID Coded Sentiment (NUM)
0 48a799ca7254c59f56daa3aa429f0e250ba294656ab1a6... 0
1 13674042c5f8e452abaddeec1d1509525f4a3cdfb9f3fb...
2 43f821e7431e024ee6b3fe2403847a888f148ffb737f42... -1
3 7f9e89d6c2b5b705ff3d1667410f6d21730424e5d79c52... 0
4 7f9e89d6c2b5b705ff3d1667410f6d21730424e5d79c52... 1
.. ... ...
313 58e18ae5a381450c6f24c5f72c2c71f49e795723d1310f... 1
314 19fd002ffbaab001a0aa2e8c2373aa5a932b6a40510830...
320 b3846c295d5cfe430a8c3faf4078ae7bccb50f1ec7e35e...
321 b3846c295d5cfe430a8c3faf4078ae7bccb50f1ec7e35e...
322 475bae274bcad23ce3947e4c910b20cfae4aad9aea24a8... 1
I need to create a pivot table to sum the numbers for each unique content ID
program_data_sheet_sentiment_fix = program_data_sheet.pivot_table(index='ContentID', values='Coded Sentiment (NUM)', aggfunc= np.sum)
Here is where my problem arises, when I create the pivot table, any rows that don't have any number in the sentiment column returns a 0 in the pivot table for that specific content ID.
What I need is for the pivot table to not add the 0 to any blank values. I need to keep the value blank if there are no numerical values associated to that content ID. This is because blank values and the value 0 both mean something different in my data process so it's important to not add 0 for blank values.
So ideally when I create the pivot table, the content ID found at row index 320 & 321 the sum would simply return a blank value since there is nothing in the sentiment column.
Hope this makes sense.
Can someone point me in the right direction?

Related

Excel - assign values based on the first unique item

I have got an excel question that I can not answer. Here is my table:
ID Key Count Unique Available Text Results
1 0 Text-1 Dupe-Y
2 1 Y Text-1 Y
3 0 Text-1 Dupe-Y
4 0 Text-1 Dupe-Y
5 1 N Text-2 N
6 1 Y Text-3 Y
7 0 Text-2 Dupe-N
8 0 Duplicate Text-2 Dupe-N
9 0 Duplicate Text-2 Dupe-N
10 0 Y Text-2 Dupe-N
Id Key is just unique key.
Count unique picks up the first time each value in column Text appears. Available can have Y, N, Duplicate and Text is the main column I need to analyze my table. The Results are for the first time each value in Text appears (Count unique = 1), if there is a value in Available then that is the value I need, if Count Unique is 0 then is either Dupe-Y or Dupe-N depending on the value in Available.
I tried with a formula like this one but got stuck after initial progress. =IF(B2=0,"",IFERROR(IF(COUNTIF(D:D,D2)>1,IF(COUNTIF($D:$D,D2)=1,"",C2),1),1))
Note that the column Results is the one I need to populate with a formula that is not affected by sorting or lack of it.
I guess you got all those values and you just need a formula for column Results.
My formul will work only if the data is sorted like in your example. If sorting changes, formula will fail:
My formula is:
=IF(B2=1;D2;"Dupe-"&RIGHT(G1;1))

Multiple Calculated Columns with Filters within Single Pivot Table

I am working to create a Pivot Table in Excel that has multiple calculated fields that take into account whether a condition is true, but cannot figure out how to create this within a single Pivot Table.
Behind the scenes, I have a set of data with a column that can have these values: A,B,C,D,F,L, or R.
I have another column that is a dollar value, and another for Quantity.
I am trying to calculate the price per unit when the Category is A,B, or C, but also calculate the price per unit when the Category is D.
I can create multiple Pivot Tables and tie them to the same slicer, but the issue is this: I need to display the spending in each of the two categories for each location, and when I create two separate Pivot Tables for this, locations that do not have spending in one of the categories are excluded (in this below case, location 2 has no spending in Category D, so it does not show up in the second Pivot Table).
Here is a portion of my data set. The whole data set is over 100,000 rows and will change over time, so I do need a solution for the long-term.
Location Category Volume Quantity
1 A $120.32 6000
3 A $30.08 1300
3 A $60.16 2600
1 B $39.91 1000
2 B $318.50 13000
2 C $196.00 8000
1 D $220.50 8100
3 D $171.50 6300
3 D $35.90 1000
3 D $53.85 1500
2 F $416.50 0
1 L $24.50 0
2 L $30.08 0
1 R $55.13 0
2 R $55.13 0
3 R $110.26 0
Thanks in advance for your help and let me know if I need to clarify anything!
Alright, a little more digging yielded the answer.
A slicer should be included that has all locations selected, then it should be tied to all Pivot Tables in use.
For all Pivot Tables, go to Field Settings, then click on the Layout and Print Tab. Check the box 'Show items with no data'.
Then go to Pivot Table Options and select the values you would like to display for blanks and errors.
I found the solution here.

How do you group data in columns?

I have numeric data under fifty samples that are mostly similar. I want to count identical columns and give statistics on the same. There are too many rows to select them (37,888). Data looks like:
Sample 1 Sample 2 Sample 3 ........ Sample 50
4 4 0
4 4 0
4 4 ...
0 0
0 0
0 0
0 0
... ...
upto thousands of rows for each sample.
There is a column for date/time as well, would be nice if I could include that in the grouping.
In this snippet, there are many rows. Sample 1 and 2 are identical hence should be grouped together. Sample three would form another group and so on.
While I'm not sure what "There are too many rows to select them" means in this context (there is no limit on the number of rows or items that can be selected and included in a formula), this looks like a job for array formulas.
If you want to determine (for instance) whether columns C and D are equal, from rows 1 through 37888, you can use this formula:
=AND(C1:C37888=D1:D37888)
To make Excel treat this as an array formula, you need to press CTRL-SHIFT-ENTER (Windows) or CMD-ENTER (Mac) after typing the formula. The "AND" function will return TRUE if and only if all corresponding entries are equal: C1=D1, C2=D2, C3=D3, ..., C37888=D37888. It returns FALSE if any corresponding entries disagree.
Exactly what you do next will depend on the nature of the statistics that you want to compute for each group, but this formula will at least help you figure out which columns belong in the same group together.

How to plot multiple grouped data in one excel scatter plot with lines

I am facing some difficulties with plotting grouped data (by index) in one graph (scatter plot with lines) in Excel, and I will appreciate a lot your help.
My data are in three columns:
The first column is the index of the data or the group (i.e. a unique number for every set of data)
the second column is the time
and the third column is the data
Group, Time, Data
1 1 12
1 3 12
1 4 28
1 8 56
1 12 37
1 24 40
1 48 34
2 0 7
2 1 14
2 4 6
2 8 63
2 12 4
2 24 35
2 48 3
und so on.
and I want to plot the data vs. time for each index i.e. data group alone, but on the same graph.
Until now, I was always doing it manually by adding each data set separately to the graph. But I think there should be a more clever and easier way to do it, especially that sometimes I have a lot of data (index number can reach 70 or 80).
Thanks a lot in advance.
You can create a pivot table on all your data. Use 'Group' as column headers and 'Time' as row headers. The resulting pivot table will have all time points from all groups as rows and your groups as columns. Each columns of course has entries only at these time points which are included in its group. The other cells are empty. If you just select the data range of this pivot table without column headers, you can get charts from the data as a plot chart omits empty cells.
Update
That is the result pivot table of your test data. The sorted data are in the red frame. (Forget the total results)
A way to do this in Excel 365 is:
Select the data
Go to Data -> From Table/Range to open the Power Query editor
Select the columns with grouped data
Select Transform -> Pivot Column
Select the column with the values corresponding to the grouped data
Under Advanced Options change the value aggregation to Don't aggregate
Click OK, then Home -> Close and Load
This should give you the data formatted in such a way that you can select it and create a chart as normal.

SSAS - Data Warehouse structure and the Unknown value

I have a table that shows summed monthly values grouped by different analysis codes
TableId Month Value Analysis1ID Analysis2ID
1 1 100 1 NULL
2 1 50 NULL 3
3 1 50 2 NULL
4 1 50 3 NULL
I have set the above as a fact table (also have a dimension for the analysis values).
As you can see the table has a new row for each unique ID for the analysis column.
We are then analysing the data in excel, Simply summing the Value column and grouping by Analyis1ID, Month
This give us :
AnalysisID1 1 = 100
AnalysisID1 2 = 50
AnalysisID1 3 = 50
Unknown = 50
Total = 250
This all looks ok apart from the Unknown, which is summed total of NULL....
I have tried excluding the NULL Value in the Dimension by setting the UnknownMember to "Hidden".
This does work but it does not exclude the amount from the total. How can i exclude it from the total value?
I am guessing that the table structure is not correct for that data, I'm unsure though how else to structure it?
Any help or guidance would be appreciated
I would not have NULL values in dimension members, in the past i've always used an Unallocated Member with a -1 ID.
You could then use Cube Security to filter out the Unknown or Unallocated members.
I would Filter that row out using Excel. Right-click on the cell labelled 'Unknown' and you can choose Filter / Hide Selected Items.

Resources