Which feature values correlate with a negative metric - statistics

I have a data set with categorical values (like region, device type etc.) and a numeric metric value. Something like this (except my data set has many more unique values and categorical columns).
Region
Device
Metric Value
US
Laptop
5
US
PC
10
Japan
Laptop
5
Japan
PC
10
Australia
Laptop
5
Australia
PC
1
What is the right stats model to figure out which variables (i.e. Australia and PC would rank high and Laptop would rank medium) are effecting the negative metric value?
Thank in you in advance.

Related

how to deal with out of range values in dataset (RSSI values)

I have dataset for indoor localization.the dataset contain columns for wireless access point about 520 column with the RSSI value for each one .the problem is each row of the dataset has values of one scan for a signals that can be captured by a device and the maximum number of wireless access point that can be captured about only 20 ( the signal can be from 0dbm which is when the device near the access point and minus 100dbm when the device far from the access point but it can capture the signal) the rest of access points which are out of the range of the device scan they have been compensated with a default value of 100 positive.these value (100 dbm ) about 500 column in each row and have different columns when ever the location differ .the question is how to deal with them?
One option to deal with this issue, you could try to impute (change) the values that are out of range with a more reasonable value. There are several approaches you could take to do this:
Replacing the out-of-range values with the mean or median of the in-range values
Using linear interpolation to estimate the missing values based on the surrounding values
The choice will depend on the goal of your machine learning model and what you want to achieve.

Visualize Cumulative column data in Google Studio

I have data in CSV as follows
State
Cumulative Number of cases
Date
India
0
12-Feb-2021
India
2
13-Feb-2021
India
5
14-Feb-2021
India
8
15-Feb-2021
India
17
16-Feb-2021
India
39
17-Feb-2021
I want to un-accumulate 'Number of cases' to visualize line chart using Google Studio. X-axis will be 'Date' and Y-axis will be 'Number of cases'.
When I am currently creating line chart with 'SUM' as metric, it just adds us cumulative numbers which is not correct. For example, at 14-Feb-2021, point gets plotted as 7 which sum of of 0+2+5. Ideally I want incremental value to be plotted. Hence for 14-Feb-2021, difference between 13-Feb and 14-Feb as point, which is 3.
Can anyone please share how this can be done? Thanks in advance.
One way that it can be achieved is by:
Chart: Time Series
Dimension: Date
Breakdown Dimension: State
Metric: Cumulative Number of cases; Running Calculation: Running Delta
Editable Google Data Studio Report (Embedded Google Sheets Data Source) and a GIF to elaborate:

Gephi - how to generate weight for edge thickness

I have a small amount of csv data counting connections between different countries, with only three cols, e.g.:
I can use this (about 100 rows) to create a nice network vis in gephi where node sizes can be resized on degree
However, ideally I'd like the edges to be weighted in size/thickness based on how many connections... e.g. in the image about UK and USA are connected twice, so their thickness would be twice the size of Greece and Nepal's connecting edge.
Is there any way to generate these weighting values automatically, either in gephi or in excel?
The one problem is that the countries are not in a standard order between source and target (e.g. USA and UK in the example above are in different orders, UK coming in the source column for one connection and USA coming first in the source column for the other connection).
Basically, I just need a way to auto count the node connections to make a value for each edge popularity/occurrence. Thanks.
Well, did this with two helper columns using match():
So, edited based on the comments and countif() to count multiple ocurrences:
=COUNTIF(F$3:F$12,"="&CONCATENATE(B3,C3))+COUNTIF(F$3:F$12,"="&CONCATENATE(C3,B3))

Pivot Table and Formulas

I have the following Pivot Table:
Product: Type: CalculateFeild:
TV Small
Computer Big
HeadPhones Medium
In the Calculate feild, I want it to calculate the number of TV's that are small for the raw date which the pivot table is created from based on the two parameters/look up values: TV and Small.
I tried creating a calculated field using IF and sum but it's not working.
Is that possible? Any tips on how?
You are looking to summarize the data, not calculate it. You already put in your parameters, TV and Small. You just need to add another column that would be the summary. For example, if your raw data is like this:
Order number Product Type
101 TV Small
102 TV Big
103 TV Big
104 Computer Medium
105 TV Small
106 TV Small
Then you can use the Order number field as your summary field, and change the setting to Count. Your resulting PivotTable would be:
Product Type Count of Order number
TV Small 3
TV Big 2
Computer Medium 1

Spotfire Heiarchy Percent Aggregation similar to excel pivot group by

First let me say that I am extremely new to Spotfire(1 Day)
Anyways I have some data that looks like the following:
Region Location Animals Cats % of Cats
West A 10 7 .7
East A 15 10 .66
East B 10 9 .9
West B 5 1 .2
East A 10 6 .6
West B 15 5 .33
When I make a new visualization I want everything to be grouped by the Region level and then drill down to the location whats happening is it is summing the percentage instead of calculated the percentage on the total for regions.
Example:
Region % of Cats
West 123%
East 216%
Instead of this happening i was expecting it to look more like a pivot table in excel where it can auto calculate the different percentage levels based on the the levels of hierarchy.
The correct output should look like this:71.43 4.33
Region % of Cats
West 71.43%
East 43.33%
Where its taking the sum of all calls converted to sales and dividing it by the sum of all call to determine the converted percentage by region.
I've tried to change every aggregation method. Rewrote the calculated column etc. ANd all it ends up doing is summing up all the percentages and making it total over 100%.
Hopefully I explained this properly.
Thanks for the help
You need to use a bit of custom expression here, instead of trying with "% of cats" (it simply doesn't hold enough information), go to "edit expression" (at the bottom of the form where you tried adding aggregations) and enter:
Sum([Cats]) / Sum([Animals])

Resources