Gephi - how to generate weight for edge thickness - excel

I have a small amount of csv data counting connections between different countries, with only three cols, e.g.:
I can use this (about 100 rows) to create a nice network vis in gephi where node sizes can be resized on degree
However, ideally I'd like the edges to be weighted in size/thickness based on how many connections... e.g. in the image about UK and USA are connected twice, so their thickness would be twice the size of Greece and Nepal's connecting edge.
Is there any way to generate these weighting values automatically, either in gephi or in excel?
The one problem is that the countries are not in a standard order between source and target (e.g. USA and UK in the example above are in different orders, UK coming in the source column for one connection and USA coming first in the source column for the other connection).
Basically, I just need a way to auto count the node connections to make a value for each edge popularity/occurrence. Thanks.

Well, did this with two helper columns using match():
So, edited based on the comments and countif() to count multiple ocurrences:
=COUNTIF(F$3:F$12,"="&CONCATENATE(B3,C3))+COUNTIF(F$3:F$12,"="&CONCATENATE(C3,B3))

Related

How to visualize these values grouped by rows in Power bi desktop

This is a table consisting statistical summary of machining parts manufactured and measured
.
The PIDs are different classes of parts, so a PID 123456 can have 100s of parts under it. Each machining part has 4 attributes to it such as pitch diameter, POW diameter, Major Diameter, Minor diameter. Unfortunately, the report is generated such that the data is in rows and not in its adjacent columns, which would would have been easier to visualize later.
How can I parse/group these row values such that I can store the part info in an object so it has the PID and other measurements with the date manufactured, for each part. I want this sorted information to be able to use it in visualizations later. I would like to differentiate between the parts with the time/date they were manufactured at.
For PID 123456, I have 2 parts and each part has 4 properties. So for example, how could I draw a chart for the upper and lower values for the major diameter or minor diameter of different parts (under the same PID)? Thank you.

excel PowerPivot Auto Calculated Measures & Columns

After looking at a few similarish questions I figured I needed something more specific so asking here. I will start by explaining the situation:
The Setup
I have a Store which sells Cakes, Cookies and Wine. I have the weekly sales data of each product sorta like this:
Product ID
Product Name
Quantity
Value
Week Ending
1
Ginderbread
2
£4
13/01/22
2
Chocolate chip
5
£25
13/01/22
3
Red Wine Bottle
1
£10
13/01/22
4
Sponge Cake
3
£9
13/01/22
Currently every week's data is stored within the same table, with me using a Week filter to show only the week i'm interested in.
Using this Data I created PivotTables that shows the sales of each category, with the ability to drill down to show the specific products. Table looks something like this:
Category
Quantity
Value
Cakes
2
£4
Cookies
7
£29
Wine
1
£10
The issue
I now want to stick in a new calculated column that shows the Value as a %. E.g The total value for the previous table was £43, so Cookies is about 67%. If I drill down, it would show the Chocolate Chip record as 80% and Gingerbread as 20%
I imagine doing this would be easier if each individual week's data was on a different table, but I got a lot of weeks and I also want to do tables showing the sales for over a period of time. Plus I don't know of a way to merge the "value" and "quantity" columns, etc instead of having 1 for each week being shown.
any advice would be appreciated
Create an extra column in the source table (prior to filtering) entitled "perc" calculated as the corresponding value for each row divdied by the total value across all rows (se pic. / eqn. for first row below) --
=E2/$E$6
No calculated fields required - just include perc as the mesaure of interest in your pivot table, with value setting as 'sum':
The reason why this worked is because of the common denominator - which allows one to sum ratios on a 1:1 basis.
Devising a calculated field using the standard 'fields, items & sets' functionality for ordinary pivot tables would not be feasible / possible as far as I am aware. You would need to move into the realm of power pivots and data models - which is not too complicated (readily accesible directly from the field list per below) - however, I see this as unnecessary complication for the task at hand.
Side notes:
Using table names in your functions is sometimes more convenient when entering, albeit may appear tricky at first when reviewing - first eqn above becomes:
=[#Value]/Table1[[#Totals],[Value]]

Creating independent variable Stata in Panel Data model

Data and description of variables
Picture 1 and Sample unbalanced paneldata
Picture 1 shows a balanced panel data that I have created using an unbalanced one provided as a sample in the same image, where I had multiple products (ID) for different amount of years (YEAR). For each product, there were a different number of Shops offering the given product (ID). So as stated, this is a balanced set created by sorting out for the same years, same products (ID), and same shops (marked by the orange area in the sample unbalanced paneldata). This is an important assumption that might affect the perception of the issue stated below. The following is therefore a description of the table shown in Picture 1:
Years indicates the amount of period a product lasts for a given product (ID)
Shop 1, Shop 2, Shop 3 indicate different prices for a given product (ID) by different firms
The minimum and second minimum value depict what shops for a given year and product (ID), have the lowest and second lowest price for that given year. This is needed to calculate the Price difference, which is **(Second minimum value - Minimum Value) / (Minimum Value)
An example of this, is given for row 5 (Year 01.01.1995 - ID 101) where Price difference would be (3999-3790)/3790 = 5,51% (In Picture 1)
Issue
In my balanced panel data, (Picture 1), I want to run a fixed effect regression in STATA using xtreg function, where the dependent variable is the Price difference, and number of shops selling a product are the independent variables. This is, so I can say how Price difference as a dependent variable is affected when there is 1 shop selling, when there are two shops selling, and when there are three shops selling.
Another problem is, is my assumption valid at all of creating a balanced panel? Is it correct to create a balanced from the unbalanced paneldata, or must I use the unbalanced panel to create such a variable?
So my main issue is how to create such independent variables, that measure the dimension of number of shops offering products. To
clarify what I mean, I have included an example of a sample fixed
effect regression that may explain the structure that I attempt to
seek, in Picture 2 below:
NOTE (In picture 2 expected cell mean to the right is the same as Price difference in Picture 1, and is used as dependent variable. They are regressed on number of firms/shops as independent variables, and these I have an issue creating)
Picture 2
What I have tried
I have tried, using dummy variables, on shops, but they ended up getting dropped. The dataset provided in picture 1 is a balanced data set as mentioned, which is needed to run (I assume) a fixed effect regression on a paneldata.
End remark
I stated this question earlier in a much more imprecise manner, where I apologiese for any inconvenience. The problem I think, might be that either I have set it up wrong in excel, hence the dummy's are dropped, or something of that nature. It might also be, that I have to use the unbalanced set in order to create this independent variable, so that might also be a problem, that I am attempting to use a balanced set instead of the unbalanced one.
In your unbalanced sample (as we discussed in the comments, the balanced sample will not make sense) we first need to create a variable for the number of shops offering each ID, let us say we have the same data as in the top portion of your Picture 1
egen number_of_firms = rownonmiss(Shop*)
xtset ID year // to use xtreg, we must tell Stata the data are panel
xtreg Price_difference i.number_of_firms
The xtreg is the regression shown in your Picture 2.
If you want the number of firms variable to be formatted a bit more like Picture 2, you can do something like this:
qui levelsof number_of_firms, local(num)
foreach n in `num' {
local lab_def `lab_def' `n' "`n' Firms"
}
label def num_firms `lab_def'
label values number_of_firms num_firms
label var number_of_firms "Number of Firms"
And then run the regression and the output will be formatted with the number of firms lables.

making big data set smaller in excel

I made a little test machine that accidentally created a 'big' data set:
6 columns with +/- 550.000 rows.
The end result I am looking for is a graph with 6 lines, horizontal axis 1 - 550.000 measurements and vertically the values in the rows. (capped at 200 or so). Data is a resistance measurement that should be between 0 - 30 or very big (borken), the software writes 'inf' in these cases.
My skill is limited to excel, so what have I done until now:
Imported in Excel. The measurements are valuable between 0 - 30 and inf is not good for a graph, so I did: if(cell>200){200}else{keep cell value}.
Now making a graph is a timely exercise and excel does not like this, result is not good.
So I would like to take the average value of 60 measurements to reduce the rows to below 10.000. So =AVERAGE(H1:H60)
But I cannot get this to work.
Questions:
How do I reduce this data set and get a good graph.
Should I switch
to other software that is more applicable?
FYI: I already changed the software of the testing device to take the average value of a bunch of measurements the next time... But I cannot repeat this test.
Download link of data set comma separated file 17MB
I think you are on the right track, however my guess is that you only want to get an average every 60 rows and are unsure how to do this.
Using MOD(Number, Divisor) inside an if statement will let you specify that the average should be calculated only once in every x number of cells.
Assuming you'll have one row above your data table for headers, you are looking for something along the lines of:
=IF(MOD(ROW(A61),60) = 1,AVERAGE(H2:H61),"")
Once you have this you can filter your average column to non-blank values and use this to create your graph.

How to generate random numbers within a normal distribution using Excel

I want to use the RAND() function in Excel to generate a random number between 0 and 1.
However, I would like 80% of the values to fall between 0 and 0.2, 90% of the values to fall between 0 and 0.3, 95% of the values to fall between 0 and 0.5, etc.
This reminds me that I took an applied statistics course once upon a time, but not of what was actually in the course...
How is the best way to go about achieving this result using an Excel formula. Alternatively, what is this kind of statistical calculation called / any other pointers that I can Google around for.
=================
Use case:
I have a single column of meter readings, which I would like to duplicate 7 times (each column for a new month). each column has 55 000 rows. While the meter readings need to vary for each month, when taken as a time series, each meter number should have 7 realistic readings.
The aim is to produce realistic data to turn into heat maps (i.e. flag outlying meter readings)
I don't think that there is a formula which would fit exactly to your requirements. I would use a very straightforward solution:
Generate 80% of data using =RANDBETWEEN(0,20)/100
Generate 10% of data using =RANDBETWEEN(20,30)/100
Generate 5% of data using =RANDBETWEEN(30,50)/100
and so on
You can easily change the precision of generated data by modifying the parameters, for example: =RANDBETWEEN(0,2000)/10000 will generate data with up to 4 digits after decimal point.
UPDATE
Use a normal distribution for the use case, for example:
=NORMINV(RAND(), 20, 5)
where 20 is a mean value and 5 is a standard deviation.

Resources