How to find the similar numeric column from one table with the column in another table - statistics

Currently i am working on a project in which the use case is for example:
there are two data frames one from the db and the other from the user uploaded file.The user uploaded file will be having unnamed columns
DB DATA FRAME
|customer_id | name | age | days_as_customer | revenue | payment |
|------------|------|-----|------------------|---------|---------|
| 00001 | x1 | 25 | 40 | 2000 |monthly |
| 00002 | x2 | 51 | 200 | 10000 | yearly |
USER UPLOADED FILE
|customer_id | C1 | C2 | C3 | C4 | c5 |
|------------|------|-----|------------------|---------|-----------|
| 00011 | x5 | 45 | 1 | 8000 |quarterly |
| 00022 | x6 | 33 | 20 | 1000 |half-yearly|
What i need is that i have to match the columns from the user uploaded file to the db columns ie:
output
{
age : C2,
days_as_customer : C3,
revenue : C4
}
Previously i tried using ks-test, qunatiles, skewness, central tendencies but the results are not as expected because most of time the uploaded file has central tendencies which is completely different from the db data frame
Can anyone suggest me any techniques to solve this ?

Related

Insert column in another table for two already imported table in Spotfire

My question is, I’m sure, quite easy.
I have two table, including for each a column named ID.
The fist one is having all ID.
The second one is having only few.
I want to work on the missing ID.
Thus, I want to add a flag in my first table if this ID is present in the second table, or not.
I’m using Spotfire 11.4. In a previous version, it was easy to « add column » using a table from the analysis. Now I can’t find it, I have to import again the same table to make the merge.
Do you have any tips ?
Table I
| ID | Country |
| -- | ------- |
| A1 | France |
| A2 | Germany |
| A3 | U.K. |
| A7 | U.S.A. |
Table II
| ID |
| -- |
| A2 |
| A7 |
Expected Table
| ID | Country | flag |
| -- | ------- | ---- |
| A1 | France | 0 |
| A2 | Germany | 1 |
| A3 | U.K. | 0 |
| A7 | U.S.A. | 1 |
Ok, the option is hidden in the "Other" option, when inserting column (or rows) in the data canva.

Excel , Multi-lookup/Match formula

I would like to use a formula in Sheet B to obtain the following transformation.
I want to vlookup stack, over, flow, super, user and the associated id's and put them into the Sheet B format. The formula would be copied horizontally across many 'Names' and then down.
Current, sheet A:
+-------------+-------+-------+
| Position_ID | Name | Value |
+-------------+-------+-------+
| 5963650267 | stack | 10 |
| 5963650267 | over | 20 |
| 5963650267 | flow | 30 |
| 5963650267 | super | 40 |
| 5963650267 | user | 50 |
| 5963650268 | stack | 90 |
| 5963650268 | over | 110 |
| 5963650268 | flow | 80 |
| 5963650268 | super | 70 |
| 5963650268 | user | 20 |
+-------------+-------+-------+
Expected, Sheet B, headers and positions ids are already pre populated:
+-------------+-------+------+------+-------+------+
| Position_ID | stack | over | flow | super | user |
+-------------+-------+------+------+-------+------+
| 5963650267 | 10 | 20 | 30 | 40 | 50 |
| 5963650268 | 90 | 110 | 80 | 70 | 20 |
+-------------+-------+------+------+-------+------+
Assuming the data in Sheet A is located at A1:C11 (adjust as required), enter this Formula Array in Sheet B at B2 then copy to all required cells (i.e. C2:F2 and B3:F3)
=INDEX('Sheet A'!$C$1:$C$11,
MATCH(CONCATENATE($A2,"|",B$1),
CONCATENATE('Sheet A'!$A$1:$A$11,"|",'Sheet A'!$B$1:$B$11),0))
Formula Array must be entered by holding down CTRL + SHIFT + ENTER
Apologies for the formatting - but if you add the vlookups to the empty shell of position_ids by name on sheet b it should give you the grid you're looking for.
Sheeta! ID&Name Position_ID Name Value
=C2&D2 1 stack 10
=C3&D3 1 over 20
=C4&D4 1 flow 30
=C5&D5 1 super 40
=C6&D6 1 user 50
=C7&D7 2 stack 90
=C8&D8 2 over 110
=C9&D9 2 flow 80
=C10&D10 2 super 70
=C11&D11 2 user 20
Sheetb! stack over flow super user
1 =VLOOKUP($A14&B$13,$B$2:$E$11,4,FALSE) =VLOOKUP($A14&C$13,$B$2:$E$11,4,FALSE) =VLOOKUP($A14&D$13,$B$2:$E$11,4,FALSE) =VLOOKUP($A14&E$13,$B$2:$E$11,4,FALSE) =VLOOKUP($A14&F$13,$B$2:$E$11,4,FALSE)
2 =VLOOKUP($A15&B$13,$B$2:$E$11,4,FALSE) =VLOOKUP($A15&C$13,$B$2:$E$11,4,FALSE) =VLOOKUP($A15&D$13,$B$2:$E$11,4,FALSE) =VLOOKUP($A15&E$13,$B$2:$E$11,4,FALSE) =VLOOKUP($A15&F$13,$B$2:$E$11,4,FALSE)
Sheetb! stack over flow super user
1 10 20 30 40 50
2 90 110 80 70 20

Getting Summation from a Table, with matching values from another Table in Excel

I have 2 tables created in Excel, which are identical in structure and the column and row names.
The only difference is that the first table has data (for the effort in work days) in it while the second is a reference table stating which milestone each cell belongs to. A sample of these tables is:
TBL1:
| | App1 | App2 | App3 |
| T1 | 32 | 12 | 48 |
| T2 | 40 | 16 | 30 |
| T3 | 56 | 18 | 36 |
TBL2:
| | App1 | App2 | App3 |
| T1 | 1 | 2 | 3 |
| T2 | 2 | 1 | 2 |
| T3 | 1 | 1 | 1 |
I want to collate these values so that I get SUM of 1, 2 and 3
| | Days Summation |
| 1 | =32+56+16+18+36 |
| 2 | =40+12+30 |
| 3 | =48 |
So basically, want to find:
IF(COL_VAL_IN_TBL2=1) THEN SUM ALL VALUES IN TBL1 CORRESPONDING TO THE ROW-COL IN RESPECTIVE
Is it possible to get a formula which I can use to do this without using something like a Pivot Table?
You can use sumif() to do this:
Here it's just looking at table2 values and comparing them to your 1, 2, or 3 and then summing the corresponding cells from your table1
SUMIF will do the trick if I understand correctly:
If you put 1 in A1, then 2 in A2, etc. Then enter in B1=SUMIF(TBL2Range,A1,TBL1Range) and copy down. Where TBL2Range is the address of your table.

Count text occurrences in a column in Excel

I have the following list in Excel:
+-------+----------+
| am | ipiresia |
+-------+----------+
| 50470 | 29 |
| 50470 | 43 |
| 50433 | 29 |
| 6417 | 51 |
| 6417 | 52 |
| 6417 | 53 |
| 4960 | 25 |
| 4960 | 26 |
| 5567 | 89 |
| 6716 | 88 |
+-------+----------+
I want to add a column, let's say 'num' and count the occurrences of column 'am' in a row adding one when a new occurrence happens as follows:
+-------+----------+-----+
| am | ipiresia | num |
+-------+----------+-----+
| 50470 | 29 | 1 |
| 50470 | 43 | 2 |
| 50433 | 29 | 1 |
| 6417 | 51 | 1 |
| 6417 | 52 | 2 |
| 6417 | 53 | 3 |
| 4960 | 25 | 1 |
| 4960 | 26 | 2 |
| 5567 | 89 | 1 |
| 6716 | 88 | 1 |
+-------+----------+-----+
Is it possible to get this automatically with a formula in Excel?
yes,
my example:
(assume you start your table containing 3 columns at Excels origin at A1 without header lines)
Then fill C1 with value "1"
and then start in C2 with entering a formula
simple like this:
=if($A2=$A1;$C1+1;1)
then you drag C2 down at the cells downright located autofill position as far as you want. Most times also double click works to let Excel autofill the columns down to the end of you prefilled table.
If you need assistance for AutoFill press F1 in Excel an the help with tell you in detail.
Assuming the sample table starts at A1 (with headers) the following formula will provide the expected results even if the list is not sorted.
=COUNTIF($A$1:$A2,A2)
Enter the formula at cell C2 then paste it down to the last cell of the data (or use AutoFill)

Access Database Design(multilevel) with Exporting Issue for Excel

Hi I am currently working on a project that contains individual information for each month and I want to build a table or two to contain the information(I don't want to create a table for each month). a simple illustration will be :
Jan
weight height
student a
student b
Feb
weight height
student a
student b
student c
what I what is just to export data to excel in the form of the above, weight, height column are fixed but I want to have data clustered by month so that the data organization is clearer.
May I ask how to design the database so that the abovementioned requirement could be met? Thanks.
Here are the tables you'll need to store the information:
students
id unsigned int(P)
name varchar(50)
+----+------+
| id | name |
+----+------+
| 1 | John |
| 2 | Mary |
| 3 | Tina |
| .. | .... |
+----+------+
In the measurements table the Primary Key is formed by the student_id, year and month. The student_id is also a foreign key to the students table.
measurements
student_id unsigned int(F students.id)-\
year unsigned int ----------------(P)
month unsigned int ---------------/
height unsigned int
weight unsigned int
+------------+------+-------+--------+--------+
| student_id | year | month | height | weight |
+------------+------+-------+--------+--------+
| 1 | 2013 | 11 | 70 | 200 |
| 2 | 2013 | 11 | 65 | 130 |
| 1 | 2013 | 12 | 70 | 192 |
| 2 | 2013 | 12 | 65 | 126 |
| 3 | 2013 | 12 | 68 | 140 |
| .......... | .... | ..... | ...... | ...... |
+------------+------+-------+--------+--------+
And then a query to extract the information:
SELECT name, height, weight, year, month
FROM students s
LEFT JOIN measurements m ON s.id = m.student_id
ORDER BY year, month, name
Which will give you:
+------+--------+--------+------+-------+
| name | height | weight | year | month |
+------+--------+--------+------+-------+
| John | 70 | 200 | 2013 | 11 |
| Mary | 65 | 130 | 2013 | 11 |
| John | 70 | 192 | 2013 | 12 |
| Mary | 65 | 126 | 2013 | 12 |
| Tina | 68 | 140 | 2013 | 12 |
+------+--------+--------+------+-------+
Which is the data you want, sorted in the way you want. Any further formatting of the data is up to your application.

Resources