Separating values that are combined in one string - excel

I would like to solve this either in Excel or in SPSS:
I have categorical data (each number representing a medical diagnosis) that are combined into single cells. In other words, a row (patient) has multiple diagnoses. However, I would like to know the frequencies of each diagnosis. What is the best way to go about this? (See picture for reference)

For SPSS:
First just creating some sample data to demonstrate on:
data list free/e_cerv_dis_state (a20).
begin data
"{1/2/3/6}" "{1/2/4}" "{2/4/5}" "{1/5/6}" "{4}" "{4/5/6}" "{1/2/3/4/5/6}"
end data.
Now the following code will create a separate variable for each possible diagnosis, and will put a 1 in it if the diagnosis exists in the original variable.
do repeat vr=diag1 to diag9/vl=1 to 9.
compute vr=char.index(e_cerv_dis_state, string(vl, f1) ) > 0.
end repeat.
freq diag1 to diag6.
Note this will only work for up to 9 diagnoses. If you have more than that the solution will have to be adapted to multiple digits.

Assuming that the number of columns is fairly regular, I would suggest using text to columns, and then using COUNTIF on the cells if they are the value wanted. However there is a more robust and reproducible solution that would involve using SQL. If you download the free version of SQL Express here: https://www.microsoft.com/en-gb/sql-server/sql-server-downloads
Then you can import your table of data, here's how to do that: How to import an Excel file into SQL Server?
Then you could use the more friendly SQL database to get the answers you want. For example you can use a select statement that would say:
SELECT count(e_cerv_dis_state)
WHERE e_cerv_dis_state = '6'
It would also be possible to use a CASE WHEN statement to add-in the names of the diagnoses.

Related

Alternatives to interpolate three dimensional data

I have a table that shows me a chemical concentration value based on temperature, pH and
ammonia. The way the I measure these variables, the ammonia level are always one of these six values (on top of the table), so it works as a categorical variable.
I need a way to interpolate on this table, based on these 3 variables. I tried using a combination of INDEX and MATCH, but I was not able to achieve what I wanted. Then I thought of "dividing" the table in intervals to "reduce" one variable and use an IF function to select which interval to interpolate based on the third variable (I was thinking pH or Ammonia), but I can't figure out a way to change intervals dynamically like this.
Can anyone think of an alternative to accomplish what I'm trying to do? If possible I would like to avoid using VBA, but if there is no other way I have no problem using it.
Thank you for the help!
I'm attaching an example of the table below.
Assuming that PH is in Column A:
=INDEX(A:H;MATCH(6,8;A:A;0)+MATCH(25;B:B;0)-2;MATCH(2;2:2,0))
Where the -2 needs to be changed to the number of rows BEFORE the first 22 in Temp.
This also assumes that the pattern of 22;25;28 in Temp is the same for every pH

Generate a multicolumn table using docxtpl

I have a series of data (in 2-dimensional list 'CombinedTable') I need to use to populate a table in an MS Word template. The table has 7 columns so I attempted the following using docxtpl module:
context = {
'tpl_modules1': CombinedTable[0]
'tpl_modules2': CombinedTable[2]
'tpl_modules3': CombinedTable[4]
'tpl_modules4': CombinedTable[6]
'tpl_modules5': CombinedTable[8]
'tpl_modules6': CombinedTable[10]
'tpl_modules7': CombinedTable[12]
}
tpl.render(context)
tpl.save(FilePath + FileName)
Not the most elegant solution I know but am just trying to get this working- unfortunately using this code with the following template results in tpl_modules7 data being written in to all columns, rather than just the 7th.
Does anyone have advice for how to resolve this? I attempted to create a for loop through the columns as well as rows but was unsuccessful in writing anything to the doc (was saved as a blank & empty doc).
The CombinedTable variable is a list of 12 lists (one for each column in template, although only 7 contain data). Each of these 12 lists contains another list with cell data whose length is equal to the number of rows to be written to the table in that column. This means that the number of rows that are written to varies for each column.
EDIT: Looking more closely at the docs, it states that I cannot use %tr multiple times in the same row. I assume I will then have to use a loop through %tc and %tr (which I tried & couldn't get working). Any advice on how to implement this? Especially on the side of the word document. Thanks!
I was able to resolve this satisfactorily for my requirements, however my solution may not suit all. I simply set up 7 different tables in a document with 7 columns and adjusted margins/borders to suit the dimensions I required for the tables. Each of the 7 tables had identical docxtpl syntax as image in my question with the small buffer columns between them being replaced by columns in the word document.

Combine two data ranges into one range (Google Drive Excel)

Hi there I am looking to combine two data ranges/arrays into one in order to feed them into excel FREQUENCY function.
Example:
First data range - B5:F50
Second data range - J5:N50
Bins data range - I5:I16
Function definition - FREQUENCY(data_array; bins_array)
Basically I am lazy and I don't want to reshuffle my excel script to spit out both datasets side by side so that I can reference them using something like B5:K50 range. Is there any way I can combine both datasets into data_array using some kind of formula? Maybe to end up with something along the line of =FREQUENCY((B5:F50,J5:N50); I5:I16) ?
BTW: Either of
=FREQUENCY(B5:F50; I5:I16)
=FREQUENCY(J5:N50; I5:I16)
work just file on their own for me.
Update
Actual formula definition FREQUENCY(data, classes)
2013 MS Excel (unrelated)
In MS Excel FREQUENCY function accepts a "union" as the first argument, i.e. a list of references separated by commas and enclosed in parentheses e.g.
=FREQUENCY((B5:F50,J5:N50),I5:I16)
Note: the "bins array" can also be a union if required
In "Google sheets" I don't think the same thing is possible - there may be a clever workaround, but I'm not aware of it
The Using Arrays page has some details that worked for me:
https://support.google.com/docs/answer/6208276?hl=en
It says:
"You can join multiple ranges into one continuous range using this same punctuation, which works the same way as VMERGE. For example, to combine values from A1-A10 with the values from D1-D10, you can use the following formula to create a range in a continuous column: ={A1:A10; D1:D10}"
I have two named ranges, so I was able to use {namedRange1;namedRange2} and it gave me one continuous range.

Display, sort and filter numbers with multiple decimal in excel 2007

I'm using excel 2007.
I've a list of tasks (200-500) that I need to group in different category/section etc (multiple filters). Whole data is in excel table so I can apply Excel's build-in table filters to display exact data that I need.
However it is always difficult to apply multiple filter to display expected data, specially as I need to do it very frequently. To make things simple I'm planning to number each record like
a.b.c.d.e.f
Where a, b, c, d, e, f are simple numbers. List looks like:
1
1.1
1.2
1.2.1
1.2.1.1
1.2.2
1.3
& so on.
Problem is, Excel take it as number with single decimal but as soon as I add second decimal, excel treat it as text, which is obvious in general behavior.
However, as special case, I need excel treat both as number or text. Number is preferable as I want to sort them, which might be difficult as a text.
To make the things little more complex, while filtering in table, I require if I can add some formula to filter results like 1.* should display all numbers starts with 1.
Is it possible with excel's default behavior, without VBA?
If no, is it possible with VBA? If yes, any clue is appreciated. I don't need whole program as I can write basic VBA program, just a clue how it can be done?
I sort mine by adding a helper column that adds a letter to the front and sort on that. E.g. 1 becomes f1, 1.1 becomes f1.1 etc. Then all are sorted as text.
You can use the formula ="f" & A1.
My sample:
Then the data sorted:
And the filter:
If I were to try this without VBA, my first step would be to use the sort to columns function on the data tab.
Next make sure all empty spaces in your data are filled with zeros.
Then sort the data by column
as long as you left your original data in the same row as the sorted data (I didn't in the images posted to focus on the process), your items should now be in order.

PLSQL to modify VARCHAR2 column data

I am working on an app that involves evaluating modifications made to vehicles, and does some number crunching from figures stored in an Oracle 10g database. Unfortunately, I only have a text data in the database, yet I need to work with numbers and not text. I would like to know if anyone could help me with understanding how to perform string operations on VARCHAR2 column data in an Oracle 10g database with PLSQL:
For example: I need to take a VARCHAR2 column named TOP_SPEED in a table named CARS, parse the text data in its column to break it up into two new values, and insert these new values into two new NUMBER type columns in the CARS table, TOP_SPEED_KMH and TOP_SPEED_MPH.
The data in the TOP_SPEED column is as such: eg. "153 km/h (94.62 mph)"
I want to save the value of 153.00 into the TOP_SPEED_KMH column, and the 94.62 value into TOP_SPEED_MPH column.
I think what I have to do in a query/script is this:
select the text data in TOP_SPEED into a local text variable
modify the local text variable and save the new values into two number variables
write back the two number variables to the corresponding TOP_SPEED_KMH and TOP_SPEED_MPH columns
Could someone please confirm that I am on the right track? I would also really appreciate any example code if anyone has the time.
Cheers
I think it's a better idea to just have the top_speed_kmh column, and get rid of the mph one. As the number of kms in a mile never changes, you can simply multiply by 0.6 to convert to miles. So you can do the same update statement as N West suggested without the mph column:
UPDATE CARS SET TOP_SPEED_KMH = TO_NUMBER(SUBSTR(1, (INSTR(UPPER(TOP_SPEED), "KM/H") -1)));
And whenever you need the mph speed, just do
Select top_speed_kmh*0.6 as top_speed_mph from cars;
For the parsing bit, you would probably use either REGEXP_SUBSTR or INSTR with SUBSTR
Then use TO_NUMBER to convert to number
You can either create a PL/SQL function for each parsing, returning the number value, and run an UPDATE query on the fields, or you could create a PL/SQL procedure with a cursor looping over all the data that is to be updated.
Here are som links for some of the built-ins:
http://psoug.org/reference/substr_instr.html
http://download.oracle.com/docs/cd/B14117_01/server.101/b10759/functions116.htm
You probably don't even need to do this with PL/SQL at all.
As long as the data in the column is consistent "99.99 km/h (99.99 m/h)" you could do this directly with SQL:
UPDATE CARS
SET TOP_SPEED_KMH = TO_NUMBER(SUBSTR(1, (INSTR(UPPER(TOP_SPEED), "KM/H") - 1))),
TOP_SPEED_MPH = <similar substr/instr combination to pull the 99.99 mph out of code>;
Set-operations are typically much faster than procedural operations.
I am working on an app that involves
evaluating modifications made to
vehicles, and does some number
crunching from figures stored in an
Oracle 10g database. Unfortunately, I
only have a text data in the database,
yet I need to work with numbers and
not text
Sounds like you should have some number columns to store these parsed out values. Instead of always calling some parsing routine (be it regexp or substr or a custom function), pass through all the data in the table(s) ONCE and populate the new number fields. You should also modify the ETL process to populate the new number fields moving forward.
If you need numbers and can parse them out, do it once (hopefully in a staging area or off hours at least) and then have the numbers you want. Now you're free to do arithmetic and everything else you'd expect from real numbers ;)
with s as
(select '153 km/h (94.62 mph)' ts from dual)
select
ts,
to_number(substr(ts, 1, instr(ts, ' ') -1)) speed_km,
to_number(substr(regexp_substr(ts, '\([0-9]+'), 2)) speed_mph
from s
Thanks everyone, it was nice to be able to use everyone's input to get the answer below:
UPDATE CARS
SET
CAR_TOP_SPEED_KPH =
to_number(substr(CAR_TOP_SPEED, 1, instr(UPPER(CAR_TOP_SPEED), ' KM/H') -1)),
CAR_TOP_SPEED_MPH =
to_number(substr(regexp_substr(CAR_TOP_SPEED, '\([0-9]+'), 2));

Resources