In SAS, I need a PROC TABULATE where labels are repeated so that it's easier on Excel to find them using INDEX-MATCH. Here is an example with sashelp.cars.
The first PROC TABULATE has the advantage of having repeating labels, which is needed for the INDEX-MATCH. But, its flaw is that SAS only gives the non missing values.
data cars;
set sashelp.cars;
run;
proc sort data=cars;
by make;
run;
This doesn't give all labels. I would like a table with 3 continents by column (Europe, Asia, USA) and every car type (Sedan, SUV, Wagon, Sports...).
PROC TABULATE DATA = cars;
option missing=0;
by make;
CLASS make type Type Origin / mlf MISSING ;
TABLE (
(type*make)
), (Origin='') / printmiss nocellmerge ; RUN;
So, in order to have all the 3 continents by colum, and every type of car (Sedan, SUV, Wagon, Sports...), I use CLASSDATA, as suggested:
Data level;
set cars;
keep make type Type Origin;
Run;
PROC TABULATE DATA = cars MISSING classdata=level;
option missing=0;
by make;
CLASS make type Type Origin / mlf MISSING ;
TABLE (
(make*type)
), (Origin='') / printmiss nocellmerge ;
RUN;
Data level;
set cars;
keep make type Type Origin;
Run;
PROC TABULATE DATA = cars MISSING classdata=level;
option missing=0;
by make;
CLASS make type Type Origin / mlf MISSING ;
TABLE (
(make*type)
), (Origin='') / printmiss nocellmerge ;
RUN;
But this gives a humongous table, and non repeating labels. Is there a midway solution with :
all the columns (3 continents) like in the last table
only the concerned MAKEs, that is the first 6 rows for Acura
repeated labels like in the first PROC TABULATE
Thank you very much,
I advice not exporting the listing of proc tabulate to excel
proc tabulate does not repeat values in the first column for each value in the second, because the output is meant for human reading. This is not the tool you need to write data to excel for further lookup.
I advice not using MATCH but SUMIFS
MATCH is a great function in excel, but is not a good choice for your application, because
it gives an error when it does not find what you look for, and that is why you need all labels in your output
it only supports one criterion, so you need at least 3 of them
it returns a position, so you still need an index function.
Therefore, I advice writing a simple create table
PROC sql;
create table TO_EXPORT as
select REGION, MACTIV, DATE, count(*) as cnt
from data
group by REGION, MACTIV, DATE;
proc export data = TO_EXPORT file="&myFolder\&myWorkbook..xlsx" replace;
RUN;
you will have your data in Excel in a more data oriented format.
To retrieve the data, I advise the following type of excel formula
=sumifs($D:$D,$A:$A,"13-*",$B:$B,$C:$C,"apr2020")`
It adds all counts with left of them the criteria you are looking for.
Because at most one row will meet these criteria, it actually just looks up a count you are looking for.
If that count does not exist, it will just return zero.
Disclaimer:
I did not test this code, so if it does not work, leave a comment and I will.
Related
trying to out put a dataframe into txt file (for a feed). a few specific columns are getting automatically converted into float instead of int as intented.
how can i specific those columns to use int as dtype?
i tried to output the whole dataframd as string and that did not work.
the columns i would like to specify are named [CID1] and [CID2]
data = pd.read_sql(sql,conn)
data = data.astype(str)
data.to_csv('data_feed_feed.txt', sep ='\t',index=True)
Based on the code you provided, you turn all of your data to strings just before export.
Thus, you either need to turn some cols back to desired type, such as:
data["CID1"] = data["CID1"].astype(int)
or not convert them in the first place.
It is not clear from what you provided why you'd have issues with ints being converted to floats.
this post provides heaps of info:
stackoverflow.com/a/28648923/9249533
My attempt:
# Compute the mean, median and variance for the variables sph, acous and dur. Compare their level of variability.
sad_mean = dat_songs[['spch', 'acous', 'dur']].mean()
sad_mode = dat_songs[['spch', 'acous', 'dur']].mode()
sad_median = dat_songs[['spch', 'acous', 'dur']].median()
sad_mmm = pd.DataFrame({'mean':sad_mean, 'median':sad_median, 'mode':sad_mode})
sad_mmm
Which outputs this
First of all, the median column is not right at all and want to know how to fix that too.
Secondly, I feel like I have seen some quicker or shorter way to do this with a simple function with pandas.
My data head for reference
Simply try, dat_songs.describe(). Descriptive Statistics will be present for all the numerical columns.
For selected columns.
dat_songs[['spch', 'acous', 'dur']].describe()
Let's assume that I have Custom Table named Possible URL target parameters with code name xyz.PossibleTargets with 2 columns:
Explanation and Value.
How to feed drop-down field on page type with data to have Value (from table) as Value and Explanation as name in drop-down?
What I already tried and it is not working:
Generate value;name pairs divided by newline and place it as List of options:
z = ""; foreach (x in CMSContext.Current.GlobalObjects.CustomTables["xyz.PossibleTargets"].Items) {z += x.GetValue("Value"); z +=";"; z += x.GetValue("Explanation"); z += "\n" }; return z;
Validator do no allow me to do such trick.
Set option Macro expression and provide enumerable object:
CMSContext.Current.GlobalObjects.CustomTables["xyz.PossibleTargets"].Items
In Item transformation: {%Explanation%} and in Value column {%TargetValue%}.
This do not work also.
Dropdown configuration
How to do this correctly? Documentation and hints on the fields are not helpful.
Kentico v11.0.26
I think that you should do it without marking field as a macro. Just type there the macro. Take a look on screen
No need to use a macro, use straight SQL, using a macro only complicates what appears to be a simple dropdown list.
SELECT '', '-- select one --' AS Explanation
UNION
SELECT TargetValue, Explanation
FROM xyz_PossibleTargets -- make sure to use the correct table name
ORDER BY ExplanationText
This should populate exactly what you're looking for without the complication of a macro.
I am trying to get some variables and numbers out from an Excel table using Matlab.
The variables below named "diffZ_trial1-4" should be calculated by the difference between two columns (between "start" and "finish"). However I get the error:
Undefined operator '-' for input arguments of type"
'cell'.
And I have read somewhere that it could be related to the fact that I get {} output instead of [] and maybe I need to use cell2mat or convert the output somehow. But I must have done that wrongly, as it did not work!
Question: How can I calculate the difference between two columns below?
clear all, close all
[num,txt,raw] = xlsread('test.xlsx');
start = find(strcmp(raw,'HNO'));
finish = find(strcmp(raw,'End Trial: '));
%%% TIMELINE EACH TRIAL
time_trial1 = raw(start(1):finish(1),8);
time_trial2 = raw(start(2):finish(2),8);
time_trial3 = raw(start(3):finish(3),8);
time_trial4 = raw(start(4):finish(4),8);
%%%MOVEMENT EACH TRIAL
diffZ_trial1 = raw(start(1):finish(1),17)-raw(start(1):finish(1),11);
diffZ_trial2 = raw(start(2):finish(2),17)-raw(start(2):finish(2),11);
diffZ_trial3 = raw(start(3):finish(3),17)-raw(start(3):finish(3),11);
diffZ_trial4 = raw(start(4):finish(4),17)-raw(start(4):finish(4),11);
You are right, raw contains data of all types, including text (http://uk.mathworks.com/help/matlab/ref/xlsread.html#outputarg_raw). You should use num, which is a numeric matrix.
Alternatively, if you have an updated version of Matlab, you can try readtable (https://uk.mathworks.com/help/matlab/ref/readtable.html), which I think is more flexible. It creates a table from an excel file, containing both text and numbers.
I am having trouble figuring out how to extract specific text within a string. My dataset has been pulled from de-identified electronic health records, and contains a list of every medication that our patients have been prescribed. I am, however, only concerned with a specific list of medications, which I have in another table. Within each cell is the name of the medication, dose, and form (Tabs, Caps, etc.) [See image]. Much of this information is not important for my analysis though, and I only need to extract the medication names that match my list. It might also be useful to extract the first word from each string, as it is (in most cases) the name of the medication.
I have examined a number of different methods of pulling substrings, but haven't quite found something that meets my needs. Any help would be greatly appreciated.
Thanks.
Data DRUGS;
infile datalines flowover;
length drug1-drug69 $20;
array drug[69];
input (drug1-drug69)($);
datalines;
AMITRIPTYLINE
AMOXAPINE
BUPROPION
CITALOPRAM
CLOMIPRAMINE
DESIPRAMINE
DOXEPIN
ESCITALOPRAM
FLUOXETINE
FLUVOXAMINE
IMIPRAMINE
ISOCARBOXAZID
MAPROTILINE
MIRTAZAPINE
NEFAZODONE
NORTRIPTYLINE
PAROXETINE
PHENELZINE
PROTRIPTYLINE
SERTRALINE
TRANYLCYPROMINE
TRAZODONE
TRIMIPRAMINE
VENLAFAXINE
AMITRIP
ELEVIL
ENDEP
LEVATE
ADISEN
AMOLIFE
AMOXAN
AMOXAPINE
DEFANYL
OXAMINE
OXCAP
WELLBUTRIN
BUPROBAN
APLENZIN
BUDEPRION
ZYBAN
CELEXA
ANAFRANIL
NORPRAMIN
SILENOR
PRUDOXIN
ZONALON
LEXAPRO
PROZAC
SARAFEM
LUVOX
TOFRANIL
TOFRANIL-PM
MARPLAN
LUDIOMIL
REMERON
REMERONSOLTAB
PAMELOR
PAXIL
PEXEVA
BRISDELLE
NARDIL
VIVACTIL
ZOLOFT
PARNATE
OLEPTRO
SURMONTIL
EFFEXOR
DESVENLAFAXINE
PRISTIQ
;;;;
run;
Data DM4_;
if _n_=1 then set DRUGS;
array drug[69];
set DM4;
do _i = 1 to countw(Description,' ().,');
_med = scan(Description,_i,' ().,');
_whichmed = whichc(_med, of drug[*]);
if _whichmed > 0 then leave;
end;
run;
Data DM_Meds (drop = drug1-drug69 _i _med _whichmed);
Set DM4_;
IF _whichmed > 0 then anti = _med;
else anti = ' ';
run;
This is a fairly common problem with a bunch of possible solutions depending on your needs.
The simplest answer is to create an array, assuming you have a smallish number of medicines. This isn't necessarily the fastest solution, but it would work fairly well and is simple to construct. Just get your drug list into a dataset, transpose it to horizontal (one row with lots of meds), then load it up this way. You iterate over the words in the name of the medicine and see if any of them are in the medicine list - if they are, then bingo, you have your drug! In real use of course drop the drug: variables afterwards.
This works a bit better than the inverse (searching each drug to see if it's in the medicine name) since usually there are more words in the drug list than in the medicine name. The hash solution might be faster, if you're comfortable with hashes (load the drug list into a hash table then use find() to do the same as what whichc is doing here).
data have;
input #1 medname $50.;
datalines;
PROVIGIL OR
ENSURE HIGH PROTEIN OR LIQD
BENADRYL 25 MG OR CAPS
ECOTRIN LOW STRENGTH 81 MG OR TBEC
SPIRONOLACTONE 25 MG PO TABS
NORVASC 5 MG OR TABS
FLUOXETINE HCL 25MG
IBUPROFEN 200MG
NEFAZODONE TABS OR CAPS 20MG
PAXIL (PAROXETINE HCL) 25MG
;;;;
run;
data drugs;
infile datalines flowover;
length drug1-drug19 $20;
array drug[19];
input (drug1-drug19) ($);
datalines;
AMITRIPTYLINE
AMOXAPINE
BUPROPION
CITALOPRAM
CLOMIPRAMINE
DESIPRAMINE
OXEPIN
ESCITALOPRAM
FLUOXETINE
FLUVOXAMINE
IMIPRAMINE
ISOCARBOXAZID
MAPROTILINE
MIRTAZAPINE
NEFAZODONE
NORTRIPTYLINE
PAROXETINE
PHENELZINE
PROTRIPTYLINE
;;;;
run;
data want;
if _n_ = 1 then set drugs;
array drug[19];
set have;
do _i = 1 to countw(medname,' ().,');
_medword = scan(medname,_i,' ().,');
_whichmed = whichc(_medword, of drug[*]);
if _whichmed > 0 then leave;
end;
run;
This should be an easy task for PROC SQL.
Let's say you have patient information in table A and drug names in table B (long format, not the wide format you gave). Here is the code filtering table A rows into table C where description in A contains drug name in B.
PROC SQL;
CREATE TABLE C AS SELECT DISTINCT *
FROM A LEFT JOIN B
ON UPCASE(A.description) CONTAINS UPCASE(B.drug);
QUIT;