SAS Renaming Columns - statistics

beginner SAS user here. Trying to rename column variables from NHANES data, but the code that I am using is registering wrong. The column names are long and drawn out so its been nearly impossible for me to try to recode them into a simpler format. Example and code down below, any assistance is greatly appreciated! For example, I'm trying to get Respondent sequence number to be renamed as ID, but SAS is having issues with the spaces between the original name if that makes sense.
data NHANES.Combined;
set NHANES.Combined;
rename Respondent sequence number = ID; run;
Image of Data Table

You have to use the names in the RENAME statement, not the labels you are looking at in the VIEWTABLE window. If you actually have a name with spaces in it (which you will not with NHANES data) then use a name literal in the code so SAS can parse out what parts of the command line represent the variable names.
rename 'non standard name'n = standard_name ;
Run PROC CONTENTS on your dataset to see the variable names and their attributes (TYPE, LENGTH, FORMAT, LABEL).

Related

SPSS converting a string into a numeric variable issue

I have a string variable with lots of parentheses and other punctuation e.g. _LSC Debt licensed work. How can I easily convert it to a numeric variable when I already have a specified code list for it? i.e. I don't want it to automatically recode everything because it uses the wrong values against the labels.
Create a dataset with two variables: a string holding the current messy name and a numeric variable holding the new code. Then, with both the original dataset and the lookup one sorted by the string, do MATCH FILES specifying a table match (or use Data > Merge Files > Add Variables).
You can prepare a separate file which includes two variables:
- one contains each of the possible values in the original string variable to be recoded (make sure the name and width are the same as your original variable)
- the second contains the new values you want to recode to.
when you set this up, match the files like this:
get file="filepath\Your_Value_Table.sav".
sort cases by YourOriginalVarName.
dataset name ValTab.
get file="filepath\Your_Original_File.sav".
sort cases by YourOriginalVarName.
match files /file=* /table=ValTab /by YourOriginalVarName.
exe.
At this point your original file will contain a new variable that has the codes you wanted.
In general I agree with the solution provided by others. However, I would like to suggest an extra step, which could make your look-up file (see the answer of eli-k and JKP) a bit better.
The point is that your string variable with lots of parentheses and other punctuation probably also has different ways to write the same thing.
For example:
_LSC Debt licensed work
LSC Debt licensed work
_LSC Debt Licensed Work
etc.
You could create a lookup-table with three variables: the unique values of the original string variable, a cleaned-up version of that variable, and finally the numeric value you want to attach.
The advantage of the cleaned-up version is that you can identify more easily the same value although it is written differently.
You could clean up using several functions:
string CleanedUpVersion (A40).
compute CleanedUpVersion = REPLACE(RTIM(LTRIM(UPCASE(YourOriginalVarName))),'_','').
execute.
In this basic example we convert to capital letters, delete leading and trailing blanks and remove the underscore by replacing it by nothing.
Overall this could help to avoid giving different numbers to unique values in your original variable that mean the same thing, while you would like them to have the same number.

How to read mixed string and number data from csv in matlab and manipulate

I'm looking to write a script for MATLAB that will import data from a csv file which has a first row containing string headers and the data in each of those columns is either string, date or numeric.
I want to then be able to filter the data in MATLAB according to instances of a particular string and number combination.
Any help appreciated!
Cheers!
I would recommend you to start with reading MATLAB documentation.
[num,txt,raw] = xlsread('myExample.xlsx')
Reads numeric, text and combined data, so, if your data is combined, then you need the cell array raw. After that, you do whatever you want with your cell array (Additional information is not provided since OP did not provide any specific information about the way the data would be filtered)
Try using readtable function in MATLAB.
It correctly imports csv file with header and mixed data type.
xlsread was imported by mixed csv file very incorrectly repeating the some rows while maintaining the same total rows.
I got this after searching for a long time:
MATLAB Central Question/Answer

SAS: Match single word within string values of a single variable then replace entire string value with a blank

I'm working in SAS 9.2, in an existing dataset. I need a simple way to match a single word within string values of a single variable, and then replace entire string value with a blank. I don't have experience with SQL, macros, etc. and I'm hoping for a way to do this (even if the code is less efficient" that will be clear to a novice.
Specifically, I need to remove the entire string containing the word "growth" in a variable "pathogen." Sample values include "No growth during two days", "no growth," "growth did not occur," etc. I cannot enter all possible strings since I don't yet know how they will vary (we have only entered a few observations so far).
TRANSWD and TRANSLATE will not work as they will not allow me to replace an entire phrase when the target word is only a part of the string.
Other methods I've looked at (for example, a SESUG paper using PRX at http://analytics.ncsu.edu/sesug/2007/CC06.pdf) appear to remove all instances of the target string in every variable in the dataset, instead of just in the variable of interest.
Obviously I could subset the dataset to a single variable before I perform one of these actions and then merge back, but I'm hoping for something less complicated. Although I will certainly give something more complicated a shot if someone can provide me with sample code to adapt (and it would be greatly appreciated).
Thanks in advance--Kim
Could you be a little more clear on who the data set is constructed? I think mjsqu's solution will work if your variable pathogen is stored sentence by sentence. If not then I would say your best bet is to parse the blocks into sentences and then apply mjsqu's solution.
DATA dataset1;
format Ref best1.
pathogen $40.;
input Ref pathogen $40. ;
datalines;
1 No growth during two days
2 no growth,
3 growth did not occur,
4 does not have the word
;
RUN;
DATA dataout;
SET dataset1;
IF index(lowcase(pathogen),"growth") THEN pathogen="";
RUN;

Best way to import numeric and non-numeric data (string) from an excel file into MATLAB?

I want to know the best way of importing both number and non-numeric data (which is string in the present case) from an excel file into MATLAB? By best (or better) way, I mean all the data together in a variable (or data structure).
First, I tried uiopen(filename) function which opens a wizard and from there, I can import the data into a MATLAB variable. However, problem here is that it replaces all the non-numeric data with zeros which is not required. I later on, found that this function calls another function, named xlsread(filename), which is another way (actual way) of importing excel file.
Second (last) way that I tried (which seems to be better) is to use function called importdata(filename) which imports both numeric and non-numeric data into separate structure variables.
However, I am wondering if there exists some other way(s) to import everything into a single variable or data structure?
xlsread is the correct way to import data from Excel spreadsheets,both numeric and non-numeric data. Check the documentation:
[num,txt,raw] = xlsread(___) additionally returns the text fields in
cell array txt, and the unprocessed data (numbers and text) in cell
array raw using any of the input arguments in the previous syntaxes.
If xlRange is specified, leading blank rows and columns in the
worksheet that precede rows and columns with data are returned in raw.

Importing data with invalid characters in numeric columns

DATA STRUCTURE: I have a data set that can be read as an Excel or CSV file. It has the following variable types: dates, time, numeric variables, and what should be numeric variables that incorrectly have characters attached to a number - e.g. -0.011* and 0.023954029324) (the parentheses at the end is in the cell) - due to an error in the program that wrote the file. There are also blank lines between every record, and it is not realistic to delete all of these as I have hundreds of files to manage.
DATA ISSUE: We've determined that some values are correct up to the character (i.e. -0.011 is correct as long as the asterisk is removed), while other values, such as 0.023954029324), are incorrect altogether and should be made missing. Please don't comment on this issue as it is out of my control and at this point all I can do is manage the data until the error is fixed and character values stop being written into the files.
PROBLEM WITH SAS:
1) If I use PROC IMPORT with an Excel file, SAS uses the first eight lines (20 for a CSV file) to determine if a variable is numeric or character. If the asterisk of parenthesis doesn't occur within the first 20 lines, SAS says the variable is numeric, then makes any later cells w/ character values missing. This is NOT okay in the case of the asterisks, because I want to maintain the numeric portion of the value and remove the asterisk in a later data step. Importing Excel files with PROC IMPORT does not allow the GUESSINGROWS option (as it does w/ CSV files, see below). Edit: Also, the MIXED=YES option does NOT work (see comments below - still need to change number of rows SAS uses, which, to me, means this option does...what?).
2) If I use PROC IMPORT with a CSV file, I can specify GUESSINGROWS=32767 and I get really excited because it then determines the variables with the asterisks are character and maintains the asterisks. However, it very strangely no longer determines the variables with parentheses as character (as it would have done when importing an Excel file as long as the parenthesis was in the first 20 lines), but instead removes the character and additionally rounds the value to the nearest whole number (0.1435980234 becomes 0, 1.82149023843 becomes 2, etc.). This is way too coarse of rounding - I need to maintain the decimal places. And, on top of that, the parentheses are now gone so I can't make the appropriate cells missing. I do not know if there is a way to make SAS not round and/or maintain the parentheses. To me, this is inconsistent behavior - why is the asterisk but not a parenthesis considered a character in this case? Also, when I read in the Excel file w/ PROC IMPORT (as described in (1)), it can cope w/ the parentheses (if they appear in the first 20 lines) - another inconsistency.
3) If I use INFILE, well - I get an error w/ every variable I try to read in - this procedure is way too sensitive and unstable for how varying the data are (and I have to code a work-around for the blank data lines).
ULTIMATE GOAL (note this code will be run automatically within a macro, if that matters):
1) Read date variable as a date
2) Read time variable as time
3) Be able to identify a variable w/ characters present in any cell of that variable (even after 20 lines) as a character variable and maintain the values in the cells (i.e. don't round/delete character). This can be by a priori telling SAS to let a certain set of variables be character (I will change them to numeric after I get rid of characters/make cells missing), or by SAS identifying variables w/ characters on its own.
SAS actually by default uses the first 8 rows. That is defined in a registry setting, TYPEGUESSROWS - which is normally stored in HKLM\Software\Microsoft\Office\14.0\Access Connectivity Engine\Engines\Excel\TypeGuessRows\ (or insert-your-office-version-there). Change that value to FFFF (hex)/65536 (decimal) or some other large number, or zero to search the maximum number of rows (a bit over 16000 - exact number is hard to find).
For a CSV file, you can write a data step import to control the formats of each variable. The easiest way to see this is to run the PROC IMPORT, then check your log; the log will contain the complete code used to read in the file in a data step. Then just modify the informats as needed. You say you have too much trouble with Infile method, so perhaps this won't work for you, but typically you can work around any inconsistencies - and if your files are THAT inconsistent it sounds like you'll be doing a ton of manual work anyway. This gives you the options to read in date/time variables correctly as well.
You also can use PROC IMPORT/CSV to the log, writing the log out to a file, then read THAT in and generate new import code on your own - or even off a proc contents of the generated file, making known modifications.
Not sure what you're asking about with date/time as you don't mention issues with it in the first part of your question.
One additional option you have is to clean out the characters before it's read in (from the CSV). This is pretty simple, if it's truly just numerics and commas (and decimals and negative signs):
data mydata;
infile myfile /*options*/;
input ##;
length infileline $32767; *or your longest reasonable line;
infileline = compress(_infile_,'.-','kd');
run;
data _null_;
set mydata;
file myfile /*options*/ /*or a new file if you prefer */;
put #1 infileline $32767.; *or your longest reasonable line;
run;
Then read that new file in using proc import. I am splitting it into two datasteps so you can see it, but you could combine them into one for ease of running - look up "updating a file in place" in SAS documentation. You could also accomplish this cleaning using OS specific tools; on Unix for example a short awk script could easily remove the misbehaving characters.

Resources