Read data with large strings with SAS - string

I want to read a .csv file with large strings with SAS. This is my file tmp.csv in comma separated values format
1,1005725,[(B42.ND761).B437]1-8-1-1-1-3-3-3-2-2/RT0658,5S3563A/RT0658,,,5S3563A,RT0658
2,09VL101,20347 PL6 O94 E98-1-0/K9616LM,19058/K9616LM,19058,,19058,K9616LM
3,09VL102,20351 PL6-1-0/K9616LM 19060/K9616LM,,19060,,19060,K9616LM
4,09VL103,20347 PL6 O94 E98-2-0/K9962LM,AID19058A/K9962LM,19058,,AID19058A,K9962LM
5,09VL105,,V4649A/F0001LM,,,V4649A,F0001LM
I've used this code, but it hasn´t worked.
DATA datos;
INFILE "C:\Users\UserName\Documents\tmp.csv" DLM="," DSD MISSOVER;
INPUT Num Code :$7. Pedigree : $44. LineCode : $17. FemaleCode $5. MaleCode $ NFemale $9. NMale $7. ;
RUN;
This should be the result
Correct Data

I think Joe has the right idea - your variable lengths are messed up. I was able to produce the desired result using your code but with some renaming and resizing of your variables.
DATA datos;
INFILE "C:\Users\UserName\Documents\tmp.csv" DLM="," DSD MISSOVER;
INPUT a:$1. b:$7. c:$44. d:$17. e:$5. f:$9. g:$7.;
RUN;

I know it seems like you are saving typing by putting the informats in the input statement, but I think it is much easier to define the variables first and then write the input statement. Especially when reading from a delimited file. If you define the variables in the same order that you want to read them you can even just use a variable list in the INPUT statement.
DATA datos;
INFILE "C:\Users\UserName\Documents\tmp.csv" DSD TRUNCOVER;
LENGTH NumCode $7 Pedigree $44 LineCode $17 FemaleCode $5 NFemale $9 NMale $7 ;
INPUT NumCode -- NMale ;
RUN;
Also it is generally better to use TRUNCOVER instead of MISSOVER option on the INFILE statement. Most of the time you do not want SAS to set the value to missing when you ask it to read 7 characters and there are only 3 available on the line. You would prefer the have SAS use the 3 characters that are available. It won't make a difference on delimited input, but if you use formatted input without the : modifier you can miss data.

Related

how do I get rid of leading/trailing spaces in SAS search terms?

I have had to look up hundreds (if not thousands) of free-text answers on google, making notes in Excel along the way and inserting SAS-code around the answers as a last step.
The output looks like this:
This output contains an unnecessary number of blank spaces, which seems to confuse SAS's search to the point where the observations can't be properly located.
It works if I manually erase superflous spaces, but that will probably take hours. Is there an automated fix for this, either in SAS or in excel?
I tried using the STRIP-function, to no avail:
else if R_res_ort_txt=strip(" arild ") and R_kom_lan=strip(" skåne ") then R_kommun=strip(" Höganäs " );
If you want to generate a string like:
if R_res_ort_txt="arild" and R_kom_lan="skåne" then R_kommun="Höganäs";
from three variables, let's call them A B C, then just use code like:
string=catx(' ','if R_res_ort_txt=',quote(trim(A))
,'and R_kom_lan=',quote(trim(B))
,'then R_kommun=',quote(trim(C)),';') ;
Or if you are just writing that string to a file just use this PUT statement syntax.
put 'if R_res_ort_txt=' A :$quote. 'and R_kom_lan=' B :$quote.
'then R_kommun=' C :$quote. ';' ;
A saner solution would be to continue using the free-text answers as data and perform your matching criteria for transformations with a left join.
proc import out=answers datafile='my-free-text-answers.xlsx';
data have;
attrib R_res_ort_txt R_kom_lan length=$100;
input R_res_ort_txt ...;
datalines4;
... whatever all those transforms will be performed on...
;;;;
proc sql;
create table want as
select
have.* ,
answers.R_kommun_answer as R_kommun
from
have
left join
answers
on
have.R_res_ort_txt = answers.res_ort_answer
& have.R_kom_lan = abswers.kom_lan_answer
;
I solved this by adding quotes in excel using the flash fill function:
https://www.youtube.com/watch?v=nE65QeDoepc

Converting a string to float in Perl

I'm new to Perl. I am reading a CSV file using Perl. The first column of the CSV is time (which is a float). I've read the CSV and displayed the contents of the CSV successfully. Further, I wish to use the CSV data for some computations. I need the time column as an array (or any data structure). On reading the time column and storing it in an array, it is stored as a string. I wish to have a numeric array for arithmetic computations.
I've tried adding 0, mul 1 and then storing it in the array,using sprintf but i'm encountering errors.
use v5.30.0;
use strict;
use warnings;
my $file = $ARGV[0] or die;
open(my $data, '<',$file) or die;
my #timeArray;
while(my $line = <$data>){
chomp $line;
my #words = split ",",$line;
#my $temp=$words[1]*1;
my $temp=sprintf "%.6f",$words[1];
push #timeArray,$temp;
}
Error:
Argument ""67.891947295"" isn't numeric in multiplication (*) at 3.pl line 12, <$data> line 19556.
and
Argument ""67.840034174"" isn't numeric in sprintf at 3.pl line 13, <$data> line 19555.
Also, why is the argument in "" "" .
It's a good idea to handle data like that with the proper module, because there are several important details that you didn't take care of. Examples:
The columns values may be enclosed in quotes
The first row may contain the header names of each column
The last record in the file may or may not have an ending line break
Etc.
Read the RFC-4180 document for more information.
There are lots of modules that can parse CSV format, for example: Text:CSV. It's very easy to install, and when you use it, your string to double problem will disappear.

Extract string after numerics in SAS

I need to extract the string after the numbers. Although the problem is that the number of digits at the front of the string is inconsistent. What I need is something similar like the Flash Fill in Excel. But I'll be doing it for 100K+ rows so Excel might not be able to handle the data. For example:
12345678aaa#mail.com
12345bbb#mail.com
123456789ccc#mail.com
I want the create another variable with the extracted string such as the following:
aaa#mail.com
bbb#mail.com
ccc#mail.com
Is this possible?
Thank you in advance!
You can use regular expression substitution (PRXCHANGE), or a careful use of the VERIFY function.
Example:
data have;
input email $char25.; datalines;
12345678aaa#mail.com
12345bbb#mail.com
123456789ccc#mail.com
1234567890123456789012345
;
data want;
set have;
mail1 = prxchange('s/^\d+//',-1,email);
if email in: ('0','1','2','3','4','5','6','7','8','9') then
mail2 = substr(email||' ',verify (email||' ', '0123456789'));
run;
Example above should be OK,
but assuming that some email addresses could have numbers, 123abc001#mail.com for instance, my code below should help:
data have;
input email $char25.; datalines;
12345678abc01#mail.com
12345bcde#mail.com
123456789cdefg1#mail.com;
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_HAVE_0003 AS
SELECT t1.email,
/* want */
(substrn(t1.email,INDEXC( t1.email, SUBSTRN(COMPRESS(t1.email, 'abcdefghijklmnopqrstuvwxyz', 'k'), 1, 1))))
AS want
FROM WORK.HAVE t1;
QUIT;
Firstly, we use COMPRESS functionm to leave only char values;
Then SUBSTRN - to have the first character appearing in email address;
After than INDEXC - returns position of the character;
Finally SUBSTRN again - leaves the rest of the email, starting from the position provided from step before.
final look:
[1]: https://i.stack.imgur.com/hFftg.png

How to input string in a Matlab function

I want to write a function that loads a text file and plots its content with time. I have 20 text files so I want to be able to choose from them.
My current not working code:
TextFile is a generic variable
text123.txt is the actual name of one of the files i want to load
function []= PlotText(TextFile)
text(1,:)=load('text123.txt') ;
t=0:10;
plot(t,text)
end
I appreciate any help!!
use importdata instead of load with appropriate delimiter. I assume you used Tab.
filename = 'num.txt';
delimiterIn = '\t';
text = importdata(filename,delimiterIn)
t=1:10;
plot(t,text);
Firstly, you can also use dlmread if your file contains only numeric data separated by the same symbol (called a delimiter) such as a comma (,), semicolon (;), space ( ), or tab ( ). This would look like:
function []= PlotText(TextFile)
text(1,:)=dlmread('text123.txt');
t=0:10;
plot(t,text)
end
Keep in mind that your code is written in a way that expects the contents of text123.txt to have 11 values in a single row. Also, if you are using multiple files, then I suggest having the file name be another input to the function:
function []= PlotText(TextFile,filename)
text(1,:)=load(filename) ;
t=0:10;
plot(t,text)
end

How can I import data from text files into Excel?

I have multiple folders. There are multiple txt files inside these folder. I need to extract data (just a single value: value --->554) from a particular type of txt file in this folder.(individual_values.txt)
No 100 Value 555 level match 0.443 top level 0.443 bottom 4343
There will be many folders with same txt file names but diff value. Can all these values be copyed to excel one below the other.
I have to extract a value from a txt file which i mentioned above. Its a same text file with same name located inside different folders. All i want to do is extract this value from all the text file and paste it in excel or txt one below the other in each row.
Eg: The above is a text file here I have to get the value of 555 and similarly from other diff values.
555
666
666
776
Yes.
(you might want to clarify your question )
Your question isn't very clear, I imagine you want to know how this can be done.
You probably need to write a script that traverses the folders, reads the individual files, parses them for the value you want, and generates a Comma Separated Values (CSV) file. CSV files can easily be imported to Excel.
There are two or three basic methods you can use to get stuff into a Excel Spreadsheet.
You can use OLE wrappers to manipulate Excel.
You can write the file in a binary form
You can use Excel's import methods to take delimited text in as a spreadsheet.
I chose the latter way, because 1) it is the simplest, and 2) your problem is so poorly stated as it does not require a more complex way. The solution below outputs a tab-delimited text file that Excel can easily support.
In Perl:
use IO::File;
my #field_names = split m|/|, 'No/Value/level match/top level/bottom';
#' # <-- catch runaway quote
my $input = IO::File->new( '<data.txt' );
die 'Could not open data.txt for input!' unless $input;
my #data_rows;
while ( my $line = <$input> ) {
my %fields = $line =~ /(level match|top level|bottom|Value|No)\s+(\d+\S*)/g;
push #data_rows, \%fields if exists $fields{Value};
}
$input->close();
my $tab_file = IO::File->new( '>data.tab' );
die 'Could not open data.tab for output!' unless $tab_file;
$tab_file->print( join( "\t", #field_names ), "\n" );
foreach my $data_ref ( #data ) {
$tab_file->print( join( "\t", #$data_ref{#field_names} ), "\n" );
}
$tab_file->close();
NOTE: Excel's text processing is really quite neat. Try opening the text below (replacing the \t with actual tabs) -- or even copying and pasting it:
1\t2\t3\t=SUM(A1:C1)
I chose c#, because i thought it would be fun to use a recursive lambda. This will create the csv file containing matches to the regex pattern.
string root_path = #"c:\Temp\test";
string match_filename = "test.txt";
Func<string,string,StringBuilder, StringBuilder> getdata = null;
getdata = (path,filename,content) => {
Directory.GetFiles(path)
.Where(f=>
Path.GetFileName(f)
.Equals(filename,StringComparison.OrdinalIgnoreCase))
.Select(f=>File.ReadAllText(f))
.Select(c=> Regex.Match(c, #"value[\s\t]*(\d+)",
RegexOptions.IgnoreCase))
.Where(m=>m.Success)
.Select(m=>m.Groups[1].Value)
.ToList()
.ForEach(m=>content.AppendLine(m));
Directory.GetDirectories(path)
.ToList()
.ForEach(d=>getdata(d,filename,content));
return content;
};
File.WriteAllText(
Path.Combine(root_path, "data.csv"),
getdata(root_path, match_filename, new StringBuilder()).ToString());
No.
just making sure you have a 50/50 chance of getting the right answer
(assuming it was a question answerable by Yes and No) hehehe
File_not_found
Gotta have all three binary states for the response.

Resources