Read certain lines of data from text files in SAS - text

I have a text file c:\test.txt with lot of information and I need only a few details from the file. Here is how my data looks
an army of ants
bchskkkk/kk/kl
id: intyst#abc.com
subject: this is an email
to xyz
kkdkdlkadkadk;kd;
jjdjsjdlasjdaljdljd
<st> This is my actual content
klfjaakjalkjflajflajefljalkfj
daklkajflkjalfkjaljflkajfkl
kkdlkal;dka;ldk <st>
In the above the rows starting with id, subject, and a few rows with start and end as are what I need in my dataset
This is what I have tried
filename data 'c:\test.txt';
data want;
infile data lrecl=1000 missover;
input #3 $ id 3-25 #4 $ sub1 10-25 #8 $ cmc 4-55
run;
The above doesn't solve my purpose. I have some 10k lines in each file with the above format and the text between and can be more than 10 rows.
Is there any better way of solving this?
Thank you

You cannot grab the data using a single input statement.
You will need to do 'landmark' detection in order to discover and extract the data portions you want, all the while retaining the parts you find as you scan the file.
Presuming id: is always a new data set row indicator and always present as the final part for a row, the following could work (not tested):
data messages;
length id $50;
length subject $100;
length message $1000;
retain id subject message;
infile data lrecl=1000 _infile_=line;
input;
if line =: "id" then do;
id_line_number = _n_;
id = substr(line,length("id:")+1);
subject = "";
content = "";
end;
if subject = "" and line =: "subject:" then do;
subject = substr(line,length("subject:")+1);
end;
if message = "" then do;
if line =: "<st>" then do;
* initial line in <st> block;
message = line;
end;
end;
else do;
* accumulate lines within <st> block;
message = trim(message) || ' ' line;
end;
* termination of <st> block, triggers a complete record and output to data set;
if length(message)>4 and substr(message,length(message)-3) = "<st>" then do;
message = substr(message,5,length(message)-8);
output;
message = "";
end;
run;
Some additional coding would be needed if the subject can be wrapped and is continued in subsequent adjacent lines as indented content

Related

Batch Bookmark & Extraction of Bookmarked lines into MULTIPLE new file in EmEditor

I will like to do a follow up question to this question here : How to search and bookmark multiple strings at once in emeditor?
However, #Yutaka gave this response: https://i.stack.imgur.com/HLYoo.png and that is my follow up question.
How can each bookmarked strings saved differently or opened differently in emeditor when use the "extract bookmark lines to new file".
I will appreciate your response.
I am expecting the result of each bookmark strings to open in different files in em editor.
Batch Extract
The Batch Extract button in the Batch Find dialog will output all matched lines to a file, with labels indicating the entry that the lines matched with.
Example output:
===== a =====
a
apple
===== b =====
b
Macro
This macro finds the given array of strings with the extract option, and it saves each result to separate files. Make sure to change the marked lines.
var filePath = document.FullName;
var strings = ["a", "b", "e"]; // Change these
for (var i = 0; i < strings.length; i++) {
editor.OpenFile(filePath);
document.selection.Find(strings[i], eeFindSelectAll | eeFindExtract, 0);
document.Save("%HOMEPATH%\\Downloads\\" + strings[i] + ".txt"); // Change path
}
I tested it on this file. Make sure to save it before running the macro.
a
b
c
d
apple

PyParsing: using SkipTo(), labeled data and possibly a Forward()

I am trying to parse an input file given the following format.
file = "Begin 'big section header'
#... section contents ...
sub 1: value
sub 2: value
....
Begin 'interior section header'
....
End 'interior section header'
End 'big section header'"
to return a list that greedily grabs everything between the labeled section header value
['section header', ['section contents']]
my current attempt looks like this
import pyparsing as pp
begin = pp.Keyword('Begin')
header = pp.Word(pp.alphanums+'_')
end = pp.Keyword('End')
content = begin.suppress() + header + pp.SkipTo(end + header)
content.searchString(file).asList()
returns
['section header', ['section contents terminated at the first end and generic header found']]
i suspect my grammar needs to be changed to some form of
begin = pp.Keyword('Begin')
header = pp.Word(pp.alphanums+'_')
placeholder = pp.Forward()
end = pp.Keyword('End')
placeholder << begin.suppress() + header
content = placeholder + pp.SkipTo(end + header)
but I cant for the life of me figure out the correct assignment to the Forward object that doesn't give me what I already have.
Even easier than Forward in this case would be to use matchPreviousLiteral:
content = begin.suppress() + header + pp.SkipTo(end + matchPreviousLiteral(header))
You are matching any end, but what you want is the end that matches the previous begin.

Split flat file into multiple files

I need to create a package which splits the huge flat file into multiple flat files.
I have flat file which has 20 million rows and now I need to split this flat file (Each flat file needs to have 55 k rows)
Example:- if there are 111 rows in total, I would have to create 3 files.
file1.txt will have 1-55 rows
file2.txt will have 55-110 rows
file3.txt will have 1 rows
.
What options do I have?
I am using Visual Studio 2012 for this project.
you could try something like this... its pretty rudimentary and I am sure that someone will point out that it is not going to be the most efficient thing but its a solid option. Note that you will need to add some try catch error handling.
int recper = 0; // this is where you will assign the number of records per file
int reccount = 0;
int filecount = 1;
string filename = "testfilename";
string networkDirectory = #"c:\fakepath\";
string fileToRead = #"c:\fakepath\textfile.txt";
using (StreamReader reader = new StreamReader(fileToRead,Encoding.Default,true))
{
while (reader.Peek() > 0)
{
using (StreamWriter writer = new StreamWriter(Path.Combine(networkDirectory, filename + filecount + ".txt"), true, Encoding.Default))
{
writer.Write(reader.ReadLine());
}
reccount++;
// checks on each iteration of the while loop to see if the
// current record count matches the number of records per file
// if sso reset reccount and change increment filecount to change the file name
if (reccount == recper)
{
reccount = 0;
filecount++;
}
}
}
Another way you can do this is in a dataflow:
First use your method of choice to add a "Row Number" column to your data flow (unless there is already one in your flat file output, in which case skip this step and use that):
https://www.google.com/search?sourceid=navclient&aq=&oq=add+rownumber+column+to+ssis+dataflow&ie=UTF-8&rlz=1T4GGNI_enUS551US551&q=add+rownumber+column+to+ssis+dataflow&gs_l=hp....0.0.0.6218...........0._Qm62-0x_YQ
Then add a MultiCast transformation to your dataflow, and use the row number to split the stream and send it to different destinations:
Row 1 - 55k, -> File1
Row 55001 - 110k -> File2
etc.

My Excel file of CSV data from Google Apps Script doesn't mirror my Google Spreadsheet

I'm trying to write a script that passes information from a Google Spreadsheet, compiles it into a CSV file and emails that file.
My problem: The CSV file on my Excel file looks very different that of my Google Spreadsheet (Dead link).
This is what my Excel file looks like, pasted into another Google Spreadsheet.
The code I am using is below:
function myFunction() {
//get active sheet, the last row where data has been entered, define the range and use that range to get the values (called data)
var sheet = SpreadsheetApp.getActiveSheet();
var lastRow=sheet.getLastRow();
var range = sheet.getRange(1,1,lastRow,91);
var data = range.getValues();
//define a string called csv
var csv = "";
//run for loop through the data and join the values together separated by a comma
for (var i = 0; i < data.length; ++i) {
csv += data[i].join(",") + "\r\n";
}
var csvFiles = [{fileName:"example.csv", content:csv}];
MailApp.sendEmail(Session.getUser().getEmail(), "New Journey Information", "", {attachments: csvFiles});
}
You need to ensure that individual cells' data is atomic. For instance, what you see as a time on the sheet contains a Date object when read by your script, then when that's written to a CSV it may be converted to a date & time string with commas, depending on your locale. (Jan 4, 2013 14:34 for example.) To be safe with punctuation that may be interpreted as delimiters by Excel, you should enclose each element with quotes.
No need to modify your code, as this problem has been solved in one of the examples provided in the Docslist Tutorial. So an easy solution is to replace your code.
Change the first bit of the provided saveAsCSV() as follows, and it will operate either with user input or by passing a filename as a parameter.
function saveAsCSV(filename) {
// Prompts the user for the file name, if filename parameter not provided
fileName = filename || Browser.inputBox("Save CSV file as (e.g. myCSVFile):");
...

Formatting strings in a text field in Crystal Reports XI

Good morning!
I'm hoping someone can help me with what is probably a simple question but I just cant get things to work the way I would like. I am currently working with a report that has a "catch-all" text field. In this field the user can input anything they want, and on occasion, will input information in a column format (ex. 1.1). This field allows carriage returns which are used to create "rows" in this field. The problem is that the user wants the "columns" in this field to line up on the report without having to count spaces between the columns when the information is entered (ex. 1.2). The problem is that even when this type of information is entered there is no set protocol or formatting guidelines. There may be multiple "subtitles", rows, rows separated by subtitles, etc.. The carriage return (Chr(10)) at the end of each line (or beginning of each new line) is the only thing that can be relied on consistantly.
I am currently trying to seperate each individual row, format each as desired, and put it back together like so:
Dim output As String
Dim sections as String
Dim returnCount as Int
Dim leftText as String
Dim rightText as String
Dim sectionTogether as String
Dim totalText as String
Dim textLength as Int
output = {table.textfield}
sections = ExtractString(output, Chr(10), Chr(10))
If Instr(sections," ") > 0 Then
leftText = Left(sections, Instr(sections, " "))
textLength = Length(Left(sections, Instr(sections, " "))
rightText = Right(sections, Instr(sections, " "))
Replace(sections," "," ")
sectionTogether = rightText + (Space(20) - (textLength - 3)) + leftText
totalText = totalText + sectionTogether
formula = totalText
else
formula = output
This is the gist of what I'm trying to do. A couple of notes:
1) I know I am missing a loop of some kind to format every section but I dont know how to set that up in crystal
2) I have VB programming experience, but I am noob in crystal and its limited tools so I feel hamstringed and I'm having trouble finding the methods and tools I would use in Visual Studio
3) My syntax may also be off in a few places because I am still learning how to set this up and I REALLY miss a debugger.
I hope someone can help me, I have been researching for over a week and it feels like I'm just beating my head against a wall.
Thank you in advance.
The output examples
ex. 1.1
"Your current charges are:
Jan 12.89
Feb 117.44
Mar 15.02
Apr 4.17"
ex. 1.2
"Your current charges are:
Jan 12.89
Feb 117.44
Mar 15.02
Apr 4.17"
The first thing you're going to want to do is split up your rows via split(), like this:
local stringvar array wallOText := split({table.textfield},chr(10));
This will give you an array of strings where each array entry is a "row". Now you can loop over your rows with the following:
for i := 1 to ubound(wallOText) step 1 do <some stuff to wallOText[i]>
Now, getting the columns right-justified is a little trickier, but here's some code to get you started. You can adjust the column widths to whatever you may need (in this case, 20 spaces). Also note, you have to use a fixed-width font.
local stringvar output;
local stringvar array row;
local numbervar i;
local numbervar j;
local stringvar array wallOText := split({#some text},chr(10));
local stringvar elem;
for i := 1 to ubound(wallOText) do //loop over lines
(row := split(wallOText[i]," ");
for j := 1 to ubound(row) do //loop over words
(elem := trim(row[j]); //get current element and cut white space
if not(elem="") then
output := output + space(20-len(elem)) + elem); //build output string
output := output + chr(10));
output

Resources