SAS - Execute Find and Replace in an Excel File via DDE - excel

I am trying to write a SAS program to do a find and replace in an excel file via DDE. Specifically, I am trying to search the header row for spaces between the strings (i.e., " ") and replace them with no spaces (i.e., "").
For example, if I have a cell that contains "Test Name" I want to do a find and replace to make it "TestName".
This is what I have:
options noxwait noxsync;
/* Open Excel */
x '"C:\Program Files (x86)\Microsoft Office\Office14\excel.exe"';
filename cmds dde 'excel|system';
data _null_;
x=sleep(5);
run;
/* Open File and Manipulate*/
data _null_;
file cmds;
put '[open("C:\filename.xls")]';
put '[select("R1")]';
put '[formula.replace(" ","",1,,false,false)]';
run;
/*Close File*/
data _null_;
file cmds;
put '[FILE-CLOSE("C:\filename.xls")]';
put '[QUIT()]';
run;
The find and replace function is not working. I get the following in my log after it reads that statement:
NOTE: The file CMDS is:
DDE Session,
SESSION=excel|system,RECFM=V,LRECL=256
ERROR: DDE session not ready.
FATAL: Unrecoverable I/O error detected in the execution of the DATA step program.
Aborted during the EXECUTION phase.
NOTE: 2 records were written to the file CMDS.
The minimum record length was 21.
The maximum record length was 70.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.63 seconds
cpu time 0.00 seconds
Any suggestions?
Also, does anyone know what the parameters in the formula.replace statement are? I just know the first and second are what you want to find and what you want to replace it with. I am struggling to find any documentation.

http://www.lexjansen.com/wuss/2007/ApplicationsDevelopment/APP_SmithC_ImportingExcelFiles.pdf
If you enter the path or filename incorrectly, Excel will fail to open your workbook. You will get an error message in the SAS log that looks like the following:
NOTE: The file DDECMD is:
DDE Session,
SESSION=excel|system,RECFM=V,LRECL=256
ERROR: DDE session not ready.
FATAL: Unrecoverable I/O error detected in the execution of the data step program.
Aborted during the EXECUTION phase.
NOTE: 0 records were written to the file DDECMD.
NOTE: The SAS System stopped processing this step because of errors.

My suggestion is to write an excel (VBA) macro to do this, then call that macro; that gives you not only a far superior IDE to write your code in, to get contextual clues and such, but also gives you a lot more control. You can call macros from separate workbooks, so even if this is something you have to do repeatedly (I assume so), you can put this in a 'macro' template workbook that you open up separately from the automated file.

Related

Unable to copy file from SFTP in Azure Data Factory when using wildcard(*) in the filename

I am unable to copy csv files from an SFTP connection to blob storage when using the wildcard(*) in the filename.
More specifically, I receive csv files in the SFTP on a daily basis, and they are of the format: "ddMMyyyyxxxxxx.csv", where "xxxxxx" is the timestamp. More concretely, my csv file for the 13th of March is: "13032019083647.csv", while for the 14th of March: "14032019083556.csv". Obviously, the timestamp is different for every day, thus I want to copy the file independently of whatever strings exists between the date and the the file extenstion.
In the "File" subfield of the "File path" of the "Connection" tab of my subset, I give as input: "13032019*.csv", as instructed by the help icon next to the field:
When I do so, my Debug run fails with:
{"errorCode": "2200", "message":
"ErrorCode=UserErrorInvalidCopyBehaviorBlobNameNotAllowedWithPreserveOrFlattenHierarchy,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Cannot
adopt copy behavior PreserveHierarchy when copying from folder to a
single file.,Source=Microsoft.DataTransfer.ClientLibrary}
I receive a similar error no matter which type of copy behaviour I choose. I have also tried experimenting with the fileFilter parameter (even though ADF warns that the same behaviour can be achieved with the fileName option), but I still end up getting the same error.
For further clarification, I am attaching the Code segment that ADF produces for this configuration:
I should also mention, that when using the full fileName in the corresponding field, namely the value: "13032019083647.csv", copying works normally.
Any help would be greatly appreciated!
My guess it might get two files with wildcard operation.
In such cases we need to use metadata activity, filter activity and for-each activity to copy these files.
1.Metadata activity : Use data-set in these activity to point the particular location of the files and pass the child Items as the parameter.
2.Filter activity : Use filter to filter the files based on your needs.
3.For-each activity : In the For-each activity get Items from the previous activity and add copy activity inside the for-each.
In copy activity the source data set should be #item().name.
I hope this will solve your issue.
What worked for me was the following: I kept the same regex for the input file, but I defined as "Copy behaviour: Merge Files". Since as mentioned, there is only 1 file that satisfies the regex condition, only 1 file was created as output. I am aware that this is a sort of "dirty" solution, but it did the trick for me.

matlab error when using xlsread in a loop

I'm trying to read multiple csv files, in a loop, and then perform some analysis on all o them.
I'm using MatlabR2015b and Excel 2016.
the problem is that at the second call to xlsread I get the following error:
>>xlsread('R:\Experiments\ResoFreq_vis_BEH\TapFlick_vis_BEH\Data\s01_rr\1_fingerTapping_s01_rr.csv')
Error using xlsread (line 251)
No explanation no message, nothing.
after some debugging I've found it fails at the following command:
Excel.workbooks.Open(filename, 0, readOnly);
in the openExcelWorkbook.m file which is somewhere down the stack of xlsread.
I found very few people with the same problem and their solution was to force the EXCEL32 process to close using the following code:
[~, computer] = system('hostname');
[~, user] = system('whoami');
[~, alltask] = system(['tasklist /S ', computer, ' /U ', user]);
excelPID = regexp(alltask, 'EXCEL.EXE\s*(\d+)\s', 'tokens')
for i = 1 : length(excelPID)
killPID = cell2mat(excelPID{i});
system(['taskkill /f /pid ', killPID]);
end
However, this does not work for me.
after somemore digging I tried to manually look at the csv im trying to open, while debugging, meaning after stopping at the breakpoint at the Excel.workbooks.Open call, I used:
actxserver('Excel.Application')
ans.Workbooks.Open(filename)
which gave me the following error:
Error using Interface.000208DB_0000_0000_C000_000000000046/Open
Which is associated to Workbooks when looking at the excel process through the matlab inspector.
That is all the information I've managed to find related to my problem.
the only thing that works for me at the moment is running xlsread, then manually closing the excel process from the task manager, then running it again, until I have all my data, and then analyze, which is not a possible considering the amount of files I need to load.
I cannot use csvread as my files have mixed types, and every other function i've tried does not read the csvs properly
(I have a field which looks like this "[,...,]" and that field keeps getting interpreted as multiple rows in every function except with xlsread)
and thus I feel like I have no option but to fix xlsread somehow.
I would gladly provide anymore information that is necessary to solve this
thanks.
you should use csvread instead of xlsread, becuase xlsread just read .xls and .xlsx files.

Excel in SSIS: How to import a column that may have more than 255 characters when DT_NTEXT causes failures?

OK, so my latest project requires loading an Excel 2007 spreadsheet into a SQL Server table. I'm working in SSIS 2008R2. Based on some stuff I found on the internet, I opened the Excel source in Advanced editor and changed the datatype of the long column to DT_NTEXT, so that it wouldn't truncate it. Then I made the database column VARCHAR(MAX). This runs correctly in debug mode on my laptop.
Then I deployed it to the development server and attempted to load the same test file. It failed with the following error messages:
Error: Code: 0xC0208265
Source: Main Data Flow Task Get Main Data [1]
Description: Failed to retrieve long data for column "DESCR".
End Error
Error: Code: 0xC020901C
Source: Main Data Flow Task Get Main Data [1]
Description: There was an error with output column "DESCR" (72) on output "Excel Source Output" (9). The column status returned was: "DBSTATUS_UNAVAILABLE".
End Error
Error: Code: 0xC0209029
Source: Main Data Flow Task Get Main Data [1]
Description: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "DESCR" (72)" failed because error code 0xC0209071 occurred, and the error row disposition on "output column "DESCR" (72)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
End Error
Searching for information about the error, I found about a million sites offering the same three suggested solutions:
Add 'IMEX=1' to the extended properties of the connection string.
It was already there.
Change the TypeGuessRows key in the registry.
This was set to zero on the server, which I understand to mean that it should look at the entire file. Nevertheless, I changed it to 8 to match my laptop. The same error occurred when I ran it again. Then I changed it to 1,763, which is more than the number of rows in the spreadsheet. It still gave the same error. So, I put it back to zero. (There's a 1,900-character value in the first row of my test file, so it shouldn't really matter how many it checks, in this case.)
Change the datatype to DT_WSTR(4000) in the source.
The column is supposed to have up to 10,000 characters, so I'm not sure this would be a good idea even if it worked. However, I tried it anyway. This time it gave me a truncation error. I changed the truncation error disposition to "ignore failure" and it loaded the data, but truncated the value to 255 characters. I have verified that the length is 4000 and doesn't get changed when I save the file, but it's still truncating at 255 characters.
I have no idea what else to look at. Any help would be appreciated.
UPDATE 1/29: The package, without any changes, works correctly when running on the pre-production server. It still fails when running on the development server. Both servers have the same version of SSIS (including minor version numbers) as well as the same versions of Windows, Access and Excel. I do not know how to explain this, nor do I know how to tell if it would work in production.
I created a new package with similar non-functional requirements (Excel 2007 file, SSIS 2008, SQL Server 2008 R2, VARCHAR(MAX) target column) and it worked just fine after deployment into the database server. My package:
Metadata at the Excel Source component's output (checked using Advanced Editor): DT_NTEXT
Derived Column component between source and destination to cast to non-unicode from unicode using (DT_TEXT,1252)
Metadata at the OLE DB Destination component's input (checked using Advanced Editor): DT_TEXT
Target Column data type: VARCHAR(MAX)
I do not explicitly use the extended property IMEX in the connection
Executed by right-clicking on the package at the database server, and loaded a file with a few thousand characters per record into the table without truncation. Hope this helps
I have faced this issue while importing an excel file with a field containing more than 255 characters. I solved the issue using Python.
Simply, import the excel in a pandas data frame and then calculate the length of each of those string values per row.
Then, sort the dataframe in descending order. This will enable SSIS to allocate maximum space for that field as it scans the first 3 rows to allocate storage:
df = pd.read_excel(f,sheet_name=0,skiprows = 1)
df = df.drop(df.columns[[0]], axis = 1)
df['length'] = df['Item Description'].str.len()
df.sort_values('length', ascending=False, inplace=True)
writer = ExcelWriter('Clean/Cleaned_'+f[5:])
df.to_excel(writer,sheet_name='Billing',index=False)
writer.save()

Proc Data sets argument error- Error 22-322 expecting a name

I'm not sure how to use proc datasets statement. Here is the error and code attached as a picture.
I just don't know what it wants when it says error 22-322 expecting a name. A simple example or solution would be great thanks.
There are several issues with your syntax:
Proc datasets expects a library name, but you've given it a dataset name. Try using library = work;.
In conjunction with the above, you need to add the line modify passengers; before the format statement so that proc datasets knows which dataset to modify. Otherwise, it will run without errors, but it won't apply the format.
You need a quit; after the run; when using proc datasets, as mentioned in your log output. This is because a proc datasets call can contain multiple run; groups, so you need to indicate that you've got to the last one.
You also have the option to put the format statement somewhere else, which would avoid having to use proc datasets at all:
The data step where you're creating work.passengers, or
The proc print where you're viewing it, if you don't want to apply the format permanently.

Collecting return code and stdout string from running SAS program in Linux KornShell script

Some developers and I are using KornShell (ksh) to run SAS programs in a Linux environment. The script invokes a SAS command line and I wish to collect the stdout from the SAS execution (a string defined and written by SAS) as well as the Linux return code (0/1).
My Code (collects stdout into envar, but return_code is always 0 because the envar assignment was successful):
envar=$(./sas XXXX/filename.sas -log $LOG_FILE)
return_code=$?
Is there a way to collect both the return code and the std out without having to submit this command twice?
SAS does not write anything to STDOUT when it is run as a non-interactive process. The log file contains the record of statements executed and step statistics; "printed" output (such as from proc print) is written to a "listing" file. By default, that file will be created using the name of your source file appended with ".lst" (in your case, filename.lst).
You are providing a file to accept the log output using the -log system option. The related option to define the listing file is the -print option. Of course, if the program does not create any listing output, such an option isn't needed.
And as you've discovered, the value returned by $? is the execution return code from SAS. Any non-zero value will indicate some sort of error occurred during program execution.
If you want to influence the return code, you can use the ABORT data step statement in your SAS program. That will immediately halt the SAS program as set the return code to something meaningful to you. For example, suppose you want to terminate further processing if a particular PROC SQL step fails:
data _null_;
rc = symgetn('SQLRC');
put rc=;
if rc > 0 then ABORT RETURN 10;
run;
This would set the return code to 10 and you could use your outer script to send an email to the appropriate person. Such a custom return code value must be greater than 6 and less than 976; other values are reserved for SAS. Here is the SAS doc link.

Resources