Oracle Table to SAS Dataset - linux

I am facing a problem in converting a large Oracle table to a SAS dataset. I did this earlier and the method worked. However, this time, it is giving me the following error messages.
SAS code:
option compress = yes;
libname sasdata ".";
libname myora oracle user=scott password=tiger path=XYZDATA ;
data sasdata.expt_tabl;
set myora.expt_tabl;
run;
Log file:
You are running SAS 9. Some SAS 8 files will be automatically converted
by the V9 engine; others are incompatible. Please see
http://support.sas.com/rnd/migration/planning/platform/64bit.html
PROC MIGRATE will preserve current SAS file attributes and is
recommended for converting all your SAS libraries from any
SAS 8 release to SAS 9. For details and examples, please see
http://support.sas.com/rnd/migration/index.html
This message is contained in the SAS news file, and is presented upon
initialization. Edit the file "news" in the "misc/base" directory to
display site-specific news and information in the program log.
The command line option "-nonews" will prevent this display.
NOTE: SAS initialization used:
real time 1.63 seconds
cpu time 0.03 seconds
1 option compress = yes;
2 libname sasdata ".";
NOTE: Libref SASDATA was successfully assigned as follows:
Engine: V9
Physical Name: /******/dibyendu
3 libname myora oracle user=scott password=XXXXXXXXXX path=XYZDATA ;
NOTE: Libref MYORA was successfully assigned as follows:
Engine: ORACLE
Physical Name: XYZDATA
4 data sasdata.expt_tabl;
5 set myora.expt_tabl;
6 run;
NOTE: There were 6422133 observations read from the data set MYORA.EXPT_TABL.DATA.
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
ERROR: File SASDATA.EXPT_TABL.DATA is damaged. I/O processing did not complete.
NOTE: The data set SASDATA.EXPT_TABL.DATA has 6422133 observations and 49 variables.
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
ERROR: Expecting page 1, got page -1 instead.
2 The SAS System 21:40 Monday, April 1, 2013
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
ERROR: Expecting page 1, got page -1 instead.
ERROR: Page validation error while reading SASDATA.EXPT_TABL.DATA.
NOTE: Compressing data set SASDATA.EXPT_TABL.DATA decreased size by 78.88 percent.
Compressed is 37681 pages; un-compressed would require 178393 pages.
ERROR: File SASDATA.EXPT_TABL.DATA is damaged. I/O processing did not complete.
NOTE: SAS set option OBS=0 and will continue to check statements. This might cause NOTE: No observations in data set.
NOTE: DATA statement used (Total process time):
real time 8:55.98
cpu time 1:39.33
7
ERROR: Errors printed on pages 1,2.
NOTE: SAS Institute Inc., SAS Campus Drive, Cary, NC USA 27513-2414
NOTE: The SAS System used:
real time 8:58.67
cpu time 1:39.40
This is running on a RH Linux Server.
Any suggestion will be appreciated.
Thanks and regards,

This sounds like a space issue on your server. How large is the file system in your default directory (from your libname sasdata '.'; statement)? Use the data set option obs=1 on your Oracle table reference to create a new SAS dataset with one row and inspect the variables.
data sasdata.dummy_test;
set myora.expt_tabl(obs=1);
run;
Perhaps there are extremely large VARCHAR or BLOB columns that are consuming too much space. Remember that SAS does not have a VARCHAR type.

Though I am not totally sure, I believe the main issue was that I was initially trying to create/write the dataset in a directory, which was restricted in some (?) sense. This was indirectly causing trouble, since the dataset created was defective. When I created it elsewhere, it was okay.
Thanks and regards,
Dibyendu

Related

Load Azure Blob CSV in SSIS

I am trying to load a Blob CSV into SSIS in a Flat File destination.
You can see in my source editor that columns are appearing (So its reading my CSV blob file)
When I run SSIS, it returns me these errors:
[Azure Blob Source] Error: The remote server returned an error: (400) Bad Request.
Warning: SSIS Warning Code DTS_W_MAXIMUMERRORCOUNTREACHED. The Execution method succeeded, but the number of errors raised (2) reached the maximum allowed (1); resulting in failure. This occurs when the number of errors reaches the number specified in MaximumErrorCount. Change the MaximumErrorCount or fix the errors.
What I am doing wrong here?

Google Colaboratory : OSError: [Errno 5] Input/output error

I am using Google Colaboratory, and mounting Google Drive. When I access a csv file, it gets me the following error:
OSError: [Errno 5] Input/output error.
This did not happen before.
How can I access to the csv file as I used to?
I have tried this, but did not work:
Input/output error while using google colab with google drive
This happened after conducting the following code.
for segment_id in tqdm(range(segment_num)):
with h5py.File(os.path.join(INPUT_PATH, "train.h5"), "r") as f:
train_answers.append(f['time_to_failure'][segment_id*segment_interval + SEGMENT_LENGTH])
The tqdm bar progressed until 37%, and than gave the following error.
OSError: Unable to open file (file read failed: time = Thu May 2 14:14:09 2019
, filename = './drive/My Drive/Kaggle/LANL-Earthquake-Prediction/input/train.h5', file descriptor = 74, errno = 5, error message = 'Input/output error', buf = 0x7ffc31926d00, total read size = 8, bytes this sub-read = 8, bytes actually read = 18446744073709551615, offset = 0)
Since then, large files like train.csv(9GB), which is on Google Drive cannot be read from Google Colaboratory. It gives the following error.
OSError: [Errno 5] Input/output error
Does anyone have a same problem?
Does anyone know how to solve this?
There are quota set by Google which are not necessary shown while using Colab. I have run in the same problem. Basically, once the limit is passed you get the [Errno 5] Input/output error independent on the file or the operation you were doing.
The problem seems to be solved since I asked to increase the quota regarding storage (limited to 1 TB total per we).
You access the quota page by visiting this page and clicking on quota:
https://cloud.google.com/docs/quota
If you don't ask to increase the quota, you might have to wait for 7-14 days until your usage is set back to 0 and can use the full quota.
I hope this helps!
I've encounter the same error (during too intensive testing of transfer learning). According to Google the reason may be in too many I/O operations with small files or due to shared and more intensively used resources - every reason related to usage of Google drive. Mostly after 1 day the quota should be refreshed.
You may also try another solution (for impatient users like me) - copy your resources (in my case a zipped folder data containing folders train and validation with images) as a zip file to your Google drive and then unzip it directly into Colab VM by use of:
!unzip -qq '/content/grive/My Drive/CNN/Datafiles/data.zip'
You can then access the data from folder /content/data/... (and say Goodbye to the I/O Error ;) )

Spreadsheet::WriteExcel set_optimization() generates unlink errors

We have been using Spreadsheet::WriteExcel since a long time and this was working like a charm.
Few years ago we migrated to Excel-Writer-XLSX which uses 5 times more memory than WriteExcel as stated in the documentation.
Thanks to XLSX, users are now able to generate larger Excel files.
We started since few weeks to face memory usage issues where the about 84% of the server memory was needed.
The same documentation states that $workbook->set_optimization() should solve the problem. The given performance figures are promising.
We tried to use $workbook->set_optimization() on a sample file but this did not work. It generates an unlink error.
If set_optimization() is removed, the Excel file is generated properly.
The example is provided by the author in this thread :
#!/usr/bin/perl -w
use strict;
use Excel::Writer::XLSX;
my $workbook = Excel::Writer::XLSX->new('test.xlsx');
$workbook->set_optimization();
my $worksheet = $workbook->add_worksheet();
my #header_values = ( 1, 2, 3, 'foo', 'bar', 6, 7 );
my $header_cnt = 0;
for my $header_cell (#header_values){
$worksheet->write(0, $header_cnt, $header_cell);
$header_cnt++;
}
$workbook->close();
Error unlinking file /opt/.../rKhGTRYWSJ using unlink0 at /usr/local/share/perl5/Excel/Writer/XLSX/Worksheet.pm line 204
(in cleanup) Error unlinking file /opt/.../iGr8Qo8VBD using unlink0 at /usr/local/share/perl5/Excel/Writer/XLSX/Worksheet.pm line 204
we are running:
Excel-Writer-XLSX 0.70
perl v5.10.1
Red Hat Enterprise Linux Server release 6.8 (Santiago)
Any help would be appreciated.
Excel::Writer::XLSX is used to write large amount of data in XLSX and to handle large data and to reduce memory usage set_optimization() method is used.
In XLSX file, a workbook can have maximum of 10,48,576 rows and 16,384 columns can be created and if the rows count exceeds maximum limit a new sheet can be created in the same workbook and in such a way the large amount of data can be handled.
Refer "Write_largeData_XLSX.pl" from this link https://github.com/AarthiRT/Excel_Writer_XLSX for more details.

How to resolve Vsam File status error code 93?

When I am trying to access a Vsam Sequential dataset(which is also opened in CICS) from batch, I use EXTEND mode to open the file and append some data to it.
Earlier it was working fine. All of a sudden , it is not working now and I am getting File status : 93 error code which means "Resource not available".
OPEN EXTEND <filename>
Foe KSDS datasets I have used EXCI(external CICS Interface) calls to access from batch even though it was opened in Online.
But I do not know how to do the same for ESDS.
Could someone help me to resolve this error.

ParseExceptions when using HQL file on HDInsight

I'm following this tutorial http://azure.microsoft.com/en-us/documentation/articles/hdinsight-use-hive/ but have become stuck when changing the source of the query to use a file.
It all works happily when using New-AzureHDInsightHiveJobDefinition -Query $queryString but when I try New-AzureHDInsightHiveJobDefinition -File "/example.hql" with example.hql stored in the "root" of the blob container I get ExitCode 40000 and the following in standarderror:
Logging initialized using configuration in file:/C:/apps/dist/hive-0.11.0.1.3.7.1-01293/conf/hive-log4j.properties
FAILED: ParseException line 1:0 character 'Ã?' not supported here
line 1:1 character '»' not supported here
line 1:2 character '¿' not supported here
Even when I deliberately misspell the hql filename the above error is still generated along with the expected file not found error so it's not the content of the hql that's causing the error.
I have not been able to find the hive-log4j.properties in the blob store to see if it's corrupt, I have torn down the HDInsight cluster and deleted the associated blob store and started again but ended up with the same result.
Would really appreciate some help!
I am able to induce a similar error by putting a Utf-8 or Unicode encoded .hql file into blob storage and attempting to run it. Try saving your example.hql file as 'ANSI' in Notepad (Open, the Save As and the encoding option is at the bottom of the dialog) and then copy it to blob storage and try again.
If the file is not found on Start-AzureHDInsightJob, then that cmdlet errors out and does not return a new AzureHDInsightJob object. If you had a previous instance of the result saved, then the subsequent Wait-AzureHDInsightJob and Get-AzureHDInsightJobOutput would be referring to a previous run, giving the illusion of the same error for the not found case. That error should definitely indicate a problem reading an UTF-8 or Unicode file when one is not expected.

Resources