SAS and Oracle CLOB unwanted conversion - string

We have some SAS programs and they used to run fine and our old Oracle database.
We've migrated to a new Oracle DB on Amazon RDS (I don't know if it's relevant) and a new SAS Server instance (from 9.1 to 9.3, don't know if relevant).
When running our programs, we are constantly facing the issue when string types, when uploaded to Oracle with a proc sql or data step, are randomly (?) converted to CLOB or LOB.
Our strings do not exceed the maximum length authorized by Oracle (for the varchar type), they are pretty short actual, but still, the data gets uploaded to CLOB. That affects our whole process for reading the data.
We have found this workaround but I'm not a fan:
data oracledb.new_data;
length REGION $ 50;
format REGION $char50.;
set old_data;
run;
The fact is, many, many string columns get randomly converted to CLOB
Do you know how to solve that issue? Does it come from the Oracle side (I doubt it) or from the SAS side (but what has changed?)
Thanks for your help,
I hope I have provided you with enough information

Ok found the error.
While transforming the data, if you don't specify the output type, the SAS datasets can automatically inherit long, very long formats.
Here is the fix:
http://support.sas.com/kb/24/804.html
This will reduce the length to the minimum without losing data. When uploading to Oracle, the formats will be adapted.
NB: I just reduced the size of my dataset by 80%. I recommend this macro to everyone.
Follow up question (please answer in comments):
This changes only the format, not the informat. Is that an issue, can it affect performance?
Thanks!

Related

How does Cassandra store variable data types like text

assumption is, Cassandra will store fixed length data in column family. like a column family: id(bigint), age(int), description(text), picture(blob). Now description and picture have no limit. How does it store that? Does Cassandra externalize through an ID -> location way?
For example, looks like, in relational databases, a pointer is used to point to the actual location of large texts. See how it is done
Also, looks like, in mysql, it is recommended to use char instead of varchar for better performance. I guess simply because, there is no need for an "id lookup". See: mysql char vs varchar
enter code here
`
Cassandra stores individual cells (column values) in its on-disk files ("sstables") as a 32-bit length followed by the data bytes. So string values do not need to have a fixed size, nor are stored as pointers to other locations - the complete string appears as-is inside the data file.
The 32-bit length limit means that each "text" or "blob" value is limited to 2GB in length, but in practice, you shouldn't use anything even close to that - with Cassandra documentation suggesting you shouldn't use more than 1MB. There are several problems with having very large values:
Because values are not stored as pointers to some other storage, but rather stored inline in the sttable files, these large strings get copied around every time sstable files get rewritten, namely during compaction. It would be more efficient to keep the huge string on disk in a separate files and just copy around pointers to it - but Cassandra doesn't do this.
The Cassandra query language (CQL) does not have any mechanism for store or retrieving a partial cell. So if you have a 2GB string, you have to retrieve it entirely - there is no way to "page" through it, nor a way to write it incrementally.
In Scylla, large cells will result in large latency spikes because Scylla will handle the very large cell atomically and not context-switch to do other work. In Cassandra this problem will be less pronounced but will still likely cause problems (the thread stuck on the large cell will monopolize the CPU until preempted by the operating system).

Error- "The size (12000) given to the type 'VarChar' exceeds the maximum allowed (8000)" in Azure dataware house

I am trying to execute T-SQL in Azure Dataware house and it's not allowing me to have data type greater then varchar(8000), can anyone please suggest some alternative to this.
(Same issue happened on table creation as well , it doesn't support blob or LOB datatype even with Bulk Loading or poly base loading, so i ended up loading trimmed data.)
You can try VARCHAR(MAX), which will support up to 2GB, but the page size in SQL is still limited to 8,000 so I'm not sure if that will help or not. And Polybase is limited to 1MB per row. Here is another useful SO entry

Azure Data Factory copy data is slow

Source database: PostgreSQL hosted on Azure VM D16s_v3
Destination database: SQL Server developer edition hosted on Azure VM D4s_v3
Source database is around 1TB in size
Destination database is empty with existing schema identical to source database
Throughput is only 1mb/s. Nothing helps. (I've selected max DIU) SQL Server doesn't have any keys or indexes at this point.
Batch size is 10000
See screenshot:
I got nailed by something similar when using ADF to copy data from an on-premises Oracle source to an Azure SQL Database sink. The same exact job performed via SSIS was something like 5 times faster. We began to suspect that something was amiss with data types, because the problem disappeared if we cast all of our high-precision Oracle NUMBER columns to less precision, or to something like integer.
It got so bad that we opened a case with Microsoft about it, and our worst fears were confirmed.
The Azure Data Factory runtime decimal type has a maximum precision of 28. If a decimal/numeric value from the source has a higher precision, ADF will first cast it to a string. The performance of the string casting code is abysmal.
Check to see if your source has any high-precision numeric data, or if you have not explicitly defined schema, see if you're perhaps accidentally using string.
Increase the batch size to 1000000.
If you are using TableName option then you should have that Table inside Dataset dropdown box. If you are extracting using SQL query then please check inside Dataset connection, click on edit and remove table name.
I had hit the same issue. If you select the query option and provide tablename in dataset, then you are confusing Azure Datafactory and making it ambiguous to decide on which option.

Is there an accurate way to get all of the stored procedure column dependencies in Sybase ASE?

I am currently working within a Sybase ASE 15.7 server and need a dependable way to get all of the stored procedures which are dependent upon a particular column. The Sybase system procedure sp_depends is notoriously unreliable for this. I was wondering if anyone out there had a more accurate way to discover these dependencies.
Apparently, the IDs of the columns are supposed to be stored in a bitmap in the varbinary column sysdepends.columns. However, I have not yet found a bitmask which has been effective in decoding these column IDs.
Thanks!
a tedious solution could be to parse the SP code in the system table syscomments to retrieve the tables.
A partial solution might be to run sp_recompile on all relevant tables, then watch master..monCachedProcedures to see changes in the CompileDate. Note that the CompileDate will only change once the stored proc has been executed after the sp_recompile (it actually gets compiled on 1st execution)
This would at least give you an idea of stored procedures that are in use, that are dependent on the specified table.
Not exactly elegant...

base64 encoding of Image Links - MySQL Fatal errors

Recently I've been working on migrating a very large client database from a shared host to a VPS (the size of the main databases alone are around 5GB) and while I've managed to migrate the majority of the tables successfully via the shell (and increasing the packet cap to 2GB) there is one table in particular which has been giving me problems.
The issue is that I have about 20k rows in the middle of a table which will not upload either in phpMyAdmin or via the shell because they contain base64 encoded images which keep causing mySQL to throw me "max packet reached errors" even though the cap is at 2GB.
I tried reaching out to my server admin and his support team and they are also stumped about how to go about processing this data so any assistance is greatly appreciated.
Also, the table in question is only 160mb or so uncompressed and the segment alone in question (split from the others) is only ~60mb so I don't think the packet cap is the issue.
update I just ran the query to show packet sizes and MySQL showed: 1048576 under the max_allowed_packet value
update2 When I show warnings in MySQL I keep getting errors about rows not having data in all columns however when I replaced the empty strings with Null the errors persist so I am still confused
IonCube seems to be enabled on the original client server so I'm not sure if that is also causing any errors.

Resources