Excel in SSIS: How to import a column that may have more than 255 characters when DT_NTEXT causes failures? - excel

OK, so my latest project requires loading an Excel 2007 spreadsheet into a SQL Server table. I'm working in SSIS 2008R2. Based on some stuff I found on the internet, I opened the Excel source in Advanced editor and changed the datatype of the long column to DT_NTEXT, so that it wouldn't truncate it. Then I made the database column VARCHAR(MAX). This runs correctly in debug mode on my laptop.
Then I deployed it to the development server and attempted to load the same test file. It failed with the following error messages:
Error: Code: 0xC0208265
Source: Main Data Flow Task Get Main Data [1]
Description: Failed to retrieve long data for column "DESCR".
End Error
Error: Code: 0xC020901C
Source: Main Data Flow Task Get Main Data [1]
Description: There was an error with output column "DESCR" (72) on output "Excel Source Output" (9). The column status returned was: "DBSTATUS_UNAVAILABLE".
End Error
Error: Code: 0xC0209029
Source: Main Data Flow Task Get Main Data [1]
Description: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "DESCR" (72)" failed because error code 0xC0209071 occurred, and the error row disposition on "output column "DESCR" (72)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
End Error
Searching for information about the error, I found about a million sites offering the same three suggested solutions:
Add 'IMEX=1' to the extended properties of the connection string.
It was already there.
Change the TypeGuessRows key in the registry.
This was set to zero on the server, which I understand to mean that it should look at the entire file. Nevertheless, I changed it to 8 to match my laptop. The same error occurred when I ran it again. Then I changed it to 1,763, which is more than the number of rows in the spreadsheet. It still gave the same error. So, I put it back to zero. (There's a 1,900-character value in the first row of my test file, so it shouldn't really matter how many it checks, in this case.)
Change the datatype to DT_WSTR(4000) in the source.
The column is supposed to have up to 10,000 characters, so I'm not sure this would be a good idea even if it worked. However, I tried it anyway. This time it gave me a truncation error. I changed the truncation error disposition to "ignore failure" and it loaded the data, but truncated the value to 255 characters. I have verified that the length is 4000 and doesn't get changed when I save the file, but it's still truncating at 255 characters.
I have no idea what else to look at. Any help would be appreciated.
UPDATE 1/29: The package, without any changes, works correctly when running on the pre-production server. It still fails when running on the development server. Both servers have the same version of SSIS (including minor version numbers) as well as the same versions of Windows, Access and Excel. I do not know how to explain this, nor do I know how to tell if it would work in production.

I created a new package with similar non-functional requirements (Excel 2007 file, SSIS 2008, SQL Server 2008 R2, VARCHAR(MAX) target column) and it worked just fine after deployment into the database server. My package:
Metadata at the Excel Source component's output (checked using Advanced Editor): DT_NTEXT
Derived Column component between source and destination to cast to non-unicode from unicode using (DT_TEXT,1252)
Metadata at the OLE DB Destination component's input (checked using Advanced Editor): DT_TEXT
Target Column data type: VARCHAR(MAX)
I do not explicitly use the extended property IMEX in the connection
Executed by right-clicking on the package at the database server, and loaded a file with a few thousand characters per record into the table without truncation. Hope this helps

I have faced this issue while importing an excel file with a field containing more than 255 characters. I solved the issue using Python.
Simply, import the excel in a pandas data frame and then calculate the length of each of those string values per row.
Then, sort the dataframe in descending order. This will enable SSIS to allocate maximum space for that field as it scans the first 3 rows to allocate storage:
df = pd.read_excel(f,sheet_name=0,skiprows = 1)
df = df.drop(df.columns[[0]], axis = 1)
df['length'] = df['Item Description'].str.len()
df.sort_values('length', ascending=False, inplace=True)
writer = ExcelWriter('Clean/Cleaned_'+f[5:])
df.to_excel(writer,sheet_name='Billing',index=False)
writer.save()

Related

Can't receive text from PostgreSQL into Excel via ODBC due to character coding problem (UTF-8 vs Win1250)

My base scenario is that I want to make an excel report with data from a PostgreSQL DB.
I get them via ODBC, making a simple linked table with PowerQuery.
For DSN I choose (None), then I write the connectio string and the SQL statement. Generally it works fine, but with one column, it doesn't. I recive the following error message:
ODBC: ERROR [22P05] ERROR: character with byte sequence 0xc2 0xb2 in encoding "UTF8" has no equivalent in encoding "WIN1250";Error while executing the query
So that is clear, the source is in UTF-8 with characters that are not compatible with Win1250.
What I am looking for is a general solution either on DB or excel site.
The used SQL statement is a simple SELECT * FROM [view], so I can use any replacement or converting or anything just to be able to hanle it with transformations on the column. I can replace the view with function if that is better.
But it would be better, if you can suggest an excel site solution.
With it there is some criteria. That scenario, when "I get the data first in text, then I convert it to Win1250, then import to excel" wont't fit, and I need something which connects to the excel file itself, so if I move it to an other pc, it need to work too without any more modification.
Thanks for all the help!

ADF copy activity failed while extracting data from DB2- Issue found for few records having special characters

I am doing a full extract from a table ABC. In copy activity, I have given a query as
select * from ABC
whereas I am facing issue for few rows (It has special characters - Japanese and Korean)
Error code 2200
Failure type User configuration issue
Details Failure happened on 'Source' side. ErrorCode=DB2DriverRunFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error thrown from driver. Sql code: '-343',Source=Microsoft.DataTransfer.ClientLibrary.Db2Connector,''Type=Microsoft.HostIntegration.DrdaClient.DrdaException,Message=HISMPCB0001 In BasePrimitiveConverter an exception has occurred. Exception description: Output buffer is smaller than required size 12 SQLSTATE=HY000 SQLCODE=-343,Source=Microsoft.HostIntegration.Drda.Requester,'
The character which is causing the issue is '轎ᆃ '
In the error description, it states that there is BasePrimitiveConverter exception that has occurred. The exception description indicates that the output buffer is smaller than the required size. So, please try converting the column to an acceptable type like graphic in db2. Refer to the following link to understand more.
https://bytes.com/topic/db2/answers/488983-storing-some-japanese-data
Referring to these links, I understand that this error might be due to the datatype of source column, or the encoding used. Try working with different encoding options available in your source dataset. Here is a similar problem with a different source but a similar problem of not being able to retrieve special characters.
https://learn.microsoft.com/en-us/answers/questions/467456/failure-happened-source-side-in-copy-activity-for.html

How Do I resolve "Illuminate\Queue\InvalidPayloadException: Unable to JSON encode payload. Error code: 5"

Trying out the queue system for a better user upload experience with Laravel-Excel.
.env was been changed from 'sync' to 'database' and migrations run. All the necessary use statements are in place yet the error above persists.
The exact error happens here:
Illuminate\Queue\Queue.php:97
$payload = json_encode($this->createPayloadArray($job, $queue, $data));
if (JSON_ERROR_NONE !== json_last_error()) {
throw new InvalidPayloadException(
If I drop ShouldQueue, the file imports perfectly in-session (large file so long wait period for user.)
I've read many stackoverflow, github etc comments on this but I don't have the technical skills to deep-dive to fix my particular situation (most of them speak of UTF-8 but I don't if that's an issue here; I changed the excel save format to UTF-8 but it didn't fix it.)
Ps. Whilst running the migration, I got the error:
SQLSTATE[42000]: Syntax error or access violation: 1071 Specified key was too long; max key length is 767 bytes (SQL: alter table `jobs` add index `jobs_queue_index`(`queue`))
I bypassed by dropping the 'add index'; so my jobs table is not indexed on queue but I don't feel this is the cause.
One thing you can do when looking into json_encode() errors is use the json_last_error_msg() function, which will give you a bit more of a readable error message.
In your case you're getting a '5' back, which is the JSON_ERROR_UTF8 error code. The error message back for this is a slightly more informative one:
'Malformed UTF-8 characters, possibly incorrectly encoded'
So we know it's encountering non-UTF-8 characters, even though you're saving the file specifically with UTF-8 encoding. At first glance you might think you need to convert the encoding yourself in code (like this answer), but in this case, I don't think that'll help. For Laravel-Excel, this seems to be a limitation of trying to queue-read .xls files - from the Laravel-Excel docs:
You currently cannot queue xls imports. PhpSpreadsheet's Xls reader contains some non-utf8 characters, which makes it impossible to queue.
In this case you might be stuck with a slow, non-queueable option, or need to convert your spreadsheet into a queueable format e.g. .csv.
The key length error on running the migration is unrelated. It has been around for a while and is a side-effect of using an older version of MySQL/MariaDB. Check out this answer and the Laravel documentation around index lengths - you need to add this to your AppServiceProvider::boot() method:
Schema::defaultStringLength(191);

Regular Expressions and SQL Server Error Logs - All false results

Ok, I have done my searching and I have tried many things. I think it is time to put my question here:
I have been working on taking in other user's SQL Server error logs, parsing out the rows into columns, then bulk inserting the data 1000 at a time. I troubleshoot SQL Server for other people so sp_readerrorlog will only show me my local instance. Finding root cause involves 4 sets of logs (SQL Server, Application Event, System Event, and get-clusterlog outputs and matching up timestamps. A fast load into SQL Server along with the ability to pull the exact timeframe needed will shorten my time spent staring at log files.
I am currently bottlenecked in testing the rows with a regular expression, which does work if I feed it data myself:
def sqlrowmatch(row):
pattern = re.compile(r'\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d.\d\d')
if pattern.search(row):
return True
else:
return False
given any string that matches above (1111-11-11 11:11:11.11) will return as true. The idea is if in a SQL Server Error Log, if this is matched, then it is a separate entry. this will allow memory graphs, deadlock graphs, and dumps to all be grouped in one entry as opposed to being split over several lines.
However, if I point it at one of the SQL Error Logs, there seems to be extra characters. This is giving re.match and re.show a hard time finding a match. If I load any line in this function,sqlrowmatch(), it reports back false for all rows.
ÿþ <-- this appears to be the first 2 characters at the first line. re.search just doesn't even find it anywhere in the in the different elements.
False is what is returned if I put the function in with the 'with open' as statement:
with open(file, 'r') as sqllog:
for line in sqllog:
print(sqlrowmatch(line))
the first line should always be true if sqlrowmatch() is used.
2018-10-13 22:40:09.41 Server Microsoft SQL Server 2016 (SP2-CU2-GDR) (KB4458621) - 13.0.5201.2 (X64)
So I am lost and my current project is at a halt. Perhaps some seasoned insight from this group can get me going again.
TIA
Interesting enough, I found my answer here: Opening huge text file, unicode issue
open should be done with encoding='utf-16'
It now matches appropriately

VBF syntax for SSRS expression cannot figure out proper construct

looking for help on what should be a very basic function. I am trying to get a SUM of a specific value, however I do not seem to get the syntax correct.
Here is what I have
=Sum(Fields!PriorYearSalesDollars.Value - Sum(Fields!PriorYearCost.Value
+Sum(Fields!PriorYearFrtCost.Value)))
However I get an error when trying to sum. Is there another way to test this also? Each time I modify the expression I then have to save the report and upload to the report server and test again. If I do it through the preview function in visual studio it throws a generic error on the whole report. When running from report server, just this specific column shows #Error
This is the syntax that works after FIRST changing the column format to numbers where I accidentally did it as currency first. Not sure why currency didn't work, but this is correct.
=Sum(Fields!PriorYearSalesDollars.Value) - (Sum(Fields!PriorYearCost.Value) + Sum(Fields!PriorYearFrtCost.Value))

Resources