Python3: Format SOME MYSQL warnings and write ALL to file - python-3.x

We have a script that handles data-import. Now that most of the data is properly sanitized we want to focus on fine-tuning the MYSQL backend. This backend is in some cases to rigidly defined (i.e. strings are longer than the varchar allows ...). Since we have new data on a weekly basis we want to log it weekly so that we can use that log to check the data-source and if necessary modify the backend.
For this the importscript needs to be modified slightly:
None-1366 MYSQL-warnings need to be suppressed and pretty printed (OK)
All MYSQL-warnings need to be written to a log file (OK)
Hide the default error notice on import, because this floods the terminal with 1366 warnings. (NOT ok)
The code I have now is (This is only a small part out of a larger script):
into_file_operation = "LOAD DATA LOCAL INFILE '%s/%s.csv' INTO TABLE %s FIELDS TERMINATED BY ',' ENCLOSED BY '\"' ESCAPED BY '' LINES TERMINATED BY '\\r';" %(folder, name, name)
#warnings.filterwarnings("ignore")
cursor.execute(into_file_operation)
conn.commit()
conn.close
warnings = conn.show_warnings()
for w in warnings:
if w[1] != 1366:
pprint(w, width=100, depth=2) ##PPRINT non-1366 errors
else:
print(str(w[1]), end='\r') #this can be turned into pass later.
errorfile.write(str(w)+"\n") #write ALL to file
errorfile.flush()
errorfile.write("\n")
errorfile.write("********* DONE TABLE **********")
errorfile.write("\n")
errorfile.flush()
This meets the first two demands, yet it still outputs the very long warnings to the console - which we want to get rid off:
C:\Users\me\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pymysql\cursors.py:166: Warning: (1366, "Incorrect integer value: '' for column 'y1' at row 126") result = self._query(query)
and errors like 1265 can be shown, but need to be pretty printed. With the current code, these get printed twice (first time, the long warning, second time a formatted warning using PrettyPrint)
I think that my next step would be to do something with the STDOUT, yet whatever I tried, I keep ending up with an empty database. I've also tried to use the filter with the ignore parameter, yet then I don't get any warnings at all - which defeats the purpose of having the log.

Related

How Do I resolve "Illuminate\Queue\InvalidPayloadException: Unable to JSON encode payload. Error code: 5"

Trying out the queue system for a better user upload experience with Laravel-Excel.
.env was been changed from 'sync' to 'database' and migrations run. All the necessary use statements are in place yet the error above persists.
The exact error happens here:
Illuminate\Queue\Queue.php:97
$payload = json_encode($this->createPayloadArray($job, $queue, $data));
if (JSON_ERROR_NONE !== json_last_error()) {
throw new InvalidPayloadException(
If I drop ShouldQueue, the file imports perfectly in-session (large file so long wait period for user.)
I've read many stackoverflow, github etc comments on this but I don't have the technical skills to deep-dive to fix my particular situation (most of them speak of UTF-8 but I don't if that's an issue here; I changed the excel save format to UTF-8 but it didn't fix it.)
Ps. Whilst running the migration, I got the error:
SQLSTATE[42000]: Syntax error or access violation: 1071 Specified key was too long; max key length is 767 bytes (SQL: alter table `jobs` add index `jobs_queue_index`(`queue`))
I bypassed by dropping the 'add index'; so my jobs table is not indexed on queue but I don't feel this is the cause.
One thing you can do when looking into json_encode() errors is use the json_last_error_msg() function, which will give you a bit more of a readable error message.
In your case you're getting a '5' back, which is the JSON_ERROR_UTF8 error code. The error message back for this is a slightly more informative one:
'Malformed UTF-8 characters, possibly incorrectly encoded'
So we know it's encountering non-UTF-8 characters, even though you're saving the file specifically with UTF-8 encoding. At first glance you might think you need to convert the encoding yourself in code (like this answer), but in this case, I don't think that'll help. For Laravel-Excel, this seems to be a limitation of trying to queue-read .xls files - from the Laravel-Excel docs:
You currently cannot queue xls imports. PhpSpreadsheet's Xls reader contains some non-utf8 characters, which makes it impossible to queue.
In this case you might be stuck with a slow, non-queueable option, or need to convert your spreadsheet into a queueable format e.g. .csv.
The key length error on running the migration is unrelated. It has been around for a while and is a side-effect of using an older version of MySQL/MariaDB. Check out this answer and the Laravel documentation around index lengths - you need to add this to your AppServiceProvider::boot() method:
Schema::defaultStringLength(191);

Attempting to append all content into file, last iteration is the only one filling text document

I'm trying to Create a file and append all the content being calculated into that file, but when I run the script the very last iteration is written inside the file and nothing else.
My code is on pastebin, it's too long, and I feel like you would have to see exactly how the iteration is happening.
Try to summarize it, Go through an array of model numbers, if the model number matches call the function that calculates that MAC_ADDRESS, when done calculating store all the content inside a the file.
I have tried two possible routes and both have failed, giving the same result. There is no error in the code (it runs) but it just doesn't store the content into the file properly there should be 97 different APs and it's storing only 1.
The difference between the first and second attempt,
1 attempt) I open/create file in the beginning of the script and close at the very end.
2 attempt) I open/create file and close per-iteration.
First Attempt:
https://pastebin.com/jCpLGMCK
#Beginning of code
File = open("All_Possibilities.txt", "a+")
#End of code
File.close()
Second Attempt:
https://pastebin.com/cVrXQaAT
#Per function
File = open("All_Possibilities.txt", "a+")
#per function
File.close()
If I'm not suppose to reference other websites, please let me know and I'll just paste the code in his post.
Rather than close(), please use with:
with open('All_Possibilities.txt', 'a') as file_out:
file_out.write('some text\n')
The documentation explains that you don't need + to append writes to a file.
You may want to add some debugging console print() statements, or use a debugger like pdb, to verify that the write() statement actually ran, and that the variable you were writing actually contained the text you thought it did.
You have several loops that could be a one-liner using readlines().
Please do this:
$ pip install flake8
$ flake8 *.py
That is, please run the flake8 lint utility against your source code,
and follow the advice that it offers you.
In particular, it would be much better to name your identifier file than to name it File.
The initial capital letter means something to humans reading your code -- it is
used when naming classes, rather than local variables. Good luck!

Regular Expressions and SQL Server Error Logs - All false results

Ok, I have done my searching and I have tried many things. I think it is time to put my question here:
I have been working on taking in other user's SQL Server error logs, parsing out the rows into columns, then bulk inserting the data 1000 at a time. I troubleshoot SQL Server for other people so sp_readerrorlog will only show me my local instance. Finding root cause involves 4 sets of logs (SQL Server, Application Event, System Event, and get-clusterlog outputs and matching up timestamps. A fast load into SQL Server along with the ability to pull the exact timeframe needed will shorten my time spent staring at log files.
I am currently bottlenecked in testing the rows with a regular expression, which does work if I feed it data myself:
def sqlrowmatch(row):
pattern = re.compile(r'\d\d\d\d-\d\d-\d\d\s\d\d:\d\d:\d\d.\d\d')
if pattern.search(row):
return True
else:
return False
given any string that matches above (1111-11-11 11:11:11.11) will return as true. The idea is if in a SQL Server Error Log, if this is matched, then it is a separate entry. this will allow memory graphs, deadlock graphs, and dumps to all be grouped in one entry as opposed to being split over several lines.
However, if I point it at one of the SQL Error Logs, there seems to be extra characters. This is giving re.match and re.show a hard time finding a match. If I load any line in this function,sqlrowmatch(), it reports back false for all rows.
ÿþ <-- this appears to be the first 2 characters at the first line. re.search just doesn't even find it anywhere in the in the different elements.
False is what is returned if I put the function in with the 'with open' as statement:
with open(file, 'r') as sqllog:
for line in sqllog:
print(sqlrowmatch(line))
the first line should always be true if sqlrowmatch() is used.
2018-10-13 22:40:09.41 Server Microsoft SQL Server 2016 (SP2-CU2-GDR) (KB4458621) - 13.0.5201.2 (X64)
So I am lost and my current project is at a halt. Perhaps some seasoned insight from this group can get me going again.
TIA
Interesting enough, I found my answer here: Opening huge text file, unicode issue
open should be done with encoding='utf-16'
It now matches appropriately

Fortran error check on formatted read

In my code I am attempting to read in output files that may or may not have a formatted integer in the first line of the file. To aid backwards compatibility I am attempting to be able to read in both examples as shown below.
head -n 3 infile_new
22
8
98677.966601475651 -35846.869655806520 3523978.2959464169
or
head -n 3 infile_old
8
98677.966601475651 -35846.869655806520 3523978.2959464169
101205.49395364164 -36765.047712555031 3614241.1159234559
The format of the top line of infile_new is '(i5)' and so I can accommodate this in my code with a standard read statement of
read(iunit, '(I5)' ) n
This works fine, but if I attempt to read in infile_old using this, I as expected get an error. I have attempted to get around this by using the following
read(iunit, '(I5)' , iostat=ios, err=110) n
110 if(ios == 0) then
print*, 'error in file, setting n'
naBuffer = na
!rewind(iunit) #not sure whether to rewind or close/open to reset file position
close(iunit)
open (iunit, file=fname, status='unknown')
else
print*, "Something very wrong in particle_inout"
end if
The problem here is that when reading in either the old or new file the code ends up in the error loop. I've not been able to find much documentation on using the read statement in this way, but cannot determine what is going wrong.
My one theory was my use of ios==0 in the if statement, but figured since I shouldn't have an error when reading the new file it shouldn't matter. It would be great to know if anyone knows a way to catch such errors.
From what you've shown us, after the code executes the read statement it executes the statement labelled 110. Then, if there wasn't an error and iostat==0 the true branch of the if construct is executed.
So, if there is an error in the read the code jumps to that statement, if there isn't it walks to the same statement. The code doesn't magically know to not execute the code starting at label 110 if there isn't an error in the read statement. Personally I've never used both iostat and err in the same read statement and here I think it's tripping you up.
Try changing the read statement to
read(iunit, '(I5)' , iostat=ios) n
You'd then need to re-work your if construct a bit, since iostat==0 is not an error condition.
Incidentally, to read a line which is known to contain only one integer I wouldn't use an explicit format, I'd just use
read(iunit, * , iostat=ios) n
and let the run-time worry about how big the integer is and where to find it.

Excel in SSIS: How to import a column that may have more than 255 characters when DT_NTEXT causes failures?

OK, so my latest project requires loading an Excel 2007 spreadsheet into a SQL Server table. I'm working in SSIS 2008R2. Based on some stuff I found on the internet, I opened the Excel source in Advanced editor and changed the datatype of the long column to DT_NTEXT, so that it wouldn't truncate it. Then I made the database column VARCHAR(MAX). This runs correctly in debug mode on my laptop.
Then I deployed it to the development server and attempted to load the same test file. It failed with the following error messages:
Error: Code: 0xC0208265
Source: Main Data Flow Task Get Main Data [1]
Description: Failed to retrieve long data for column "DESCR".
End Error
Error: Code: 0xC020901C
Source: Main Data Flow Task Get Main Data [1]
Description: There was an error with output column "DESCR" (72) on output "Excel Source Output" (9). The column status returned was: "DBSTATUS_UNAVAILABLE".
End Error
Error: Code: 0xC0209029
Source: Main Data Flow Task Get Main Data [1]
Description: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "output column "DESCR" (72)" failed because error code 0xC0209071 occurred, and the error row disposition on "output column "DESCR" (72)" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
End Error
Searching for information about the error, I found about a million sites offering the same three suggested solutions:
Add 'IMEX=1' to the extended properties of the connection string.
It was already there.
Change the TypeGuessRows key in the registry.
This was set to zero on the server, which I understand to mean that it should look at the entire file. Nevertheless, I changed it to 8 to match my laptop. The same error occurred when I ran it again. Then I changed it to 1,763, which is more than the number of rows in the spreadsheet. It still gave the same error. So, I put it back to zero. (There's a 1,900-character value in the first row of my test file, so it shouldn't really matter how many it checks, in this case.)
Change the datatype to DT_WSTR(4000) in the source.
The column is supposed to have up to 10,000 characters, so I'm not sure this would be a good idea even if it worked. However, I tried it anyway. This time it gave me a truncation error. I changed the truncation error disposition to "ignore failure" and it loaded the data, but truncated the value to 255 characters. I have verified that the length is 4000 and doesn't get changed when I save the file, but it's still truncating at 255 characters.
I have no idea what else to look at. Any help would be appreciated.
UPDATE 1/29: The package, without any changes, works correctly when running on the pre-production server. It still fails when running on the development server. Both servers have the same version of SSIS (including minor version numbers) as well as the same versions of Windows, Access and Excel. I do not know how to explain this, nor do I know how to tell if it would work in production.
I created a new package with similar non-functional requirements (Excel 2007 file, SSIS 2008, SQL Server 2008 R2, VARCHAR(MAX) target column) and it worked just fine after deployment into the database server. My package:
Metadata at the Excel Source component's output (checked using Advanced Editor): DT_NTEXT
Derived Column component between source and destination to cast to non-unicode from unicode using (DT_TEXT,1252)
Metadata at the OLE DB Destination component's input (checked using Advanced Editor): DT_TEXT
Target Column data type: VARCHAR(MAX)
I do not explicitly use the extended property IMEX in the connection
Executed by right-clicking on the package at the database server, and loaded a file with a few thousand characters per record into the table without truncation. Hope this helps
I have faced this issue while importing an excel file with a field containing more than 255 characters. I solved the issue using Python.
Simply, import the excel in a pandas data frame and then calculate the length of each of those string values per row.
Then, sort the dataframe in descending order. This will enable SSIS to allocate maximum space for that field as it scans the first 3 rows to allocate storage:
df = pd.read_excel(f,sheet_name=0,skiprows = 1)
df = df.drop(df.columns[[0]], axis = 1)
df['length'] = df['Item Description'].str.len()
df.sort_values('length', ascending=False, inplace=True)
writer = ExcelWriter('Clean/Cleaned_'+f[5:])
df.to_excel(writer,sheet_name='Billing',index=False)
writer.save()

Resources