Snowpipe doesn't load the files after error has been rectified - snowpipe

I am using snowpipe to load files from S3 bucket. It worked well for 2 files.
But then to check out how snowpipe works when there is any error occur in between file loading, I intentionally changed file format ( changed delimiter to '|' whereas file is CSV ) so that COPY command will not work. And uploaded 3rd CSV file on S3. But it was not loaded due to file format error. It was perfect till this time.
Later I recreated file format with correct delimiter i.e. ',' but since notification was already sent for 3rd file, it did not loaded in table. So I uploaded 4th csv file and it got loaded successfully. So my questions is how to take care of loading of 3rd file for which event notification was generated while file format was wrong.
Let me know if any more details are required.

Related

Not able to read .xlsb file or .xlsx (large files - 150 MB) from shared drive using python

I am facing this problem where when I try to read the file directly from shared drive it's throwing invalid path error. Trying to explain the situation below:
The data files in the form of .xlsx and .xlsb is copied to the sharepoint, which works as the source.
I used 'open in explorer' function from sharepoint and got the drive address.
Mapped the path after opening in explorer with my network drive, and added as p drive.
Now i am using this path to read the file directly using pandas read_excel.
it is throwing invalid path OS22 error
Issues :
When i am reading .xlsx file which is smaller in size 15MB, it is working well.
Trying to read another excel file 150 MB in size, getting invalid path error.
Same is happening when reading .xlsb binary files.
Already tried forward and back slashes, same error.
used open to read the file, got same invalid path error.
Though if i download the same file to local, it is working without any issue. Easily able to read the files, with same codes.
Any suggestion?

Howto handle umlauts in Logic App for export to csv

I created a logic app to export some data to a *.csv file.
Data which will be exported contains german umlauts.
I read all the needed values into variables which are then concatenated and added to an array.
Finally I get an array of semicolon separated strings with the values in it.
This result will then be added to an email as file attachment:
All the values are handled correctly in the Logic App and are correct in the *.csv file but as soon I open the csv with Excel, the umlauts are not shown correctly anymore.
Is there a way to create explicitly a file with the correct encoding within the logic app and add the file to the email instead of the ExportString?
Or can I somehow encode the content of the ExportString-Variable?
Any hints?
I have reproduced in my environment and followed below steps to get correct output in CSV file:
My input is:
I have sent the data into CSV table as below and then created a file in file share as below:
Then when i open my file share and download the content from there i got different output as you got:
Then I opened my Azure Storage explorer and downloaded it as below:
When i open in notepad the downloaded file:
I get the correct output, try to do in this way
And when i save it as hello.csv and keep utf-8 with bom like below:
Then I get the correct output in csv as well:

Line 2 ERROR The file NuxeoCSV-USERDOC.pdf does not exist

When i want to add an attachement(csv) to a file using the addon nuxeo csv import. I got this issue:
Line 2 ERROR The file NuxeoCSV-USERDOC.pdf does not exist
This is the csv file :
name,"type","dc:title","dc:description","file:content","dc:nature","dc:source"
nuxeo-csv-userdoc,"File","Nuxeo CSV User documentation","This is the user guide for Nuxeo CSV","NuxeoCSV-USERDOC.pdf","procedure","http://doc.nuxeo.com"
Nuxeo-csv-sample-3,"File","Nuxeo CSV Sample","This a second file imported with Nuxeo CSV","Nuxeo-csv-sample-3.odt","article","http://doc.nuxeo.com"
It's demanded to make some changes in the file conf but I don't get the last line. How I'm supposed to add the path and how can I add nuxeo.csv.blobs.folder, just by pasting it?
Configuration :
The Nuxeo CSV addon enables users to create file documents and upload their
main attachment at the same time. This requires to configure where the
server will take the attachments. This is done adding the parameter
nuxeo.csv.blobs.folder in the server nuxeo.conf and giving it a value that
is a local path to a folder that can be accessed by the server.
Thanks in advance.

What .xlsx file format is this?

Using an existing SSIS package, I was trying to import .xlsx files we received from a client. I received the error message:
External table is not in the expected format
These files will open in XL
When I use XL (currently XL2010) to Save As... the file without making any changes:
The new file imports just fine
The new file is 330% the size of the original file
When changing .xlsx to .zip and investigating the contents with WinZip:
The original file only has 4 .xml files and a _rels folder (with 2 .rels files):
The new file has the expected .xlsx contents:
Does anyone know what kind of file this could be?
It would be nice to develop my SSIS package to work with these original files, without having to open and re-save each file. There are only 12 files, so if there are no other options, opening/saving each file is not that big of deal...and I could automate it with VBA going forward.
Thanks for any help anyone can provide,
CTB
There are many Excel file formats.
The file you are trying to import may have another excel format but the extension is changed to .xlsx (it could be edited by someone else) , or it could be created with a different Excel version.
There is a Third-Part application called TridNet File Identifier which is an utility designed to identify file types from their binary signatures. you can use it to specify the real extension of the specified file.
Also after a simple search on External table is not in the expected format this error is thrown when the definition (or version) of the excel files supported in the connection string is different from the file selected. Check the connection string used in the excel connection manager. It might help to identify the version of the file.

saving an image to bytes and uploading to boto3 returning content-MD5 mismatch

I'm trying to pull an image from s3, quantize it/manipulate it, and then store it back into s3 without saving anything to disk (entirely in-memory). I was able to do it once, but upon returning to the code and trying it again it did not work. The code is as follows:
import boto3
import io
from PIL import Image
client = boto3.client('s3',aws_access_key_id='',
aws_secret_access_key='')
cur_image = client.get_object(Bucket='mybucket',Key='2016-03-19 19.15.40.jpg')['Body'].read()
loaded_image = Image.open(io.BytesIO(cur_image))
quantized_image = loaded_image.quantize(colors=50)
saved_quantized_image = io.BytesIO()
quantized_image.save(saved_quantized_image,'PNG')
client.put_object(ACL='public-read',Body=saved_quantized_image,Key='testimage.png',Bucket='mybucket')
The error I received is:
botocore.exceptions.ClientError: An error occurred (BadDigest) when calling the PutObject operation: The Content-MD5 you specified did not match what we received.
It works fine if I just pull an image, and then put it right back without manipulating it. I'm not quite sure what's going on here.
I had this same problem, and the solution was to seek to the beginning of the saved in-memory file:
out_img = BytesIO()
image.save(out_img, img_type)
out_img.seek(0) # Without this line it fails
self.bucket.put_object(Bucket=self.bucket_name,
Key=key,
Body=out_img)
The file may need to be saved and reloaded before you send it off to S3. The file pointer seek also needs to be at 0.
My problem was sending a file after reading out the first few bytes of it. Opening a file cleanly did the trick.
I found this question getting the same error trying to upload files -- two scripts clashed, one creating, the other uploading. My answer was to create using ".filename" then:
os.rename(filename.replace(".filename","filename"))
The upload script then needs to ignore . files. This ensured the file was done being created.
To anyone else facing similar errors, this usually happens when content of the file gets modified during file upload, possibly due to file being modified by another process/thread.
A classic example would be to scripts modifying the same file at the same time, which throws the bad digest due to change in MD5 content. In the below example, the data file is being uploaded to s3, while it is being uploaded, if another process overwrites it, you will end up with this exception
random_uuid=$(uuidgen)
cat data
aws s3api put-object --acl bucket-owner-full-control --bucket $s3_bucket --key $random_uuid --body data

Resources