Upload large file using azure java sdk more than 50k block - azure

I'm trying to upload a file size of 230GB into azure block blob with the following code
private void uploadFile(FileObject srcFile, FileObject destFile) throws Exception {
try {
BlobClient destBlobClient = blobContainerClient.getBlobClient("destFilename");
long blockSize = 4 * 1024 * 1024 // 4 MB
ParallelTransferOptions opts = new ParallelTransferOptions()
.setBlockSizeLong(blockSize)
.setMaxConcurrency(5);
BlobRequestConditions requestConditions = new BlobRequestConditions();
try (BlobOutputStream bos = destBlobClient.getBlockBlobClient().getBlobOutputStream(
opts, null, null, null, requestConditions);
InputStream is = srcFile.getContent().getInputStream()) {
byte[] buffer = new byte[(int) blockSize];
int i = 0;
for (int len; (len = is.read(buffer)) != -1; ) {
bos.write(buffer, 0, len);
}
}
}
finally {
destFile.close();
srcFile.close();
}
}
Since,I am explicitly setting block size 4MB for each write operation I'm in a assumption that each write block is considered as single block in azure. But which is not the case.
For the above example 230GB file the write operation was executed 58880 times and the file got uploaded successfully.
Can someone please explain me more about how blocks are splits internally in azure and let help me to understand better.
Thanks in advance

Related

Amazon DynamoDB :- Invalid UpdateExpression: Expression size has exceeded the maximum allowed size dynamodb

I am trying to update an item in was dynamoDB using nodes, db.updateItem(query).
I am getting the following error :
Invalid UpdateExpression: Expression size has exceeded the maximum allowed size dynamodb
On reading few posts, I realised that dynamoDB allows itemSize to be 400KB and that might be a problem here. But if that is the problem, why did it allow to insert the item in the first place.
I am not sure what exactly the issue. Any help would be appreciated.
Please let me know if I missed any required information
You are probably hitting Expression Parameters limits. Please refer to:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Limits.html#limits-expression-parameters
If you are getting this exception
software.amazon.awssdk.services.dynamodb.model.DynamoDbException: Item size has exceeded the maximum allowed size
This exception is due to AWS Dynamodb limits mentioned here
in my case, I compressed the record using gzip and stored binary zipped data, and uncompressed it back after reading that record.
please see below sample code to compress and decompress (I am using enhanced dynamodb client library)
public CompletableFuture<Boolean> storeItem(MyBeanClass object) {
CompletableFuture<Boolean> completableFuture = CompletableFuture.supplyAsync(() -> false);
try {
byte[] serialized = objectMapper.writeValueAsString(object.getLargeData()).getBytes(UTF_8);
if (serialized.length >= 10000) { //large record, gzip it
try (ByteArrayOutputStream bos = new ByteArrayOutputStream(serialized.length);
GZIPOutputStream gzip = new GZIPOutputStream(bos)) {
gzip.write(serialized);
gzip.close();
MyBeanClass newObject = new MyBeanClass();
newObject.setPrimaryId(object.getPrimaryId());
newObject.setGzData(SdkBytes.fromByteArray(bos.toByteArray()));
completableFuture = enhancedDynamoDbTable.putItem(newObject)
.thenApply(res -> true)
.exceptionally(th -> {
th.printStackTrace();
return false;
});
}
} else { //no compression required
completableFuture = enhancedDynamoDbTable.putItem(object).thenApply(res -> true)
.exceptionally(th -> {
th.printStackTrace();
return false;
});
}
} catch (IOException e) {
e.printStackTrace();
}
To fetch record and unzip
public CompletableFuture<MyBeanClass> getItem(String id) {
return enhancedDynamoDbTable
.getItem(Key.builder().partitionValue(id).build())
.thenApply(record -> {
if (record.getGzData() != null) {
try (ByteArrayInputStream arrayInputStream = new ByteArrayInputStream(record.getGzData().asByteArray());
GZIPInputStream inputStream = new GZIPInputStream(arrayInputStream);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream()) {
byte[] buffer = new byte[1024];
int length;
while ((length = inputStream.read(buffer)) != -1) {
byteArrayOutputStream.write(buffer, 0, length);
}
record = objectMapper.readValue(byteArrayOutputStream.toString(UTF_8), MyBeanClass.class);
} catch (IOException e) {
e.printStackTrace();
}
}
return record;
});
}
Hope that helps.

C#: WPD - Downloading a Picture with meta tags

I am running the Portable Device API to automatically get Photos from a connected Smart Phone. I have it all transferring correctly. The code that i use is that Standard DownloadFile() routine:
public PortableDownloadInfo DownloadFile(PortableDeviceFile file, string saveToPath)
{
IPortableDeviceContent content;
_device.Content(out content);
IPortableDeviceResources resources;
content.Transfer(out resources);
PortableDeviceApiLib.IStream wpdStream;
uint optimalTransferSize = 0;
var property = new _tagpropertykey
{
fmtid = new Guid(0xE81E79BE, 0x34F0, 0x41BF, 0xB5, 0x3F, 0xF1, 0xA0, 0x6A, 0xE8, 0x78, 0x42),
pid = 0
};
resources.GetStream(file.Id, ref property, 0, ref optimalTransferSize, out wpdStream);
System.Runtime.InteropServices.ComTypes.IStream sourceStream =
// ReSharper disable once SuspiciousTypeConversion.Global
(System.Runtime.InteropServices.ComTypes.IStream)wpdStream;
var filename = Path.GetFileName(file.Name);
if (string.IsNullOrEmpty(filename))
return null;
FileStream targetStream = new FileStream(Path.Combine(saveToPath, filename),
FileMode.Create, FileAccess.Write);
try
{
unsafe
{
var buffer = new byte[1024];
int bytesRead;
do
{
sourceStream.Read(buffer, 1024, new IntPtr(&bytesRead));
targetStream.Write(buffer, 0, 1024);
} while (bytesRead > 0);
targetStream.Close();
}
}
finally
{
Marshal.ReleaseComObject(sourceStream);
Marshal.ReleaseComObject(wpdStream);
}
return pdi;
}
}
There are two problems with this standard code:
1) - when the images are saves to the windows machine, there is no EXIF information. this information is what i need. how do i preserve it?
2) the saved files are very bloated. for example, the source jpeg is 1,045,807 bytes, whilst the downloaded file is 3,942,840 bytes!. it is similar to all of the other files. I would of thought that the some inside the unsafe{} section would output it byte for byte? Is there a better way to transfer the data? (a safe way?)
Sorry about this. it works fine.. it is something else that is causing these issues

Resources getting deleted when trying to add at run time

When i am trying to add resources to another file at run time, some of the earlier resources are getting deleted. Please find the source code below:
void CResourceIncludeSampleDlg::OnBnClickedButton1()
{
CString strInputFile = _T("C:\\SampleData\\FileToInsert.zip"); // This File is 100 MB
HANDLE hFile = CreateFile(strInputFile, GENERIC_READ, 0, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);
DWORD FileSize = GetFileSize(hFile, NULL);
BYTE *pBuffer = new BYTE[FileSize];
DWORD dwBytesRead;
ReadFile(hFile, pBuffer, FileSize, &dwBytesRead, NULL);
for (int iIndex = 1; iIndex <= 4; iIndex++)
{
InsertResource(FileSize, iIndex, pBuffer);
}
CloseHandle(hFile);
}
void CResourceIncludeSampleDlg::InsertResource(DWORD FileSize, int iIndex, BYTE *pBuffer)
{
CString strOutputFile = _T("C:\\SampleData\\ResourceIncludeSample_Source.exe");
int iResourceID = 300 + iIndex;
HANDLE hResource = BeginUpdateResource(strOutputFile, FALSE);
if (INVALID_HANDLE_VALUE != hResource)
{
if (UpdateResource(hResource, _T("VIDEOBIN"), MAKEINTRESOURCE(iResourceID), MAKELANGID(LANG_ENGLISH, SUBLANG_ENGLISH_US),
(LPVOID)pBuffer, FileSize) == TRUE)
{
EndUpdateResource(hResource, FALSE);
}
}
}
After completion of the execution, i am expecting output as 301, 302, 303 and 304 added under "VIDEOBIN" category. But only 2 (sometimes 3) resources are present. One resource is always deleted.
Could you please let me know what could be wrong or any fix for the same ?
Any help or sample source code is greatly appreciated.
Thanks and Regards,
YKK Reddy
You need delete[] pBuffer after closing the file. It should be RT_RCDATA instead of _T("VIDEOBIN") although custom resource name may not be the cause of this particular problem.

Regarding CloudBlockblob.putBlock and CloudBlockBlob.PutBlockList

I am aware that we can use CloudBlockblob.putBlock and CloudBlockBlob.PutBlockList to upload in chunks but these methods do not have lease id parameter.
For this can i form the httpwebrequest with header "x-ms-lease-id" and attach to CloudBlockblob.putBlock and CloudBlockBlob.PutBlockList
Hi Gaurav,I could not big comment to your response hence adding it.
I tried with BlobRequest.PutBlock and Blobrequest.PutBlock with following code:
`for (int idxThread = 0; idxThread < numThreads; idxThread++)
{
tasks.Add(Task.Factory.StartNew(() =>
{
KeyValuePair blockIdAndLength;
while (true)
{
lock (queue)
{
if (queue.Count == 0)
break;
blockIdAndLength = queue.Dequeue();
}
byte[] buff = new byte[blockIdAndLength.Value];
//copying chunks into buff from inputbyte array
Array.Copy(buffer, blockIdAndLength.Key * (long)blockIdAndLength.Value, buff, 0, blockIdAndLength.Value);
// Upload block.
string blockName = Convert.ToBase64String(BitConverter.GetBytes(
blockIdAndLength.Key));
//string blockIdString = Convert.ToBase64String(ASCIIEncoding.ASCII.GetBytes(string.Format("BlockId{0}", blockIdAndLength.Key.ToString("0000000"))));
/// For small files like 100 KB it works files,for large files like 10 MB,it will end up uploading only 2-3 MB
/// //Is there any better way to implement Uploading in chunks and leasing.
///
string url = blob.Uri.ToString();
if (blob.ServiceClient.Credentials.NeedsTransformUri)
{
url = blob.ServiceClient.Credentials.TransformUri(url);
}
var req = BlobRequest.Put(new Uri(url), 90, new BlobProperties(), BlobType.BlockBlob, leaseId, 0);
using (Stream writer = req.GetRequestStream())
{
writer.Write(buff,0,buff.Length);
}
blob.ServiceClient.Credentials.SignRequest(req);
req.GetResponse().Close();
}
}));
}
// Wait for all threads to complete uploading data.
Task.WaitAll(tasks.ToArray());`
This does not work for multiple chunks..Could you please provide your inputs
I don't think you can. However take a look at BlobRequest class in Microsoft.WindowsAzure.StorageClient.Protocol namespace. It has PutBlock and PutBlockList functions which allows you to specify LeaseId.
Hope this helps.

MonoTouch - WebRequest memory leak and crash?

I've got a MonoTouch app that does an HTTP POST with a 3.5MB file, and it is very unstable on the primary platforms that I test on (iPhone 3G with OS 3.1.2 and iPhone 4 with OS 4.2.1). I'll describe what I'm doing here and maybe someone can tell me if I'm doing something wrong.
In order to rule out the rest of my app, I've whittled this down to a tiny sample app. The app is an iPhone OpenGL Project and it does only this:
At startup, allocate 6MB of memory in 30k chunks. This simulates my app's memory usage.
Read a 3.5MB file into memory.
Create a thread to post the data. (Make a WebRequest object, use GetRequestStream(), and write the 3.5MB data in).
When the main thread detects that the posting thread is done, goto step 2 and repeat.
Also, each frame, I allocate 0-100k to simulate the app doing something. I don't keep any references to this data so it should be getting garbage collected.
iPhone 3G Result: The app gets through 6 to 8 uploads and then the OS kills it. There is no crash log, but there is a LowMemory log showing that the app was jettisoned.
iPhone 4 Result: It gets an Mprotect error around the 11th upload.
A few data points:
Instruments does NOT show the memory increasing as the app continues to upload.
Instruments doesn't show any significant leaks (maybe 1 kilobyte total).
It doesn't matter whether I write the post data in 64k chunks or all at once with one Stream.Write() call.
It doesn't matter whether I wait for a response (HttpWebRequest.HaveResponse) or not before starting the next upload.
It doesn't matter if the POST data is even valid. I've tried using valid POST data and I've tried sending 3MB of zeros.
If the app is not allocating any data each frame, then it takes longer to run out of memory (but as mentioned before, the memory that I'm allocating each frame is not referenced after the frame it was allocated on, so it should be scooped up by the GC).
If nobody has any ideas, I'll file a bug with Novell, but I wanted to see if I'm doing something wrong here first.
If anyone wants the full sample app, I can provide it, but I've pasted the contents of my EAGLView.cs below.
using System;
using System.Net;
using System.Threading;
using System.Collections.Generic;
using System.IO;
using OpenTK.Platform.iPhoneOS;
using MonoTouch.CoreAnimation;
using OpenTK;
using OpenTK.Graphics.ES11;
using MonoTouch.Foundation;
using MonoTouch.ObjCRuntime;
using MonoTouch.OpenGLES;
namespace CrashTest
{
public partial class EAGLView : iPhoneOSGameView
{
[Export("layerClass")]
static Class LayerClass ()
{
return iPhoneOSGameView.GetLayerClass ();
}
[Export("initWithCoder:")]
public EAGLView (NSCoder coder) : base(coder)
{
LayerRetainsBacking = false;
LayerColorFormat = EAGLColorFormat.RGBA8;
ContextRenderingApi = EAGLRenderingAPI.OpenGLES1;
}
protected override void ConfigureLayer (CAEAGLLayer eaglLayer)
{
eaglLayer.Opaque = true;
}
protected override void OnRenderFrame (FrameEventArgs e)
{
SimulateAppAllocations();
UpdatePost();
base.OnRenderFrame (e);
float[] squareVertices = { -0.5f, -0.5f, 0.5f, -0.5f, -0.5f, 0.5f, 0.5f, 0.5f };
byte[] squareColors = { 255, 255, 0, 255, 0, 255, 255, 255, 0, 0,
0, 0, 255, 0, 255, 255 };
MakeCurrent ();
GL.Viewport (0, 0, Size.Width, Size.Height);
GL.MatrixMode (All.Projection);
GL.LoadIdentity ();
GL.Ortho (-1.0f, 1.0f, -1.5f, 1.5f, -1.0f, 1.0f);
GL.MatrixMode (All.Modelview);
GL.Rotate (3.0f, 0.0f, 0.0f, 1.0f);
GL.ClearColor (0.5f, 0.5f, 0.5f, 1.0f);
GL.Clear ((uint)All.ColorBufferBit);
GL.VertexPointer (2, All.Float, 0, squareVertices);
GL.EnableClientState (All.VertexArray);
GL.ColorPointer (4, All.UnsignedByte, 0, squareColors);
GL.EnableClientState (All.ColorArray);
GL.DrawArrays (All.TriangleStrip, 0, 4);
SwapBuffers ();
}
AsyncHttpPost m_Post;
int m_nPosts = 1;
byte[] LoadPostData()
{
// Just return 3MB of zeros. It doesn't matter whether this is valid POST data or not.
return new byte[1024 * 1024 * 3];
}
void UpdatePost()
{
if ( m_Post == null || m_Post.PostStatus != AsyncHttpPostStatus.InProgress )
{
System.Console.WriteLine( string.Format( "Starting post {0}", m_nPosts++ ) );
byte [] postData = LoadPostData();
m_Post = new AsyncHttpPost(
"https://api-video.facebook.com/restserver.php",
"multipart/form-data; boundary=" + "8cdbcdf18ab6640",
postData );
}
}
Random m_Random = new Random(0);
List< byte [] > m_Allocations;
List< byte[] > m_InitialAllocations;
void SimulateAppAllocations()
{
// First time through, allocate a bunch of data that the app would allocate.
if ( m_InitialAllocations == null )
{
m_InitialAllocations = new List<byte[]>();
int nInitialBytes = 6 * 1024 * 1024;
int nBlockSize = 30000;
for ( int nCurBytes = 0; nCurBytes < nInitialBytes; nCurBytes += nBlockSize )
{
m_InitialAllocations.Add( new byte[nBlockSize] );
}
}
m_Allocations = new List<byte[]>();
for ( int i=0; i < 10; i++ )
{
int nAllocationSize = m_Random.Next( 10000 ) + 10;
m_Allocations.Add( new byte[nAllocationSize] );
}
}
}
public enum AsyncHttpPostStatus
{
InProgress,
Success,
Fail
}
public class AsyncHttpPost
{
public AsyncHttpPost( string sURL, string sContentType, byte [] postData )
{
m_PostData = postData;
m_PostStatus = AsyncHttpPostStatus.InProgress;
m_sContentType = sContentType;
m_sURL = sURL;
//UploadThread();
m_UploadThread = new Thread( new ThreadStart( UploadThread ) );
m_UploadThread.Start();
}
void UploadThread()
{
using ( MonoTouch.Foundation.NSAutoreleasePool pool = new MonoTouch.Foundation.NSAutoreleasePool() )
{
try
{
HttpWebRequest request = WebRequest.Create( m_sURL ) as HttpWebRequest;
request.Method = "POST";
request.ContentType = m_sContentType;
request.ContentLength = m_PostData.Length;
// Write the post data.
using ( Stream stream = request.GetRequestStream() )
{
stream.Write( m_PostData, 0, m_PostData.Length );
stream.Close();
}
System.Console.WriteLine( "Finished!" );
// We're done with the data now. Let it be garbage collected.
m_PostData = null;
// Finished!
m_PostStatus = AsyncHttpPostStatus.Success;
}
catch ( System.Exception e )
{
System.Console.WriteLine( "Error in AsyncHttpPost.UploadThread:\n" + e.Message );
m_PostStatus = AsyncHttpPostStatus.Fail;
}
}
}
public AsyncHttpPostStatus PostStatus
{
get
{
return m_PostStatus;
}
}
Thread m_UploadThread;
// Queued to be handled in the main thread.
byte [] m_PostData;
AsyncHttpPostStatus m_PostStatus;
string m_sContentType;
string m_sURL;
}
}
I think you should read in your file 1 KB (or some arbitrary size) at a time and write it to the web request.
Code similar to this:
byte[] buffer = new buffer[1024];
int bytesRead = 0;
using (FileStream fileStream = File.OpenRead("YourFile.txt"))
{
while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) != 0)
{
httpPostStream.Write(buffer, 0, bytesRead);
}
}
This is off the top of my head, but I think it's right.
This way you don't have an extra 3MB floating around in memory when you don't really need to. I think tricks like this are even more important on iDevices (or other devices) than on the desktop.
Test the buffer size too, a larger buffer will get you better speeds up to a point (I remember 8KB being pretty good).

Resources