Azure blob first write - azure

This official example for writing blob blocks has a step where it checks which blocks have not been committed:
fmt.Println("Get uncommitted blocks list...")
list, err := b.GetBlockList(storage.BlockListTypeUncommitted, nil)
if err != nil {
return fmt.Errorf("get block list failed: %v", err)
}
uncommittedBlocksList := make([]storage.Block, len(list.UncommittedBlocks))
for i := range list.UncommittedBlocks {
uncommittedBlocksList[i].ID = list.UncommittedBlocks[i].Name
uncommittedBlocksList[i].Status = storage.BlockStatusUncommitted
}
If I'm creating a blob (with multiple blocks) that definitely doesn't yet exist. Is there any problem with skipping that code?
The code would be something like:
b := cnt.GetBlobReference(blockBlobName)
err := b.CreateBlockBlob(nil)
blockID := "00000"
data := randomData(1984)
err = b.PutBlock(blockID, data, nil)
blockID2 := "00001"
data2 := randomData(6542)
err = b.PutBlock(blockID2, data2, nil)
var uncommittedBlocksList []storage.Block
uncommittedBlocksList = append(uncommittedBlocksList,
Block{
ID:"00000"
Status:BlockStatusUncommitted,
},
Block{
ID:"00001"
Status:BlockStatusUncommitted,
},
)
err = b.PutBlockList(uncommittedBlocksList, nil)

If I'm creating a blob (with multiple blocks) that definitely doesn't
yet exist. Is there any problem with skipping that code?
Absolutely not. You can certainly skip the code for fetching uncommitted block list. This scenario for fetching uncommitted list is useful when you tried to upload a blob and it failed in between and you want to resume the upload from the last failed block. By skipping this code, you are essentially telling Azure Storage to discard any other uncommitted blocks and use the blocks specified in the block list to create the blob.

Related

Is it possible to do append blob restore using multiple threads?

Which version of the SDK was used?
v0.11.0
Which platform are you using? (ex: Windows, Linux, Debian)
Windows
What problem was encountered?
[Approach]
Acquire lease before goroutine started
Calling AppendBlock(ctx, bytes.NewReader(rangeData), azblob.AppendBlobAccessConditions{}, nil)
concurrently inside go routine.
We are using "azblob.AppendPositionAccessConditions{IfAppendPositionEqual: subRangeSize}}" in
AppendBlock call.
It is working well without threads but fails when using goroutine
===== RESPONSE ERROR (ServiceCode=AppendPositionConditionNotMet) =====
Description=The append position condition specified was not met.
FourMegaByteAsBytes := common.FourMegaByteAsBytes
var strLeaseID string = ""
var respAcquireLease *azblob.BlobAcquireLeaseResponse
subRangeSize := int64(0)
//Restore data to Append Blob
for currpos := int64(0); currpos < SourceBlobLength; {
subRangeSize = int64(math.Min(float64(SourceBlobLength-currpos), float64(FourMegaByteAsBytes)))
rangeData := make([]byte, subRangeSize)
if len(strLeaseID) == 0 {
//Acquire the Lease for Restore Blob
respAcquireLease, err = blobURL.AcquireLease(ctx, "", -1, azblob.ModifiedAccessConditions{})
if err != nil {
_, err = blobURL.AppendBlock(ctx, bytes.NewReader(rangeData),
azblob.AppendBlobAccessConditions{}, nil)
} else {
strLeaseID = respAcquireLease.LeaseID()
_, err1 := blobURL.AppendBlock(ctx, bytes.NewReader(rangeData),
azblob.AppendBlobAccessConditions{
azblob.ModifiedAccessConditions{},
azblob.LeaseAccessConditions{LeaseID: strLeaseID},
azblob.AppendPositionAccessConditions{},
}, nil)
if err1 != nil {
log.Fatal(err1)
return
}
}
} else {
_, err = blobURL.AppendBlock(ctx, bytes.NewReader(rangeData),
azblob.AppendBlobAccessConditions{
azblob.ModifiedAccessConditions{},
azblob.LeaseAccessConditions{LeaseID: strLeaseID},
azblob.AppendPositionAccessConditions{}}, nil)
}
currpos += subRangeSize
}
Have you found a mitigation/solution?
No
Appending to a blob requires that you have a lease. Therefore, only the client (aka thread) that has the lease can write to the blob.
So the answer to your question is No, it is not possible to do it at the same time.
There are 2 possible work arounds:
If all your threads write to a queue. Then a single process reads from the queue an writes to the blob.
Program such that the tread waits for the lease to be available. Note the minimum duration of a lease is 15 seconds.

using Go linter with security issue

we use the following lib
import "crypto/sha1"
while running golangci-lint we got the following errors :
G505: Blocklisted import crypto/sha1: weak cryptographic primitive (gosec) for "crypto/sha1"
G401: Use of weak cryptographic primitive (gosec)
sha := sha1.New()
Is there is something that I can do without excluding them? not sure that I understand those issues. if it was not related to security it's simple tasks to exclude ...
update
what we are doing is
fdrContent, err := ioutil.ReadFile(filepath.Join(path))
// gets the hashcode of the FDR file
h := sha1.New()
code, err := h.Write(fdrContent)
return code, err
I use h.Write in my own gtarsum project as in here:
h := sha256.New()
for {
buf := make([]byte, 1024*1024)
bytesRead, err := tr.Read(buf)
if err != nil {
if err != io.EOF {
panic(err)
}
}
if bytesRead > 0 {
_, err := h.Write(buf[:bytesRead])
All you have to do, if there is no obvious performance issue, is to switch to sha256.
No more warning.
The issue comes sha1 collision, that I have documented here, from the shattered.io project.

While downloading file from Azure Blob Storage using Golang getting " curl Empty reply from server" , but file is downloaded in background

I am trying to download a file from Azure Blob Storage using http request. I am able to download the file but on a terminal curl returns "Empty reply from server". I tried to increase the timeout, but it didn't fix it. I referred other questions related to this response from curl, but it didn't help. For small files this code is working flawlessly but for big files say 75 MB it is not working.
containerURL := azblob.NewContainerURL(*URL, pipeline)
blobURL := containerURL.NewBlockBlobURL(splitArray[1])
ctx := context.Background()
downloadResponse, err := blobURL.Download(ctx, 0, azblob.CountToEnd, azblob.BlobAccessConditions{}, false)
if err != nil {
.
.
.
}
bodyStream := downloadResponse.Body(azblob.RetryReaderOptions{MaxRetryRequests: 20})
// read the body into a buffer
downloadedData := bytes.Buffer{}
_, err = downloadedData.ReadFrom(bodyStream)
file, err := os.OpenFile(
"/tmp/"+fileName,
os.O_RDWR|os.O_TRUNC|os.O_CREATE,
0777,
)
file.Write(downloadedData.Bytes())
file.Close()
filePath := "/tmp/" + fileName
file, err = os.Open(filePath)
return middleware.ResponderFunc(func(w http.ResponseWriter, r runtime.Producer) {
fn := filepath.Base(filePath)
w.Header().Set(CONTENTTYPE, "application/octet-stream")
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=%q", fn))
io.Copy(w, file)
err := defer os.Remove(filePath)
file.Close()
})
I am thinking of implementing the above logic using goroutines. Is there even a need of using goroutines?
Any constructive feedback will be helpful.
After analyzing packets from wireshark got to know it was getting disconnected from my side due to timeout as I am using go-swagger , I increased the timeout , in configure.go . GoSwagger provides in-built function for handling these scenarios like TLS , Timeout. Below is code for reference.
// As soon as server is initialized but not run yet, this function will be called.
// If you need to modify a config, store server instance to stop it individually later, this is the place.
// This function can be called multiple times, depending on the number of serving schemes.
// scheme value will be set accordingly: "http", "https" or "unix"
func configureServer(s *http.Server, scheme, addr string) {
s.WriteTimeout(time.Minute * 5)
}

Golang processing images via multipart and streaming to Azure

In the process of learning golang, I'm trying to write a web app with multiple image upload functionality.
I'm using Azure Blob Storage to store images, but I am having trouble streaming the images from the multipart request to Blob Storage.
Here's the handler I've written so far:
func (imgc *ImageController) UploadInstanceImageHandler(w http.ResponseWriter, r *http.Request, p httprouter.Params) {
reader, err := r.MultipartReader()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
for {
part, partErr := reader.NextPart()
// No more parts to process
if partErr == io.EOF {
break
}
// if part.FileName() is empty, skip this iteration.
if part.FileName() == "" {
continue
}
// Check file type
if part.Header["Content-Type"][0] != "image/jpeg" {
fmt.Printf("\nNot image/jpeg!")
break
}
var read uint64
fileName := uuid.NewV4().String() + ".jpg"
buffer := make([]byte, 100000000)
// Get Size
for {
cBytes, err := part.Read(buffer)
if err == io.EOF {
fmt.Printf("\nLast buffer read!")
break
}
read = read + uint64(cBytes)
}
stream := bytes.NewReader(buffer[0:read])
err = imgc.blobClient.CreateBlockBlobFromReader(imgc.imageContainer, fileName, read, stream, nil)
if err != nil {
fmt.Println(err)
break
}
}
w.WriteHeader(http.StatusOK)
}
In the process of my research, I've read through using r.FormFile, ParseMultipartForm, but decided on trying to learn how to use MultiPartReader.
I was able to upload an image to the golang backend and save the file to my machine using MultiPartReader.
At the moment, I'm able to upload files to Azure but they end up being corrupted. The file sizes seem on point but clearly something is not working.
Am I misunderstanding how to create a io.Reader for CreateBlockBlobFromReader?
Any help is much appreciated!
As #Mark said, you can use ioutil.ReadAll to read the content into a byte array, the code like below.
import (
"bytes"
"io/ioutil"
)
partBytes, _ := ioutil.ReadAll(part)
size := uint64(len(partBytes))
blob := bytes.NewReader(partBytes)
err := blobClient.CreateBlockBlobFromReader(container, fileName, size, blob, nil)
According to the godoc for CreateBlockBlobFromReader, as below.
The API rejects requests with size > 64 MiB (but this limit is not checked by the SDK). To write a larger blob, use CreateBlockBlob, PutBlock, and PutBlockList.
So if the size is larger than 64MB, the code shoule be like below.
import "encoding/base64"
const BLOB_LENGTH_LIMITS uint64 = 64 * 1024 * 1024
partBytes, _ := ioutil.ReadAll(part)
size := uint64(len(partBytes))
if size <= BLOB_LENGTH_LIMITS {
blob := bytes.NewReader(partBytes)
err := blobClient.CreateBlockBlobFromReader(container, fileName, size, blob, nil)
} else {
// Create an empty blob
blobClient.CreateBlockBlob(container, fileName)
// Create a block list, and upload each block
length := size / BLOB_LENGTH_LIMITS
if length%limits != 0 {
length = length + 1
}
blocks := make([]Block, length)
for i := uint64(0); i < length; i++ {
start := i * BLOB_LENGTH_LIMITS
end := (i+1) * BLOB_LENGTH_LIMITS
if end > size {
end = size
}
chunk := partBytes[start: end]
blockId := base64.StdEncoding.EncodeToString(chunk)
block := Block{blockId, storage.BlockStatusCommitted}
blocks[i] = block
err = blobClient.PutBlock(container, fileName, blockID, chunk)
if err != nil {
.......
}
}
err = blobClient.PutBlockList(container, fileName, blocks)
if err != nil {
.......
}
}
Hope it helps.
A Reader can return both an io.EOF and a valid final bytes read, it looks like the final bytes (cBytes) is not added to read total bytes. Also, careful: if an error is returned by part.Read(buffer) other than io.EOF, the read loop might not exit. Consider ioutil.ReadAll instead.
CreateBlockBlobFromReader takes a Reader, and part is a Reader, so you may be able to pass the part in directly.
You may also want to consider Azure block size limits might be smaller than the image, see Asure blobs.

Reading from a Reader multiple times

I'm building a simple caching proxy that intercepts HTTP requests, grabs the content in response.Body, then writes it back to the client. The problem is, as soon as I read from response.Body, the write back to the client contains an empty body (everything else, like the headers, are written as expected).
Here's the current code:
func requestHandler(w http.ResponseWriter, r *http.Request) {
client := &http.Client{}
r.RequestURI = ""
response, err := client.Do(r)
defer response.Body.Close()
if err != nil {
log.Fatal(err)
}
content, _ := ioutil.ReadAll(response.Body)
cachePage(response.Request.URL.String(), content)
response.Write(w)
}
If I remove the content, _ and cachePage lines, it works fine. With the lines included, requests return and empty body. Any idea how I can get just the Body of the http.Response and still write out the response in full to the http.ResponseWriter?
As in my comment you could implement the io.ReadCloser
As per Dewy Broto (Thanks) you can do this much simpler with:
content, _ := ioutil.ReadAll(response.Body)
response.Body = ioutil.NopCloser(bytes.NewReader(content))
response.Write(w)
As you have discovered, you can only read once from a request's Body.
Go has a reverse proxy that will facilitate what you are trying to do. Check out httputil.ReverseProxy and httputil.DumpResponse
You do not need to read from the response a second time. You already have the data in hand and can write it directly to the response writer.
The call
response.Write(w)
writes the response in wire format to the server's response body. This is not what you want for a proxy. You need to copy the headers, status and body to the server response individually.
I have noted other issues in the code comments below.
I recommend using the standard library's ReverseProxy or copying it and modifying it to meet your needs.
func requestHandler(w http.ResponseWriter, r *http.Request) {
// No need to make a client, use the default
// client := &http.Client{}
r.RequestURI = ""
response, err := http.DefaultClient.Do(r)
// response can be nil, close after error check
// defer response.Body.Close()
if err != nil {
log.Fatal(err)
}
defer response.Body.Close()
// Check errors! Always.
// content, _ := ioutil.ReadAll(response.Body)
content, err := ioutil.ReadAll(response.Body)
if err != nil {
// handle error
}
cachePage(response.Request.URL.String(), content)
// The Write method writes the response in wire format to w.
// Because the server handles the wire format, you need to do
// copy the individual pieces.
// response.Write(w)
// Copy headers
for k, v := range response.Header {
w.Header()[k] = v
}
// Copy status code
w.WriteHeader(response.StatusCode)
// Write the response body.
w.Write(content)
}

Resources