Is it possible to do append blob restore using multiple threads?

Is it possible to do append blob restore using multiple threads? - azure

Which version of the SDK was used?
v0.11.0
Which platform are you using? (ex: Windows, Linux, Debian)
Windows
What problem was encountered?
[Approach]
Acquire lease before goroutine started
Calling AppendBlock(ctx, bytes.NewReader(rangeData), azblob.AppendBlobAccessConditions{}, nil)
concurrently inside go routine.
We are using "azblob.AppendPositionAccessConditions{IfAppendPositionEqual: subRangeSize}}" in
AppendBlock call.
It is working well without threads but fails when using goroutine
===== RESPONSE ERROR (ServiceCode=AppendPositionConditionNotMet) =====
Description=The append position condition specified was not met.
FourMegaByteAsBytes := common.FourMegaByteAsBytes
var strLeaseID string = ""
var respAcquireLease *azblob.BlobAcquireLeaseResponse
subRangeSize := int64(0)
//Restore data to Append Blob
for currpos := int64(0); currpos < SourceBlobLength; {
subRangeSize = int64(math.Min(float64(SourceBlobLength-currpos), float64(FourMegaByteAsBytes)))
rangeData := make([]byte, subRangeSize)
if len(strLeaseID) == 0 {
//Acquire the Lease for Restore Blob
respAcquireLease, err = blobURL.AcquireLease(ctx, "", -1, azblob.ModifiedAccessConditions{})
if err != nil {
_, err = blobURL.AppendBlock(ctx, bytes.NewReader(rangeData),
azblob.AppendBlobAccessConditions{}, nil)
} else {
strLeaseID = respAcquireLease.LeaseID()
_, err1 := blobURL.AppendBlock(ctx, bytes.NewReader(rangeData),
azblob.AppendBlobAccessConditions{
azblob.ModifiedAccessConditions{},
azblob.LeaseAccessConditions{LeaseID: strLeaseID},
azblob.AppendPositionAccessConditions{},
}, nil)
if err1 != nil {
log.Fatal(err1)
return
}
}
} else {
_, err = blobURL.AppendBlock(ctx, bytes.NewReader(rangeData),
azblob.AppendBlobAccessConditions{
azblob.ModifiedAccessConditions{},
azblob.LeaseAccessConditions{LeaseID: strLeaseID},
azblob.AppendPositionAccessConditions{}}, nil)
}
currpos += subRangeSize
}
Have you found a mitigation/solution?
No

Appending to a blob requires that you have a lease. Therefore, only the client (aka thread) that has the lease can write to the blob.
So the answer to your question is No, it is not possible to do it at the same time.
There are 2 possible work arounds:
If all your threads write to a queue. Then a single process reads from the queue an writes to the blob.
Program such that the tread waits for the lease to be available. Note the minimum duration of a lease is 15 seconds.

Related

How to make work concurrent while sending data on a stream in golang?

I have a golang grpc server which has streaming endpoint. Earlier I was doing all the work sequentially and sending on the stream but then I realize I can make the work concurrent and then send on stream. From grpc-go docs: I understood that I can make the work concurrent, but you can't make sending on the stream concurrent so I got below code which does the job.
Below is the code I have in my streaming endpoint which sends data back to client in a streaming way. This does all the work concurrently.
// get "allCids" from lot of files and load in memory.
allCids := .....
var data = allCids.([]int64)
out := make(chan *custPbV1.CustomerResponse, len(data))
wg := &sync.WaitGroup{}
wg.Add(len(data))
go func() {
wg.Wait()
close(out)
}()
for _, cid := range data {
go func (id int64) {
defer wg.Done()
pd := repo.GetCustomerData(strconv.FormatInt(cid, 10))
if !pd.IsCorrect {
return
}
resources := us.helperCom.GenerateResourceString(pd)
val, err := us.GenerateInfo(clientId, resources, cfg)
if err != nil {
return
}
out <- val
}(cid)
}
for val := range out {
if err := stream.Send(val); err != nil {
log.Printf("send error %v", err)
}
}
Now problem I have is size of data slice can be approx a million so I don't want to spawn million go routine doing the job. How do I handle that scenario here? If instead of len(data) I use 100 then will that work for me or I need to slice data as well in 100 sub arrays? I am just confuse on what is the best way to deal with this problem?
I recently started with golang so pardon me if there are any mistakes in my above code while making it concurrent.

Please check this pseudo code
func main() {
works := make(chan int, 100)
errChan := make(chan error, 100)
out := make(chan *custPbV1.CustomerResponse, 100)
// spawn fixed workers
var workerWg sync.WaitGroup
for i := 0; i < 100; i++ {
workerWg.Add(1)
go worker(&workerWg, works, errChan, out)
}
// give input
go func() {
for _, cid := range data {
// this will be blocked if all the workers are busy and no space is left in the channel.
works <- cid
}
close(works)
}()
var analyzeResults sync.WaitGroup
analyzeResults.Add(2)
// process errors
go func() {
for err := range errChan {
log.Printf("error %v", err)
}
analyzeResults.Done()
}()
// process outout
go func() {
for val := range out {
if err := stream.Send(val); err != nil {
log.Printf("send error %v", err)
}
}
analyzeResults.Done()
}()
workerWg.Wait()
close(out)
close(errChan)
analyzeResults.Wait()
}
func worker(job *sync.WaitGroup, works chan int, errChan chan error, out chan *custPbV1.CustomerResponse) {
defer job.Done()
// Idle worker takes the work from this channel.
for cid := range works {
pd := repo.GetCustomerData(strconv.FormatInt(cid, 10))
if !pd.IsCorrect {
errChan <- errors.New(fmt.Sprintf("pd %d is incorrect", pd))
// we can not return here as the total number of workers will be reduced. If all the workers does this then there is a chance that no workers are there to do the job
continue
}
resources := us.helperCom.GenerateResourceString(pd)
val, err := us.GenerateInfo(clientId, resources, cfg)
if err != nil {
errChan <- errors.New(fmt.Sprintf("got error", err))
continue
}
out <- val
}
}
Explanation:
This is a worker pool implementation where we spawn a fixed number of goroutines(100 workers here) to do the same job(GetCustomerData() & GenerateInfo() here) but with different input data(cid here). 100 workers here does not mean that it is parallel but concurrent(depends on the GOMAXPROCS). If one worker is waiting for io result(basically some blocking operation)then that particular goroutine will be context switched and other worker goroutine gets a chance to execute. But increasing goroutuines (workers) may not give much performance but can leads to contention on the channel as more workers are waiting for the input job on that channel.
The benefit over splitting the 1 million data to subslice is that. Lets say we have 1000 jobs and 100 workers. each worker will get assigned to the jobs 1-10, 11-20 etc... What if the first 10 jobs is taking more time than others. In that case the first worker is overloaded and the other workers will finish the tasks and will be idle even though there are pending tasks. So to avoid this situation, this is the best solution as the idle worker will take the next job. So that no worker is more overloaded compared to the other workers

While downloading file from Azure Blob Storage using Golang getting " curl Empty reply from server" , but file is downloaded in background

I am trying to download a file from Azure Blob Storage using http request. I am able to download the file but on a terminal curl returns "Empty reply from server". I tried to increase the timeout, but it didn't fix it. I referred other questions related to this response from curl, but it didn't help. For small files this code is working flawlessly but for big files say 75 MB it is not working.
containerURL := azblob.NewContainerURL(*URL, pipeline)
blobURL := containerURL.NewBlockBlobURL(splitArray[1])
ctx := context.Background()
downloadResponse, err := blobURL.Download(ctx, 0, azblob.CountToEnd, azblob.BlobAccessConditions{}, false)
if err != nil {
.
.
.
}
bodyStream := downloadResponse.Body(azblob.RetryReaderOptions{MaxRetryRequests: 20})
// read the body into a buffer
downloadedData := bytes.Buffer{}
_, err = downloadedData.ReadFrom(bodyStream)
file, err := os.OpenFile(
"/tmp/"+fileName,
os.O_RDWR|os.O_TRUNC|os.O_CREATE,
0777,
)
file.Write(downloadedData.Bytes())
file.Close()
filePath := "/tmp/" + fileName
file, err = os.Open(filePath)
return middleware.ResponderFunc(func(w http.ResponseWriter, r runtime.Producer) {
fn := filepath.Base(filePath)
w.Header().Set(CONTENTTYPE, "application/octet-stream")
w.Header().Set("Content-Disposition", fmt.Sprintf("attachment; filename=%q", fn))
io.Copy(w, file)
err := defer os.Remove(filePath)
file.Close()
})
I am thinking of implementing the above logic using goroutines. Is there even a need of using goroutines?
Any constructive feedback will be helpful.

After analyzing packets from wireshark got to know it was getting disconnected from my side due to timeout as I am using go-swagger , I increased the timeout , in configure.go . GoSwagger provides in-built function for handling these scenarios like TLS , Timeout. Below is code for reference.
// As soon as server is initialized but not run yet, this function will be called.
// If you need to modify a config, store server instance to stop it individually later, this is the place.
// This function can be called multiple times, depending on the number of serving schemes.
// scheme value will be set accordingly: "http", "https" or "unix"
func configureServer(s *http.Server, scheme, addr string) {
s.WriteTimeout(time.Minute * 5)
}

Azure blob first write

This official example for writing blob blocks has a step where it checks which blocks have not been committed:
fmt.Println("Get uncommitted blocks list...")
list, err := b.GetBlockList(storage.BlockListTypeUncommitted, nil)
if err != nil {
return fmt.Errorf("get block list failed: %v", err)
}
uncommittedBlocksList := make([]storage.Block, len(list.UncommittedBlocks))
for i := range list.UncommittedBlocks {
uncommittedBlocksList[i].ID = list.UncommittedBlocks[i].Name
uncommittedBlocksList[i].Status = storage.BlockStatusUncommitted
}
If I'm creating a blob (with multiple blocks) that definitely doesn't yet exist. Is there any problem with skipping that code?
The code would be something like:
b := cnt.GetBlobReference(blockBlobName)
err := b.CreateBlockBlob(nil)
blockID := "00000"
data := randomData(1984)
err = b.PutBlock(blockID, data, nil)
blockID2 := "00001"
data2 := randomData(6542)
err = b.PutBlock(blockID2, data2, nil)
var uncommittedBlocksList []storage.Block
uncommittedBlocksList = append(uncommittedBlocksList,
Block{
ID:"00000"
Status:BlockStatusUncommitted,
},
Block{
ID:"00001"
Status:BlockStatusUncommitted,
},
)
err = b.PutBlockList(uncommittedBlocksList, nil)

If I'm creating a blob (with multiple blocks) that definitely doesn't
yet exist. Is there any problem with skipping that code?
Absolutely not. You can certainly skip the code for fetching uncommitted block list. This scenario for fetching uncommitted list is useful when you tried to upload a blob and it failed in between and you want to resume the upload from the last failed block. By skipping this code, you are essentially telling Azure Storage to discard any other uncommitted blocks and use the blocks specified in the block list to create the blob.

Read media keys from Go program

I am writing a media cross-platform distributed media player for use on my own network.
The current version has three/four parts:
A NAS holding the audio files.
A metadata server holding information about the files.
A HTML/JS client that allows manipulation of the metadata server and queuing media for the:
A player deamon.
My problem lies with part 4. The player has no UI, nor does it need one. It will be controlled via network commands from the client and by listening to the media keys on its current host.
The player daemon needs to work on both Windows and Linux, but I can't seem to figure out a way (any way) to read these keys on either OS. Most of the way I know to read the keyboard will not read these keys at all.

With the help of several commenters, I now have it all figured out.
The Linux version is as follows:
package main
import (
“bytes”
“encoding/binary”
“fmt”
“os”
“os/exec”
“syscall”
)
// parses through the /proc/bus/input/devices file for keyboard devices.
// Copied from `github.com/gearmover/keylogger` with trivial modification.
func dumpDevices() ([]string, error) {
cmd := exec.Command(“/bin/sh”, “-c”, “/bin/grep -E ‘Handlers|EV=’ /proc/bus/input/devices | /bin/grep -B1 ‘EV=120013’ | /bin/grep -Eo ‘event[0-9]+’”)
output, err := cmd.Output()
if err != nil {
return nil, err
}
buf := bytes.NewBuffer(output)
var devices []string
for line, err := buf.ReadString(‘\n’); err == nil; {
devices = append(devices, “/dev/input/”+line[:len(line)-1])
line, err = buf.ReadString(‘\n’)
}
return devices, nil
}
// Using MS names, just because I don’t feel like looking up the Linux versions.
var keys = map[uint16]string{
0xa3: “VK_MEDIA_NEXT_TRACK”,
0xa5: “VK_MEDIA_PREV_TRACK”,
0xa6: “VK_MEDIA_STOP”,
0xa4: “VK_MEDIA_PLAY_PAUSE”,
}
// Most of the code here comes from `github.com/gearmover/keylogger`.
func main() {
// drop privileges when executing other programs
syscall.Setgid(65534)
syscall.Setuid(65534)
// dump our keyboard devices from /proc/bus/input/devices
devices, err := dumpDevices()
if err != nil {
fmt.Println(err)
}
if len(devices) == 0 {
fmt.Println(“No input devices found”)
return
}
// bring back our root privs
syscall.Setgid(0)
syscall.Setuid(0)
// Open the first keyboard device.
input, err := os.OpenFile(devices[0], os.O_RDONLY, 0600)
if err != nil {
fmt.Println(err)
return
}
defer input.Close()
// Log media keys
var buffer = make([]byte, 24)
for {
// read the input events as they come in
n, err := input.Read(buffer)
if err != nil {
return
}
if n != 24 {
fmt.Println(“Weird Input Event Size: “, n)
continue
}
// parse the input event according to the <linux/input.h> header struct
binary.LittleEndian.Uint64(buffer[0:8]) // Time stamp stuff I could care less about
binary.LittleEndian.Uint64(buffer[8:16])
etype := binary.LittleEndian.Uint16(buffer[16:18]) // Event Type. Always 1 for keyboard events
code := binary.LittleEndian.Uint16(buffer[18:20]) // Key scan code
value := int32(binary.LittleEndian.Uint32(buffer[20:24])) // press(1), release(0), or repeat(2)
if etype == 1 && value == 1 && keys[code] != “” {
// In a real application I would send a message here.
fmt.Println(keys[code])
}
}
}
And the Windows version:
package main
import (
“fmt”
“syscall”
“time”
)
var user32 = syscall.NewLazyDLL(“user32.dll”)
var procGAKS = user32.NewProc(“GetAsyncKeyState”)
// Key codes from MSDN
var keys = [4]uint{
0xb0, // VK_MEDIA_NEXT_TRACK
0xb1, // VK_MEDIA_PREV_TRACK
0xb2, // VK_MEDIA_STOP
0xb3, // VK_MEDIA_PLAY_PAUSE
}
var names = [4]string{
“VK_MEDIA_NEXT_TRACK”,
“VK_MEDIA_PREV_TRACK”,
“VK_MEDIA_STOP”,
“VK_MEDIA_PLAY_PAUSE”,
}
func main() {
fmt.Println(“Running…”)
// Since I don’t want to trigger dozens of times for each key I need to track state.
// I could check the bits of GAKS’ return value, but that is not reliable.
down := [4]bool{false, false, false, false}
for {
time.Sleep(1 * time.Millisecond)
for i, key := range keys {
// val is not a simple boolean!
// 0 means “not pressed” (also certain errors)
// If LSB is set the key was just pressed (this may not be reliable)
// If MSB is set the key is currently down.
val, _, _ := procGAKS.Call(uintptr(key))
// Turn a press into a transition and track key state.
goingdown := false
if int(val) != 0 && !down[i] {
goingdown = true
down[i] = true
}
if int(val) == 0 && down[i] {
down[i] = false
}
if goingdown {
// In a real application I would send a message here.
fmt.Println(names[i])
}
}
}
}
The only "issue" is that the Linux version must be run as root. For me this is not a problem. If running as root is a problem I think there is a way that involves X11...

Golang processing images via multipart and streaming to Azure

In the process of learning golang, I'm trying to write a web app with multiple image upload functionality.
I'm using Azure Blob Storage to store images, but I am having trouble streaming the images from the multipart request to Blob Storage.
Here's the handler I've written so far:
func (imgc *ImageController) UploadInstanceImageHandler(w http.ResponseWriter, r *http.Request, p httprouter.Params) {
reader, err := r.MultipartReader()
if err != nil {
http.Error(w, err.Error(), http.StatusInternalServerError)
return
}
for {
part, partErr := reader.NextPart()
// No more parts to process
if partErr == io.EOF {
break
}
// if part.FileName() is empty, skip this iteration.
if part.FileName() == "" {
continue
}
// Check file type
if part.Header["Content-Type"][0] != "image/jpeg" {
fmt.Printf("\nNot image/jpeg!")
break
}
var read uint64
fileName := uuid.NewV4().String() + ".jpg"
buffer := make([]byte, 100000000)
// Get Size
for {
cBytes, err := part.Read(buffer)
if err == io.EOF {
fmt.Printf("\nLast buffer read!")
break
}
read = read + uint64(cBytes)
}
stream := bytes.NewReader(buffer[0:read])
err = imgc.blobClient.CreateBlockBlobFromReader(imgc.imageContainer, fileName, read, stream, nil)
if err != nil {
fmt.Println(err)
break
}
}
w.WriteHeader(http.StatusOK)
}
In the process of my research, I've read through using r.FormFile, ParseMultipartForm, but decided on trying to learn how to use MultiPartReader.
I was able to upload an image to the golang backend and save the file to my machine using MultiPartReader.
At the moment, I'm able to upload files to Azure but they end up being corrupted. The file sizes seem on point but clearly something is not working.
Am I misunderstanding how to create a io.Reader for CreateBlockBlobFromReader?
Any help is much appreciated!

As #Mark said, you can use ioutil.ReadAll to read the content into a byte array, the code like below.
import (
"bytes"
"io/ioutil"
)
partBytes, _ := ioutil.ReadAll(part)
size := uint64(len(partBytes))
blob := bytes.NewReader(partBytes)
err := blobClient.CreateBlockBlobFromReader(container, fileName, size, blob, nil)
According to the godoc for CreateBlockBlobFromReader, as below.
The API rejects requests with size > 64 MiB (but this limit is not checked by the SDK). To write a larger blob, use CreateBlockBlob, PutBlock, and PutBlockList.
So if the size is larger than 64MB, the code shoule be like below.
import "encoding/base64"
const BLOB_LENGTH_LIMITS uint64 = 64 * 1024 * 1024
partBytes, _ := ioutil.ReadAll(part)
size := uint64(len(partBytes))
if size <= BLOB_LENGTH_LIMITS {
blob := bytes.NewReader(partBytes)
err := blobClient.CreateBlockBlobFromReader(container, fileName, size, blob, nil)
} else {
// Create an empty blob
blobClient.CreateBlockBlob(container, fileName)
// Create a block list, and upload each block
length := size / BLOB_LENGTH_LIMITS
if length%limits != 0 {
length = length + 1
}
blocks := make([]Block, length)
for i := uint64(0); i < length; i++ {
start := i * BLOB_LENGTH_LIMITS
end := (i+1) * BLOB_LENGTH_LIMITS
if end > size {
end = size
}
chunk := partBytes[start: end]
blockId := base64.StdEncoding.EncodeToString(chunk)
block := Block{blockId, storage.BlockStatusCommitted}
blocks[i] = block
err = blobClient.PutBlock(container, fileName, blockID, chunk)
if err != nil {
.......
}
}
err = blobClient.PutBlockList(container, fileName, blocks)
if err != nil {
.......
}
}
Hope it helps.

A Reader can return both an io.EOF and a valid final bytes read, it looks like the final bytes (cBytes) is not added to read total bytes. Also, careful: if an error is returned by part.Read(buffer) other than io.EOF, the read loop might not exit. Consider ioutil.ReadAll instead.
CreateBlockBlobFromReader takes a Reader, and part is a Reader, so you may be able to pass the part in directly.
You may also want to consider Azure block size limits might be smaller than the image, see Asure blobs.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Is it possible to do append blob restore using multiple threads? - azure

Related

How to make work concurrent while sending data on a stream in golang?

While downloading file from Azure Blob Storage using Golang getting " curl Empty reply from server" , but file is downloaded in background

Azure blob first write

Read media keys from Go program

Golang processing images via multipart and streaming to Azure

Categories

Resources