First requests not sending in the start of the goroutine func - multithreading

i am running goroutines in my code. say, if i set my threads to 50, it will not run the first 49 requests, but it will run the 50th request and continue with the rest. i am not really sure how to describe the issue i am having, and it gives no errors. this has only happened while using fasthttp, and works fine with net/http. could it be an issue with fasthttp? (this is not my whole code, just the area where i think the issues are occurring)
threads := 50
var Lock sync.Mutex
semaphore := make(chan bool, threads)
for len(userArray) != 0 {
semaphore <- true
go func() {
Lock.Lock()
var values []byte
defer func() { <-semaphore }()
fmt.Println(len(userArray))
if len(userArray) == 0 {
return
}
values, _ = json.Marshal(userArray[0])
currentArray := userArray[0]
userArray = userArray[1:]
client := &fasthttp.Client{
Dial: fasthttpproxy.FasthttpHTTPDialerTimeout(proxy, time.Second * 5),
}
time.Sleep(1 * time.Nanosecond)
Lock.Unlock()
this is the output i get (the numbers are the amount of requests left)
200
199
198
197
196
195
194
193
192
191
190
189
188
187
186
185
184
183
182
181
180
179
178
177
176
175
174
173
172
171
170
169
168
167
166
165
164
163
162
161
160
159
158
157
156
155
154
153
152
151
(10 lines of output from req 151)
150
(10 lines of output from req 150)
cont.
sorry if my explanation is confusing, i honestly don't know how to explain this error

I think the problem is with the scoping of the variables. In order to represent the queueing, I'd have a pool of parallel worker threads that all pull from the same channel and then wait for them using a waitgroup.
The exact code might need to be adapted as I don't have a go compiler at hand, but the idea is like this:
threads := 50
queueSize := 100 // trying to add more into the queue will blocke
semaphore := make(chan bool, threads)
jobQueue := make(chan MyItemType, queueSize)
var wg sync.WaitGroup
func processQueue(jobQueue <- chan MyItemType) {
defer wg.Done()
for item := range jobQueue {
values, _ = json.Marshal(item) // doesn't seem to be used?
client := &fasthttp.Client{
Dial: fasthttpproxy.FasthttpHTTPDialerTimeout(proxy, time.Second * 5),
}
}
}
for i := 0; i < threads; i++ {
wg.Add(1)
go processQueue(jobQueue)
}
close(jobQueue)
wg.Wait()
Now you can put items into jobQueue and they will be processed by one of these threads.

Related

Can't explain this Node clustering behavior

I'm learning about threads and how they interact with Node's native cluster module. I saw some behavior I can't explain that I'd like some help understanding.
My code:
process.env.UV_THREADPOOL_SIZE = 1;
const cluster = require('cluster');
if (cluster.isMaster) {
cluster.fork();
} else {
const crypto = require('crypto');
const express = require('express');
const app = express();
app.get('/', (req, res) => {
crypto.pbkdf2('a', 'b', 100000, 512, 'sha512', () => {
res.send('Hi there');
});
});
app.listen(3000);
}
I benchmarked this code with one request using apache benchmark.
ab -c 1 -n 1 localhost:3000/ yielded these connection times
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 605 605 0.0 605 605
Waiting: 605 605 0.0 605 605
Total: 605 605 0.0 605 605
So far so good. I then ran ab -c 2 -n 2 localhost:3000/ (doubling the number of calls from the benchmark). I expected the total time to double since I limited the libuv thread pool to one thread per child process and I only started one child process. But nothing really changed. Here's those results.
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 0
Processing: 608 610 3.2 612 612
Waiting: 607 610 3.2 612 612
Total: 608 610 3.3 612 612
For extra info, when I further increase the number of calls with ab -c 3 -n 3 localhost:3000/, I start to see a slow down.
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 599 814 352.5 922 1221
Waiting: 599 814 352.5 922 1221
Total: 599 815 352.5 922 1221
I'm running all this on a quadcore mac using Node v14.13.1.
tldr: how did my benchmark not use up all my threads? I forked one child process with one thread in its libuv pool - so the one call in my benchmark should have been all it could handle without taking longer. And yet the second test (the one that doubled the amount of calls) took the same amount of time as the benchmark.

What is the idiomatic way to include data in error messages?

I’ve encountered different ways to incorporate variables into error messages in Go. In the following example, which way is the idiomatic one? Is there a better alternative?
Which is safer when things start to break? For example, when there is very little memory left available, the option that allocates fewer bytes would be preferable.
Which is faster, in case we need to generate a lot of errors?
The full runnable code can be seen in the Go Play Space or in the official Go Playground.
func f() error {
return SepError("Sepuled " + strconv.Itoa(n) + " sepulcas " + strconv.Itoa(t) +
" times each")
}
func g() error {
return SepError(strings.Join([]string{
"Sepuled", strconv.Itoa(n), "sepulcas", strconv.Itoa(t), "times each"}, " "))
}
func h() error {
return SepError(fmt.Sprintf("Sepuled %d sepulcas %d times each", n, t))
}
Unless you have very little memory, or are going to be generating a HUGE amount of these errors I wouldn't worry about it. As far as idiomatic Go, I would opt for the h() option because it is easier to read.
The nice thing here is that allocations, memory used, and speed can be tested with some simple benchmarks
func BenchmarkF(b *testing.B) {
for i := 0; i <= b.N; i++ {
f()
}
}
func BenchmarkG(b *testing.B) {
for i := 0; i <= b.N; i++ {
g()
}
}
func BenchmarkH(b *testing.B) {
for i := 0; i <= b.N; i++ {
h()
}
}
Output of `go test -bench . -benchmem
BenchmarkF-8 10000000 169 ns/op 72 B/op 4 allocs/op
BenchmarkG-8 10000000 204 ns/op 120 B/op 5 allocs/op
BenchmarkH-8 5000000 237 ns/op 80 B/op 4 allocs/op
As you can see, f() is the fastest, uses the least memory, and is tied for the fewest allocations. It is also, not worth (in my opinion) the additional cost of readability.

Permission denied while accessing /proc/<pid>/exe

I am having trouble accessing the file in /proc filesystem
My process once started writes in a log file .My process was stopped and when i checked the logfile to see where it encountered the problem and found "permission denied".
it goes to the /proc directory ,fetches PID via getPID() and fires open() using O_RDONLY to read /proc/<pid>/exe
but after firing i get an error "Permission denied".
I did some research and found that kernel enforces some restrictions while accessing certain files in/proc ,but i have 20 process all accessing the same /proc/<pid>/exe ,but only one facing this problem ..
CHAR fn[100];
159 CHAR args[500];
160 CHAR ProgName[50];
161 CHAR *arr[6];
162 CHAR *buf;
163 CHAR ProcessId[10];
164 static int count_try = 0;
165
166
167 memset(fn,0,100);
168 memset(ProcessId,0,10);
169 sprintf (ProcessId,"%d",Pid);
170 strcpy(fn, "/proc/");
171 strcat(fn, ProcessId);
172 //strcat(fn, "/elf_prpsinfo");
173 strcat(fn, "/exe");
174
175 if ((psp = open(fn, O_RDONLY)) == -1)
176 {
177 perror("GetProgName:ps open::");
178 exit(ERROR);
179 }

A thread who is spinning and trying to get the spinlock can't be preempted?

When a thread on Linux is spinning and trying to get the spinlock, Is there no chance this thread can be preempted?
EDIT:
I just want to make sure some thing. On a "UP" system, and there is no interrupt handler will access this spinlock. If the thread who is spinning and trying to get the spinlock can be preempted, I think in this case, the critical section which spinlock protects can call sleep, since the thread holding spinlock can be re-scheduled back to CPU.
No it cannot be preempted: see the code (taken from linux sources) http://lxr.free-electrons.com/source/include/linux/spinlock_api_smp.h?v=2.6.32#L241
241 static inline unsigned long __spin_lock_irqsave(spinlock_t *lock)
242 {
243 unsigned long flags;
244
245 local_irq_save(flags);
246 preempt_disable();
247 spin_acquire(&lock->dep_map, 0, 0, _RET_IP_);
248 /*
249 * On lockdep we dont want the hand-coded irq-enable of
250 * _raw_spin_lock_flags() code, because lockdep assumes
251 * that interrupts are not re-enabled during lock-acquire:
252 */
253 #ifdef CONFIG_LOCKDEP
254 LOCK_CONTENDED(lock, _raw_spin_trylock, _raw_spin_lock);
255 #else
256 _raw_spin_lock_flags(lock, &flags);
257 #endif
258 return flags;
259 }
260
[...]
349 static inline void __spin_unlock(spinlock_t *lock)
350 {
351 spin_release(&lock->dep_map, 1, _RET_IP_);
352 _raw_spin_unlock(lock);
353 preempt_enable();
354 }
see lines 246 and 353
By the way It is generally a bad idea to sleep while holding a lock (spinlock or not)

Why doesn't fill the PTE entry in page fault for vmalloc on ARM Linux Kernel?

When page fault happens on VMALLOC_START~VMALLOC_END, why do_translation_fault does not fill Page table entry and just fill PG, PUD and PMD?
Corresponding source code #do_translation_fault in arch/arm/mm/fault.c:
414 static int __kprobes
415 do_translation_fault(unsigned long addr, unsigned int fsr,
416 struct pt_regs *regs)
417 {
418 unsigned int index;
419 pgd_t *pgd, *pgd_k;
420 pud_t *pud, *pud_k;
421 pmd_t *pmd, *pmd_k;
422
423 if (addr < TASK_SIZE)
424 return do_page_fault(addr, fsr, regs);
425
426 if (user_mode(regs))
427 goto bad_area;
428
429 index = pgd_index(addr);
430
431 /*
432 * FIXME: CP15 C1 is write only on ARMv3 architectures.
433 */
434 pgd = cpu_get_pgd() + index;
435 pgd_k = init_mm.pgd + index;
436
437 if (pgd_none(*pgd_k))
438 goto bad_area;
439 if (!pgd_present(*pgd))
440 set_pgd(pgd, *pgd_k);
441
442 pud = pud_offset(pgd, addr);
443 pud_k = pud_offset(pgd_k, addr);
444
445 if (pud_none(*pud_k))
446 goto bad_area;
447 if (!pud_present(*pud))
448 set_pud(pud, *pud_k);
449
450 pmd = pmd_offset(pud, addr);
451 pmd_k = pmd_offset(pud_k, addr);
452
453 #ifdef CONFIG_ARM_LPAE
454 /*
455 * Only one hardware entry per PMD with LPAE.
456 */
457 index = 0;
458 #else
459 /*
460 * On ARM one Linux PGD entry contains two hardware entries (see page
461 * tables layout in pgtable.h). We normally guarantee that we always
462 * fill both L1 entries. But create_mapping() doesn't follow the rule.
463 * It can create inidividual L1 entries, so here we have to call
464 * pmd_none() check for the entry really corresponded to address, not
465 * for the first of pair.
466 */
467 index = (addr >> SECTION_SHIFT) & 1;
468 #endif
469 if (pmd_none(pmd_k[index]))
470 goto bad_area;
471
472 copy_pmd(pmd, pmd_k);
473 return 0;
474
475 bad_area:
476 do_bad_area(addr, fsr, regs);
477 return 0;
478 }
This range is reserved for kernel memory, allocated using vmalloc.
Kernel memory is normally accessed when IRQs (soft or hard) are disabled, and page faults can't be handled.
The vmalloc function takes care to create the mapping in advance, so there will be no fault on access.
If there was a fault, it's because the access is to memory which wasn't allocated (or was freed), so it can't be handled.

Resources