Multithreaded program segfaults with OpenSSL and OpenMP - multithreading

I am using OpenSSL in a multithreaded program in C and having issues. So I wrote a small program to try to narrow down what the problem is. The functions besides the main function were copy pasted from https://github.com/plenluno/openssl/blob/master/openssl/crypto/threads/mttest.c
My program is as follows.
#include<stdio.h>
#include<stdlib.h>
#include<stdarg.h>
#include <strings.h>
#include <string.h>
#include <math.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include<omp.h>
#include <openssl/bn.h>
#include <openssl/dh.h>
#include <openssl/crypto.h>
#include <pthread.h>
#include <openssl/lhash.h>
#include <openssl/buffer.h>
#include <openssl/x509.h>
#include <openssl/ssl.h>
#include <openssl/err.h>
static pthread_mutex_t *lock_cs;
static long *lock_count;
void pthreads_locking_callback(int mode, int type, char *file,
int line)
{
#if 0
fprintf(stderr,"thread=%4d mode=%s lock=%s %s:%d\n",
CRYPTO_thread_id(),
(mode&CRYPTO_LOCK)?"l":"u",
(type&CRYPTO_READ)?"r":"w",file,line);
#endif
#if 0
if (CRYPTO_LOCK_SSL_CERT == type)
fprintf(stderr,"(t,m,f,l) %ld %d %s %d\n",
CRYPTO_thread_id(),
mode,file,line);
#endif
if (mode & CRYPTO_LOCK)
{
pthread_mutex_lock(&(lock_cs[type]));
lock_count[type]++;
}
else
{
pthread_mutex_unlock(&(lock_cs[type]));
}
}
unsigned long pthreads_thread_id(void)
{
unsigned long ret;
ret=(unsigned long)pthread_self();
return(ret);
}
void CRYPTO_thread_setup(void)
{
int i;
lock_cs=OPENSSL_malloc(CRYPTO_num_locks() *
sizeof(pthread_mutex_t));
lock_count=OPENSSL_malloc(CRYPTO_num_locks() * sizeof(long));
for (i=0; i<CRYPTO_num_locks(); i++)
{
lock_count[i]=0;
pthread_mutex_init(&(lock_cs[i]),NULL);
}
CRYPTO_set_id_callback((unsigned long (*)())pthreads_thread_id);
CRYPTO_set_locking_callback((void (*)())pthreads_locking_callback);
}
void thread_cleanup(void)
{
int i;
CRYPTO_set_locking_callback(NULL);
for (i=0; i<CRYPTO_num_locks(); i++)
{
pthread_mutex_destroy(&(lock_cs[i]));
}
OPENSSL_free(lock_cs);
OPENSSL_free(lock_count);
}
int main(){
BN_CTX *ctx;
ctx = BN_CTX_new();
omp_set_num_threads(158);
#pragma omp parallel
{
int ID = omp_get_thread_num();
BIGNUM *b,*e,*r,*m;
b = BN_new();
e = BN_new();
r = BN_new();
m = BN_new();
BN_set_word(b, 9);
BN_set_word(e, 3);
BN_set_word(m, 5);
BN_mod_exp(r,b,e,m,ctx);
char* result = BN_bn2dec(r);
printf("\n thread = %d result = %s", ID, result); fflush(stdout);
}
thread_cleanup();
}
I get the following error and backtrace, which tells me that BN_mod_exp(r,b,e,m,ctx) is the problem.
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffa9f69700 (LWP 151994)]
0x00007ffff7a97bb6 in BN_CTX_end () from /lib/x86_64-linux-
gnu/libcrypto.so.1.0.0
(gdb) bt
#0 0x00007ffff7a97bb6 in BN_CTX_end () from /lib/x86_64-linux-
gnu/libcrypto.so.1.0.0
#1 0x00007ffff7a940cd in BN_div () from /lib/x86_64-linux-
gnu/libcrypto.so.1.0.0
#2 0x00007ffff7aa3dff in BN_MONT_CTX_set () from /lib/x86_64-
linux-gnu/libcrypto.so.1.0.0
#3 0x00007ffff7a963e5 in BN_mod_exp_mont_word () from /lib/x86_64-
linux-gnu/libcrypto.so.1.0.0
#4 0x0000000000400fef in main._omp_fn.0 () at
debuggingsession.c:106
#5 0x00007ffff77f734a in ?? () from /usr/lib/x86_64-linux-
gnu/libgomp.so.1
#6 0x00007ffff75d9184 in start_thread (arg=0x7fffa9f69700) at
pthread_create.c:312
#7 0x00007ffff730603d in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) frame 4
#4 0x0000000000400fef in main._omp_fn.0 () at
debuggingsession.c:106
106 BN_mod_exp(r,b,e,m,ctx);
(gdb) print r
$1 = (BIGNUM *) 0x7fff8c000900
(gdb) x 0x7fff8c000900
0x7fff8c000900: 0x00000000
(gdb) print b
$2 = (BIGNUM *) 0x7fff8c0008c0
(gdb) x 0x7fff8c0008c0
0x7fff8c0008c0: 0x8c000940
(gdb) x 0x8c000940
0x8c000940: Cannot access memory at address 0x8c000940
(gdb) print b
$3 = (BIGNUM *) 0x7fff8c0008c0
(gdb) print e
$4 = (BIGNUM *) 0x7fff8c0008e0
(gdb) print m
$5 = (BIGNUM *) 0x7fff8c000920
(gdb) x
Update: I am using OpenSSL and OpenMP in a larger program, the above was just for debugging. In the larger program, I set up multiple threads and each is supposed to write to its own file (not the same file). From here: https://en.wikibooks.org/wiki/OpenSSL/Initialization they say that thread callbacks have to be set up before the initialization functions. I am assuming what this means is that first we call CRYPTO_thread_setup(), and then the OpenSSL library initialization functions. Where does CRYPTO_thread_setup() have to be called from, is it immediately after the first #pragma omp parallel and before the opening brace? When I put it there, along with the library initialization functions, I still get segmentation faults, this time relating to using fclose() within the threads. Any ideas on why this could be happening?

Related

I'm trying to create a string with n characters by allocating memories with malloc, but I have a problem

#define _CRT_SECURE_NO_WARNINGS
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
int n;
printf("Length? ");
scanf("%d", &n);
getchar();
char* str = (char*)malloc(sizeof(char) * (n+1));
fgets(str,sizeof(str),stdin);
for (int i = 0; i < n; i++)
printf("%c\n", str[i]);
free(str);
}
Process results like this!
Length? 5
abcde
a
b
c
?
(I wanted to upload the result image, but I got rejected since I didn't have 10 reputations)
I can't figure out why 'd' and 'e' won't be showing in the results.
What is the problem with my code??
(wellcome to stackoverflow :) (update #1)
str is a pointer to char instead of a character array therefore sizeof(str) is always 8 on 64-bit or 4 on 32-bit machines, no matter how much space you have allocated.
Demo (compilation succeeds only if X in static_assert(X) holds):
#include <assert.h>
#include <stdlib.h>
int main(void){
// Pointer to char
char *str=(char*)malloc(1024);
#if defined _WIN64 || defined __x86_64__ || defined _____LP64_____
static_assert(sizeof(str)==8);
#else
static_assert(sizeof(str)==4);
#endif
free(str);
// Character array
char arr[1024];
static_assert(sizeof(arr)==1024);
return 0;
}
fgets(char *str, int num, FILE *stream) reads until (num-1) characters have been read
Instead of fgets(str,sizeof(str),stdin) please fgets(str,n+1,stdin)
Fixed version:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
int main(void){
int n=0;
printf("Length? ");
scanf("%d",&n);
getchar();
char *str=(char*)calloc((n+1),sizeof(char));
static_assert(
sizeof(str)==sizeof(char*) && (
sizeof(str)==4 || // 32-bit machine
sizeof(str)==8 // 64-bit machine
)
);
fgets(str,n+1,stdin);
for(int i=0;i<n;++i)
printf("%c\n",str[i]);
free(str);
str=NULL;
}
Length? 5
abcde
a
b
c
d
e

Correctly allocate stack for clone a thread

So I want to create a thread without CLONE_FILES flag. I try to call clone directly, but has some strange problem. I think it is related to incorrect memory allocation.
#ifndef _GNU_SOURCE
#define _GNU_SOURCE
#endif
#include <iostream>
#include <sys/mman.h>
#include <pthread.h>
#include <unistd.h>
const int clone_flags = (CLONE_VM | CLONE_FS | CLONE_SYSVSEM
| CLONE_SIGHAND | CLONE_THREAD
| CLONE_SETTLS | CLONE_PARENT_SETTID
| CLONE_CHILD_CLEARTID
| 0);
static int cloneThread(void* arg)
{
long arg2 = (long)arg + (long)arg;
long* arg2_ptr = &arg2;
return 0;
}
int main(int argc, char** argv)
{
const int STACK_SIZE = 0x800000;
void* stackaddr = malloc(STACK_SIZE);
void* stacktop = (char*)stackaddr + STACK_SIZE; // assuming stack going downwards
clone(cloneThread, stacktop, clone_flags, (void*)1);
sleep(1); // wait for cloneThread running before exit
}
As you can see here, I am using malloc for stack allocation. lldb shows the program crashes at the beginning of cloneThread. But if I remove long* arg2_ptr = &arg2;, the program exit successfully.
I also read source code of pthread_create.c, allocatestack.c. With strace, I replaced malloc with the following
void* stackaddr = mmap(NULL, STACK_SIZE, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0);
mprotect(stackaddr, STACK_SIZE, PROT_READ|PROT_WRITE);
But it has the same behavior as using malloc. So how should I use clone?
OS: Ubuntu 18.04 LTS, g++ 7.3.0
When you supply the CLONE_SETTLS, CLONE_PARENT_SETTID and CLONE_CHILD_CLEARTID flags you must provide the newtls, ptid and ctid arguments to clone() respectively.
If all you want is a normal thread with a separate FD table though, just use pthread_create() and call unshare(CLONE_FILES) as the first operation in the new thread.

Ptrace reset a breakpoint

I am having trouble resetting a process after I have hit a breakpoint with Ptrace. I am essentially wrapping this code in python.
I am running this on 64 bit Ubuntu.
I understand the concept of resetting the data at the location and decrementing the instruction pointer, but after I get the trap signal and do that, my process is not finishing.
Code snippet:
# Continue to bp
res = libc.ptrace(PTRACE_CONT,pid,0,0)
libc.wait(byref(wait_status))
if _wifstopped(wait_status):
print('Breakpoint hit. Signal: %s' % (strsignal(_wstopsig(wait_status))))
else:
print('Error process failed to stop')
exit(1)
# Reset Instruction pointer
data = get_registers(pid)
print_rip(data)
data.rip -= 1
res = set_registers(pid,data)
# Verify rip
print_rip(get_registers(pid))
# Reset Instruction
out = set_text(pid,c_ulonglong(addr),c_ulonglong(initial_data))
if out != 0:
print_errno()
print_text(c_ulonglong(addr),c_ulonglong(get_text(c_void_p(addr))))
And I run a PTRACE_DETACH right after returning from this code.
When I run this, it hits the breakpoint the parent process returns successfully, but the child does not resume and finish its code.
If I comment out the call to the breakpoint function it just attaches ptrace to the process and then detaches it, and the program runs fine.
The program itself is just a small c program that prints 10 times to a file.
Full code is in this paste
Is there an error anyone sees with my breakpoint code?
I ended up writing a C program that was as exact a duplicate of the python code as possible:
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>
#include <syscall.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/reg.h>
#include <sys/user.h>
#include <unistd.h>
#include <errno.h>
#include <time.h>
void set_unset_bp(pid){
int wait_status;
struct user_regs_struct regs;
unsigned long long addr = 0x0000000000400710;
unsigned long long data = ptrace(PTRACE_PEEKTEXT,pid,(void *)addr,0);
printf("Orig data: 0x%016x\n",data);
unsigned long long trap = (data & 0xFFFFFFFFFFFFFF00) | 0xCC;
ptrace(PTRACE_POKETEXT,pid,(void *)addr,(void *)trap);
ptrace(PTRACE_CONT,pid,0,0);
wait(&wait_status);
if(WIFSTOPPED(wait_status)){
printf("Signal recieved: %s\n",strsignal(WSTOPSIG(wait_status)));
}else{
perror("wait");
}
ptrace(PTRACE_POKETEXT,pid,(void *)addr,(void *)data);
ptrace(PTRACE_GETREGS,pid,0,&regs);
regs.rip -=1;
ptrace(PTRACE_SETREGS,pid,0,&regs);
data = ptrace(PTRACE_PEEKTEXT,pid,(void *)addr,0);
printf("Data after resetting bp data: 0x%016x\n",data);
ptrace(PTRACE_CONT,pid,0,0);
}
int main(void){
//Fork child process
extern int errno;
int pid = fork();
if(pid ==0){//Child
ptrace(PTRACE_TRACEME,0,0,0);
int out = execl("/home/chris/workspace/eliben-debugger/print","/home/chris/workspace/eliben-debugger/print",0);
if(out != 0){
printf("Error Value is: %s\n", strerror(errno));
}
}else{ //Parent
wait(0);
printf("Got stop signal, we just execv'd\n");
set_unset_bp(pid);
printf("Finished setting and unsetting\n");
wait(0);
printf("Got signal, detaching\n");
ptrace(PTRACE_DETACH,pid,0,0);
wait(0);
printf("Parent exiting after waiting for child to finish\n");
}
exit(0);
}
After comparing the output to my Python output I noticed that according to python my original data was 0xfffffffffffe4be8 and 0x00000000fffe4be8.
This lead me to believe that my return data was getting truncated to a 32 bit value.
I changed my get and set methods to something like this, setting the return type to a void pointer:
def get_text(addr):
restype = libc.ptrace.restype
libc.ptrace.restype = c_void_p
out = libc.ptrace(PTRACE_PEEKTEXT,pid,addr, 0)
libc.ptrace.restype = restype
return out
def set_text(pid,addr,data):
return libc.ptrace(PTRACE_POKETEXT,pid,addr,data)
Can't tell you how it works yet, but I was able to get the child process executing successfully after the trap.

Why thread_id creates not in order?

I tried to create 10 threads, and output each tread index. My code is shown as below, I am wondering why they are repeating instead of arranging in order?
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
#include <string.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include "util.h"
#include <errno.h>
#include <unistd.h>
#include <signal.h>
#include <time.h>
pthread_mutex_t request_buf_lock = PTHREAD_MUTEX_INITIALIZER;
void * worker(void *arg)
{
int thread_id = *(int*)arg;
// int requests_handled = 0;
//requests_handled = requests_handled + 1;
printf("%d\n",thread_id);
}
int main(int argc, char** argv)
{
pthread_t dispatchers[100];
pthread_t workers[100];
int i;
int * thread_id = malloc(sizeof(int));
for (i = 0; i < 10; i++) {
*thread_id = i;
pthread_create(&workers[i], NULL, worker, (void*)thread_id);
}
for (i = 0; i < 10; i++) {
pthread_join(workers[i], NULL);
}
return 0;
}
And the output result is:
4
5
5
6
6
6
7
8
9
9
But I expected it as:
0
1
2
3
4
5
6
7
8
9
Anyone has any idea or advice?
All 10 threads execute in parallel, and they all share a single int object, the one created by the call to malloc.
By the time your first thread executes its printf call, the value of *thread_id has been set to 4. Your second and third threads execute their printf calls when *thread_id has been set to 5. And so on.
If you allocate a separate int object for each thread (either by moving the malloc call inside the loop or just by declaring an array of ints), you'll get a unique thread id in each thread. But they're still likely to be printed in arbitrary order, since there's no synchronization among the threads.

Why "ls" is not colored after forkpty()

Why output of ls executed here is not colored?
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <pty.h>
#include <sys/wait.h>
int main(int argc, char **argv ) {
termios termp; winsize winp;
int amaster; char name[128];
if (forkpty(&amaster, name, &termp, &winp) == 0) {
system("ls"); // "ls --color" will work here!
return 0;
}
wait(0);
char buf[128]; int size;
while (1) {
size = read(amaster, buf, 127);
if (size <= 0) break;
buf[size] = 0;
printf("%s", buf);
}
return 0;
}
According to man (and ls.c that I am inspecting) it should be colored if isatty() returns true. After forkpty() it must be true. Besides, ls DOES output in columnized mode in this example! Which means it feels it has tty as output.
Of course I do not want only ls to output color, but an arbitrary program to feel that it has real color enabled tty behind.
I just wrote a simple test:
#include <unistd.h>
int main() {
printf("%i%i%i%i%i\n", isatty(0), isatty(1), isatty(2), isatty(3), isatty(4));
}
and call it in a child part of forkpty, and it displays 11100, which means ls should be colored!
OK, as it seems the fact that ls produces no color output has nothing to do with forkpty(). It is just not color enabled by default. But now, maybe that's another question, why it is not color if it just checks isatty()?

Resources