I am learning to program in MPI and I came across this question. Lets say I have a .txt file with 100,000 rows/lines, how do I chunk them for processing by 4 processors? i.e. I want to let processor 0 take care of the processing for lines 0-25000, processor 1 to take care of 25001-50000 and so on. I did some searching and did came across MPI_File_seek but I am not sure can it work on .txt and supports fscanf afterwards.

Text isn't a great format for parallel processing exactly because you don't know ahead of time where (say) line 25001 begins. So these sorts of problems are often dealt with ahead of time through some preprocessing step, either building an index or partitioning the file into the appropriate number of chunks for each process to read.
If you really want to do it through MPI, I'd suggest using MPI-IO to read in overlapping chunks of the text file onto the various processors, where the overlap is much longer than you expect your longest line to be, and then have each processor agree on where to start; eg, you could say that the first (or last) new line in the overlap region shared by processes N and N+1 is where process N leaves off and N+1 starts.
To follow this up with some code,
#include <stdio.h>
#include <mpi.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
void parprocess(MPI_File *in, MPI_File *out, const int rank, const int size, const int overlap) {
MPI_Offset globalstart;
int mysize;
char *chunk;
/* read in relevant chunk of file into "chunk",
* which starts at location in the file globalstart
* and has size mysize
MPI_Offset globalend;
MPI_Offset filesize;
/* figure out who reads what */
MPI_File_get_size(*in, &filesize);
filesize--; /* get rid of text file eof */
mysize = filesize/size;
globalstart = rank * mysize;
globalend = globalstart + mysize - 1;
if (rank == size-1) globalend = filesize-1;
/* add overlap to the end of everyone's chunk except last proc... */
if (rank != size-1)
globalend += overlap;
mysize = globalend - globalstart + 1;
/* allocate memory */
chunk = malloc( (mysize + 1)*sizeof(char));
/* everyone reads in their part */
MPI_File_read_at_all(*in, globalstart, chunk, mysize, MPI_CHAR, MPI_STATUS_IGNORE);
chunk[mysize] = '\0';
* everyone calculate what their start and end *really* are by going
* from the first newline after start to the first newline after the
* overlap region starts (eg, after end - overlap + 1)
int locstart=0, locend=mysize-1;
if (rank != 0) {
while(chunk[locstart] != '\n') locstart++;
if (rank != size-1) {
while(chunk[locend] != '\n') locend++;
mysize = locend-locstart+1;
/* "Process" our chunk by replacing non-space characters with '1' for
* rank 1, '2' for rank 2, etc...
for (int i=locstart; i<=locend; i++) {
char c = chunk[i];
chunk[i] = ( isspace(c) ? c : '1' + (char)rank );
/* output the processed file */
MPI_File_write_at_all(*out, (MPI_Offset)(globalstart+(MPI_Offset)locstart), &(chunk[locstart]), mysize, MPI_CHAR, MPI_STATUS_IGNORE);
int main(int argc, char **argv) {
MPI_File in, out;
int rank, size;
int ierr;
const int overlap = 100;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
if (argc != 3) {
if (rank == 0) fprintf(stderr, "Usage: %s infilename outfilename\n", argv[0]);
ierr = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, MPI_INFO_NULL, &in);
if (ierr) {
if (rank == 0) fprintf(stderr, "%s: Couldn't open file %s\n", argv[0], argv[1]);
if (ierr) {
if (rank == 0) fprintf(stderr, "%s: Couldn't open output file %s\n", argv[0], argv[2]);
parprocess(&in, &out, rank, size, overlap);
return 0;
Running this on a narrow version of the text of the question, we get
$ mpirun -n 3 ./textio foo.out
$ paste foo.out
Hi guys I am learning to 11 1111 1 11 11111111 11
program in MPI and I came 1111111 11 111 111 1 1111
across this question. Lets 111111 1111 111111111 1111
say I have a .txt file with 111 1 1111 1 1111 1111 1111
100,000 rows/lines, how do 1111111 11111111111 111 11
I chunk them for processing 1 11111 1111 111 1111111111
by 4 processors? i.e. I want 22 2 22222222222 2222 2 2222
to let processor 0 take care 22 222 222222222 2 2222 2222
of the processing for lines 22 222 2222222222 222 22222
0-25000, processor 1 to take 22222222 222222222 2 22 2222
care of 25001-50000 and so 2222 22 22222222222 222 22
on. I did some searching and 333 3 333 3333 333333333 333
did came across MPI_File_seek 333 3333 333333 3333333333333
but I am not sure can it work 333 3 33 333 3333 333 33 3333
on .txt and supports fscanf 33 3333 333 33333333 333333
afterwards. 33333333333


what is the right inotify event mask configuration for monitoring writes to a file?

I am trying to use inotify to monitor a file for any reads or writes.
typical use case would be user runs one of the following commands and I get the appropriate notification.
echo "xyz" > /tmp/the_file # This should give me a write notification
cat /tmp/the_file # This should give me an access notification.
I have tried the following masks :
mask = (IN_MODIFY|IN_ACCESS); //gets correct notification for reads , for writes i get 2 notifications
mask = (IN_CLOSE_WRITE|IN_ACCESS); // gets correct notification for reads , no notification for writes.
what would be the right mask value to get a single notification for every read and write ?
I am using the test application from this blog as reference :
How to Use inotify API in C Language
#define MAX_EVENTS 1024
#define LEN_NAME 256
#define EVENT_SIZE (sizeof (struct inotify_event))
int fd,wd;
void sig_handler(int sig){
inotify_rm_watch(fd, wd);
int main(int argc, char **argv){
char *path_to_be_watched;
int i=0,length;
char buffer[BUF_LEN];
struct inotify_event *event = (struct inotify_event *) &buffer[i];
path_to_be_watched = argv[1];
fd = inotify_init();
fcntl(fd, F_SETFL, O_NONBLOCK);
wd = inotify_add_watch(fd, path_to_be_watched, IN_MODIFY | IN_ACCESS);
printf("Could not watch : %s\n", path_to_be_watched);
} else {
printf("Watching : %s\n", path_to_be_watched);
length = read(fd,buffer,BUF_LEN);
event = (struct inotify_event *) &buffer[i];
if ( event->mask & IN_MODIFY ) {
printf( "The file %s was modified.\n", event->name );
/* This gets printed twice when I run 'echo "123" > /tmp/this_file' */
else if ( event->mask & IN_ACCESS ) {
printf( "The file %s was accessed.\n", event->name );
i += EVENT_SIZE + event->len;
Snippet of strace for echo "hello" > ./a.txt
openat(AT_FDCWD, "./a.txt", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
fcntl64(1, F_GETFD) = 0
fcntl64(1, F_DUPFD, 10) = 10
fcntl64(1, F_GETFD) = 0
fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
dup2(3, 1) = 1
close(3) = 0
write(1, "hello\n", 6) = 6
dup2(10, 1) = 1
fcntl64(10, F_GETFD) = 0x1 (flags FD_CLOEXEC)
close(10) = 0

Pthreads program is slower than the serial program - Linux

Thank you for being generous with your time and helping me in this matter. I am trying to calculate the sum of the squared numbers using pthread. However, it seems that it is even slower than the serial implementation. Moreover, when I increase the number of threads the program becomes even slower. I made sure that each thread is running on a different core (I have 6 cores assigned to the virtual machine)
This is the serial program:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/time.h>
#include <time.h>
int main(int argc, char *argv[]) {
struct timeval start, end;
gettimeofday(&start, NULL); //start time of calculation
int n = atoi(argv[1]);
long int sum = 0;
for (int i = 1; i < n; i++){
sum += (i * i);
gettimeofday(&end, NULL); //end time of calculation
printf("The sum of squares in [1,%d): %ld | Time Taken: %ld mirco seconds \n",n,sum,
((end.tv_sec * 1000000 + end.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec)));
return 0;
This the Pthreads program:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <sys/types.h>
#include <sys/time.h>
#include <time.h>
void *Sum(void *param);
// structure for thread arguments
struct thread_args {
int tid;
int a; //start
int b; //end
long int result; // partial results
int main(int argc, char *argv[])
struct timeval start, end;
gettimeofday(&start, NULL); //start time of calculation
int numthreads;
int number;
double totalSum=0;
if(argc < 3 ){
printf("Usage: ./sum_pthreads <numthreads> <number> ");
return 1;
numthreads = atoi(argv[1]);
number = atoi(argv[2]);;
pthread_t tid[numthreads];
struct thread_args targs[numthreads];
printf("I am Process | range: [%d,%d)\n",1,number);
printf("Running Threads...\n\n");
for(int i=0; i<numthreads;i++ ){
//Setting up the args
targs[i].tid = i;
targs[i].a = (number)*(targs[i].tid)/(numthreads);
targs[i].b = (number)*(targs[i].tid+1)/(numthreads);
if(i == numthreads-1 ){
targs[i].b = number;
pthread_create(&tid[i],NULL,Sum, &targs[i]);
for(int i=0; i< numthreads; i++){
printf("Threads Exited!\n");
printf("Process collecting information...\n");
for(int i=0; i<numthreads;i++ ){
totalSum += targs[i].result;
gettimeofday(&end, NULL); //end time of calculation
printf("Total Sum is: %.2f | Taken Time: %ld mirco seconds \n",totalSum,
((end.tv_sec * 1000000 + end.tv_usec) - (start.tv_sec * 1000000 + start.tv_usec)));
return 0;
void *Sum(void *param) {
int start = (*(( struct thread_args*) param)).a;
int end = (*((struct thread_args*) param)).b;
int id = (*((struct thread_args*)param)).tid;
long int sum =0;
printf("I am thread %d | range: [%d,%d)\n",id,start,end);
for (int i = start; i < end; i++){
sum += (i * i);
(*((struct thread_args*)param)).result = sum;
printf("I am thread %d | Sum: %ld\n\n", id ,(*((struct thread_args*)param)).result );
hamza#hamza:~/Desktop/lab4$ ./sum_serial 10
The sum of squares in [1,10): 285 | Time Taken: 7 mirco seconds
hamza#hamza:~/Desktop/lab4$ ./sol 2 10
I am Process | range: [1,10)
Running Threads...
I am thread 0 | range: [0,5)
I am thread 0 | Sum: 30
I am thread 1 | range: [5,10)
I am thread 1 | Sum: 255
Threads Exited!
Process collecting information...
Total Sum is: 285.00 | Taken Time: 670 mirco seconds
hamza#hamza:~/Desktop/lab4$ ./sol 3 10
I am Process | range: [1,10)
Running Threads...
I am thread 0 | range: [0,3)
I am thread 0 | Sum: 5
I am thread 1 | range: [3,6)
I am thread 1 | Sum: 50
I am thread 2 | range: [6,10)
I am thread 2 | Sum: 230
Threads Exited!
Process collecting information...
Total Sum is: 285.00 | Taken Time: 775 mirco seconds
The two programs do very different things. For example, the threaded program produces much more text output and creates a bunch of threads. You're comparing very short runs (less than a thousandth of a second) so the overhead of those additional things is significant.
You have to test with much longer runs such that the cost of producing additional output and creating and synchronizing threads is lost.
To use an analogy, one person can tighten three screws faster than three people can because of the overhead of getting a tool to each person, deciding who will tighten which screw, and so on. But if you have 500 screws to tighten, then three people will get it done faster.

Why does cat call read() twice when once was enough?

I am new to Linux kernel module. I am learning char driver module based on a web course. I have a very simple module that creates a /dev/chardevexample, and I have a question for my understanding:
When I do echo "hello4" > /dev/chardevexample, I see the write execute exactly once as expected. However, when I do cat /dev/chardevexample, I see the read executed two times.
I see this both in my code and in the course material. All the data was returned in the first read(), so why does cat call it again?
All the things I did so far are as follows:
insmod chardev.ko to load my module
echo "hello4" > /dev/chardevexample. This is the write and I see it happening exactly once in dmesg
cat /dev/chardevexample. This is the read, and dmesg shows it happening twice.
I did strace cat /dev/chardevexample, and I indeed see the function call being called twice for read. There is a write in between as well
read(3, "hello4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072) = 4096
write(1, "hello4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096hello4) = 4096
read(3, "", 131072)
dmesg after read (cat command)
[909836.517402] DEBUG-device_read: To User hello4 and bytes_to_do 4096 ppos 0 # Read #1
[909836.517428] DEBUG-device_read: Data send to app hello4, nbytes=4096 # Read #1
[909836.519086] DEBUG-device_read: To User and bytes_to_do 0 ppos 4096 # Read #2
[909836.519093] DEBUG-device_read: Data send to app hello4, nbytes=0 # Read #2
Code snippet for read, write and file_operations is attached. Any
guidance would help. I searched extensively and couldn't understand.
Hence the post.
* #brief Write to device from userspace to kernel space
* #returns Number of bytes written
static ssize_t device_write(struct file *file, //!< File pointer
const char *buf,//!< from for copy_from_user. Takes 'buf' from user space and writes to
//!< kernel space in 'buffer'. Happens on fwrite or write
size_t lbuf, //!< length of buffer
loff_t *ppos) //!< position to write to
int nbytes = lbuf - copy_from_user(
buffer + *ppos, /* to */
buf, /* from */
lbuf); /* how many bytes */
*ppos += nbytes;
buffer[strcspn(buffer, "\n")] = 0; // Remove End of line character
pr_info("Recieved data \"%s\" from apps, nbytes=%d\n", buffer, nbytes);
return nbytes;
* #brief Read from device - from kernel space to user space
* #returns Number of bytes read
static ssize_t device_read(struct file *file,//!< File pointer
char *buf, //!< for copy_to_user. buf is 'to' from buffer
size_t lbuf, //!< Length of buffer
loff_t *ppos)//!< Position {
int nbytes;
int maxbytes;
int bytes_to_do;
maxbytes = PAGE_SIZE - *ppos;
if(maxbytes >lbuf)
bytes_to_do = lbuf;
bytes_to_do = maxbytes;
buffer[strcspn(buffer, "\n")] = 0; // Remove End of line character
printk("DEBUG-device_read: To User %s and bytes_to_do %d ppos %lld\n", buffer + *ppos, bytes_to_do, *ppos);
nbytes = bytes_to_do - copy_to_user(
buf, /* to */
buffer + *ppos, /* from */
bytes_to_do); /* how many bytes*/
*ppos += nbytes;
pr_info("DEBUG-device_read: Data send to app %s, nbytes=%d\n", buffer, nbytes);
return nbytes;} /* Every Device is like a file - this is device file operation */ static struct file_operations device_fops = {
.owner = THIS_MODULE,
.write = device_write,
.open = device_open,
.read = device_read,};
The Unix convention for indicating end-of-file is to have read return 0 bytes.
In this case, cat asks for 131072 bytes and only receives 4096. This is normal and not to be interpreted as having reached the end of the file. For example, it happens when you read from the keyboard but the user only inputs a small amount of data.
Because cat has not yet seen EOF (i.e. read did not return 0), it continues to issue read calls until it does. This means that if there's any data, you will always see a minimum of two read calls: one (or more) for the data, and one final one that returns 0.

Get the serial number of the volume udf cd / dvd disk?

I'm writing a program in linux which counts the serial number (xxxx-xxxx) of the volume of the CD in Windows 7. My program correctly determines the serial number of the volume on disks with the filesystems iso9660 and joilet. But how to define a disk volume sniffer with a file system udf? Can someone tell me ....
ps if anyone does not understand I'm talking about the serial number of this kind
#include <QCoreApplication>
#include <stdio.h>
#include <sys/ioctl.h>
#include <linux/cdrom.h>
#include <string.h>
#include <szi/szimac.h>
#include <qfile.h>
#include <iostream>
#include <QDir>
#include <unistd.h>
#define SEC_SIZE 2048
#define VD_N 16
#define VD_TYPE_SUPP 2
#define VD_TYPE_END 255
#define ESC_IDX 88
#define ESC_LEN 3
#define ESC_UCS2L1 "%/#"
#define ESC_UCS2L2 "%/C"
#define ESC_UCS2L3 "%/E"
using namespace std;
int cdid(unsigned char pvd[SEC_SIZE])
unsigned char part[4] = {0};
int i;
for(i = 0; i < SEC_SIZE; i += 4)
part[3] += pvd[i + 0];
part[2] += pvd[i + 1];
part[1] += pvd[i + 2];
part[0] += pvd[i + 3];
return (part[3] << 24) + (part[2] << 16) + (part[1] << 8) + part[0];
int main(int argc, char *argv[])
FILE *in;
unsigned char buf[SEC_SIZE];
struct cdrom_multisession msinfo;
long session_start;
int id;
QString home=QString(getenv("HOME"))+QString("/chteniestorm");
QFile file(home);
in = fopen(ustr.toLocal8Bit().data(), "rb");
if(in == NULL)
if (
return 0;
/* Get session info */
msinfo.addr_format = CDROM_LBA;
if(ioctl(fileno(in), CDROMMULTISESSION, &msinfo) != 0)
fprintf(stderr, "WARNING: Can't get multisession info\n");
session_start = 0;
session_start = msinfo.addr.lba;
fseek(in, 0, SEEK_SET); //to the begining
/* Seek to primary volume descriptor */
if(fseek(in, (session_start + VD_N) * SEC_SIZE, SEEK_SET) != 0)
if (
return 0;
/* Read descriptor */
if(fread(buf, 1, SEC_SIZE, in) != SEC_SIZE)
if (
return 0;
/* Caclculate disc id */
id = cdid(buf);
/* Search for Joliet extension */
while(buf[0] != VD_TYPE_END)
/* Read descriptor */
if(fread(buf, 1, SEC_SIZE, in) != SEC_SIZE)
return 0;
if(buf[0] == VD_TYPE_SUPP
&& (memcmp(buf + ESC_IDX, ESC_UCS2L1, ESC_LEN) == 0
|| memcmp(buf + ESC_IDX, ESC_UCS2L2, ESC_LEN) == 0
|| memcmp(buf + ESC_IDX, ESC_UCS2L3, ESC_LEN) == 0)
/* Joliet found */
id = cdid(buf);
It looks like this question was asked on more places [1], [2], [3], [4] but nowhere was answered yet. So I will do it here.
In some of those posts people decoded serial number generation algorithm. It is just checksum which you already have found and put into your cdid() function. Same checksum algorithm is used for both ISO9660 and UDF filesystems on Windows. You have already figured out from which ISO9660 structures is that checksum calculated.
So your question remain just for UDF filesystem. For UDF filesystem on Windows that checksum is calculated from the 512 bytes long File Set Descriptor (FSD) structure. I would suggest you to read OSTA UDF specification how to locale that FSD on UDF disc.
Basically for plain UDF which do not use Virtual Allocation Table (VAT), Sparing Table or Metadata Partition, location of the FSD is stored in Logical Volume Descriptor (LVD) structure, in field LogicalVolumeContentsUse (it is of type long_ad). LVD is stored in the Volume Descriptor Sequence (VDS). VDS's location is stored in Anchor Volume Descriptor Pointer (AVDP), in field MainVolumeDescriptorSequenceExtent. AVDP itself is located at sector 256 of medium. Optical media have sector size 2048 bytes and common hard disk 512 bytes.
For UDF with VAT (e.g. on CD-R/DVD-R/BD-R), Sparing Table (e.g. on CD-RW/DVD-RW) or Metadata Partition (e.g. on Blu-ray), it is much more complicated. You need to look into Virtual, Sparable or Metadata Partition to figure out how to translate logical location of the FSD to physical location of media.
In udftools project starting with version 2.0, there is a new tool udfinfo which provides various information about UDF filesystem. It shows also that Windows specific Volume Serial Number from your question under winserialnum key. Note that udfinfo cannot read FSD from UDF filesystem with VAT or Metadata yet.

MPI program does nothing - running on linux

I've written this MPI C program on linux. The master is supposed to send tasks to the slaves and receive data from the slaves (and if there're more tasks to give them to the finished slaves).
After all the tasks are completed, it's supposed to print a solution.
It prints nothing and I can't figure out why. It isn't stuck, it just finishes after a second and doesn't print anything.
I've tried debugging by placing a printf in different places in the code.
The only place in the code that printed something was before the MPI_Recv in the master section, and it printed a few times (less than the number of processes).
Here's the full code:
#include <mpi.h>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#define NUMS_TO_CHECK 2000
#define RANGE_MIN -5
#define RANGE_MAX 5
#define PI 3.1415
#define MAX_ITER 100000
double func(double x);
int main (int argc, char *argv[])
int numProcs, procId;
int errorCode= MPI_ERR_COMM;
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &numProcs);
MPI_Comm_rank(MPI_COMM_WORLD, &procId);
MPI_Status status;
int i;
double recieve=0;
int countPositives=0;
double arr[NUMS_TO_CHECK];
double difference= (RANGE_MAX - RANGE_MIN) / NUMS_TO_CHECK;
int counter = NUMS_TO_CHECK-1; //from end to start...
//Initiallizing the array.
for(i=0; i<NUMS_TO_CHECK; i++){
//Send tasks to all procs
for(i=1; i<numProcs; i++){
MPI_Send(&arr[counter], 1, MPI_DOUBLE, i, 0, MPI_COMM_WORLD);
MPI_Recv(&recieve, 1, MPI_DOUBLE, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status);
MPI_Send(&arr[counter], 1, MPI_DOUBLE, status.MPI_SOURCE, 0, MPI_COMM_WORLD);
printf("Number of positives: %d", countPositives);
MPI_Send(&recieve, 1, MPI_DOUBLE, 0, 0, MPI_COMM_WORLD);
double func(double x)
int i;
double value = x;
int limit = rand() % 3 + 1;
for(i = 0; i < limit * MAX_ITER; i++)
value = sin(exp(sin(exp(sin(exp(value))))) - PI / 2) - 0.5;
return value;
I think your slaves need to read data in a while loop. They only do 1 receive and 1 send. Whereas the master starts at 2000. That may be by design, so I may be wrong.
On the principle, your code looks almost fine. Only two things are missing here:
The most obvious one is a loop of a sort on the slaves' side, to receive their instructions from the master, and then send back their work; and
Less obvious but as essential: a mean for the master to tell when the work is done. It could be a special value send, that is tested by the slaves, and which leads them to exist the recv + work + send loop upon reception, or a different tag that you test. In the latter case, you'd have to use MPI_ANY_TAG for the reception call on the slaves' side.
With this in mind, I'm sure you can make your code to work.
