Reading N number of files using thread concept in java

Reading N number of files using thread concept in java - multithreading

How to read N number of lines from a file using thread concept.
Suppose a file contains N number of lines, Contents will be added dynamically.
I need to read the line which contains the word "ERROR","Shutdown","Recovery","failed".
Here I need to use the thread concept because, First time my file reads 1-100 lines and it is checking for the word,if there a particular word (Error, Shutdown, Recovery, failed) it should store it in a string.
Next time my thread is starts to read after 2 minutes then it should start reading from the 101-200 lines, it should not read again from the first line of the file.
It should keep on continuously read the file after every 2 minutes. Please provide me related coding for the one I have requested.

You are confused about what a thread is. It has nothing to do with file read offsets. What you need is a java.io.RandomAccessFile. The first time you read until EOF and store the count of bytes read. Each next time to call skipBytes() before start reading. And so on...

Related

How to reliably read data from a file which is being continuously written by another process?

So, I am in the situation where one process is continuously (after each few seconds) writing data to a file (not appending). The data is in the form of json. Now another process has to read this file at regular intervals. Now it could be that the reading process reads it while the writing process is writing to the file.
A soluition to this problem that I can think of is for the writer process to also write a corresponding checksum file. The reader process would now have to read both the file and its checksum file. If the calculated checksum doesn't match, the reader process would repeat the process until the calculated checksum matches. In this way, now it would know that it has read the correct data.
Or maybe a better solution is to read the file twice after a certain time period (much less than the writing interval of the writing process), and see if the read data matches.
The third way could be to write some magic data at the end of the file, so that the reading process knows that it has read the whole file, if it has encoutered that magic data at the end.
What do you think? Are these solutions viable, or are there better methods to achieve this?

Create an entire new file each time, and rename() the new file once it's been completely written:
If newpath already exists, it will be atomically replaced, so that
there is no point at which another process attempting to access
newpath will find it missing. ...
Some copy of the file will always be there, and it will always be complete and correct:
So, instead of
writeDataFile( "/path/to/data/file.json" );
and then trying to figure out what to do in the reader process(es), you simply do
writeDataFile( "/path/to/data/file.json.new" );
rename( "/path/to/data/file.json.new", "/path/to/data/file.json" );
No locking is necessary, nor any reading of the file and computing checksums and hoping it's correct.
The only issue is any reader process has to open() the file each time it needs to read the latest copy - it can't keep and open file descriptor on the file and try to read new contents as the rename() call unlinks the original file and replaces it with an entirely new file.

If you want to guarantee that the reader always gets all data, consider using a name pipe.
mkfifo ./jsonoutput
Then set one program to write to and the other program to read from this file ./jsonoutput.
So long as the writer is regularly closing and reopening the file after writing each JSON, the reader will get an EOF and process the input.
However if that isn't the case, the reader will just keep reading and the writer will just keep writing. If the programs aren't designed to handle streams of data like that, then they might just never process the data and the programs will hang.
If that's the case then you could write a program that reads from one named pipe until it gets a complete JSON and then flushes it through a second named pipe to the final program.

TensorFlow: More than one thread in shuffle_batch for single sample files

I'm trying to understand the significance of using num_threads>1 in tf.train.shuffle_batch connected to tf.WholeFileReader reading image files (each file contains a single data sample). Will setting num_threads>1 make any difference in such case compared to num_threads=1? What is the mechanics of the file and batch queues in such case?

A short answer: probably it will make the execution faster. Here is some authoritative explanation from the guide:
single reader via the tf.train.shuffle_batch with num_threads bigger
than 1. This will make it read from a single file at the same time
(but faster than with 1 thread), instead of N files at once. This can
be important:
If you have more reading threads than input files, to avoid the risk
that you will have two threads reading the same example from the same
file near each other.
Or if reading N files in parallel causes too
many disk seeks. How many threads do you need?
the
tf.train.shuffle_batch* functions add a summary to the graph that
indicates how full the example queue is. If you have enough reading
threads, that summary will stay above zero.

Is it safe to use write() multiple times on the same file without regard for concurrency as long as each write is to a different region of the file?

According to the docs for fs:
Note that it is unsafe to use fs.write multiple times on the same file without waiting for the callback. For this scenario, fs.createWriteStream is strongly recommended.
I am downloading a file in chunks (4 chunks downloading at a time concurrently). I know the full size of the file beforehand (I use truncate after opening the file to allocate the space upfront) and also the size and ultimate location in the file (byte offset from beginning of file) of each chunk. Once a chunk is finished downloading, I call fs.write to put that chunk of data into the file at its proper place. Each call to fs.write includes the position where the data should be written. I am not using the internal pointer at all. No two chunks will overlap.
I assume that the docs indicate that calling fs.write multiple times without waiting for the callback is unsafe because you can't know where the internal pointer is. Since I'm not using that, is there any problem with my doing this?

No its not safe. Simpy because you don't know if the first call to write has been successful when you execute the second call.
Imagine if the second call was successful but the first and third call wasn't and the fifth and sixth were successful as well.
And the chaos is perfect.
Plus NodeJS has a different execution stack than other interpreters have. You have no guarantee when specific code parts will be executed or in which order

Queuing and priority?

How can I avoid a queue (Beanstalk) to block other users if the initial one takes time ?
For example, if my first user upload a file that takes 10 hours to be processed, how can I avoid the other users to have to wait 10 hours to have their file started.
For the details : One user upload a file with n lines. This file will be splitted in small chuncks of 1000 lines and added to a queue. 4 worker works simultaneously to process the queue (which means 4000 lines at the same time).
If the first user upload a file containing 100.000 lines, and the second user upload a file of 4.000 lines, he will have to wait that the workers have finished processing the first file.
Is there a way to avoid forcing the second user to wait ?
The only solution that cames to my mind is to limit the number of lines to a certain amount (that will be not too long to wait) and if the file is higher, create a dedicated instance to that specific user.
How would you do?

How to write data into same file in MFC?

I have created a worker thread.
One thread prints the natural numbers by creating one .txt file and my intention is to open the same file and print even numbers.
I am able to print in different files by creating new .txt file in another thread.
But I need the same file (which is created by first thread) to be opened and print even numbers.
Please help me out.

There's a couple of ways I can think of to do this :
Use a critical section around the file open/write/close sections in each of the two threads (I think you'll probably need to close the file after each write before you release the critical section).
Use a third thread to do all the file writing and just pass messages from the other two threads to it to tell it what to write to file.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Reading N number of files using thread concept in java - multithreading

You are confused about what a thread is. It has nothing to do with file read offsets. What you need is a java.io.RandomAccessFile. The first time you read until EOF and store the count of bytes read. Each next time to call skipBytes() before start reading. And so on...

Related

How to reliably read data from a file which is being continuously written by another process?

TensorFlow: More than one thread in shuffle_batch for single sample files

Is it safe to use write() multiple times on the same file without regard for concurrency as long as each write is to a different region of the file?

Queuing and priority?

How to write data into same file in MFC?

Categories

Resources