Read nth line in Node.js without reading entire file - node.js

I'm trying to use Node.js to get a specific line for a binary search in a 48 Million line file, but I don't want to read the entire file to memory. Is there some function that will let me read, say, line 30 million? I'm looking for something like Python's linecache module.
Update for how this is different: I would like to not read the entire file to memory. The question this is identified as a duplicate of reads the entire file to memory.

You should use readline module from Node’s standard library. I deal with 30-40 million rows files in my project and this works great.
If you want to do that in a less verbose manner and don’t mind to use third party dependency use nthline package:
const nthline = require('nthline')
, filePath = '/path/to/100-million-rows-file'
, rowNumber = 42
nthline(rowNumber, filePath)
.then(line => console.log(line))

According to the documentation, you can use fs.createReadStream(path[, options]), where:
options can include start and end values to read a range of bytes from the file instead of the entire file.
Unfortunately, you have to approximate the desired position/line, but it seems to be no seek like function in node js.
EDIT
The above solution works well with lines that have fixed length.
New line character is nothing more than a character like all the others, so looking for new lines is like looking for lines that start with the character a.
Because of that, if you have lines with variable length, the only viable approach is to load them one at a time in memory and discard those in which you are not interested.

Related

Reading values from a file and outputting each number, largest/smallest numbers, sum, and average of numbers from the file

The issue that I am having is that I am able to read the information from the files, but when I try to convert them from a string to an integer, I get an error. I also have issues where the min/max prints as the entire file's contents.
I have tried using if/then statements as well as using different variables for each line in the file.
file=input("Which file do you want to get the data from?")
f=open('data3.txt','r')
sent='-999'
line=f.readline().rstrip('\n')
while len(line)>0:
lines=f.read().strip('\n')
value=int(lines)
if value>value:
max=value
print(max)
else:
min=value
print(min)
total=sum(lines)
print(total)
I expect the code to find the min/max of the numbers in the file as well as the sum and average of the numbers in the file. The results from the file being processed in the code, then have to be written to a different file. My results have consisted in various errors reading that Python is unable to convert from a str to an int as well as printing the entire file's contents instead of the expected results.
does the following work?
lines = list(open('fileToRead.txt'))
intLines = [int(i) for i in lines]
maxValue = max(intLines)
minvalue = min(intLines)
sumValue = sum(intLines)
print("MaxValue : {0}".format( maxValue))
print("MinValue : {0}".format(minvalue))
print("Sum : {0}".format(sumValue))
print("Avergae : {0}".format(sumValue/len(intLines)))
and this is how my filesToRead.txt is formulated (just a simple one, in fact)
10
20
30
40
5
1
I am reading file contents into a list. Then I create a new list (it can be joined with the previous step as part of some code refactoring) which has all the list of ints.Once when I have the list of ints, its easier to calculate max and min on it.
Note that some of the variables are not named properly. Also reading the whole file in one go (like what I have done here) might be a bad idea if the file is too large. In that case, you should never ever read the whole file in one go. In this case , you need to read it line by line, parse the ints and add them to a list of ints. Once when you are done reading the file, close the file. You can then start your calculations based on the list of ints that you have now obtained.
Please let me know if this resolves your query.
Thanks

Attempting to append all content into file, last iteration is the only one filling text document

I'm trying to Create a file and append all the content being calculated into that file, but when I run the script the very last iteration is written inside the file and nothing else.
My code is on pastebin, it's too long, and I feel like you would have to see exactly how the iteration is happening.
Try to summarize it, Go through an array of model numbers, if the model number matches call the function that calculates that MAC_ADDRESS, when done calculating store all the content inside a the file.
I have tried two possible routes and both have failed, giving the same result. There is no error in the code (it runs) but it just doesn't store the content into the file properly there should be 97 different APs and it's storing only 1.
The difference between the first and second attempt,
1 attempt) I open/create file in the beginning of the script and close at the very end.
2 attempt) I open/create file and close per-iteration.
First Attempt:
https://pastebin.com/jCpLGMCK
#Beginning of code
File = open("All_Possibilities.txt", "a+")
#End of code
File.close()
Second Attempt:
https://pastebin.com/cVrXQaAT
#Per function
File = open("All_Possibilities.txt", "a+")
#per function
File.close()
If I'm not suppose to reference other websites, please let me know and I'll just paste the code in his post.
Rather than close(), please use with:
with open('All_Possibilities.txt', 'a') as file_out:
file_out.write('some text\n')
The documentation explains that you don't need + to append writes to a file.
You may want to add some debugging console print() statements, or use a debugger like pdb, to verify that the write() statement actually ran, and that the variable you were writing actually contained the text you thought it did.
You have several loops that could be a one-liner using readlines().
Please do this:
$ pip install flake8
$ flake8 *.py
That is, please run the flake8 lint utility against your source code,
and follow the advice that it offers you.
In particular, it would be much better to name your identifier file than to name it File.
The initial capital letter means something to humans reading your code -- it is
used when naming classes, rather than local variables. Good luck!

Python3 - How to write a number to a file using a variable and sum it with the current number in the file

Suppose I have a file named test.txt and it currently has the number 6 inside of it. I want to use a variable such as x=4 then write to the file and add the two numbers together and save the result in the file.
var1 = 4.0
f=open(test.txt)
balancedata = f.read()
newbalance = float(balancedata) + float(var1)
f.write(newbalance)
print(newbalance)
f.close()
It's probably simpler than you're trying to make it:
variable = 4.0
with open('test.txt') as input_handle:
balance = float(input_handle.read()) + variable
with open('test.txt', 'w') as output_handle:
print(balance, file=output_handle)
Make sure 'test.txt' exists before you run this code and has a number in it, e.g. 0.0 -- you can also modify the code to deal with creating the file in the first place if it's not already there.
Files only read and write strings (or bytes for files opened in binary mode). You need to convert your float to a string before you can write it to your file.
Probably str(newbalance) is what you want, though you could customize how it appears using format if you want. For instance, you could round the number to two decimal places using format(newbalance, '.2f').
Also note that you can't write to a file opened only for reading, so you probably need to either use mode 'r+' (which allows both reading and writing) combined with a f.seek(0) call (and maybe f.truncate() if the length of the new numeric string might be shorter than the old length), or close the file and reopen it in 'w' mode (which will truncate the file for you).

How to read a file in Node JS into a buffer from end to start (backwards)?

I'm looking for a performance-oriented way to read a file into a buffer backwards (from end to beginning).
The zip file format has a crucial end of central directory record at the end of the file (it could be n bytes back, there is a signature I need to find to know I have got it, so I can't just read the last 22 bytes of the file since there is an optional 64K comment in there).
I couldn't find any discussion on Stack Overflow or using Google on how to accomplish this.
Check out this module: https://github.com/bnoordhuis/node-buffertools
You could use the reverse function given by the module, which creates a new buffer in memory of equal length and loops through the original buffer from the end, appending each element to the front of the new buffer.
You would be better off simply using a loop with the starting index as buffer.length - 1 and decrementing until you get the data you want.

Start loop at specific line of text file in groovy

I am using groovy and I am trying to have a text file be altered at specific line, without looping through all of the previous lines. Is there a way to state the line of a text file that you want to wish to alter?
For instance
Text file is:
1
2
3
4
5
6
I would like to say
Line(3) = p
and have it change the text file to:
1
2
p
4
5
6
I DO NOT want to have to do a loop to iterate through the lines to change the value, aka I do not want to use a .eachline {line ->...} method.
Thank you in advance, I really appreciate it!
I dont think you can skip lines and traverse like this. You could do the skip by using the Random Access File in java, but instead of lines you should be specifying the number of bytes.
Try using readLines() on file text. It will store all your lines in a list. To change content at line n, change content at n-1 index on list and then join on list items.
Something like this will do
//We can call this the DefaultFileHandler
lineNumberToModify = 3
textToInsert = "p"
line( lineNumberToModify, textToInsert )
def line(num , text){
list = file.readLines()
list[num - 1] = text
file.setText(list.join("\n"))
}
EDIT: For extremely large files, it is better that you have a custom implementation. May be something on the lines of what Tim Yates had suggested in the comment on your question.
The above readLines() can easily process upto 100000 lines of text within less than a sec. So you can do something like this:
if(file size < 10 MB)
use DefaultFileHandler()
else
use CustomFileHandler()
//CustomFileHandler
- Split the large file into buckets of acceptable size.
- Ex: Bucket 1(1-100000 lines), Bucket 2(100000-200000 lines), etc.
- if (lineNumberToModify falls in bucket range)
insert into line in the bucket
There is no hard and fast rule to define how you implement your CustomFileHandler as it completely depends on the use case scenario. If you need to do the above operation multiple times on the same file, you can choose to do the complete bucket split first, store them in memory and use the buckets for the following operations. Or if it is a one time operation, you can avoid manipulating all the buckets first but deal with only what you need and process the others later on on-demand basis.
And even within the buckets you can define your own intelligence to speed up your job. Say if you want to insert into 99999 line of a bucket with 1-100000 lines, you can exploit groovy's methods and closures to their fullest,
file.readLines().reverse()[1] = "some text"

Resources