Reading all but last few lines of a data file - visual-c++

I can easily skip the header of a data file using getline, but then when I parse through the data file and get to the footer of the file, I end up stuck in a loop because the program is trying to parse columns of data that no longer exist. Is there an easy way to stop reading when there is no longer data in the line? It looks like there is a blank line followed by some footer information, but I cannot guarantee that all of my data files will look like that (i.e. I need something pretty generic).

Looking at your existing code (edit your question and put it there, not in a comment), I see you have nested loops. But what you really want is one loop with two reasons to exit.
while ((q < 16) && (liness >> temp)) { ... }

Read the line into a string, parse if only if you see \n at the end.

Related

Is there a way to compare the format in which a line of a text file was written in Fortran?

I'm developing a Fortran program that must obtain some data from a text file and generate another text file using specific data from the first one.
The input file have many lines written in several specific formats which I know of. Although I know the formats, the lines in this file are generated in a "random way".
It would be much easier to generate the output file if I could compare the format in which each line was written, then I would know exactly what data I can get from that line of the input file to use it in the output file.
What I need is something like, for example, knowing that the format of the line read and stored in the LINHA variable is described in the FORMATO variable, do something like:
    
IF (FORMATO = '(1X, 15,3F8.1,2 (5A, 1X))') THEN
READ (LINHA, '(6X, F8.1)') my_variable
END IF
Because there might be another format such as
'(6A, 2F8.1, F8.6,2 (6A))'
in which, if I use the same READ statement, I will read an F8.1 variable in my_variable, however this value is not the correct one.
A (not so elegant) work-around that I can think of is to read the entire line using the advance = no option of read() and parse each character in the line separately. While doing so, you may count white spaces or other specific characters that you know of and then identify the different formats from there.
It would be helpful if you could give more specifications of the nature of the task.
The best option is to read without format, keeping each line in a character array. Then read the line variable as an internal file with the required format using the variable IOSTAT in order to check if the format is the correct.
INT max_size = 80
CHARACTER(LEN=max_size) :: line
READ(*,*) line
READ(line,'(1X, 15,3F8.1,2 (5A, 1X))',IOSTAT=ios) var1, var2, ...
Problem solved using a mixture of some of the suggestions posted.
I read each of the lines of the input file in an internal variable (RLINFILE) in the format '(A165)'. After that, I read all the contents of the string that I put in this internal variable in several dummy variables, using the format I knew of the lines from where I wanted to get some information (read all the information of the line in the desired and get IOSTAT = 0 guarantee that this is the correct line), so if the result of the reading is ok (IOSTAT = 0), it is because the line I just read was the correct one for the information I wanted, so I store the contents of some of the dummy variables that represent the values that interest me. In the code, the solution looked something like this:
OPEN(UNIT=LU1,FILE=RlinName,STATUS='OLD')
ilin = 0
formato = '(14X,A,1X,F7.1,1X,F7.1,5X,A,1X,A,1X,A,5X,A,I5,1X,A,I3,3F8.1,A,A,A,1X,A,2(1X,F8.2),1X,A,1X,A)'
DO WHILE (.TRUE.)
READ(LU1,'(A165)',END=300) RLINFILE
READ(RLINFILE,formato,IOSTAT=linhaok) dum2_a1,dum2_f1,dum2_f2,dum2_a2,dum2_a3,dum2_a4,dum2_a5,dum2_i1,dum2_a6,dum2_i2,dum2_f3,dum2_f4,dum2_f5,dum2_a7,dum2_a8,dum2_a9,dum2_a10,dum2_f6,dum2_f7,dum2_a11,dum2_a12
IF(linhaok.EQ.0) THEN
ilin = ilin+1
rlin_lshu(ilin) = dum2_a4
rlin_nbpa(ilin) = dum2_i1
rlin_ncir(ilin) = dum2_i2
rlin_ppij(ilin) = dum2_f3
rlin_pqij(ilin) = dum2_f4
rlin_tapn(ilin) = dum2_a7
END IF
END DO
300 CLOSE(UNIT=LU1)
The description of the problem you are trying to solve is a bit vague to me, but the simplest solutions that comes to my mind, given the description of the problem, is to modify the original code that generates the input data file, to write the used Fortran READ format before the data line in the input file. This way, you can read the format as a string and use it in the subsequent data IO in your second code.
If you describe the specific task your tryting to accomplish in more details, perhaps more experienced Fortranners could help.

Having some issues with Perl Splitting and Merging Functions

First and foremost, I'm not familiar with Perl at all. I've been studying C++ primarily for the last 1/2 year. I'm in a class now that that is teaching Linux commands, and we have short little topics on languages used in Linux, including Perl, which is totally throwing me for a loop (no pun intended). I have a text file that contains a bunch of random numbers separated by spaces and tabs, maybe even newlines, that gets read into the program via a filehandle. I'm supposed to write 2 lines of code that split the lines of numbers and merge them into one array, inside of a foreach loop. I'm not looking for an answer, just a nudge in the right direction. I've been trying different things for multiple hours and feel totally silly I can't get it, I'm totally lost with the syntax. Its just a bit odd not working inside a compiler and out of my comfort zone working outside of C++. I really appreciate it. I've included a few photos. Basically, the code we are writing it just to store the numbers and the rest of the program will determine the smallest number and sum of all numbers. Mine is currently incorrect because I'm not sure what to do. In the output photo, it will display all the numbers being entered in via the text file, so you can see them.
Several things to fix here. First of all, please don't post screenshots of your sample data or code, as it makes it impossible to copy and paste to test your code or data. Post your code/data by indenting it with four spaces and a newline preceding the code block.
Add use strict; in your script. This should be lesson 0 in your class. After that add my to all variable declarations.
To populate #all_numbers with contents of each line's numbers, without using push, you can use something like this:
foreach my $line (#output_lines)
{
my #numbers = split /\s/, $line;
#all_numbers = (#all_numbers, #numbers);
}
You say you're "not looking for an answer," so here's your nudge:
You're almost there. You split each line well (using split/\s/) and store the numeric values in #all_numbers. However, notice that each time around in the loop, you replace (using the assignment, #all_numbers = ...) the whole contents of #all_numbers with the numbers you found in the current line. Effectively, you're throwing away everything you've stored from the previous lines.
Instead, you want to add to #all_numbers, not replace #all_numbers. Have a look at the push() function for how to do this.
NB: Your split() call is fine, but it's more customary to use split(' ', $line) in this case. (See split(): you can use a single space, ' ', instead of the pattern, /\s/, when you want to split on any whitespace.)
I hope you need to store the all splitting element into array, so you looking for push function.
foreach $line (#input_lines)
{
push(#all_numbers,split(/\s/,$line));
}
Your problem is, in every iteration, the splitted value is over written in an array not to append together. For example,
#array = qw(one two three);
#array = qw(five four seven);
print "#array";
output is five four seven not the one two three five four seven because this is reinitialize with a new values. You want to append the new values in the array in before or after use unshift or push
for example
#array = qw(one two three);
push(#array,qw(five four seven));
Another way:
my #all_numbers = map { split ' ', $_ } #output_lines;
See http://perldoc.perl.org/functions/map.html

Comparing strings in python 2.7

This is my code:
for films in filmlist:
with codecs.open('peliculas.txt', encoding='utf8', mode='r') as lfile:
filmsDone = lfile.read()
filmsDoneList = filmsDone.split(',')
if films not in filmsDoneList:
with codecs.open('peliculas.txt', encoding='utf8', mode='a+') as lfile:
lfile.write(films.strip() + ',')
It will never recognize the last item of the list.
I have printed filmsDoneList and the last item in PyCharm looks like this: u'X Men.Primera Generacion'. I have printed films and they looks like this: X Men.Primera Generacion'
So I have no idea where is the problem. Thanks in advance.
#Rafa, for you to better understand what I meant in the comments, I had to write an entire answer in order for me to attach codes and screenshots.
Let's say the peliculas.txt file has the following format:
You can import such file in python according the following 3 commands:
fileIN=open('peliculas.txt','r')
filmsDoneList=fileIN.readlines()
fileIN.close()
So you basically open the file, import each line thanks to readlines() and then close the file because its contents are available in filmsDoneList. The latter has the following contents (in PyCharm):
Obviously this list is quite long and does not fit in my screen, but you get the point.
You can now get rid of that annoying newline tag '\r\n' by means of the following loop:
for id in range(len(filmsDoneList)):
filmsDoneList[id]=filmsDoneList[id].strip()
and now filmsDoneList has the form:
much better now, innit?
Now, let's say you want to add the following films:
newFilms=['The Exorcist','Back to the Future','Aliens','Back to the Future']
To make your code more robust, I have added Back to the Future twice. Basically you can get rid of duplicates in newFilms by means of the set() function. This will convert newFilms in a set with duplicates removed, but we will convert it back to a list thanks to this command:
newFilms=list(set(newFilms))
and now newFilms has the form:
Now that everything has been sorted, it's time to check if items in newFilms already are in filmsDoneList which, recall, is the contents of peliculas.txt.
Reopen peliculas.txt as follows:
fileOUT=open('peliculas.txt','a')
the 'a' tag means "append", so basically everything you write will be added to the file without removing anything from it.
And the main loop goes:
for film in newFilms:
if film in filmsDoneList:
pass
else:
fileOUT.write(film+'\n')
the pass means "do nothing". The write commands also appends the newline tag to the movie title: this will keep the previous format of 1 title per line. At the end of this loop you might as well close fileOUT.
The resulting peliculas.txt is
and, as you can see, Back to the Future was in newFilms but wasn't appended to the end of this file because already was in it. As instead, The Exorcist and Aliens have been appended to this file, at the bottom.
If your file has titles separated by commas, this approach is still valid. However you must add
filmsDoneList=filmsDoneList[0].split(',')
after the first for loop. Also in the write function (in the last for loop) you might want to replace the newline value with a comma.
This approach is cleaner, I reckon will also fix the problem you've been having and avoids continuous open/close files in a loop. Hope this helps!

need guidance with basic function creation in MATLAB

I have to write a MATLAB function with the following description:
function counts = letterStatistics(filename, allowedChar, N)
This function is supposed to open a text file specified by filename and read its entire contents. The contents will be parsed such that any character that isn’t in allowedChar is removed. Finally it will return a count of all N-symbol combinations in the parsed text. This function should be stored in a file name “letterStatistics.m” and I made a list of some commands and things of how the function should be organized according to my professors' lecture notes:
Begin the function by setting the default value of N to 1 in case:
a. The user specifies a 0 or negative value of N.
b. The user doesn’t pass the argument N into the function, i.e., counts = letterStatistics(filename, allowedChar)
Using the fopen function, open the file filename for reading in text mode.
Using the function fscanf, read in all the contents of the opened file into a string variable.
I know there exists a MATLAB function to turn all letters in a string to lower case. Since my analysis will disregard case, I have to use this function on the string of text.
Parse this string variable as follows (use logical indexing or regular expressions – do not use for loops):
a. We want to remove all newline characters without this occurring:
e.g.
In my younger and more vulnerable years my father gave me some advice that I've been turning over in my mind ever since.
In my younger and more vulnerableyears my father gave me some advicethat I’ve been turning over in my mindever since.
Replace all newline characters (special character \n) with a single space: ' '.
b. We will treat hyphenated words as two separate words, hence do the same for hyphens '-'.
c. Remove any character that is not in allowedChar. Hint: use regexprep with an empty string '' as an argument for replace.
d. Any sequence of two or more blank spaces should be replaced by a single blank space.
Use the provided permsRep function, to create a matrix of all possible N-symbol combinations of the symbols in allowedChar.
Using the strfind function, count all the N-symbol combinations in the parsed text into an array counts. Do not loop through each character in your parsed text as you would in a C program.
Close the opened file using fclose.
HERE IS MY QUESTION: so as you can see i have made this list of what the function is, what it should do, and using which commands (fclose etc.). the trouble is that I'm aware that closing the file involves use of 'fclose' but other than that I'm not sure how to execute #8. Same goes for the whole function creation. I have a vague idea of how to create a function using what commands but I'm unable to produce the actual code.. how should I begin? Any guidance/hints would seriously be appreciated because I'm having programmers' block and am unable to start!
I think that you are new to matlab, so the documentation may be complicated. The root of the problem is the basic understanding of file I/O (input/output) I guess. So the thing is that when you open the file using fopen, matlab returns a pointer to that file, which is generally called a file ID. When you call fclose you want matlab to understand that you want to close that file. So what you have to do is to use fclose with the correct file ID.
fid = open('test.txt');
fprintf(fid,'This is a test.\n');
fclose(fid);
fid = 0; % Optional, this will make it clear that the file is not open,
% but it is not necessary since matlab will send a not open message anyway
Regarding the function creation the syntax is something like this:
function out = myFcn(x,y)
z = x*y;
fprintf('z=%.0f\n',z); % Print value of z in the command window
out = z>0;
This is a function that checks if two numbers are positive and returns true they are. If not it returns false. This may not be the best way to do this test, but it works as example I guess.
Please comment if this is not what you want to know.

How to make this Groovy string search code more efficient?

I'm using the following groovy code to search a file for a string, an account number. The file I'm reading is about 30MB and contains 80,000-120,000 lines. Is there a more efficient way to find a record in a file that contains the given AcctNum? I'm a novice, so I don't know which area to investigate, the toList() or the for-loop. Thanks!
AcctNum = 1234567890
if (testfile.exists())
{
lines = testfile.readLines()
words = lines.toList()
for (word in words)
{
if (word.contains(AcctNum)) { done = true; match = 'YES' ; break }
chunks += 1
if (done) { break }
}
}
Sad to say, I don't even have Groovy installed on my current laptop - but I wouldn't expect you to have to call toList() at all. I'd also hope you could express the condition in a closure, but I'll have to refer to Groovy in Action to check...
Having said that, do you really need it split into lines? Could you just read the whole thing using getText() and then just use a single call to contains()?
EDIT: Okay, if you need to find the actual line containing the record, you do need to call readLines() but I don't think you need to call toList() afterwards. You should be able to just use:
for (line in lines)
{
if (line.contains(AcctNum))
{
// Grab the results you need here
break;
}
}
When you say efficient you usually have to decide which direction you mean: whether it should run quickly, or use as few resources (memory, ...) as possible. Often both lie on opposite sites and you have to pick a trade-off.
If you want to search memory-friendly I'd suggest reading the file line-by-line instead of reading it at once which I suspect it does (I would be wrong there, but in other languages something like readLines reads the whole file into an array of strings).
If you want it to run quickly I'd suggest, as already mentioned, reading in the whole file at once and looking for the given pattern. Instead of just checking with contains you could use indexOf to get the position and then read the record as needed from that position.
I should have explained it better, if I find a record with the AcctNum, I extract out other information on the record...so I thought I needed to split the file into multiple lines.
if you control the format of the file you are reading, the solution is to add in an index.
In fact, this is how databases are able to locate records so quickly.
But for 30MB of data, i think a modern computer with a decent harddrive should do the trick, instead of over complicating the program.

Resources