gathering two files to another file - io
i have two files file1.txt, file2.txt, I would like collected them in file3.txt after checking lines of two files.
example :
file1.txt
line 1 T
line 2 F
line 3 T
line 4 T
line 5 F
line 6 F
file2.txt
line 1 T
line 2 T
line 3 F
line 4 T
line 5 F
line 6 T
file3.txt
file1
********************
number of line = 6
number of true = 3
number of false = 3
********************
line 1 T
line 3 T
line 4 T
file2
********************
number of line = 6
number of true = 4
number of false = 2
********************
line 1 T
line 2 T
line 4 T
line 6 T
For each file, the resulting file should contain a header showing the number of lines, the number of lines that were true (T) and the number of lines that were false (F). Next only the lines that were true are printed.
some help ?
Here's a rough sketch to get you started using swi-prolog and library(pio).
We take lines//1, as defined my #mat in his answer to the related question "Read a file line by line in Prolog". Using tpartition/4 and prefix_of_t/3 we then write:
?- set_prolog_flag(double_quotes , codes),
set_prolog_flag(toplevel_print_anon, false).
true.
?- phrase_from_file(lines(_Ls), 'file1.txt'),
maplist(reverse, _Ls, _Rs),
tpartition(prefix_of_t("T"), _Rs, _Ts0, _Fs),
maplist(reverse, _Ts0, _Ts),
forall(member(X,_Ts), format('~s~n',[X])).
line 1 T
line 3 T
line 4 T
true.
You can do this as follows. We first define a predicate merge/2 with the first item a list of filenames that must be read (here [file1.txt,file2.txt]), and the second parameter the name of the file to which you wish to write (here file3.txt). Now we define merge/2 as:
merge(Inp,Outp) :-
open(Outp,write,Outs),
mergeS(Inp,Outs).
So we open a file with the name Outp and the get the corresponding stream Outs, we then call mergeS/2.
There are two cases for mergeS/2:
all input files have been processed, so we can stop processing and close the stream:
mergeS([],OutS) :-
close(OutS).
there is still at least one file we need to process:
mergeS([H|T],OutS) :-
open(H,read,InS),
atom_chars(FileN,H),
process(FileN,InS,OutS),
close(InS),
mergeS(T,OutS).
The core of this predicate is evidently process/3, but in order to make things more convenient. We do the file handling already in mergeS.
Next our process/3 predicate reads:
process(Header,InS,OutS) :-
get_lines(InS,Lin,NL,NT,NF),
write(OutS,Header),nl(OutS),
write(OutS,'********************'),nl(OutS),
write(OutS,'number of line = '),write(OutS,NL),nl(OutS),
write(OutS,'number of true = '),write(OutS,NT),nl(OutS),
write(OutS,'number of false = '),write(OutS,NF),nl(OutS),
write(OutS,'********************'),nl(OutS),
print_lines(OutS,Lin),
nl(OutS).
We first gather the content of the file with get_lines/5. This predicate will simultaneously calculate statistics like the number of lines, the number of trues and the number of false. Next we use a number of write/2 and nl/1 statements to write statistics to the output file and then use a predicate print_lines/2 that will write the content of the file to file3.txt:
get_lines and get_line
get_lines/5 uses three accumulators to calculate statistics. This is done by initializing three accumulators and then make a call to get_lines/8:
get_lines(Ins,Lin,NL,NT,NF) :-
get_lines(Ins,Lin,0,0,0,NL,NT,NF).
get_lines/8 is a recursive function that processes one line at a time, determines whether the line is T or F, updates the accumulators, and will parse the next line. It is however possible that we are given an empty file. In order to make our approach more robust we thus write:
get_lines(InS,[],NL,NT,NF,NL,NT,NF) :-
at_end_of_stream(InS),
!.
and the recursive case:
get_lines(InS,[H|T],NL0,NT0,NF0,NL,NT,NF) :-
get_line(InS,HCs),
inspect(HCs,NL0,NT0,NF0,NL1,NT1,NF1),
atom_chars(H,HCs),
get_lines(InS,T,NL1,NT1,NF1,NL,NT,NF).
get_line/2 simply reads the next line of the file, and returns a stream of characters. It terminates at '\n' (inclusive, but not in the result).
get_line(InS,[]) :-
at_end_of_stream(InS),
!.
get_line(InS,[H|T]) :-
get_char(InS,H),
H \= '\n',
!,
get_line(InS,T).
get_line(_,[]).
inspect/7
Now we still have to inspect our line with inspect/7. We again use three accumulators (as you could already guess based on the implementation of get_lines/8). First we fetch the last character of the line. If no such character exists, the statistics won't change (it could be possible somewhere an empty line is introduced). Otherwise we obtain the last character with last/2 and inspect it with inspect_last/7:
inspect(L,NL0,NT0,NF0,NL,NT,NF) :-
last(L,LL),
!,
NL is NL0+1,
inspect_last(LL,NT0,NF0,NT,NF).
inspect(_,NL0,NT,NF,NL1,NT,NF) :-
!,
NL1 is NL0+1.
inspect_last determines whether the last character is a T, F or something else, and updates the accumulators accordingly:
inspect_last('T',NT0,NF,NT1,NF) :-
!,
NT1 is NT0+1.
inspect_last('F',NT,NF0,NT,NF1) :-
!,
NF1 is NF0+1.
inspect_last(_,NT,NF,NT,NF).
print_lines/2:
finally we still need to print our file to the output. This is done using print_lines/2 which is rather straightforward:
print_lines(_,[]) :-
!.
print_lines(OutS,[H|T]) :-
write(OutS,H),
nl(OutS),
print_lines(OutS,T).
Full code
The full code:
merge(Inp,Outp) :-
open(Outp,write,Outs),
mergeS(Inp,Outs).
mergeS([],OutS) :-
close(OutS).
mergeS([H|T],OutS) :-
open(H,read,InS),
atom_chars(FileN,H),
process(FileN,InS,OutS),
close(InS),
mergeS(T,OutS).
process(Header,InS,OutS) :-
get_lines(InS,Lin,NL,NT,NF),
write(OutS,Header),nl(OutS),
write(OutS,'********************'),nl(OutS),
write(OutS,'number of line = '),write(OutS,NL),nl(OutS),
write(OutS,'number of true = '),write(OutS,NT),nl(OutS),
write(OutS,'number of false = '),write(OutS,NF),nl(OutS),
write(OutS,'********************'),nl(OutS),
print_lines(OutS,Lin),
nl(OutS).
get_lines(Ins,Lin,NL,NT,NF) :-
get_lines(Ins,Lin,0,0,0,NL,NT,NF).
get_lines(InS,[],NL,NT,NF,NL,NT,NF) :-
at_end_of_stream(InS),
!.
get_lines(InS,[H|T],NL0,NT0,NF0,NL,NT,NF) :-
get_line(InS,HCs),
inspect(HCs,NL0,NT0,NF0,NL1,NT1,NF1),
atom_chars(H,HCs),
get_lines(InS,T,NL1,NT1,NF1,NL,NT,NF).
get_line(InS,[]) :-
at_end_of_stream(InS),
!.
get_line(InS,[H|T]) :-
get_char(InS,H),
H \= '\n',
!,
get_line(InS,T).
get_line(_,[]).
inspect(L,NL0,NT0,NF0,NL,NT,NF) :-
last(L,LL),
!,
NL is NL0+1,
inspect_last(LL,NT0,NF0,NT,NF).
inspect(_,NL0,NT,NF,NL1,NT,NF) :-
!,
NL1 is NL0+1.
inspect_last('T',NT0,NF,NT1,NF) :-
!,
NT1 is NT0+1.
inspect_last('F',NT,NF0,NT,NF1) :-
!,
NF1 is NF0+1.
inspect_last(_,NT,NF,NT,NF).
print_lines(_,[]) :-
!.
print_lines(OutS,[H|T]) :-
write(OutS,H),
nl(OutS),
print_lines(OutS,T).
If one now queries:
?- merge(["file1.txt","file2.txt"],"file3.txt").
true.
a file named file3.txt is constructed with content:
file1.txt
********************
number of line = 6
number of true = 3
number of false = 3
********************
line 1 T
line 2 F
line 3 T
line 4 T
line 5 F
line 6 F
file2.txt
********************
number of line = 6
number of true = 4
number of false = 2
********************
line 1 T
line 2 T
line 3 F
line 4 T
line 5 F
line 6 T
Which is more or less what you want. Please comment if further errors occur.
EDIT
Somehow I didn't get you wanted to filter the lines and only show the ones that are true (T). You can do so by simply modifying get_lines/7:
get_lines(InS,[],NL,NT,NF,NL,NT,NF) :-
at_end_of_stream(InS),
!.
get_lines(InS,Res,NL0,NT0,NF0,NL,NT,NF) :-
get_line(InS,HCs),
inspect(HCs,NL0,NT0,NF0,NL1,NT1,NF1),
atom_chars(H,HCs),
( NT1 > NT0
-> Res =[H|T]
; Res = T
),
get_lines(InS,T,NL1,NT1,NF1,NL,NT,NF).
(added boldface where the content changed)
This is not the most elegant way to do it, but it is probably the shortest fix.
Now the file on my machine shows:
file1.txt
********************
number of line = 6
number of true = 3
number of false = 3
********************
line 1 T
line 3 T
line 4 T
file2.txt
********************
number of line = 6
number of true = 4
number of false = 2
********************
line 1 T
line 2 T
line 4 T
line 6 T
Related
How to skip N central lines when reading file?
I have an input file.txt like this: 3 2 A 4 7 B 1 9 5 2 0 I'm trying to read the file and when A is found, print the line that is 2 lines below when B is found, print the line that is 4 lines below My current code and current output are like below: with open('file.txt') as f: for line in f: if 'A' in line: ### Skip 2 lines! f.readline() ### Skipping one line line = f.readline() ### Locate on the line I want print(line) if 'B' in line: ## Skip 4 lines f.readline() ### Skipping one line f.readline() ### Skipping two lines f.readline() ### Skipping three lines line = f.readline() ### Locate on the line I want print(line) '4\n' 7 '1\n' '9\n' '5\n' 2 >>> Is printing the values I want, but is printing also 4\n,1\n... and besides that, I need to write several f.realines()which is not practical. Is there a better way to do this? My expected output is like this: 7 2
Here is a much simpler code for you: lines=open("file.txt","r").read().splitlines() #print(str(lines)) for i in range(len(lines)): if 'A' in lines[i]: print(lines[I+2]) # show 2 lines down elif 'B' in lines[i]: print(lines[I+4]) # show 4 lines down This reads the entire file as an array in which each element is one line of the file. Then it just goes through the array and directly changes the index by 2 (for A) and 4 (for B) whenever it finds the line it is looking for.
if you don't like repeated readline then wrap it in a function so the rest of the code is very clean: def skip_ahead(it, elems): assert elems >= 1, "can only skip positive integer number of elements" for i in range(elems): value = next(it) return value with open('file.txt') as f: for line in f: if 'A' in line: line = skip_ahead(f, 2) print(line) if 'B' in line: line = skip_ahead(f, 4) print(line) As for the extra output, when the code you have provided is run in a standard python interpreter only the print statements cause output, so there is no extra lines like '1\n', this is a feature of some contexts like the IPython shell when an expression is found in a statement context, in this case f.readline() is alone on it's own line so it is detected as possibly having a value that might be interesting. to suppress this you can frequently just do _ = <expr> to suppress output.
Seeing if a character in line a is also in line b
I am trying to write a python code see to if a character in line a is also in line b but not in the same position. I do not know where to get started for example : line a = 1365 line b = 3487 How would I write a code that tells me that 3 is in line a and in line b but not in the same position
Set in python replacing number 10 from a file with 0,1
I have a file with the line:1 2 3 4 5 10. When I add this line to a set in Python, I get {1,2,3,4,5,0} instead of {1,2,3,4,5,10}. How do I code so that I get the 10 inside the set instead of it recognizing it as a 1 and a 0? EDIT: This was the code I wrote: states = set() line = open("filepath", "r").readlines()[0] states.add(line) print (states) Input file content: 1 2 3 4 5 10
As set cannot have a same number twice, the zero which belongs to 10 is being treated as a unique element thus set cannot contain two same elements. Do something like this to fix it (Assuming you don't have newline characters, if you do, just use the strip method.): line = open("filepath", "r").readlines()[0] line = line.split(' ') #Split by Space number_set = set(line) #Since file is a list after splitting.
Python 3.x - don't count carriage returns with len
I'm writing the following code as part of my practice: input_file = open('/home/me/01vshort.txt', 'r') file_content = input_file.read() input_file.close() file_length_question = input("Count all characters (y/n)? ") if file_length_question in ('y', 'Y', 'yes', 'Yes', 'YES'): print("\n") print(file_content, ("\n"), len(file_content) - file_content.count(" ")) It's counting carriage returns in the output, so for the following file (01vshort.txt), I get the following terminal output: Count all characters (y/n)? y 0 0 0 1 1 1 9 ...or... Count all characters (y/n)? y 0 00 111 9 In both cases, the answer should be 6, as there are 6 characters, but I'm getting 9 as the result. I've made sure the code is omitting whitespace, and have tested this with my input file by deliberately adding whitespace and running the code with and without the line: - file_content.count(" ") Can anyone assist here as to why the result is 9 and not 6? Perhaps it isn't carriage returns at all? I'm also curious as to why the result of 9 is indented by 1 whitespace? The input file simply contains the following (with a blank line at the end of the file, line numbers indicated in the example): 1. 0 2. 0 0 3. 1 1 1 4. ...or... 1. 0 2. 00 3. 111 4. Thanks.
If you want to ignore all whitespace characters including tabs and newlines and other control characters: print(sum(not c.isspace() for c in file_content)) will give you the 6 you expect. Alternatively you can take advantage of the fact the .split() method with no argument will split a string on any whitespace character. So split it into non-space chunks and then join them all back together again without the whitespace characters: print(len(''.join(file_content.split())))
You're getting 9 because the content of the file could be interpreted like: file_content = "0\n0 0\n1 1 1\n" and you're only matching the white spaces (file_content.count(" ")). In order to count only the characters you'd either: read line by line the file, or use a regexp to match white space. For the indenting of 9: print processes the commas as outlined here
Pulling a list of lines out of a string
Beginning Line 2 Line 3 Line 4 Line 5 Line 6 End Trying to pull off line 2 through line 6. Can't do it to save my soul. a is the saved string I'm searching through. b = re.findall(r'Beginning(.*?)End', a) Doesn't give me a thing, just a blank b. I know it's because of the newlines but how do I go about detecting and moving on forward with the newlines. I've tried, not knowing exactly for sure how I'm suppose to use MULTILINE or DOTALL. Nothing changed. How do I go about getting it to put lines 2 through 6 in b? To add in this will occur multiple times through the same file that I need to perform this search and pull technique. I have no other easy way of doing this since the information in Lines 2-6 need to be looked through further to pull off data that will be put into a csv file. Since some of the data contains hours and some of the data doesn't contain hours, aka Unavailable, I need to be able to pull off and differentiate between the two occurrences.
string = """Beginning Line 2 Line 3 Line 4 Line 5 Line 6 End """ lines = string.splitlines() answer = [] flag = False for line in lines: line = line.strip() if not line: continue if line == "Beginning": flag = True continue if line == "End": flag = False if not flag: continue answer.append(line) Output: In [209]: answer Out[209]: ['Line 2', 'Line 3', 'Line 4', 'Line 5', 'Line 6']
You could make a function that takes a multi-line string, then a starting line, and an ending line. def Function(string, starting_line, ending_line): if "\n" in string: #Checks for whether or not string is mult-line pass else: return "The string given isn't a multiline string!" #If string isn't multiline, then Function returns a string explaining that string isn't a multi-line string if ending_line < starting_line: #Checks if ending_line < starting_line return "ending_line is greater than starting_line!" #If ending_line < starting_line, then Function returns a string explaining that ending_line > starting_line array = [] #Defines an array for i in range(len(string)): #Loops through len(string) if list(string)[i] = "\n": #Checks whether list(string)[i] = a new line array.append(i) #Appends i to array return string[array[starting_line - 1]::array[ending_line - 1]] print(Function(a, 3, 7)) This code should return: Line 1 Line 2 Line 3 Line 4 Line 5 Line 6