Find simiarities between two text files - text

I NEED to a find a solution for the following,
File A:
a
b
c
File B:
a
b
c
d
I need a script to tell say that Contents of File A is present in File B. (Script or tool)
Thank for all responses and help.

Related

Export results of the command tree in csv file with columns

I need to save the results of our directory structure, only folders, from the linux server and export the results in a csv file with columns.
What I tried and works best is tree -d /path/folder/ -L 3 > file.csv
I tried to combine with column but my knowledge is ye.. limited.
Best would be if I can list the first level of a directory in column A, second level in column B and the last one in column C.
Assuming you want to only list files that are three levels deep:
find . -maxdepth 3 -mindepth 3 | sed 's:./::;s:/:,:g' > file.csv
But I generally don't see any goods from trying to translate a file structure to a csv file. That doesn't seem to be any useful.

How do I compare two spreadsheets to identify missing line items and add them?

I am trying to compare two .xlsx files. What I am looking to do is basically the following:
Does any cell in column B of file1 exist in column B of file2?
If yes, continue.
Else, add the row to file2
The structure of the files is different, so I would need to organize the information being added to file2 to match the format, also, but I think I would be able to do that myself once I know how to do the transfer.
The files are basically a vulnerability export from ACAS and a POA&M. I want to add any existing vulnerabilities from the export that are not already represented on the POA&M.

Select only first line from files under a directory in pyspark

I want to collect all the first line from the files under a directory using Pyspark i tried using
file=sc.wholeTextFiles("Location").map(lambda x: x[0]).collect()
but this is giving me list of files under directory.I want some thing like this below lets say i have two files
file1.csv file2.csv
x,y,z q,r,s
1,2,3 4,5,6
a,b,c d,e,f
I want collect the first lines of the files {x,y,z} and {q,r,s}.Please help me, how can i get only first line from multiple files under a directory
You can do something like the following:
def read_firstline(filename):
with open(filename, 'rb') as f:
return f.readline()
# files is a list of filenames
rdd_of_firstlines = sc.parallelize(files).flatMap(read_firstline)

Python: how to open a file and loop through word for word and compare to a list

I have a file that is all strings and I want to loop through the file and check its contents against another file. Both files are too big to place in the code so I have to open each file with open method and then turn each into a loop that iterates over the file word for word (in each file) and compare every word for every word in other file. Any ideas how to do this?
If the files are both sorted, or if you can produce sorted versions of the files, then this is relatively easy. Your simplest approach (conceptually speaking) would be to take one word from file A, call it a, and then read a word from file B, calling it b. Either b is alphabetically prior to a, or it is after a, or they are the same. If they are the same, add the word to a list you're maintaining. If b is prior to a, read b from file B until b >= a. If equal, collect that word. If a < b, obviously, read a from A until a >= b, and collect if equal.
Since file size is a problem, you might need to write your collected words out to a results file to avoid running out of memory. I'll let you worry about that detail.
If they are not sorted and you can't sort them, then it's a harder problem. The naive approach would be to take a word from A, and then scan through B looking for that word. Since you say the files are large, this is not an attractive option. You could probably do better than this by reading in chunks from A and B and working with set intersections, but this is a little more complex.
Putting it as simply as I can, I would read in a reasonably-sized chunks of file A, and convert it to a set of words, call that a1. I would then read similarly-sized chunks of B as sets b1, b2, ... bn. The union of the intersections of (a1, b1), (a1, b2), ..., (a1, bn) is the set of words appearing in a1 and B. Then repeat for chunk a2, a3, ... an.
I hope this makes sense. If you haven't played with sets, it might not, but then I guess there's a cool thing for you to learn about.
I found the answer. There is a pointer when reading files . The problem is that when using a nested loop it doesn't redirect back to the next statement in the outer loop for Python.

Create output in an excel file consisting of data rows

I am a beginner in netlogo. I find the Netlogo Manual not always as explicit as I feel it should be, as exemplified by the folowing task. My feeling is that this task ought to be relatively simple, but so far I have not been able to accomplish it. I have searched for hints on this forum to help me, but perhaps my problem is so simple that nobody has yet come up with a corresponding question.
The task is: Write data into columns of an Excel file such that for each tick six data points form a row of data across six columns (say across columns A, B, C, D, E, F of the Excel file). I do not find the command to assure that after each data point, having been entered into its column, the next column is selected to enter there the next data point.
This starts already with the headings of the columns for which I give the commands in the setup procedure as below:
...
if (file-exists? "TO_test.csv") [carefully [file-delete "TO_test.csv"] [print error-message]]
file-open "TO_test.csv"
file-type "number,"
file-type "name,"
file-type "age,"
file-type "height,"
file-type "income,"
file-type "status,"
file-close
....
The output in the Excel file is then
number, name,age,height,income,status,
all in one comlumn. I use the 'type' command because that assures that entries are made in the same line. I added the ‘,’ to each string because I believe to have picked up somewhere that this causes the shift to the next column (which, however, it does not). If I use 'file-print' instead of 'file-type' I get an entry in successive lines instead of columns, because here the ‘,’ causes a ‘return’ command.
What I would like to get is the following (where the first line shows the given headings of the Excel file):
A B C D E F
number name age height income status

Resources