find lines from one file in another efficiently - search

I have two files referencing some objects, the first file contains a label and a corresponding id value on each line as follows:
label : 123456789
anotherlabel : 987654321
yetanotherlabel : 567891234
The second file contains a subset of records from file one that meet certain criteria, but it only lists the ID. It's a flat one column list as follows
987654321
123456789
I want to make a third file that will contain one column listing the labels from the first file that correspond to the ids from the second file. So in this example it would be
anotherlabel
label
These files are fairly big so I'm looking for an efficient solution. How should I go about this?
Thanks!

You can upload file 2 into hashtable(if it fit into the memory), and thereafter iterate file 1, and parse. If ID match, then print for appropriate ID.

Related

Automatically update the file path of Vlookup to the newest workbook added to a folder

Hello each month we receive a series of monthly returns from different accounts which go into a designated folder based on the account name. Each return has the new month's returns appended to all the previous monthly returns. I am running a vlookup function on my workbook based on the specific return I am looking for. Is it possible to change the source on the vlookup function so it takes the data from the most recently added file in the folder, that way it will contain all the most recent return data with all the previous returns?
Thanks
There are many ways to do that. The first step should be to connect to the designated folder. You should see then something like this:
Option 1: If the file contains the month
If your file contains the month you can use it to extract this information. Following the example above you could:
extract the first 7 charactes and parse it to a date.
sort the date in descending order, so the latest file will be on top
use keep rows to get rid of the rest of files
with the last file remaining, expand the content
Option 2: Use file properties
When you connect to a folder you can see the field "Date created". Use this the same way as explained in option 1.
Option 3: Remove duplicates
If for whatever reason the two options above are not possible, depending on your data you can:
join all files which will lead to duplicates
filter duplicates
This third option might not work if you could have two registers which look the same (all columns in the row have the same value) can appear in your dataset.

How can I add comments at the top of data file that I have created using savetxt function in python 3.0?

Using the savetxt function, I created a data file named as 'output.dat' to which two arrays were written as two different columns. So the file output.dat contains 2 columns of data. Now I want to add headings at the top of each column that would help me to remind what the file contains when I refer back the file later. Say, I want to put the heading 'Time' on the top of the first column and 'Voltage' on the top of the second. How can I do this?

Excel VBA Textfile to 2d array

I am new to excel vba. I want to read a textfile that contains text like this:
John Smith Engineer Chicago
Bob Alice Doctor New York
Jane Smith Teacher St. Louis
So, I want to convert this into a 2D array so if I do print(3,3), it should return 'Teacher'.
I am able to read entire file contents into one string but am having difficulty in converting it to
a 2d array like above. Please advice on how to proceed. Thanks
unless the text file has some specific structure to it, you're going to struggle a bit. Things that might make it easier are:
Does the text file contain line breaks at the end of each line?
Are all the names in [FirstName][LastName] format as per your example
or might some have more/less words?
Does the Occupation always come directly after the name?
Are there a (very) limited number of Occupations?
as mentioned by NautMeg, You have to make some assumptions on the data based on the provided template.
However we can assume that :
a space is the delimiter
The Final column is City, which can contain a space
there are 4 columns
First Name
Last Name
Profession
City/Location
Using this information:
While Not EOF(my_file)
Line Input #my_file, text_line
// text_line contains the independent line
i = i + 1
// i is the line number
Wend
is how we retrieve each line.
Split ( Expression, [Delimiter], [Limit], [Compare] )
This will give you each item in the list. For index's < 3 (0 based index), they are unique columns of data and you can handle them however you want.
For Index >=3, Join these together into 1 string .
Join( SourceArray, [Delimiter] )
You'll likely want to make the delimiter in this case a simple space, since the split function will remove the space.
That will allow you to parse the data AS is.
However, for future reference if you can control the export of the text file, you should try exporting as a CSV file.
Good luck

compare 2 string with multiple results

There are 2 files containing strings. The first strings in the first file are contained in the second file. So as an example there is the file which contains the following elements: table, apple and house. And in the second file there is tableleg, tabletop, applerings, applecake, and so on. Now the second file should be searched for table and the result should be the tableleg and tabletop. Is there a function for it or is Excel VBA needed?
I tried it with the following =VLOOKUP("*C3*";'File2'!A:A;1;0). That works well only unfortunately he only gives me one result per element.

Creation of a Structure of Cells - MATLAB

I have a set of data as can be seen from the attached snapshot. As can be observed, it is a set of repetitive data. I am trying to write a code such that the code would create a main structure "RoadXML" with all the subsequent text in the cell as structure elements.
For eg: " RoadXML.Network.SubNetworks.SubNetwork.RoadNetwork.Grounds.Ground" should produce a structure RoadXML which has a struct element "Networks" which inturn is a structure. Likewise " Networks" should have " Subnetworks" as an element which is a structure and so on... Furthurmore, the rest of the data should append itself to the main structure under its respective fields. Hence in the end only one structure would remain with all the data in the excel sheet as its structure elements.
Now the problem is that, when repetitive sets of elements are present in the excel sheet as can be seen from the screenshot, only the last set of data remains thus overwriting the data that has been already stored. That is to say (with reference to the attached screenshot) data from rows 30 to 34 over writes all the data from rows 15 to 29 that has been alreaady stored.
UPDATE
To be a bit more clear about my problem, during the iteration from row 15 to 19, my code stores the data from the first column as structures in the exact format as shown in the snap. i.e. RoadXML is a structure which has Network which in turn is a structure which has SubNetworks which inturn is a structure which has SubNetwork and so on till the last parameter. In the end, we'll have Ground as a structure inside Grounds.
Since A15 and A20 are the same data, once row 20 is encountered the code should convert Ground which was earlier a structure into a cell, create a 1x1 structure in the cell which has ''Attributes''(structure) in it as a field. Once ''Attribute'' has been created, ''granulosity'', ''grip''; ''name'' and ''type'' should be appended into ''Attributes'' with their corresponding values from column B.
For every new line in the excel file:
Check if part of the new field path already exists.
For example, after line 2 "RoadXML.Network.SubNetworks.SubNetwork.RoadNetwork.Grounds.Ground.Attributes.granulosity" is created.
On line 3, the path till "Attributes" already exists. Only the field "grip" is missing.
Read here about checking if a certain field exists in a nested structure.
If part of path exists, start add fields where the existing path ends.
If the full path, including the last field exists, choose what you want it to do. Overwrite? Make that field a vetor?
Example run:
Line 15: Ground doesn't exists, so created:
RoadXML....Ground{1,1}.Text=""
Lines 16-19: Ground exists, add the attributes
RoadXML....Ground{1,1}.Attributes.X=Y
Line 20: Ground contains 1 cell, so add a new one:
RoadXML....Ground{1,2}.Text=""
And so on..

Resources