Creation of a Structure of Cells - MATLAB - excel

I have a set of data as can be seen from the attached snapshot. As can be observed, it is a set of repetitive data. I am trying to write a code such that the code would create a main structure "RoadXML" with all the subsequent text in the cell as structure elements.
For eg: " RoadXML.Network.SubNetworks.SubNetwork.RoadNetwork.Grounds.Ground" should produce a structure RoadXML which has a struct element "Networks" which inturn is a structure. Likewise " Networks" should have " Subnetworks" as an element which is a structure and so on... Furthurmore, the rest of the data should append itself to the main structure under its respective fields. Hence in the end only one structure would remain with all the data in the excel sheet as its structure elements.
Now the problem is that, when repetitive sets of elements are present in the excel sheet as can be seen from the screenshot, only the last set of data remains thus overwriting the data that has been already stored. That is to say (with reference to the attached screenshot) data from rows 30 to 34 over writes all the data from rows 15 to 29 that has been alreaady stored.
UPDATE
To be a bit more clear about my problem, during the iteration from row 15 to 19, my code stores the data from the first column as structures in the exact format as shown in the snap. i.e. RoadXML is a structure which has Network which in turn is a structure which has SubNetworks which inturn is a structure which has SubNetwork and so on till the last parameter. In the end, we'll have Ground as a structure inside Grounds.
Since A15 and A20 are the same data, once row 20 is encountered the code should convert Ground which was earlier a structure into a cell, create a 1x1 structure in the cell which has ''Attributes''(structure) in it as a field. Once ''Attribute'' has been created, ''granulosity'', ''grip''; ''name'' and ''type'' should be appended into ''Attributes'' with their corresponding values from column B.

For every new line in the excel file:
Check if part of the new field path already exists.
For example, after line 2 "RoadXML.Network.SubNetworks.SubNetwork.RoadNetwork.Grounds.Ground.Attributes.granulosity" is created.
On line 3, the path till "Attributes" already exists. Only the field "grip" is missing.
Read here about checking if a certain field exists in a nested structure.
If part of path exists, start add fields where the existing path ends.
If the full path, including the last field exists, choose what you want it to do. Overwrite? Make that field a vetor?
Example run:
Line 15: Ground doesn't exists, so created:
RoadXML....Ground{1,1}.Text=""
Lines 16-19: Ground exists, add the attributes
RoadXML....Ground{1,1}.Attributes.X=Y
Line 20: Ground contains 1 cell, so add a new one:
RoadXML....Ground{1,2}.Text=""
And so on..

Related

Automatically update the file path of Vlookup to the newest workbook added to a folder

Hello each month we receive a series of monthly returns from different accounts which go into a designated folder based on the account name. Each return has the new month's returns appended to all the previous monthly returns. I am running a vlookup function on my workbook based on the specific return I am looking for. Is it possible to change the source on the vlookup function so it takes the data from the most recently added file in the folder, that way it will contain all the most recent return data with all the previous returns?
Thanks
There are many ways to do that. The first step should be to connect to the designated folder. You should see then something like this:
Option 1: If the file contains the month
If your file contains the month you can use it to extract this information. Following the example above you could:
extract the first 7 charactes and parse it to a date.
sort the date in descending order, so the latest file will be on top
use keep rows to get rid of the rest of files
with the last file remaining, expand the content
Option 2: Use file properties
When you connect to a folder you can see the field "Date created". Use this the same way as explained in option 1.
Option 3: Remove duplicates
If for whatever reason the two options above are not possible, depending on your data you can:
join all files which will lead to duplicates
filter duplicates
This third option might not work if you could have two registers which look the same (all columns in the row have the same value) can appear in your dataset.

find lines from one file in another efficiently

I have two files referencing some objects, the first file contains a label and a corresponding id value on each line as follows:
label : 123456789
anotherlabel : 987654321
yetanotherlabel : 567891234
The second file contains a subset of records from file one that meet certain criteria, but it only lists the ID. It's a flat one column list as follows
987654321
123456789
I want to make a third file that will contain one column listing the labels from the first file that correspond to the ids from the second file. So in this example it would be
anotherlabel
label
These files are fairly big so I'm looking for an efficient solution. How should I go about this?
Thanks!
You can upload file 2 into hashtable(if it fit into the memory), and thereafter iterate file 1, and parse. If ID match, then print for appropriate ID.

How to manually control data schema interpretation

When I export public weather data from https://www1.ncdc.noaa.gov/pub/data/uscrn/products/subhourly01/2017/CRNS0101-05-2017-TX_Austin_33_NW.txt, as soon as solar radiation > 9, all of my data for the remaining columnsĀ gets lumped into a single column, as shown below. I have tried uploading as txt and csv and the problem still exists in excel, sheets, and dataprep.
Why is this happening?
Is there a programmatic way to fix this so that the data populates as intended, with 1 value per column?
It is likely because the initial data structure is not detected correctly. This can happen if the first rows of your dataset have a different structure than the remaining rows.
To solve this problem in Dataprep, you can indicate how the dataset should be structured by following these steps:
Go to the flow view
Right click on the dataset and choose "remove structure..."
Open the recipe
Insert a split row step:
splitrows col: column1 on: '\n'
Split the column using a whitespace regex (for e.g., /\s+/)
splitpatterns col: column1 type: on on: /\s+/ limit: 22
(you can copy and paste the following command inside the search input when you create a new step)
Here is what you should get:
Note: it is also possible to prevent the initial structure detection when importing a dataset. See https://cloud.google.com/dataprep/docs/html/Remove-Initial-Structure_136154971

Finding if a value is located in a range of cells in a corresponding column, to a row with the same value as a given cell in a given row

Given that column A contains directory paths, and column B contains actions on these paths; I'd like a resultant cell to contain a specific string if a specific string is located in Column B (per column A).
Granted I can sort the column, I believe I can use OFFSET, but would like to know how to locate an ending index of cells so I can search the range returned by OFFSET.
For example:
Path Operation
/share/Admins Accessed
/share/Admins Removed
/share/Admins Added
/share/Admins Changed
/share/Network Accessed
/shared/Projects Accessed
In this case, I want to search Path for a unique value (in this case /share/Admins, /share/Network, and /shared/Projects), and given this range in Path, I'd like to search the corresponding Operation, and if any Operation that matches Removed, Added, Changed exists, I'd like the cell value to be WRITE; and if those values aren't found, "READ".
In this case, I would expect the column (with header Result) to read:
Path Operation Result
/share/Admins Accessed WRITE
/share/Admins Removed WRITE
/share/Admins Added WRITE
/share/Admins Changed WRITE
/share/Network Accessed READ
/shared/Projects Accessed READ
Pardon the SEO: I am using this to compare Varonis DatAdvantage reports 01.a.01 (user access logs) with 04.j.01 (effective user permissions report). Unfortunately, DatAdvantage doesn't feel the need to write reports that correlate their user activity records and file system permissions records.
You can use Sumproduct. See screenshot.
=IF(SUMPRODUCT((A:A=A1)*((B:B="Removed")+(B:B="Added")+(B:B="Changed"))),"WRITE","READ")
Or in shorter form:
=IF(SUMPRODUCT((A:A=A1)*(B:B={"Removed","Added","Changed"})),"WRITE","READ")

Append record with binary value

I have a file with FB length=80. I want to append fixed value numeric 1 at position 81, if value at position 80='Y'
This appended value is supposed to be S9(9) BINARY when viewed from a copybook.
The appended field will be used in SUM FIELDS in a separate step.
How do I code the SORT SYSIN card ?
OPTION COPY
INREC IFTHEN=(WHEN=(80,1,CH,EQ,C'Y'),OVERLAY=(81:+1,TO=BI,LENGTH=2))
There is no need for this to be separate from you step with SUM in. Obviously you'd not use the OPTION COPY.
If you are SUMming records other than Y in Col 80, you'll need a IFTHEN=(WHEN=INIT to set everything to zero first.
Since this is a Mainframe task, you'd have got an earlier response if you'd used that Tag.

Resources