Limiting DICOM tags - python-3.x

I am trying to limit the DICOM tags, which are retained, by using
for key in keys:
if key.upper() not in {'0028|0010','0028|0011'}:
image_slice.EraseMetaData(key)
in Python 3.6 where image_slice is of type SimpleITK.SimpleITK.Image
I then use
image_slice.GetMetaDataKeys()
to see what tags remain and they are the tags I selected. I then save the image with
writer.SetFileName(outputDir+os.path.basename(sliceFileNames[i]))
writer.Execute(image_slice)
where outputDir is the output directory name and os.path.basename(sliceFileNames[i]) is the DICOM image name. However, when I open the image, with Weasis or with MIPAV, I notice that there are a lot more tags than were in image_slice. For example, there is
(0002,0001) [OB] FileMetaInformationVersion: binary data
(0002,0002) [UI] MediaStorageSOPClassUID:
(0002,0003) [UI] MediaStorageSOPInstanceUID:
(0008,0020) [DA] StudyDate: (which is given the date that the file was created)
I was wondering how, and where these additional tags were added.

The group 2 tags you are seeing are meta data tags, that are always written while writing the dataset. Unless "regular" tags, which start with group 8, these group 2 tags do not belong to the dataset itself, but contain information about the encoding/writing of the dataset, like the transfer syntax - more information can be found in the DICOM standard, part 10. They will be recreated on saving a dataset to a file, otherwise, the DICOM file would not be valid.
About the rest of the tags I can only guess, but they are probably written by the software because they are mandatory DICOM tags and have been missing in the dataset. StudyDate is certainly a mandatory tag, so adding it if it is missing is correct, if the data is seen as derived data (which it usually is if you are manipulating it with ITK). I guess the other tags that you didn't mention are also mandatory tags.
Someone with more SimpleITK knowledge can probably add more specific information.

Related

Azure Media Services: Provide custom file names for the asset files

I'm encoding a video file using the built-in adaptive streaming transform. Once the file is successfully processed, an asset container is created with the below files:
Is it possible to provide custom file names at the time a job is created? It seems that the default behavior is to take a certain number of characters from the original file name and prepend them in the above file names. If possible, I'd like to configure this behavior.
P.S. I'm using the .NET SDK.
You can create a custom transform to output file names differently. On https://learn.microsoft.com/en-us/rest/api/media/transforms/createorupdate#definitions search for the Mp4Format section. In that you can specify the filenamePattern with certain macros like {Bitrate} and {Codec}.
See https://learn.microsoft.com/en-us/azure/media-services/latest/custom-preset-cli-howto for an example custom transform and the process by which to create it in Media Services.
I use the macros on my jobs, they work ok. I have a process that takes 3 videos (an intro section, the actual content, and the outro section) and encodes them as one single video. The issue I have with the macros is that it uses the file name of the first video in the inputs. So it ends up using the file name of the intro video which is a generic name. They need to have a way where we can have a little more control.
I suppose I could copy/rename the intro video to a desired name before I encode and it would pick it up, but that seems to be a little bit of overkill.
The Macros are good, but they could use some enhancements I think.

How to load only changed portion of YAML file in Ruamel

I am using ruamel.yaml library to load and process YAML file.
The YAML file can get updated after I have called
yaml.load(yaml_file_path)
So, I need to call load() on the same YAML file multiple times.
Is there a way/optimization parameter to pass to loader to load only the new entries in the YAML file?
There is no such facility currently built into ruamel.yaml.
If a file consists of multiple YAML documents, you can optimize the loading, by splitting the file on the document marker (---). This is fairly trivial and then you can load a single document from start to finish.
If you only want to reload parts of a document things get more difficult. If there are anchors and aliases involved, there is no easy way to do this as you may need a (non-updated) anchor definition in an updated part that needs an alias. If there are no such aliases, and you know the structure of your file, and have a way to determine what got updated, you can do partial loads and update your data structure. You would need to do some parsing of the YAML document, but if you only use a subset of YAML possibilities, this is often possible.
E.g. if you know that you only have simple scalar keys at the root level mapping of a YAML document, you can parse the document and extract non-indented strings that are followed by the value indicator. Any such string that is not in your "old" data structure is a new key and its value should be parsed (i.e. the YAML document content until the next non-indented string).
The above is far less trivial to do for any added data that is not added at the root level (whether mapping or sequence).
Since there is no indication within the YAML specification of the complexity of a YAML docment (i.e. whether it includes anchors, alias, tags etc), any of this is less easy to built in ruamel.yaml itself.
Without specific information on the format of your YAML document, and what can get updated, specific implementation details cannot be given. I assume however that you will not update and write out the loaded data, if that is so, make sure to use
yaml = YAML(typ='safe')
when possible as this will get you much faster loading times than the default round-trip loader provides.

Editing xmp tags pyexiv2 modify_xmp does not replace xmp tags correctly

I am trying to write a script that will loop through a large number of images and write new xmp tags based on certain criteria.
I am using pyexiv2 to read and modify the 'Xmp.dc.subject' tag.
I am able to assign a new set of tags to the image, and using pyexiv2.read_xmp() to check my results shows that the new set of tags has replaced the old set of tags, as expected.
However, when I check the properties in windows explorer or another photo manager, the old tags remain in addition to the new set of tags.
see my code below
from pyexiv2 import Image
path=some_path
img=Image(path)
tags=img.read_xmp()
tags.get('Xmp.dc.subject') ####outputs list of tags ['old_tag1', 'old_tag2', 'old_tag3']
newtags=['new_tag1','new_tag2']
dict1={'Xmp.dc.subject':newtags}
img.modify_xmp(dict1)
img.close()
Now, when I open properties of the file in Explorer, or check in a photo manager software, the tags on the file are ['old_tag1', 'old_tag2', 'old_tag3','new_tag1','new_tag2']. The expected behaviour as stated in the pyexiv2 tutorial is that the new list of tags will replace the old tags.
I have tried using py3exiv2, but I am having problems with that library due to an error referencing Microsoft Visual Studio.
Is there a way to achieve my outcome ideally using pyexiv2, or alternatively using any other method?
I found the solution to this problem.
Windows explorer (and Adbobe Bridge, and i'm guessing other software too) displays both xmp tags and iptc tags.
So if you modify the xmp tags only, explorer (or other software) will show the new xmp tags as well as the old iptc tags.
So the solution is to use modify_xmp() and modify_iptc() to change both sets of tags.

Mass Classification Attribute Removal

I was wondering if anyone else has gone though a massive classification restructure process, and knew of a way to efficiently remove attributes from a classification. Preferably via the MXLoader, or the import/export functionality built into Maximo. I know going into every individual classification that needs to have some specific attributes removed can be achieved, just hoping someone might know a better way to accomplish this.
Essentially what I am hoping to accomplish is the following with out going directly through the UI for each classification.
Pre-update
Classification A
(Attribute 1
Attribute 2
Attribute 3
Attribute 4
Attribute 5)
Post update
Classification A
(Attribute 1
Attribute 2
Attribute 5)
I have tried to export the .csv file using the object structure we have built supporting classstructure, classspec, and classusewith then imported the file back post removing an unwanted attribute using the sync function, but was not successful and not 100% sure where to go from there if the feat is indeed possible via theses means.
Thanks ahead of time for any potential help/support.
To do this with a data load, you'll need to load XML, not flat / CSV. This XML file will need to be loaded via the External Systems > Enterprise Services tab, after you've built an Enterprise Service for your Object Structure and associated that Enterprise Service with an External System. Your top level tag, for a given classification, will need to have attribute action="Change" and the child tag for the attribute will need to have an attribute of action="Delete". With a little string concatenation in your spreadsheet tool, you should be able to easily upcycle your CSV into the necessary XML.

Using a list for a feature in an ML model

I want to run a machine learning algorithm on some data, so I'm exporting the data into a file first.
But one of my features for the text I'm classifying is a list of tags,
and each text can have multiple tags ex. (["mystery", "thriller"]).
Is it recommended that when I write to my CSV file for exporting the data, that I write that entire list as one of the features for my data (the "tags" feature).
Or is it better to make a separate feature for each tag. The only problem then is that most examples will only have one tag, so the other feature columns for those will be blank.
So it seems like writing this list of tags as one feature makes the most sense, but then when parsing it for training, would I then treat every element of that list as its own feature still or no?
If you do it as a single feature just make sure to use some delimiter to separate the tags that won't occur in any of the tags, and also isn't a comma (as that will mess with the csv format), something like | would probably do fine. When you go to build your models and read in that list of tags you can then split it based on that delimiter. In Java this would look like:
String[] tagList = inputString.split("|");
I'm sure most languages will have a similar method to do this.

Resources