TensorFlow example for text classification - how to evaluate your own text? - text

Does any one have full steps and example for TensorFlow example for passing in your own text files and getting them evaluated against the existing model that comes with examples - using train.py as documented?
Also, if I wanted to train on different input set of say 1000 text files of my own samples, and then use that model for new text files? I know there is documentation but is terse for someone who is not familiar with text classification process.
I was able to run image example against my own images as that was only requiring to swap out one image .jpg file name for myh new image file, but for text it seems to be more involved.
Thanks

Here is an example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/text_classification.py
You can set the flag test_with_fake_data to use the fake data in text_train.csv (training samples) and text_test.csv (testing samples) here. Next, you can modify these two files to include whatever data you'd like to have. You will need to do some preprocessing if your existing text files are in a different format.

You need to load the vocabulary file saved during training and process your new text with that. See the eval.py file here
Change the data parameters with your input text and proceed.

Related

Can't use custom 3D model for visualization

I'm using the Modelica.Mechanics.MultiBody.Visualizers.FixedShape to render custom 3D models. I tried using .dxf and .stl in ASCII format but none works.
For .dxf I get no error but the model doesn't show up.
For .stl I get an error saying there is no plugin for that file type.
Please see the highlighted portions in the above image. OpenModelica supports adding an external shape using the format shapeType=modelica://<Modelica-name>/<relative-path-file-name>. Preferably create a Resources folder in the same level under the library folder and add dxf files in them. The following animation shows the same. Hope this helps!

Python Flask image manipulation librery

I google it, but the number of library are overwhelming,
I'm looking for a image manipulation library, written in Python that I can implement in Flask;
I need to solve this simple sets of operations:
Upload the image.
Resize / scale the image (maintained the proportions).
Save the new image in a specific path with a specific name.
Remove the original image.
Also I notice that many promising project are unchanged in the last 2 years....

How to prepare test data for textsum?

I have been able to successfully run the pre-trained model of TextSum (Tensorflow 1.2.1). The output consists of summaries of CNN & Dailymail articles (which are chuncked into bin format prior to testing).
I have also been able to create the aforementioned bin format test data for CNN/Dailymail articles & vocab file (per instructions here). However, I am not able to create my own test data to check how good the summary is. I have tried modifying the make_datafiles.py code to remove had coded values. I am able to create tokenized files, but the next step seems to be failing. It'll be great if someone can help me understand what url_lists is being used for. Per the github readme -
"For each of the url lists all_train.txt, all_val.txt and all_test.txt, the corresponding tokenized stories are read from file, lowercased and written to serialized binary files train.bin, val.bin and test.bin. These will be placed in the newly-created finished_files directory."
How is a URL such as http://web.archive.org/web/20150401100102id_/http://www.cnn.com/2015/04/01/europe/france-germanwings-plane-crash-main/ being mapped to the corresponding story in my data folder? If someone has had success with this, please do let me know how to go about this. Thanks in advance!
Update: I was able to figure out how to use own data to create bin files for testing (and avoid using url_lists altogether).
This will be helpful - https://github.com/dondon2475848/make_datafiles_for_pgn
Will update answer once I figure out how to fix ROGUE scoring for this.

OFFIS DICOM scope toolkit Structured report link to image

Does anybody knows how to create a structured report using dicom scope toolkit via console (ubuntu 16.04) with a link to a related image?
The thing is that I have an image of some kind of trauma and I have to connect with a report which is in a text file. The last file should be in .dcm format which contains annotation and a link to an image. I have to use dicom scope program.
Maybe others refrain from answering because your question needs a very long answer. I cannot provide step-by-step instructions, a few hints, though.
The way I would go is to:
(assuming that your image is available in DICOM format):
obtain a sample structured report. I think that the "simple" Basic Text SR is what you want to go for. You can find some samples here.
convert the SR to an XML file using dsr2xml
edit the contents in XML. Do not forget to include your image reference in (0040,a730) Content Sequence -> (0008,1199) Referenced SOP Sequence
convert the XML back to DICOM SR using xml2dsr
By the way: From your question, I did not really understand why you want to use a structured report, as you wrote that your report is plain text. Instead of digging into the complex structure of SR, you may want to consider exporting the report to an Encapsulated PDF document which can reference images as well.

Given a PDF, how to extract the images *and their locations on the page* from the command line?

I have a PDF which includes text and images. I want to extract images from the PDF using the linux command line. I can use pdfimages to extract the images, but I also want to find the location on each page where that image is. pdfimages can tell me what page each image (from the filename), however that's all it gives me. Is there any other FLOSS tool that can do this?
Well I think the PDF must contain the info for placing them, so this should be possible. On the other hand a solution can be e.g.:
Convert each pdf page to an image with pdftoppm
Extract the images from each page with pdfimages
Convert the images to a single 8-bits grey-scale channel (for faster analysis) with cvCvtColor
Object detection with matchTemplate
Step 1 may look similar to this Step 2:
for i in {0..99} ; do pdfimages -f $((i)) -l $((i+1)) file.pdf page$((i)); done
Step 3 here* a simple example
In Step 4 you should not have problems with training, because the image will be an exact match. matchTemplate( imageToSearch, pdfPageImg, outputMap, 'CV_TM_SQDIFF')
(* - link removed as it now appears to be pointing towards a ransomware site)
There's an -xml switch for the pdftohtml command which will give image position, dimension and source information.
pdftohtml -xml file.pdf
There is no guarantee in PDF that if an image is reused it will not be a separate image. There is very little image metadata in a PDF file beyond the page location and its actual size on the page. I wrote an article explaining how images are stored inside a PDF at http://www.jpedal.org/PDFblog/2010/09/understanding-the-pdf-file-format-images/

Resources