How to prepare test data for textsum? - python-3.x

I have been able to successfully run the pre-trained model of TextSum (Tensorflow 1.2.1). The output consists of summaries of CNN & Dailymail articles (which are chuncked into bin format prior to testing).
I have also been able to create the aforementioned bin format test data for CNN/Dailymail articles & vocab file (per instructions here). However, I am not able to create my own test data to check how good the summary is. I have tried modifying the make_datafiles.py code to remove had coded values. I am able to create tokenized files, but the next step seems to be failing. It'll be great if someone can help me understand what url_lists is being used for. Per the github readme -
"For each of the url lists all_train.txt, all_val.txt and all_test.txt, the corresponding tokenized stories are read from file, lowercased and written to serialized binary files train.bin, val.bin and test.bin. These will be placed in the newly-created finished_files directory."
How is a URL such as http://web.archive.org/web/20150401100102id_/http://www.cnn.com/2015/04/01/europe/france-germanwings-plane-crash-main/ being mapped to the corresponding story in my data folder? If someone has had success with this, please do let me know how to go about this. Thanks in advance!

Update: I was able to figure out how to use own data to create bin files for testing (and avoid using url_lists altogether).
This will be helpful - https://github.com/dondon2475848/make_datafiles_for_pgn
Will update answer once I figure out how to fix ROGUE scoring for this.

Related

Revit API - Finding the path of nested links

I am trying to find the file path of nested links and ran into a problem.
I am interested in the 2nd level, meaning finding the file path of a link inside one of the links in the file that I currently have opened.
My problem is where the reference Type is Overlay, if reference type is Attachment I don't have a problem.
I am also assuming the files are in BIM360 cloud, but I believe I have the same problem if the files were local.
I am able to get the RevitLinkInstance and RevitLinkType object for those links.
However, if I try to use GetLinkDocument() on the RevitLinkInstance I get null, and trying getting the InSessionPath using GetExternalResourceReferences() on the RevitLinkType get me "Autodesk Docs://" and stops there without the real path.
Will appreciate any help including letting me know if this is at all possible.
Thank you,
To my knowlegde it is impossible to get the link within a link through one Document. This is because with Overlay the link is not stored into the Link. That is also why it works with Attachment. Because Revit than makes it a point to store the Link in that file and thus it is part of that Revit file.
What you could do instead is search for the file path of the first link. Then use the file path in the OpenAndActivateDocument(string filepath) method. Repeat the first step to get all RevitLinkInstances of this Document. You can get the needed information from this link, store it in a variable and close the link document.
If an example is needed please let me know.

Insert Hyperlink in Access Database (pyodbc)

Here is the situation: I'm 'having fun' really using Microsoft Access for the first time for small personnal project/tools ideas.
I don't know anything about VBA yet, and unless I can't do without it, I don't plan to learn it this time (already a lot else to cover).
So I tried to use Python to automatize the main table filling. I did find pyodbc package and succeeded to connect, read and write some data out of my database.
However, I wanted to experiment a little further, and one of the fields could contain hyperlinks (could be handled somewhere else in another script later, but I am curious about the functionality anyway)...
But I couldn't figure how to insert hyperlink data in the table. I only get the displayed text set, but not the target one.
Is this feasible using pyodbc or am I on the wrong track?
Thanks in advance!
Emmanuel
The hyperlink field in MS Access consists of three parts, separated by #:
display text # filename or target # location within the document
So an example of the data of a field can look like this:
StackOverflow#http://www.stackoverflow.com#
See the docs: https://learn.microsoft.com/en-us/office/vba/api/access.application.hyperlinkpart
and samples here also: http://allenbrowne.com/casu-09.html

Shared Documentation in GitLab over several repos

We have serveral microservices for which we have artifacts in GitLab. (like for example helm chart, valuesfiles...)
For better documentation we have a deployment.png that shows the deployment path. Like where we get the images from and how we import our helm-charts and how to access the Openshift-Cluster from the jumphost.
This diagram should be included in every microservice repo so that everybody who has to deal with the microservices sees the diagram.
Now I don't want to have duplicated code and don't want to check in and take care for the deployment.png and the Text below it in every microservice.
Is there a good solution for that usecase?
We thought about having an extra documentation-repo and pulling in the relevant Readme with the image as a link into the respective microservice-readmes in each microservice-repo...
Any idea or best practice?
If I'm understanding correctly, you want to have a way to reference the same image and text files for multiple documents. All you have to know is the url location to your image and it can be referenced/embedded.
For example, ![deployment img](gitlab.com/.../...)
A statement such as this will embed a file or image in a markdown file. If we remove the ! from the front, it simply links to the file location as a hyperlink.
The same strategy goes for the text file.

A Study on the Modification of PDF in nodejs

Project Environment
The environment we are currently developing is using Windows 10. nodejs 10.16.0, express web framework. The actual environment being deployed is the Linux Ubuntu server and the rest is the same.
What technology do you want to implement?
The technology that I want to implement is the information that I entered when I joined the membership. For example, I want to automatically put it in the input text box using my name, age, address, phone number, etc. so that the user only needs to fill in the remaining information in the PDF. (PDF is on some of the webpages.)
If all the information is entered, the PDF is saved and the document is sent to another vendor, which is the end.
Current Problems
We looked at about four days for PDFs, and we tried to create PDFs when we implemented the outline, structure, and code, just like it was on this site at https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF
However, most PDFs seem to be compressed into flatDecode rather than this simple. So I also looked at Data extraction from /Filter /FlateDecode PDF stream in PHP and tried to decompress it using QPDF.
Unzip it for now.Well, I thought it would be easy to find out the difference compared to the PDF without Kim after putting it in the first name.
However, there is too much difference even though only three characters are added... And the PDF structure itself is more difficult and complex to proceed with.
Note : https://www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf (PDF official document in English)
Is there a way to solve the problem now?
It sounds like you want to create a PDF from scratch and possibly extract data from it and you are finding this a more difficult prospect than you first imagined.
Check out my answer here on why PDF creation and reading is non-trivial and why you should reach for a tool you help you do this:
https://stackoverflow.com/a/53357682/1669243

TensorFlow example for text classification - how to evaluate your own text?

Does any one have full steps and example for TensorFlow example for passing in your own text files and getting them evaluated against the existing model that comes with examples - using train.py as documented?
Also, if I wanted to train on different input set of say 1000 text files of my own samples, and then use that model for new text files? I know there is documentation but is terse for someone who is not familiar with text classification process.
I was able to run image example against my own images as that was only requiring to swap out one image .jpg file name for myh new image file, but for text it seems to be more involved.
Thanks
Here is an example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/skflow/text_classification.py
You can set the flag test_with_fake_data to use the fake data in text_train.csv (training samples) and text_test.csv (testing samples) here. Next, you can modify these two files to include whatever data you'd like to have. You will need to do some preprocessing if your existing text files are in a different format.
You need to load the vocabulary file saved during training and process your new text with that. See the eval.py file here
Change the data parameters with your input text and proceed.

Resources