Extracting a url from a hotspot - lotus-notes

I've got a collection of url hotspots in a Notes document. I'd like to extract the urls associated with the hotspots. Any hints on how to approach this?
thanks
clem

I recommend using the DXL export features and then walking the resulting XML to find the URLs. Try it, and if you have issues, come back and ask questions about the specific piece of code.

Related

neo4j: fulltext index on external documents and attachments

This is the situation. Don't ask why, it is the way it is.
I created a simple fulltext-index over the document-contents by grabbing the content from the document-server and adding the simple unformatted content as a new property (raw_content) to each Document-Object in the neo4jdb.
Then i created a fulltext-index like:
CALL db.index.fulltext.createNodeIndex('content', ['Document'], ['title', 'teaser', 'raw_content'])
So far so good. Search works very well.
Now I want to index the attachments. I've got Attachment-URLs for each document, which I can call by the docid.
So, before I slide into any antipatterns, I'd like to ask the community, about how to do this. I've got two ways on my mind:
similar to the way I index the raw_content - is there a way to make Lucene get and parse the URLs, that I give to it?
the batch does all the parsing and adds the content to new fields like "attachment01_content" ...
Solution 1 would be appreciated, but i did not find any documentation on this.
Solution 2 is ugly, especially because Lucene can handle with pdf, doc ...
any ideas on how to solve this?

How do I go about changing the contents of a page based on the URL? Express.js, Node.js

Okay. I know this question was a bit confusing, so let me decompose my question a bit further. For example, let's say I have the URL: https://example.com. I have an open GET endpoint at: https://example.com/user/* that will return a specific user's information based on what the contents of the "*" is. Lets say a specific user is at: https://example.com/user/12345. On an HTML page, I would like to put that user's profile contents and some of their hobbies. Again, this is theoretical. I have explored various solutions such as Handlebars.js which can dynamically change values based on the server request. However, this solution does not always work. Take a search engine for example at: https://mysearchengine.com/search?query=dogs. Here, we have a search query for dogs. How do I render all of the results to a HTML document without using a dynamic content module like Handlebars?
This question was particularly difficult to ask, so please do not mark this as "not enough information". I would be more than happy to clarify any questions you may have about the nature of my query. Thank you so much in advance,
Flight Dude.
Just wanted to let y'all know I found my answer: EJS. Thanks!

How to parse a document using crawler4j

I wanted to parse all the documents containing some text I enter as "query" using crawler4j in Eclipse.
Any ideas?
Not really a "direct" answer, but I also played with crawling these last few days. I looked first at Crawler4J, then stumbled on JSoup. Did not play much with the crawler, but jSoup turns out to be quite an easy tool for parsing. Hence my suggestion. I guess crawler is good if you really need to crawl a part of the web. But JSoup really seems to shine as a good parser. Similar to JQuery in terms of selecting nodes etc... So perhaps use the crawler for first collecting documents, then parse them using JSoup. Here's a quick example:
Document doc = Jsoup.connect("http://example.com").userAgent("Mozilla").timeout(5000)
.get();
Elements els = doc.select("li");

List Schema - URL Syntax

I ran acrossed this a couple months ago and did not save the link anywhere, unfortunately.
Basically, there is a URl syntax to extract a Sharepoint Lists basic schema that exports it to the browser in XML format. It gives the basic information for the field and views of the list.
Resolution:
http://blogs.msdn.com/b/kaevans/archive/2009/05/01/getting-xml-data-from-a-sharepoint-list-the-easy-way.aspx
You just have to put in the right context of words to get the result you need.
For everyone else's sake:
http://<PATH TO SITE>/_vti_bin/owssvr.dll?Cmd=ExportList&List={GUID}
The list GUID can be found by going to the List Settings, then pulling out the GUID from the URL.
You might want to take a look at this: http://www.dotnetmafia.com/blogs/dotnettipoftheday/archive/2010/01/21/introduction-to-querying-lists-with-rest-and-listdata-svc-in-sharepoint-2010.aspx
Not sure if it's what you are after..

How to get a description of a URL

I have a list of URLs and am trying to collect their "descriptions." By description I mean what comes up, for example, if you Googled the link. For example, http://stackoverflow.com">Google: http://stackoverflow.com shows the description as
A language-independent collaboratively
edited question and answer site for
programmers. Questions and answers
displayed by user votes and tags.
This the data I'm trying to accumulate for the URLs I have.
I tried parsing the URL's meta-descriptions, however most of them are lacking a meta-description (yet Google and other search engines manage to get a description somehow).
Any ideas? Should I just "google" each link and scrape the data? I have a feeling Google wouldn't like this...
Thanks guys.
Different search engines have different algorithms to get the description out of the page if/when they are lacking the description meta tag. Some ignore the tag even it it's there.
If you want the description Google has, the most accurate way to get it would be to scrape it. Otherwise, you could write your own or look around on the web for code that does it.
These are called snippets.
Google use proprietary (and possibly patented) methods to garner this information, so there is no simple answer.
As you suggest, they will use meta-description information if it is there. (How to set the meta-information to help Google.)
They will also honour requests from the page authors to NOT include snippets. (How to prevent Google from displaying snippets) You should probably respect this too (as well as robots.txt, of course.)
You may have some luck with existing auto-summary packages, such as OTS.
You may want to check AboutUs.org (i.e. http://www.aboutus.org/StackOverflow.com).
But, there's little chance that the site will have an aboutus page and not have a meta description.
Some info that might explain how google does this:
Webmasters/Site owners Help
Adding a URL to google
I am not familiar with Google APIs, but perhaps there is an official way to get such information.
Interesting. some sources are better than others.
For "audiotuts.com" google has a worse description than AboutUs.com.
Google
Nov 18th in General by Joel Falconer ·
1. Recently, an AUDIOTUTS reader asked me about creative process. While this
is a topic that can’t be made into a
...
AboutUs.com:
AUDIOTUTS is a blog/tutorial site for
musicians, producers and audio
junkies! It is the sister site of the
popular PSDTUTS, VECTORTUTS and
NETTUTS.
I hate problems like these... they should be trivial but they aren't!
If you can assume English content, you can first look for Meta Description, and if that doesn't work, you can look for the first two or three sentence-like word sequences.
A product I worked on looked for the first P or DIV that contained more than one sequence of > n "words" delimited by periods. It would use the two or three sentence-like sequences, up to x total words, as a summary paragraph. It wasn't 100% accurate, but good enough for the average case. The number of words was adjusted a few times to eliminate things like navigation elements.

Resources