Search for file in IPFS by its filename? - p2p

Is there a way to search for files in IPFS by its filename? If I want for example search for a subject in IPFS, I suppose this subject is in the filename like some-subject.pdf, then, there is a way to retrieve the hashes of the files containing the word "subject"?

No such thing exists in IPFS, and it's not something that could be implemented int he future because IPFS adresses each file by its hash. If you knew about the existence of a file on IPFS, then you'd have to download it (by adressing it using its hash) to know about its name.
Think about IPFS the same way you'd think about HTTP, but decentralized. It takes Google and its pagerank algorithm to search things on a HTTP-based web, and it'd probably the same with IPFS.

There is no API to query directly, as bcolin wrote in their answer, but someone is trying to make a search engine and tries to index documents found on IPFS.
The project IPFSearch is still in early beta at the time of writing but it can be found here:
/ipns/ipfsearch.xyz (untested by me)
or /ipns/QmSE8g9k5JS1vJ7y5znhSZikmybvdsm3yDj7sbKjPRqsJW (untested by me)
or, since the DHT is still super slow, through the gateway: https://ipfsearch.xyz
The announcement was done on discuss.ipfs.io
The project Github is here.
Good luck!

Related

Implementing a docs search for multiple docs sites

We have many different documentation sites and I would like to search a keyword across all of these sites. How can I do that?
I already thought about implementing a simple web scraper, but this seems like a very ugly solution.
An alternative may be to use Elasticsearch and somehow point it to the different doc repos.
Are there better suggestions?
Algolia is the absolute best solution that I can think of. There's also Typesense and Meilisearch of course.
Algolia is meant specifically for situations like yours, so it even comes with a crawler.
https://www.algolia.com/products/search-and-discovery/crawler/
https://www.algolia.com/
https://typesense.org/
https://www.meilisearch.com/
Here's a fun page comparing them (probably a little biased in Typesense's favor)
https://typesense.org/typesense-vs-algolia-vs-elasticsearch-vs-meilisearch/
Here are some example sites that use Algolia Search
https://developers.cloudflare.com/
https://getbootstrap.com/docs/5.1/getting-started/introduction/
https://reactjs.org/
https://hn.algolia.com/
If you personally are just trying to search for a keyword, as long as they're indexed by Google, you can always search with the format site:{domain} "keyword"
You can checkout Meilisearch for your use case. Meilisearch is a Rust based and open sourced search engine.
Meilisearch comes with a document scraper tool ( https://github.com/meilisearch/docs-scraper ) that can scrape content and then also index it.
While using it you need to define what exact content you are searching for in the configuration file for the scraper tool. And then you can run the tool using Docker.

how to download an original image or video with the baseUrl of Google Photos API?

I want to use the REST Google Photos API to download original photos or videos from Goolge Photos, and I found there is no way to achieve it with the "baseUrl".
I have checked the following pages, but there is not a definitive answer:
https://issuetracker.google.com/issues/112096115
https://issuetracker.google.com/issues/80149160
So if there is indeed a way to get the original photos and videos or if there will be one?
The addition of '=d' will not give you the original file! I tested it. The quality and resolution of the image seems to match the original one, but some information like exif metadata (geo location) is missing. As a result, the file size is also smaller than the original. This makes is not usable for backup synchronization where I want the original file.
Actually, I expect from google that they give me automated access to my own original data. It looks like that is currently not the case.
I'm afraid there are currently only two options how to get the original fotos:
Manual download on Google Fotos
Manual download via Google Takeout
Very disappointing!
So I just read through the issue tracker answers you provided, and I noticed that one reply was to add '=d'to the baseUrl.
So example: GET https://lh3.googleusercontent.com/lr/AGb3...HG2n=d

How to search newsgroups in gnus

I have gnus working for multiple email addresses with searching via
(nnir-search-engine imap)
I have newsgroup reading setup and working fine too, however, I have never been able to get searching in newsgroups working even though I have
(setq gnus-select-method '(nntp "news.gmane.org"
(nnir-search-engine gmane)))
With the latter, with my cursor on a gname newsgroup, I expect to be able to do G G enter a search and have it return a list of hits as it does with imap search. Instead, I get the message
Contacting host: search.gmane.org:80
open-network-stream: search.gmane.org/80 nodename nor servname provided, or not known
in my mini and Messages buffers.
Any idea what is going on and how to rectify this?
One thought I had was perhaps that I needed to utilize gnu-agent and an agent category to allow me to download messages via J s (all of which I did set up, but haven't fully understood where it is saving, etc.).
Everything else works great in gnus, I just want to search newsgroups too in gnus.
p.s. I have downloaded Unison, which is quite nice and free now, and it can do what I need, but I still hope to do it in gnus.
The gmane search engine does not work because gmane has undergone some changes: gmane search has gone bust for the last two years (?) or so, ever since Lars decided that he was not going to continue with gmane. Although the people who took over brought the nntp service back, search is still missing.
There are other search engines however: the gnus manual lists swish++, swish-e, namazu, notmuch and hyrex (obsolete). I have no idea how well each works: I do know that they require configuration (imap search and gmane search (before it broke), worked right out of the box).
The doc has very few details on the rest, but it does describe how to set up namazu: it requires that you create and maintain index files, presumably indexing a set of local files. The doc's emphasis is on indexing local email, but presumably it would work similarly for downloaded local news articles.

Google Docs: Table of content page numbers

we are currently building an application on the google cloud platform, which generates reports in Google Doc. For them, it is really important to have a table of content ... with page numbers. I know this is a feature request since a few years and there are add-ons (Paragraph Styles +, which didn't work for us) that provide this solution, butt we are considering to build this ourselves. if anybody has a suggestion on how we could start with this, it would be a great help!
thanks,
Best bet is to file a feature request on the product forums.
Currently the only way to do that level of manipulation of a doc to provide a custom TOC is to use Apps Script. It provides access to the document structure sufficient enough to build and insert a basic table of contents, but I'm not sure there's enough to do paging correctly (unless you force a page break on ever page...) There's no method to answer the question of "what page is this element on?"
Hacks like writing to a DOCX and converting don't work because TOCs are recognized for what they and show up without page numbers.
Of course you could write a DOCX or PDF with the TOC as you'd like and upload as a blob rather than as a Google Doc. They can still be viewed in Drive and such.

SharePoint files MD5 hash

We have a client who requires that all image files within SharePoint are stored in a manner that it can be shown they are a bit for bit copy of the originally uploaded file. Obviously, hashing the file would be able to show that when the file is retrieved.
What I haven't been able to find it any reference to someone implementing this functionality on a SharePoint image library. I've found numerous articles around implementing this generically in C#, but ideally I'd like to be able to do it on a standard SharePoint document/image library.
Does anyone have any suggestions as how best to go about doing this? Workflow comes to mind, but what do people think? Also, as a side to this, does anyone know whether or not SharePoint will store a bit for bit copy that will verify when we compare the checksum?
You can to implement a event handler which compute your file hash on upload and to store it in a metadata text field. It's a simple solution for your problem.
Why not use a Record Center site, they are designed for this sort of thing - verifiable archiving and storage.
I would add a "text" column to the image library and then implement the hashing logic in an event receiver. You will need two handlers - ItemAdded and ItemUpdated.
The code will look something like
public override void ItemAdded(Microsoft.SharePoint.SPItemEventProperties properties)
{
base.ItemAdded(properties);
this.DisableEventFiring();
properties.ListItem["myCustomField"] = this.CalculateHash(properties.ListItem.File);
properties.ListItem.SystemUpdate();
this.EnableEventFiring();
}

Resources