Find a file with all given strings on Linux

Find a file with all given strings on Linux - linux

Similar to this question How do I find all files containing specific text on Linux? but I want all the files that contain multiple given strings (these strings not necessarily next to each other or on the same line, just in the same file).
My use case is I am looking at a UI and want to modify the file which controls this particular screen. The codebase though is huge and it is difficult to locate this file. All I have to go on is some of the hardcoded strings on this screen which I would like to do the search on. The strings are quite generic though such as 'Done', 'Close', 'View Details'... Doing a search on any of these strings individually, using the answer from the linked question above, brings back too many results but I think doing the search on all of them together will filter it down enough to find the file.

Related

Is there any reason to reference UI element text in the strings.xml resource file rather than hard-coding in Android Studio?

It seems like it's simply more straightforward to hard-code the text values. In an event that these values should be changed it seems like it would be more logical to search for the relevant UI element in each activity's xml layout file rather than look through the entire strings.xml. Of course if you have certain UI elements across multiple activities that all share the same text then this might be an exception (like a back button for instance), but generally there doesn't seem to be much advantage to storing these in the strings.xml. Am I missing something?

I will give you two reasons;
1 - Avoid duplication: all of your strings in one place. also, you can use string value many times. when you want to change it, there is one place to do the change. that makes it easier to maintain.
2 - Multi-language support: if you want to translate your strings to another language you must have all the strings in Strings.xml
let me know if you need more clarifications.

How to search for files faster in Sublime Text 3

Right now I do ⌘t then scroll through autocomplete, or start typing the name (but half the time it doesn't find it).
Sublime doesn't find a file in many cases. For example, I typically have all my files called index.<ext> nested inside some folder. So I might have:
my/long/directory/structure/index.js
my/long/directory/structure2/index.js
my/long/directory/structure3/index.js
my/long/directory/structure.../index.js
my/long/directory/structuren/index.js
my/long/directory/index.js
my/long/directory2/index.js
my/long/directory.../index.js
my/long/directoryn/index.js
my/long/index.js
my/index.js
...
But in sublime you have to search for an exact path. I can't search this:
my directory index
And get results for directory, directory2, directory..., directoryn, I just get empty results because there is not my/directory. I can't remember the full folder path most of the time, so it takes a lot of effort to do so and I end up just navigating in the sidebar to find the file which takes some time.
Wondering if there is a better/faster way of doing this. Basically searching for a file by snippets/keywords of the complete path. So m dir would return my/long/directory, etc.

The first thing to note is that you do not have to search for an exact path; anywhere that Sublime provides you a list of items to select from and a text entry, fuzzy matching is in play. In your example searching just for idx will narrow down the list to all items that have those characters in that order, even if they're not adjacent to each other.
The entries show you visually how they're matching up, and there's a fairly sophisticated system behind the scenes that decides which characters make the best matches (relative to some hidden scoring algorithm):
In addition to this you can use multiple space separated terms to filter down the list. Each term is applied to the list of items resulting from the prior term, so they don't need to be provided in the same order as they appear in the file names.
This helps with searches where you know generally the name of the file, and from there can further drill down on segments of the path or other terms that will help narrow things down:
Something to note here is that as seen in these images, the folder structure is my/long/directory/structure, but the names of the files as seen in the panel don't include the my/ at the start.
In cases where your project contains only one top level folder, that folder isn't presented in the names of the files. Presumably this is because it's common to every file and thus not going to be a useful filter. As such trying to use my in the search field will return no matches unless one of the files has an m and a y somewhere in their filenames.
This isn't the case if there are multiple top level folders; in that case Sublime will include the root folder in the names of the files presented because now it's required to be able to distinguish between files in the different folders:
In addition to this, note that for any given filter text you enter in a panel, Sublime remembers the full text of the item that you selected while that filter text was being used, and uses that in it's scoring to prioritize the matches the next time you search in the same panel. The next time you search with the same term, Sublime will automatically pre-select the item that you picked last time under the theory that you probably want it again.
The search terms and their matches are saved in the session file and in your project's sublime-workspace files, so as you move from window to window and project to project you're essentially training Sublime how to find the files that you want.
My advice would be to try and flip your thinking a little bit. In my opinion the power of the fuzzy matching algorithm works best when you try to find files in a more organic way than trying to replicate the path entirely.
Instead, I would throw a few characters from the name of the file that I'm trying to find first, and then add another term that filters on some part of the path that will disambiguate things more; a term of idx s1 in this example immediately finds the two index.js files that are contained in structure1 folders, for example.
In a more real world example the names of the folders might contain the names of the components that they're a part of or something else that is providing a logical structure to the code, so you might do idx con to pull the index.js from the controller folder or idx mod to find the one in the model folder, and so on.
Regarding a better/faster way to do this I don't think there is one, at least in the general case. Sublime inherently knows every file that's in your project as a part of indexing all of the files to power other features such as Goto Symbol and it uses file watchers to detect changes to the structure of the open folders.
Anything else, including a third party plugin or package, would need to first do a redundant file scan to accumulate the list of files and would also have to replicate the file watching that Sublime is already doing in order to know when things change.

Delimiting quick-open path with fullstops in Sublime Text 3?

I'm making the move to ST3, and I'm having some trouble. I'd like to be able to delimit the quick-open filepath (⌘ + T) with periods instead of slashes or spaces. However, I can't find the setting to do that.
For example:
component.biz_site_promotions.presentation
should be able to open the file that
component biz_site_promotions presentation
would.
Any help would be greatly appreciated!

There is no setting in Sublime that changes the way this works; the search term is always used to directly match the text in the list items (except for space characters).
Note however that the Goto Anything panel uses fuzzy matching on the text that you're entering, so in many cases trying to enter an entire file name is more time consuming anyway.
As an example, to find the file you're mentioning, you could try entering the text cbspp, which in this case is the first letters of all of the parts of the file name in question.
As you add to the search term, the file list immediately filters down to text that matches what you entered; first only filenames that contain a C, then only filenames that contain a C that is followed somewhere after by a B, and so on.
Depending on the complexity and number of files that you have in your project, you may need to add in a few extra characters to dial in better (e.g. comb_s_pp). Usually this search method will either end you up at the exact file you want, or filter the list so much that the file that you want will be easier to find and select.
Additionally, when you select an item and there was more than one possible match, Sublime remembers which item you selected for that particular search term and brings it to the top of the search results next time you do it, under the assumption that you want the same thing again.
As you use Sublime more (and with different projects) you will quickly get a handle on what partial search terms work the best for you.
In addition to finding files, you can do other things with that panel as well, such as jumping to a specific line and/or column or searching inside the file for a search term and jumping directly to it. This applies not only to the current file but also the one that you're about to open.
For more complete details, there is a page in the Unofficial Documentation that covers File Navigation with Goto Anything
As an extra aside, starting with Sublime Text build 3154, the fuzzy searching algorithm handles spaces differently than previous builds.
Historically, spaces in the search term are essentially ignored and the entire input is treated as one search term to be matched character by character.
Starting in build 3154, spaces are handled by splitting up a single search term into multiple search terms, which are applied one after the other.
This allows multiple search terms to hit out of order. For example, index doc in build 3154 will find doc/index.html, but it won't find it in previous versions because the terms aren't in the right order.
As such, assuming you're not currently using such a build (as of right now it's a development build, so only licensed users have access to it), moving forward if you continue to search the way you're searching in your question, you might start getting more results than you expected.

After using pdftotext: find page of string from txt

I am currently coding in python and managed to use pdftotext in order to extract the text from a pdf.
That particular text file is split up in a list of strings. By using regular expression I am able to find specific words I am interested in. The reason why I divide the text into a list is that I want to measure the distance between two specific words and by distance I mean the number of words in between the two words.
However after finding the position of the words I would like to be able to refer back to the initial pdf. In detail, I am interested in the page and maybe even line (if pdf supports this kind of structure) where these words are located.
One idea I have is to do this process for each page of the pdf, so when I find these words I know on what page this was. But this has the big disadvantage that sometimes page breaks are not necessarily natural. Meaning, I would lose the ability to find the words if they are unfortunately separated by a page break.
Do you have any idea how to do this in a more sophisticated manner?

You'll need a more sophisticated library than the one you're using. The Datalogics PDF Java Toolkit has several classes that can extract text from a PDF file. The one you use depends on what you want to do with the text after extraction. The ReadingOrderTextExtractor will create a list of lists that will allow you to extract the text and examine the content of paragraphs, sentences within those paragraphs, and words within that sentence. You'll not only be able to tell the distance between the words but whether they are in the same sentence or paragraph. One you've found a Word object, you can then find both it's location on the page, allowing for highlighting, and the page number it's on.

Efficient Way of Recording Page Numbers from a Search of a PDF

I have a list of ~1200 queries (part numbers) that are specified somewhere inside of a 100 page PDF. Pretty much what I need to do is take record of what pages each of the queries appear on, in the PDF. I can't think of a clever way of doing this. It should take me 5-20 hours to do this search by search, so if someone can give me a good idea before the 5 hour mark that would be great!

Assumed you can determine what a "query" is in your context programatically from the plain text (for example, by using regular expressions):
You could split your PDF into different files (1 file per page) using pdftk
http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/
Then convert those files to text with a pdf-to-text utility like this one:
http://www.fileguru.com/PDF-To-TXT-Converter/download
or this one
http://www.pdf2text.com/
And finally write yourself a simple script using your favorite programming language to determine which of those files contains a "query" (whatever that looks like).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string