adding many dictionaries to aspell - linux

I have a tex document spanning several files that I want to check with aspell.
The command I use is:
cat $f | aspell list --extra-dicts="./names.spl" --mode=tex -l en |sort -u
for every file name f.
Some files that concern pronunciation have "words" like aj and oo inside them, which aspell counts as spelling mistakes. I want to filter them out without putting them into the names.spl dictionary. (first because they are not names, second because they shouldn't be ignored in other files)
the aspell documentation states that the "extra-dicts" argument can receive a list, but I can't seem to delimit it properly. I tried , : and plain spaces to no avail. They are either treated as a long file path or get entirely separated from the extra-dicts keywords.
I also tried to use the option twice, but the second time just overrides the first.
Am I missing something trivial about how lists are provided as command line arguments in the terminal?

According to the texinfo manual (info aspell), aspell uses a list option format that is different from other GNU programs, in which the base option name is prefixed with add- or rem- to respectively add or remove items from a list:
4.1.1.3 List options ....................
To add a value to the list, prefix the option name with an 'add-' and
then specify the value to add. For example, to add the URL filter use
'--add-filter url'. To remove a value from a list option, prefix the
option name with a 'rem-' and then specify the value to remove. For
example, to remove the URL filter use '--rem-filter url'. To remove
all items from a list prefix the option name with a 'clear-' without
specify any value. For example, to remove all filters use
'--clear-filter'.
Following this pattern for the --extra-dicts option, you would add multiple extra dictionaries as
--add-extra-dicts dict1 --add-extra-dicts dict2
The documentation for Aspell 0.60.7-20110707 also mentions a (possibly newer) more direct delimited list format, using a third prefix lset:
A list option can also be set directly, in which case it will be
set to a single value. To directly set a list option to multiple
values prefix the option name with a 'lset-' and separate each value
with a ':'. For example, to use the URL and TeX filter use
'--lset-filter url:tex'.
Following this format, your option would become
--lset-extra-dicts dict1:dict2

Related

Linux: finding the position of the last '/' in a string only

I have this string:
/sandbox/US_MARKETING/COMMON_DATA/BAU/FILES/2020/08/dnb_mi_081420.gz
Without knowing how many '/' there are in it, I want to be able to read just the file into a variable.
I want to be able to do a search where I start at the last '/' in the line and find the filename 'dnb_mi_081420.gz'.
I want to basically say "Find the last '/' in the string and then read the substring that comes after it to the end and store it.
So I know it's going to look like this:
filename=substr(<position of the last'/'>,<position of first character in last string>)
So how to find the index position of the last '/' is I guess what I'm looking for.
Does anyone know what that is?
Also I tried using basename and unfortunately I'm doing this through 'hdfs dfs' to get to a hadoop shell. So some of the non-standard Linux commands like basename aren't in that vocabulary. I'm basically going to have to store that whole string in a variable and do operations on that variable value.
In bash, parameter expansion can be used:
${parameter##word}
The word is expanded to produce a pattern and matched according to the rules described below (see Pattern Matching). If the pattern matches the beginning of the expanded value of parameter, then the result of the expansion is the expanded value of parameter with the shortest matching pattern (the ‘#’ case) or the longest matching pattern (the ‘##’ case) deleted
Example:
$ s="/sandbox/US_MARKETING/COMMON_DATA/BAU/FILES/2020/08/dnb_mi_081420.gz" && echo ${s##*/}
dnb_mi_081420.gz
$
You can use the -state subcommand which pulls information and stats about a file in a specified format. Since you simply want the file name the format would simply be "%n"
hdfs dfs -stat "%n" /path/to/file
This may be more expensive than a solution based on raw indices, but should not create a meaningful or noticeable hit to performance.

In CMake how do turn a multi line output of a command into a list?

I want to do do something like this
execute_process(
COMMAND bash -c "git --git-dir ${CMAKE_SOURCE_DIR}/.git ls-files"
OUTPUT_VARIABLE TRACKED_FILES)
add_custom_target(all_file_project SOURCES ${TRACKED_FILES})
And the command itself seems to work as expected but the generated variable "TRACKED_FILES" contains only one logical entry (one multi line string) rather than a list of files.
Can I somehow turn a string containing multiple lines separated by a newline ("\n") into a list in CMake?
One option (as the title of my question suggests) is to actively split the string manually rather than interpreting a variable as list in the first place:
string(REPLACE "\n" ";" ADDITIONAL_PROJECT_FILES_LIST ${ADDITIONAL_PROJECT_FILES})
This works for me but it would be very nice to have something more abstract and less platform specific (e.g. I don't know whether this works on all OSes including Windows)
Something like execute_process(COMMAND find -type f OUTPUT_LIST_VARIABLE MY_LIST)
Or at least set(MY_LIST FROM_MULTILINE MY_MULTILINE_STRING)

Delete text with GREP in Textwrangler

I have the following source code from the Wikipedia page of a list of Games. I need to grab the name of the game from the source, which is located within the title attribute, as follows:
<td><i>007: Quantum of Solace</i><sup id="cite_ref-4" class="reference"><span>[</span>4<span>]</span></sup></td>
As you can see above, in the title attribute there's a string. I need to use GREP to search through every single line for when that occurs, and remove everything excluding:
title="Game name"
I have the following (in TextWrangler) which returns every single occurrence:
title="(.*)"
How can I now set it to remove everything surrounding that, but to ensure it keeps either the string alone, or title="string".
I use a multi-step method to process these kind of files.
First you want to have only one HTML tag per line, GREP works on each line so you want to minimise the need for complicated patterns. I usually replace all: > with >\n
Then you want to develop a pattern for each occurrence of the item you want. In this case 'title=".?"'. Put that in between parentheses (). Then you want add some filling to that statement to find and replace all occurrences of this pattern: .?(title=".?").
Replace everything that matches .?(title=".?").* with \1
Finally, make smart use of the Textwrangler function process lines containing, to filter any remaining rubbish.
Notes
the \1 refers to the first occurrence of a match between () you can also reorder stuff using multiple parentheses and use something like (.?), (.) with \2, \1 to shuffle columns.
Learn how to do lazy regular expressions. The use of ? in these patterns is very powerfull. Basically ? will have the pattern looking for the next occurrence of the next part of the pattern not the latest part that the next part of your pattern occurs.
I've figured this problem out, it was quite simple. Instead of retrieving the content in the title attribute, I'd retrieve the page name.
To ensure I only struck the correct line where the content was, I'd use the following string for searching the code.
(.)/wiki/(.)"
Returning \2
After that, I simply remove any cases where there is HTML code:
<(.*)
Returning ''
Finally, I'll remove the remaining content after the page name:
"(.*)
Returning ''
A bit of cleaning up the spacing and I have a list for all game names.

Find required files by pattern and the change the pattern on Linux

I need to find all *.xml files that matched by pattern on Linux. I need to have written the file name on the screen and then change the pattern in the file just was found.
For instance.
I can start the script with arguments for keyword and for value, i.e
script.sh keyword "another word"
Script should find all files with keyword and do the following changes in the files containing keyword.
<keyword></keyword> should be the same <keyword></keyword>
<keyword>some word</keyword> should be like this <keyword>some word, another word</keyword>
In other words if initially value in keyword node was empty, then I don't need to change it and if it contains some value then I need to extend it with the value I will specify.
What is best way to do this on Linux? Using find, grep, sed?
Performance is also important since the number of files are thousands.
Thank you.
It seems using a combination of find, grep and sed would do this and they are pretty fast since you'll be doing text processing so there might not be a need for xml processing but if you could you give an example or rephrase your question I might be able to provide more help.

in vim, how to combine ^] and arge, to follow a tag and also add it to the arg list?

I can get the word under the cursor with or , and I can use that to open a file and add it to the arg list. For example, with the cursor over a java class name, at line 45:
:arge +45 mydirhere/<cword>.java
But I don't know how to pass into the the tag mechanism, so it will return the file name (and line number), that can be passed to arge
So I guess my question is specifically: "how do you call the tag mechanism?" I'm expecting something like:
String getFileAndLineforTag(String tag)
You can use the taglist() function. From :help taglist() (in Vim 7.1):
taglist({expr}) *taglist()*
Returns a list of tags matching the regular expression {expr}.
Each list item is a dictionary with at least the following
entries:
name Name of the tag.
filename Name of the file where the tag is
defined. It is either relative to the
current directory or a full path.
cmd Ex command used to locate the tag in
the file.
kind Type of the tag. The value for this
entry depends on the language specific
kind values. Only available when
using a tags file generated by
Exuberant ctags or hdrtag.
static A file specific tag. Refer to
|static-tag| for more information.
More entries may be present, depending on the content of the
tags file: access, implementation, inherits and signature.
Refer to the ctags documentation for information about these
fields. For C code the fields "struct", "class" and "enum"
may appear, they give the name of the entity the tag is
contained in.
The ex-command 'cmd' can be either an ex search pattern, a
line number or a line number followed by a byte number.
If there are no matching tags, then an empty list is returned.
To get an exact tag match, the anchors '^' and '$' should be
used in {expr}. Refer to |tag-regexp| for more information
about the tag search regular expression pattern.
Refer to |'tags'| for information about how the tags file is
located by Vim. Refer to |tags-file-format| for the format of
the tags file generated by the different ctags tools.
When you define a custom command you can specify -complete=tag or
-complete=tag_listfiles. If you need to do something more elaborate you can use -complete=custom,{func} or -complete=customlist,{func}. See :help :command-completion for more on this.
Here's some code to do it though not going to the right line, using Laurence Gonsalves's answer (thanks!)
:exe "arge " . taglist(expand("<cword>"))[0].filename
A confusing part was how to integrate the worlds of functions and ordinary commands, which is done with ("exe") and string concatention (".")

Resources