Vim, word frequency function and French accents - vim

I have recently discovered the Vim Tip n° 1531 (Word frequency statistics for a file).
As suggested I put the following code in my .vimrc
function! WordFrequency() range
let all = split(join(getline(a:firstline, a:lastline)), '\A\+')
let frequencies = {}
for word in all
let frequencies[word] = get(frequencies, word, 0) + 1
endfor
new
setlocal buftype=nofile bufhidden=hide noswapfile tabstop=20
for [key,value] in items(frequencies)
call append('$', key."\t".value)
endfor
sort i
endfunction
command! -range=% WordFrequency <line1>,<line2>call WordFrequency()
It works fine except for accents and other french specifics (latin small ligature a or o, etc…).
What am I supposed to add in this function to make it suit my needs ?
Thanks in advance

For 8-bit characters you can try to change the split pattern from \A\+ to
[^[:alpha:]]\+.

The pattern \A\+ matches any number of consecutive non-alphabetic characters which — unfortunately — includes multibytes characters like our beloved çàéô and friends.
That means that your text is split at spaces AND at multibyte characters.
With \A\+, the phrase
Rendez-vous après l'apéritif.
gives:
ap 1
apr 1
l 1
Rendez 1
ritif 1
s 1
vous 1
If you are sure your text doesn't include fancy spaces you could replace this pattern with \s\+ that matches whitespace only but it's probably to liberal.
With this pattern, \s\+, the same phrase gives:
après 1
l'apéritif. 1
Rendez-vous 1
which, I think, is closer to what you want.
Some customizing may be necessary to exclude punctuations.

function! WordFrequency() range
" Whitespace and all punctuation characters except dash and single quote
let wordSeparators = '[[:blank:],.;:!?%#*+^#&/~_|=<>\[\](){}]\+'
let all = split(join(getline(a:firstline, a:lastline)), wordSeparators)
"...
endfunction
If all punctuation characters should be word separators, the expression shortens to
let wordSeparators = '[[:blank:][:punct:]]\+'

Related

In vim, how to split a word and flip the halves? FooBar => BarFoo

I sometimes write a multi-word identifier in one order, then decide the other order makes more sense. Sometimes there is a separator character, sometimes there is case boundary, and sometimes the separation is positional. For example:
$foobar becomes $barfoo
$FooBar becomes $BarFoo
$foo_bar becomes $bar_foo
How would I accomplish this in vim? I want to put my cursor on the word, hit a key combo that cuts the first half, then appends it to the end of the current word. Something like cw, but also yanking into the cut buffer and then appending to the current word (eg ea).
Nothing general and obvious comes to mind. This is more a novelty question than one of daily practical use, but preference is given to shortest answer with fewest plugins. (Hmm, like code golf for vim.)
You can use this function, it swaps any word of the form FooBar, foo_bar, or fooBar:
function! SwapWord()
" Swap the word under the cursor, ex:
" 'foo_bar' --> 'bar_foo',
" 'FooBar' --> 'BarFoo',
" 'fooBar' --> 'barFoo' (keeps case style)
let save_cursor = getcurpos()
let word = expand("<cword>")
let match_ = match(word, '_')
if match_ != -1
let repl = strpart(word, match_ + 1) . '_' . strpart(word, 0, match_)
else
let matchU = match(word, '\u', 1)
if matchU != -1
let was_lower = (match(word, '^\l') != -1)
if was_lower
let word = substitute(word, '^.', '\U\0', '')
endif
let repl = strpart(word, matchU) . strpart(word, 0, matchU)
if was_lower
let repl = substitute(repl, '^.', '\L\0', '')
endif
else
return
endif
endif
silent exe "normal ciw\<c-r>=repl\<cr>"
call setpos('.', save_cursor)
endf
Mapping example:
noremap <silent> gs :call SwapWord()<cr>
Are you talking about a single instance, globally across a file, or generically?
I would tend to just do a global search and replace, e.g.:
:1,$:s/$foobar/$barfoo/g
(for all lines, change $foobar to $barfoo, every instance on each line)
EDIT (single occurrence with cursor on the 'f'):
3xep
3xep (had some ~ in there before the re-edit of the question)
4xea_[ESC]px
Best I got for now. :)
nnoremap <Leader>s dwbP
Using Leader, s should now work.
dw : cut until the end of the word from cursor position
b : move cursor at the beginning of the word
P : paste the previously cut part at the front
It won't work for you last example though, you have to add another mapping to deal with _ .
(If you don't know what Leader is, see :help mapleader)

Vim (vimscript) get exact character under the cursor

I am getting the character under the cursor in vimscript the following way:
getline('.')[col('.')-1]
It works exactly like it should, however there is something I dislike. consider this [] the cursor. When there is a bracket next to the cursor like so:
}[] , ][] , )[] or {[] the cursor actually returns the bracket. What do I have to set so it will always return the character exactly under the cursor or atleast ignore if there is a bracket to it's left?
Note: I suspect that it might have to do with the brackets highlight, though I am not sure.
Note2: for the situation to occur there has to be a matching bracket.
Though I cannot reproduce the problem you're describing, there's another problem with your code: Because of the string indexing (and this is one of the uglier sides of Vimscript), it only works with single-byte characters, but will fail to capture chars like Ä or 𠔻 (depending on the encoding used). This is a better way of capturing the character under the cursor:
:echo matchstr(getline('.'), '\%' . col('.') . 'c.')
Edit: Since about Vim 7.4.1742, Vim has new strgetchar() and strcharpart() functions that work with character indexes, not byte addressing. This is helpful in many circumstances, but not here, because you still can only get the byte-index position of the cursor (or the screen column with virtcol(), but that's not the same as character index).
nr2char(strgetchar(getline('.')[col('.') - 1:], 0))
or
strcharpart(getline('.')[col('.') - 1:], 0, 1)
Another way to get the character index under cursor that deal with both ASCII and non-ASCII characters is the like the following:
function! CharAtIdx(str, idx) abort
" Get char at idx from str. Note that this is based on character index
" instead of the byte index.
return strcharpart(a:str, a:idx, 1)
endfunction
function! CursorCharIdx() abort
" A more concise way to get character index under cursor.
let cursor_byte_idx = col('.')
if cursor_byte_idx == 1
return 0
endif
let pre_cursor_text = getline('.')[:col('.')-2]
return strchars(pre_cursor_text)
endfunction
Then if you want to get char under cursor, use the following command:
let cur_char_idx = CursorCharIdx()
let cur_char = CharAtIdx(getline('.'), cur_char_idx)
See also this post on how to get pre-cursor char.

vim scripting - count number of matches on a line

I'm trying to count the number of regex matches in a line, and I need to use the result in a vim function. For example, count the number of open braces.
function! numberOfMatchesExample(lnum)
let line_text = getline(a:lnum)
" This next line is wrong and is the part I'm looking for help with
let match_list = matchlist(line_text, '{')
return len(match_list)
endfunction
So I'd like to find a way in a vim function to capture into a variable the number of regex matches of a line.
There are plenty of examples of how to do this and show the result on the status bar, see
:h count-items, but I need to capture the number into a variable for use in a function.
The split() function splits a string on a regular expression. You can use it to split the line in question, and then subtract 1 from the number of resulting pieces to obtain the match count.
let nmatches = len(split(getline(a:lnum), '{', 1)) - 1
See :h split().
For the special case of counting a single ASCII character like {, I'd simply substitute() away all other characters, and use the length:
:let cnt = len(substitute(line_text, '[^{]', '', 'g'))
You can use a hack with substitute() with side effects:
function CountFigureBrackets(lnum)
let line=getline(a:lnum)
let d={'num': 0}
call substitute(line, '{', '\=extend(d, {"num": d.num+1}).num', 'g')
return d.num
endfunction

How to count characters while typing?

I often use VIM to write comments in newspapers or blog sites.
Often there is a max number of characters to type.
How do I create a counter (p.e. in the statusbar) to see the characters I have typed (including whitespaces) while typing?
The 'statusline' setting allows evaluation of expressions with the %{...} special item.
So if we can come up with an expression that returns the number of characters (not bytes!) in the current buffer we can incorporate it in our statusline to solve the problem.
This command does it:
:set statusline+=\ %{strwidth(join(getline(1,'$'),'\ '))}
For text with CJK characters strwidth() is not good enough, since it returns a display cell count, not a character count. If double-width characters are part of the requirement, use this improved version instead:
:set statusline+=\ %{strlen(substitute(join(getline(1,'$'),'.'),'.','.','g'))}
But be aware that the expression is evaluated on every single change to the buffer.
See :h 'statusline'.
Sunday afternoon bonus – The character position under the cursor can also be packed into a single expression. Not for the faint of heart:
:set statusline+=\ %{strlen(substitute(join(add(getline(1,line('.')-1),strpart(getline('.'),0,col('.')-1)),'.'),'.','.','g'))+1}
By mixing glts answer and this post and a bit of fiddling with the code, I made the following for my self which you can put it in ~/.vimrc file (you need to have 1 second idol cursor so the function calculates the words and characters and the value can be changed by modifying set updatetime=1000):
let g:word_count = "<unknown>"
let g:char_count = "<unknown>"
function WordCount()
return g:word_count
endfunction
function CharCount()
return g:char_count
endfunction
function UpdateWordCount()
let lnum = 1
let n = 0
while lnum <= line('$')
let n = n + len(split(getline(lnum)))
let lnum = lnum + 1
endwhile
let g:word_count = n
let g:char_count = strlen(substitute(join(getline(1,'$'),'.'),'.','.','g'))
endfunction
" Update the count when cursor is idle in command or insert mode.
" Update when idle for 1000 msec (default is 4000 msec).
set updatetime=1000
augroup WordCounter
au! CursorHold,CursorHoldI * call UpdateWordCount()
augroup END
" Set statusline, shown here a piece at a time
highlight User1 ctermbg=green guibg=green ctermfg=black guifg=black
set statusline=%1* " Switch to User1 color highlight
set statusline+=%<%F " file name, cut if needed at start
set statusline+=%M " modified flag
set statusline+=%y " file type
set statusline+=%= " separator from left to right justified
set statusline+=\ %{WordCount()}\ words,
set statusline+=\ %{CharCount()}\ chars,
set statusline+=\ %l/%L\ lines,\ %P " percentage through the file
It will look like this:

Indenting fold text

When you unfold nested levels of your code the folded text in nested code is not indented. It begins on the beginning of the line with + instead of starting indented.
Do you know how to change it?
If you want the fold text to be indented at the same level as the first line of the fold, you need to prepend the indent level to the foldtext:
function! MyFoldText()
let indent_level = indent(v:foldstart)
let indent = repeat(' ',indent_level)
...
...
return indent . txt
endfunction
Here I am assuming that the string txt is your existing foldtext, so all you need to do is add it to the end of indent.
But I am not sure if that is what you want to achieve.
EDIT:
Now I have seen your picture, I'm not sure if this is what you want. You could try stripping the leading whitespace before appending to the +. So the foldtext you want will be something like indent . '+' . txt.
Maybe.
Aha
You might want to comment out this function in your .vimrc:
set foldtext=MyFoldText()
set fillchars=fold:_
This is what makes your fold text appearing non default, by using the function:
function! MyFoldText()
" setting fold text
let nl = v:foldend - v:foldstart + 1
let comment = substitute(getline(v:foldstart),"^ *\" *","",1)
let linetext = substitute(getline(v:foldstart+1),"^ *","",1)
let txt = '+ ' . comment . ': ' . nl . ' ' . v:foldstart . ' '
return txt
endfunction
As it happens, I quite like that function, but of course, de gustibus...

Resources