I use Ai-powered summarization from https://github.com/huggingface/transformers/tree/master/examples/summarization - state of the art results.
Should i train it myself to get summary output longer than used in original huggingface github training script?
:
python run_summarization.py \
--documents_dir $DATA_PATH \
--summaries_output_dir $SUMMARIES_PATH \ # optional
--no_cuda false \
--batch_size 4 \
--min_length 50 \
--max_length 200 \
--beam_size 5 \
--alpha 0.95 \
--block_trigram true \
--compute_rouge true
When i do inference with
--min_length 500 \
--max_length 600 \
I got a good output for 200 tokens, but the rest of the text is
. . . [unused7] [unused7] [unused7] [unused8] [unused4] [unused7] [unused7] [unused4] [unused7] [unused8]. [unused4] [unused7] . [unused4] [unused8] [unused4] [unused8]. [unused4] [unused4] [unused8] [unused4] . . [unused4] [unused6] [unused4] [unused7] [unused6] [unused4] [unused8] [unused5] [unused4] [unused7] [unused4] [unused4] [unused7]. [unused4] [unused6]. [unused4] [unused4] [unused4] [unused8] [unused4] [unused7] [unused4] [unused8] [unused6] [unused4] [unused4] [unused4]. [unused4]. [unused5] [unused4] [unused8] [unused7] [unused4] [unused7] [unused9] [unused4] [unused7] [unused4] [unused7] [unused5] [unused4] [unused5] [unused4] [unused6] [unused4]. . . [unused5]. [unused4] [unused4] [unused4] [unused6] [unused5] [unused4] [unused4] [unused6] [unused4] [unused6] [unused4] [unused4] [unused5] [unused4]. [unused5] [unused4] . [unused4] [unused4] [unused8] [unused8] [unused4] [unused7] [unused4] [unused8] [unused4] [unused7] [unused4] [unused8] [unused4] [unused8] [unused4] [unused6]
The short answer is: Yes, probably.
To explain this in a bit more detail, we have to look at the paper behind the implementation: In Table 1, you can clearly see that most of their generated headlines are much shorter than what you are trying to initialize. While that alone might not be an indicator that you couldn't generate anything longer, we can go even deeper and look at the meaning of the [unusedX] tokens, as described by BERT dev Jacob Devlin:
Since [the [unusedX] tokens] were not used they are effectively randomly initialized.
Further, the summariazation paper describes
Position embeddings in the original BERT model have a maximum length
of 512; we over-come this limitation by adding more position
em-beddings that are initialized randomly and fine-tuned with other
parameters in the encoder.
This is a strong indicator that past a certain length, they are likely falling back to the default initialization, which is unfortunately random. The question is whether you can still salvage the previous pre-training, and simply fine-tune to your objective, or whether it is better to just start from scratch.
Related
I would like to remain the only the first pattern (The "Important#") in the text.
Input:
Im port ant1 RandomJunk
Imp ortan t4 Lorum ipsum
Imp ort ant5 dolor sit amet, conse
I mport ant3 Aliquam vel nibh diam
Impo rtant 1 dignissim vel nisi vitae
Imp orta nt9 dui ut posuere rhoncu
Output:
Im port ant1
Imp ortan t4
Imp ort ant5
I mport ant3
Impo rtant 1
Imp orta nt9
New to vim and don't really understand :%s or :s
Please help
Assuming that you have to delete everything after 12th character on each line
you can use vim's search and replace:
Overall command structure:
:s/searchThis/replaceWithThis
This is a command to do the job:
:%s/\(.\{12}\).*/\1
Explanation:
% - signals to apply search and replace on each line in a file
\(.\{12}\).* - search part
\(.\{12\}\).* is a regex, (, ), {, } escaped with \ so vim
engine will understand that those are special signs. Without escaping it looks like this: (.{12}).*.
(.{12}) - capturing group, match first any 12 characters
.* - any character. zero or more times *
\1 - replace part, it means keep only first captured group after matching.
In our case those will be first 12 character on the line.
More thorough explanation on search and replace in vim wiki
Use the Jeday powers (regex) of vim:
:%s/\d \zs.*//
Read more at --> :h \zs
:%norm 3whD
% ................ in the whole file
norm ............. in normal mode
3w ............... jump 3 words
h ................ move one char to the left
D ................ delete the rest of the line
I want to realize one effect for my Tcl/Tk tool: in the text widget, depending on the specific condition, I hope to highlight some lines' background color, other lines are normal and transparent. It is possible?
I have tried some options like: -highlightbackground ,-insertbackground and so on, but no one can do this.
if this is impossible, how to change the color of specif line text , it is also a workaround.
I hope to highlight some lines' background color, other lines are normal and transparent. It is possible?
Yes. You do it by setting a tag on the text concerned. You can then configure the text with that tag to look how you want it to, which can include changing the font, the foreground colour and the background colour. Tags are set either when you insert the text, or using the tag add method. Tags are configured with the tag configure method.
# Make a text widget and put some text in it
pack [text .t -height 10 -width 40]
.t insert 1.0 "This is an example of tagging."
# Set some tags; they have no style as yet
.t tag add foo 1.5 1.10
.t tag add bar 1.15 1.20
# Configure the tags so that we can see them
.t tag configure foo -font {Times 16 {bold italic}}
.t tag configure bar -foreground yellow -background blue
Note that the selection is actually a special tag, sel. You can configure it however you want, but the text widget's class bindings control where it is applied to in response to user actions.
You can use the Tk widget demo to help you, and more specifically the Search tool. Here I took the most essential parts of it with some edits to simplify it:
package require Tk
# proc to highlight
proc textSearch {w string tag} {
# Remove all tags
$w tag remove search 0.0 end
# If string empty, do nothing
if {$string == ""} {return}
# Current position of 'cursor' at first line, before any character
set cur 1.0
# Search through the file, for each matching word, apply the tag 'search'
while 1 {
set cur [$w search -count length $string $cur end]
if {$cur eq ""} {break}
$w tag add $tag $cur "$cur + $length char"
set cur [$w index "$cur + $length char"]
}
# For all the tagged text, apply the below settings
.text tag configure search -background blue -foreground white
}
# Window set ups
text .text -yscrollcommand ".scroll set" -setgrid true
scrollbar .scroll -command ".text yview"
frame .string
label .string.label -text "Search string:" -width 13 -anchor w
entry .string.entry -width 40 -textvariable searchString
button .string.button -text "Highlight" \
-command "textSearch .text \$searchString search"
pack .string.label .string.entry -side left
pack .string.button -side left -pady 5 -padx 10
bind .string.entry <Return> "textSearch .text \$searchString search"
pack .string -side top -fill x
pack .scroll -side right -fill y
pack .text -expand yes -fill both
.text insert 1.0 \
{Lorem ipsum dolor sit amet, consectetur adipiscing elit. Fusce aliquet, neque at sagittis vulputate, felis orci posuere sapien, a tempus purus diam id tellus. Quisque volutpat pretium iaculis. Mauris nibh ex, volutpat id ligula sit amet, ullamcorper lobortis orci. Aliquam et erat ac velit auctor bibendum. Aliquam erat volutpat. Maecenas fermentum diam sed convallis fermentum. Maecenas ultricies nisi mauris, ac lacinia lacus sollicitudin eget. Mauris eget euismod nisi, sed suscipit est.}
I am using a function that I got from Vimcasts to preserve the cursor position when executing a command in Vim:
" A command to preserve last search and cursor position after running another
" command. See: http://vimcasts.org/episodes/tidying-whitespace/
function! Preserve(command)
" Preparation: save last search, and cursor position.
let _s=#/
let l = line(".")
let c = col(".")
" Do the business:
execute a:command
" Clean up: restore previous search history, and cursor position
let #/=_s
call cursor(l, c)
endfunction
" Strip trailing whitespace
nmap <Leader>$ :call Preserve("%s/\\s\\+$//e")<CR>
It works pretty well for the strip trailing whitespace mapping I've shown here, but not when I'm calling an external command like this:
" Reformat a plain text document to use hard wrapping and uniform spacing
" Note: This uses the BSD `fmt` program. The GNU coreutils version takes
" different options.
nmap <Leader>f :call Preserve("%!fmt -s -78")<CR>
vnoremap <Leader>f :call Preserve("'<,'>!fmt -s -78")<CR>
The first mapping works fine, but the second one exhibits a strange looping behavior. For example, if I have a text file like this:
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut laboret dolore magna aliqua. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
Sed ut perspiciatis unde omnis iste natus error sit. Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?
At vero eos et accusamus et iusto odio dignissimos ducimus qui blanditiis praesentium voluptatum deleniti, similique sunt in culpa qui officia deserunt mollitia animi, id est laborum et dolorum fuga. Itaque reiciendis voluptatibus maiores alias consequatur aut perferendis doloribus asperiores repellat.
When I select those lines in visual mode to filter them, the command appears to run five times. Here's what I'm seeing in the output:
5 lines filtered
5 lines filtered
5 lines filtered
5 lines filtered
5 lines filtered
Press ENTER or type command to continue
If the file has 10 lines, then they are filtered 10 times. It still filters the area correctly, but I'm confused as to why it's looping. I think it has something to do with the Preserve function because running the command outside of preserve doesn't exhibit the looping.
Note: I think this is the appropriate place for this question, but the closing of the Vi/Vim proposal leaves me wondering where I should really be posting a question like this. Please let me know if there's a more appropriate forum for it.
When you call a function on a multi-line visual selection, that function is called for each line in the selection. Since your visual selection covers 5 lines the Preserve() function and the command you passed to it are called 5 times.
The solution is simple, add the range argument to the function definition:
function! Preserve() range
With that argument, the function is called only once and you can let it or the underlying command deal with the visual range itself.
See :help func-range.
Another – slightly dirtier – solution could be to modify your mappings to remove the range before calling the function so that it is called only once:
map <key> :<C-u>call Function(args)<CR>
See :help c_ctrl-u.
I am trying to create an autocommand that will create boiler plate comments and code for new Java source files. As a simple start, I have added the following two lines (only a new line after the first line below in the actual file) to my .vim/ftplugin/java.vim:
autocmd BufNewFile *.java
\ exe "normal O/*\r" . expand('%:t') . "\t" . strftime("%B %d %Y") .
"\r/\r\rpublic class " . expand('%:t:r') . " {\r\t\<Esc>i"
With the last part, \t\<Esc>i, I am trying to insert a tab and shift to insert mode automatically. I can't make the switch to insert mode work and have tried different permutations of two or more of \<Esc>, \<Insert>, "insert" , i and \t. What am I missing ?
I am using VIM 7.2 on Linux.
You could use the :startinsert command. Just execute it after the :normal command:
autocmd! BufNewFile *.java
\ exe "normal O/*\r" . expand('%:t') . "\t" . strftime("%B %d %Y") .
\ "\r/\r\rpublic class " . expand('%:t:r') . " {\r\t" |
\ startinsert!
Here's some more information on that: http://vimdoc.sourceforge.net/htmldoc/insert.html#:startinsert.
I've been attempting to follow the instructions on the Vim wiki to get the matchit plugin working with ColdFusion (*.cfm) files containing both ColdFusion and HTML tags running on MacVim.
I've got the syntax file for ColdFusion (cf.vim) installed in $HOME/.vim/syntax/cf.vim, the latest version of matchit installed in .vim/plugin/matchit.vim, and I've added the following block to the end of the end of matchit.vim:
au FileType html,jsp,php,cf if !exists("b:match_words") |
I also added the following line to the end of my $HOME/.vimrc file:
filetype plugin on
Finally I added the suggested block to the end of cf.vim:
" Only do this when not done yet for this buffer
if exists("b:did_ftplugin")
finish
endif
" Don't load another plugin for this buffer
let b:did_ftplugin = 1
if exists("loaded_matchit")
let b:match_words = '<cfif\>.\{-}>\|<cfif\>.\{-}$:'
\ . '<cfelseif\>.\{-}>\|<cfelseif\>.\{-}$:'
\ . '<cfelse\>.\{-}>\|<cfelse\>.\{-}$:'
\ . '<\/cfif>,'
\ . '<cfloop\>.\{-}>\|<cfloop\>.\{-}$:'
\ . '<\/cfloop\>.\{-}>,'
\ . '<cfoutput\>.\{-}>\|<cfoutput\>.\{-}$:'
\ . '<\/cfoutput\>.\{-}>,'
\ . '<cftimer\>.\{-}>\|<cftimer\>.\{-}$:'
\ . '<\/cftimer\>.\{-}>,'
\ . '<!---:--->,'
\ . '<cfquery\>.\{-}>\|<cfquery\>.\{-}$:<\/cfquery\>.\{-}>,'
\ . '<cfscript>:<\/cfscript>'
" Since we are counting things outside of comments only,
" It is important we account comments accurately or match_words
" will be wrong and therefore useless
syntax sync fromstart
endif " exists("loaded_matchit")
However when I press the % key to jump to the matching tag it only half works, based on the file extension. If the file has a .cfm extension I can jump from <cfif> to </cfif> but not <body> to </body> for example. The situation is reversed if the extension is .html.
However looking a the code for cf.vim it appears that it should work with ColdFusion and HTML tags mixed in the same file:
" Inherit syntax rules from the standard HTML syntax file
if version < 600
source <sfile>:p:h/html.vim
else
runtime! syntax/html.vim
endif
On a related note I added:
let b:match_ignorecase = 1
to $HOME/.vimrc to disable case sensitivity as stated in the documentation, but it still only works with cfif and not CFIF for example.
I did something similar for the django template language. I just added the html expressions in the b:match_words list. Eg. (Note the first three non django looking expressions)
if exists("loaded_matchit")
let b:match_ignorecase = 1
let b:match_skip = 's:Comment'
let b:match_words = '<:>,' .
\ '<\#<=[ou]l\>[^>]*\%(>\|$\):<\#<=li\>:<\#<=/[ou]l>,' .
\ '<\#<=dl\>[^>]*\%(>\|$\):<\#<=d[td]\>:<\#<=/dl>,' .
\ '<\#<=\([^/][^ \t>]*\)[^>]*\%(>\|$\):<\#<=/\1>,' .-
\ '{% *if .*%}:{% *else *%}:{% *endif *%},' .-
\ '{% *ifequal .*%}:{% *else *%}:{% *endifequal *%},' .-
\ '{% *ifnotequal .*%}:{% *else *%}:{% *endifnotequal *%},' .-
\ '{% *ifchanged .*%}:{% *else *%}:{% *endifchanged *%},' .-
\ '{% *for .*%}:{% *endfor *%},' .-
\ '{% *with .*%}:{% *endwith *%},' .
\ '{% *comment .*%}:{% *endcomment *%},' .
\ '{% *block .*%}:{% *endblock *%},' .
\ '{% *filter .*%}:{% *endfilter *%},' .
\ '{% *spaceless .*%}:{% *endspaceless *%}'-
endif
Those three expressions cover all of html/xml so obviously whoever came up with those three know a lot more about vim regex than I do.
I'd suggest submitting your code to the vim.org cf.vim maintainer if there is no matchit in the syntax files for cold fusion.