Print all pairs of values in column of tsv file via bash - linux

I want to write a small shell script to make a labeling task easier for myself. Alas, I'm still lacking the skills to do so.
I've got a news.tsv file which looks like this:
id foo bar text
1 a b lorem
2 c d ipsum
...
50 e f muspi
Where the actual entries in the text column are lengthy news articles. I want to print 2 of these texts at once, until all possible pairs in the column have been printed.
Searching for a solution, I found that awk might be the right tool for the task. I know how to print two specific entries in the text column, e.g,
awk -F '\t' 'NR==2 {print $4} NR==3 {print $4}' news.tsv
will print lorem and ipsum. For getting all pairs, I think I'll need a nested for-loop, but I fail at implementing it with awk.
My spaghetti-try looks like this:
awk -F '\t' '{for (i=0; i<50; i++){for (j=i+1; j<50; j++) if(i!=j){NR==i {print $4} NR==j {print $4}}}}' news.tsv
I'm open for other tools as well.

Is this what you want?
awk 'NR>1 {print $4; next; print $4}'

Here's an attempt based on your own script ... assuming that there are no embedded TABs, ever.
$ cat awkward
NR>1{
a[NR-1]=$4
}
END{
for(i=1;i<=50;i++){
for(j=1;j<=50;j++){
if(i!=j){
print a[i],a[j]
}
}
}
}
Invoked like so:
$ awk -f awkward test.tsv
Lorem ipsum
Lorem dolor
Lorem sit
Lorem amet,
Lorem consectetur
Lorem adipiscing
Lorem elit,
Lorem sed
Lorem do
Lorem eiusmod
.
.
[over 2000 lines stripped]
.
.
fugiat dolor
fugiat in
fugiat reprehenderit
fugiat in
fugiat voluptate
fugiat velit
fugiat esse
fugiat cillum
fugiat dolore
fugiat eu

If the total amount of the texts is not so huge, it will be efficient
store the whole texts in an array. Otherwise you need to read the input file
multiple times (the number of combination).
Then how about:
awk -F '\t' '
NR>1 {texts[++n]=$4}
END {
for (i=1; i<n; i++) {
for (j=i+1; j<=n; j++) {
print texts[i] " " texts[j]
}
}
}' news.tsv
Hope this helps.

Related

Chapter titles in the Header of a document generated by Asciidoctor-PDF

I'd need to add the Chapter title in the page's heading of PDF files generated with Asciidoctor-toPDF.
Here is the set of Properties I'm using at the beginning of my doc:
= Book title
:notitle:
:toc: left
:toclevels: 8
:sectnums:
:sectnumlevels: 8
:source-highlighter: coderay
:icons: font
:chapter-label:
:header_recto_content_center: '{section-title}'
Is there any property I am missing or which conflicts with the Header generation?
A few theme-related attributes exist. But header is not one of them unfortunately.
So to style the header of a PDF one must resort to a custom style in YAML format.
Style
For this example the file is named style.yml and placed in our working directory.
extends: default #1
header:
height: 15mm #2
recto: &header #3
center-content: '-- {section-or-chapter-title} --' #4
verso: *header #5
Extends the default asciidoctor-pdf theme with your customized code.
Define the height of the header or the content will not show; also explained here.
Create a YAML anchor to the whole content of recto.
The use of {section-or-chapter-title} is explained and shown at the end of the post.
Reference the content from &header. In plain make verso behave the same as recto.
Document
The content of the adoc file is as shown below. The file is aptly named book.adoc and placed in the same directory as style.yml.
= Book title
:notitle:
:toc: left
:toclevels: 8
:sectnums:
:sectnumlevels: 8
:source-highlighter: coderay
:icons: font
:chapter-label:
== Chapter One
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua.
== Chapter Two
=== Section One
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy
eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam
voluptua.
Conversion
PDF conversion is done using the pdf-stylesdir and pdf-style attributes.
asciidoctor-pdf -a pdf-stylesdir=. -a pdf-style=style.yml -d book book.adoc
Result
Chapter one looks then as shown in the screenshot. As there is no section header the chapter is used to populate the header.
Chapter two having both a chapter and a section defined is displaying the section title in the header.

Using grep to search for a word in a very long string

I have a huge file containing a unique long string. I need to search for a specific word in that file. Of course I cannot use gedit or similar software because they chock. So, a solution could be grep. The problem is that it returns the full string into the shell if the word matches, so I cannot find where the word is located and I cannot observe the other near words.
Is there any particular option to pass in order to stop/pause the grep shell stream (e.g., a certain number of chars after the match) as soon as it finds my word?
Use the -o option to "Show only the part of a matching line that matches PATTERN."
Example:
% cat lorem
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
% grep -oE '.{20}fugiat.{20}' lorem
se cillum dolore eu fugiat nulla pariatur. Exc
Edit: #tripleee suggested the E part, to give padding on either side of the match.
Use the -m NUM, --max-count=NUM option:
$ grep -m 1 [pattern] [/path/to/file]
Stop reading a file after NUM matching lines. If the input is standard input from a regular file, and NUM matching lines are output, grep ensures that the standard input is positioned to just after the last matching line before exiting, regardless of the presence of trailing context lines. This enables a calling process to resume a search. When grep stops after NUM matching lines, it outputs any trailing context lines. When the -c or --count option is also used, grep does not output a count greater than NUM. When the -v or --invert-match option is also used, grep stops after outputting NUM non-matching lines.

Quickly send long text in AutoHotkey

I've been trying to figure out how to insert/expand long text faster. The current keystroke method I'm using is quite time consuming and therefore something I would rather avoid.
Right now I am using the following method:
::abc::all bad cats
Or for longer text:
::li::
Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
However the second method is a little slow.
Any suggestions for how I can avoid this slow expansion method? Perhaps by using the clipboard to copy and paste from the AHK script?
Try this:
::li::
ClipSaved := ClipboardAll ; save the entire clipboard to the variable ClipSaved
clipboard := "" ; empty the clipboard (start off empty to allow ClipWait to detect when the text has arrived)
clipboard = ; copy this text:
(
Lorem ipsum dolor ...
line2
..
)
ClipWait, 2 ; wait max. 2 seconds for the clipboard to contain data.
if (!ErrorLevel) ; If NOT ErrorLevel, ClipWait found data on the clipboard
Send, ^v ; paste the text
Sleep, 300
clipboard := ClipSaved ; restore original clipboard
ClipSaved = ; Free the memory in case the clipboard was very large.
return
https://autohotkey.com/docs/misc/Clipboard.htm
Another Way to send Quickly long text with autohotkey Scripting languages is,
if you put all the text first to the Windows Registry Memory.
then you can Read it With [Registry Ram Memory Speed] to the [Clipboard Memory] paste it and it is done.
You can try this code:
Example1.ahk:
; [^ = Ctrl] [+ = Shift] [! = Alt] [# = Win]
#SingleInstance Force
;Variant 1
;put any clipboard text into variable a1
;send ^c
;a1=%clipboard%
;Variant 2
;or put any Variable to variable a1
;a1 := "Any Text"
;Variant 3
;or put any File text to variable a1
FileRead, a1, C:\My File1.txt
FileRead, a2, C:\My File2.txt
;Write the variable's To Registry - KeyHintText,value1 to value2
RegWrite, REG_SZ, HKEY_CURRENT_USER, software\KeyHintText,value1,%a1%
RegWrite, REG_SZ, HKEY_CURRENT_USER, software\KeyHintText,value2,%a2%
:*:abc::
clipboard := a1
send ^v
return
:*:def::
clipboard := a2
send ^v
return
Note - :*:abc:: = (you can type abc without Space) - ::abc:: = (you can type abc with Space)
If you use Windows Registry then the pros are :
1 - If the Value1 Text In the Registry is CHANGED, you use it with the same Hotkey :*:abc::
:*:abc::
RegRead, clipboard, HKEY_CURRENT_USER,software\KeyHintText,value1
send ^v
return
2 - If you Restart the Computer, all the Text is automatic Saved to the Ram Memory.
You can then only use this Code
example2.ahk
; [^ = Ctrl] [+ = Shift] [! = Alt] [# = Win]
#SingleInstance Force
;read the Variable's From Windows Registry
RegRead, a1, HKEY_CURRENT_USER,software\KeyHintText,value1
RegRead, a2, HKEY_CURRENT_USER,software\KeyHintText,value2
:*:abc::
clipboard := a1
send ^v
return
:*:def::
clipboard := a2
send ^v
return
Tip: I use it With Buttoncommander Software you can then make on the Windows Desktop a set of Clickable Pictures (toolbars), you can replace for example the abc text into Pictures, if you push these Images With your Mouse or touch device it will Execute (native) the Autohotkey Command Codes.
::li::
text =
(
Line1
Line2
...
)
; IfWinActive, ahk_group textEditors ; create a group in the auto execute section
SendInput, %text% ; SendInput is faster and more reliable
return
or
::li::
; IfWinActive, ahk_group textEditors
SendInput,
(
Line1
Line2
...
)
return

VIM: Insert a line number, with a space after

I need to insert the line number before each line of text using Vim, and there has to be a space after the line number. For example, if this was TestFile:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Morbi nunc enim, vehicula eget, ultricies vel, nonummy in, turpis.
It should look like this
1 Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
2 Morbi nunc enim, vehicula eget, ultricies vel, nonummy in, turpis.
I have been using the command :%s/^/\line('.')/ with a number of variations, but I cannot figure out how to get the space at the end.
Any ideas?
You were very close!
This substitution will do the job by concatenating the string ' ' to the line number:
%s!^!\=line('.').' '!
This is probably easiest with an external tool:
:%!nl -ba -w1 -s' '
You can use a macro. First make sure you have a 0 before the first line and have your cursor placed on it:
0 Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Morbi nunc enim, vehicula eget, ultricies vel, nonummy in, turpis.
foo
bar
etc...
Then perform this key sequence to store the right macro in register a: qaywjP0<C-A>q.
Now press #a to execute the macro. Use a quantifier to execute it multiple times.
Type :help q to find out more about recording macro's.

What do the f and t commands do in Vim?

What do f and t commands do in vim and exactly how they work?
Your first stop with questions like these should be vim's internal help, :h f and :h t. However, in this case, those entries are a bit cryptic without an example. Suppose we had this line (^ = cursor position):
The quick brown fox jumps over the lazy dog.
^
These commands find characters on a line. So fb would place the cursor here:
The quick brown fox jumps over the lazy dog.
^
t is like f but places the cursor on the preceding character. So tb would give you:
The quick brown fox jumps over the lazy dog.
^
You can remember these commands as find and till. Also, you can prepend the commands with a number to move to the nth occurrence of that character. For example, 3fb would move to the third b to the right of the cursor. My example sentence only has one b though, so the cursor wouldn't move at all.
Just to add to Michael Kristofik's answer, no description of f or t is complete without also mentioning ;.
From this Vim cheat sheet:
; "Repeat latest f, t, F or T [count] times."
So, to continue the #MichaelKristofik's theme:
The quick brown fox jumps over the lazy dog.
^
type fo to go to the first 'o':
The quick brown fox jumps over the lazy dog.
^
and then ; to go to the next one:
The quick brown fox jumps over the lazy dog.
^
I find f and t very useful in combination with d and c. For example, ct: will let you replace everything from your cursor up to the next colon, but not delete the colon. You can remember it as "change to colon".
fx jumps to the next x on the line.
tx jumps to the character just before the next x on the line.
You can use Fx and Tx to reach the previous x.
You can use 2fx to jump to the second x on the line.
So, fFand tT are useful when you want to go quickly to the next set of parentheses (f() or delete everything from the cursor to, but excluding, the previous = (dT=) and so on…
See :h motion.txt. It will blow your mind.
Since LondonRob mentioned ;, I guess a description of the comma , command is in order. It is used very much in conjunction with these commands (when the search overshoots).
After performing a search with f, F, t or T, one could use , to repeat the search in the opposite direction.
Let's say we are at the start of this sentence, and we would like to change the elot to elit.
Lorem ipsum dolor sit amet consectetur adipiscing elot sed do eiusmod tempor.
^
I know I have to replace an o, so I perform an fo (find o) immediately. The cursor is stuck at some early o in the line! Hit ; to repeat the search in the same direction. Type type type... I should have just done it five times, but let's say I overshoot and type ; six times instead. I end up here:
Lorem ipsum dolor sit amet consectetur adipiscing elot sed do eiusmod tempor.
^
Lorem ipsum dolor sit amet consectetur adipiscing elot sed do eiusmod tempor.
^
Lorem ipsum dolor sit amet consectetur adipiscing elot sed do eiusmod tempor.
^
Lorem ipsum dolor sit amet consectetur adipiscing elot sed do eiusmod tempor.
^
Lorem ipsum dolor sit amet consectetur adipiscing elot sed do eiusmod tempor.
^
Lorem ipsum dolor sit amet consectetur adipiscing elot sed do eiusmod tempor.
^
Lorem ipsum dolor sit amet consectetur adipiscing elot sed do eiusmod tempor.
^
Now, one could just perform a , twice to repeat this search in the other direction. The cursor will reach the o in elot.
Lorem ipsum dolor sit amet, consectetur adipiscing elot, sed do eiusmod tempor.
^
Lorem ipsum dolor sit amet, consectetur adipiscing elot, sed do eiusmod tempor.
^
ri to finish the replacement.
As with most movement commands, , also take a count: [count],.
From the manual:
Repeat latest f, t, F or T in opposite direction [count] times.

Resources