How to reverse each word in a text file with linux commands without changing order of words - linux

There's lots of questions indicating how to reverse each word in a sentence, and I could readily do this in Python or Javascript for example, but how can I do it with Linux commands? It looks like tac might be an option, but seems like this would likely reverse lines as well as words, rather than just words? What other tools can do this? I literally have no idea. I know rev and tac and awk all seem like contenders...
So I'd like to go from:
cat dog sleep
pillow green blue
to:
tac god peels
wollip neerg eulb
**slight followup
From this reference it looks like I could use awk to break each field up into an array of single characters and then write a for loop to reverse manually each word in this way. This is quite awkward. Surely there's a better/more succinct way to do this?

Try this on for size:
sed -e 's/\s+/ /g' -e 's/ /\n/g' < file.txt | rev | tr '\n' ' ' ; echo
It collapses all the space and counts punctuation as part of "words", but it looks like it (at least mostly) works. Hooray for sh!

Related

I am trying to replace a text for example

Example:
"word" -nothing
To
word" - nothing
in gvim.
I tried
:%s/^.*\"/
But what I get is: -nothing
Well I am new to scripting so I would like to know if it can be done in any other way like using gvim or awk or sed.
In vim... Check for \(word + quote + space + hyphen\) as first reference, followed directly by another \(word\) as second reference... replace by first reference + space + second reference... Make sure the find/replace can happen multiple times on a line with g suffix.
:%s/\(\w" -\)\(\w\)/\1 \2/g
Note that I left out the leading quote... I suppose it is possible you might have spaces in the quoted text - and I think this form might be better for you. Now in sed, that is the really cool thing about the relationship between *nix tools - they all use similar (or the same) regular expressions pattern language. So, the same exact pattern as above can be done in sed (using : as delimiter for clarity).
sed 's:\(\w" -\)\(\w\):\1 \2:g'
Awk doesn't do back references; so, not to say it can't be done, but it is not so convenient.
Could you please try following and let me know if this helps you.
awk '{sub(/^"/,"");sub(/-/,"- ")} 1' Input_file
Solution 2nd: With sed.
sed 's/^"//;s/-/- /' Input_file
Since you also tagged grep: GNU grep has the -P switch for PCRE (Perl compatible reg ex) which has \K: Keep the stuff left of the \K, don't include it in $&, so:
$ echo \"word\" | grep -oP "\"\Kword\""
word"
If I understand your question correctly you want to replace first " in each line with empty string. So in sed it is just:
sed 's/"//'
Without g flag it will replace only first occurrence in each line.
EDIT:
The same way it will work in Vim (unless you have 'gdefault' option set), so in Vim you can:
:%s/"//
try this :
:%s/\"(.*)\"/\1\"/gc

Embedding quotation marks in command string generated by AWK?

I need to match all instances of strings in one file, with a master list in another. However, if my string is abc I want only that, not abcdef, abc1234 and so on.
So, a word boundary for the regex? Right now, I'm using a simple awk one liner:
cat results_file| sort -k 1| awk -F" " '{ print $1" /home/owner/file_2_search"}'|
xargs -L 1 /bin/grep -i
However, to force a word boundary, I'd need to grep string\b and the quotes (single or double) seem to be required.
In awk, \b is a special character, you need \\b ... And the quoted quotes ... (arg) ... Or am I missing something and overdoing this?
This is a Linux box, so presumably gawk. I have gone over quoting rules for awk, and realize this has got to be simple (and not complex ... but), but am not seeing it.
Had meant to post as an answer, not a comment. Will try to pose a more readable question, but confess to having second thoughts about doing this as a one-liner in the first place -- may be best to follow an alternate method. Appreciate the willingness to help.
--Joe

What are the differences among grep, awk & sed? [duplicate]

This question already has answers here:
What are the differences between Perl, Python, AWK and sed? [closed]
(5 answers)
What is the difference between sed and awk? [closed]
(3 answers)
Closed last month.
I am confused about the differences between grep, awk and sed in terms of their role in Unix/Linux system administration and text processing.
Short definition:
grep: search for specific terms in a file
#usage
$ grep This file.txt
Every line containing "This"
Every line containing "This"
Every line containing "This"
Every line containing "This"
$ cat file.txt
Every line containing "This"
Every line containing "This"
Every line containing "That"
Every line containing "This"
Every line containing "This"
Now awk and sed are completly different than grep.
awk and sed are text processors. Not only do they have the ability to find what you are looking for in text, they have the ability to remove, add and modify the text as well (and much more).
awk is mostly used for data extraction and reporting. sed is a stream editor
Each one of them has its own functionality and specialties.
Example
Sed
$ sed -i 's/cat/dog/' file.txt
# this will replace any occurrence of the characters 'cat' by 'dog'
Awk
$ awk '{print $2}' file.txt
# this will print the second column of file.txt
Basic awk usage:
Compute sum/average/max/min/etc. what ever you may need.
$ cat file.txt
A 10
B 20
C 60
$ awk 'BEGIN {sum=0; count=0; OFS="\t"} {sum+=$2; count++} END {print "Average:", sum/count}' file.txt
Average: 30
I recommend that you read this book: Sed & Awk: 2nd Ed.
It will help you become a proficient sed/awk user on any unix-like environment.
Grep is useful if you want to quickly search for lines that match in a file. It can also return some other simple information like matching line numbers, match count, and file name lists.
Awk is an entire programming language built around reading CSV-style files, processing the records, and optionally printing out a result data set. It can do many things but it is not the easiest tool to use for simple tasks.
Sed is useful when you want to make changes to a file based on regular expressions. It allows you to easily match parts of lines, make modifications, and print out results. It's less expressive than awk but that lends it to somewhat easier use for simple tasks. It has many more complicated operators you can use (I think it's even turing complete), but in general you won't use those features.
I just want to mention a thing, there are many tools can do text processing, e.g.
sort, cut, split, join, paste, comm, uniq, column, rev, tac, tr, nl, pr, head, tail.....
they are very handy but you have to learn their options etc.
A lazy way (not the best way) to learn text processing might be: only learn grep , sed and awk. with this three tools, you can solve almost 99% of text processing problems and don't need to memorize above different cmds and options. :)
AND, if you 've learned and used the three, you knew the difference. Actually, the difference here means which tool is good at solving what kind of problem.
a more lazy way might be learning a script language (python, perl or ruby) and do every text processing with it.

Sorting on the last field of a line

What is the simplest way to sort a list of lines, sorting on the last field of each line? Each line may have a variable number of fields.
Something like
sort -k -1
is what I want, but sort(1) does not take negative numbers to select fields from the end instead of the start.
I'd also like to be able to choose the field delimiter too.
Edit: To add some specificity to the question: The list I want to sort is a list of pathnames. The pathnames may be of arbitrary depth hence the variable number of fields. I want to sort on the filename component.
This additional information may change how one manipulates the line to extract the last field (basename(1) may be used), but does not change sorting requirements.
e.g.
/a/b/c/10-foo
/a/b/c/20-bar
/a/b/c/50-baz
/a/d/30-bob
/a/e/f/g/h/01-do-this-first
/a/e/f/g/h/99-local
I want this list sorted on the filenames, which all start with numbers indicating the order the files should be read.
I've added my answer below which is how I am currently doing it. I had hoped there was a simpler way - maybe a different sort utility - perhaps without needing to manipulate the data.
awk '{print $NF,$0}' file | sort | cut -f2- -d' '
Basically, this command does:
Repeat the last field at the beginning, separated with a whitespace (default OFS)
Sort, resolve the duplicated filenames using the full path ($0) for sorting
Cut the repeated first field, f2- means from the second field to the last
Here's a Perl command line (note that your shell may require you to escape the $s):
perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} <>"
Just pipe the list into it or, if the list is in a file, put the filename at the end of the command line.
Note that this script does not actually change the data, so you don't have to be careful about what delimeter you use.
Here's sample output:
>perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} " files.txt
/a/e/f/g/h/01-do-this-first
/a/b/c/10-foo
/a/b/c/20-bar
/a/d/30-bob
/a/b/c/50-baz
/a/e/f/g/h/99-local
something like this
awk '{print $NF"|"$0}' file | sort -t"|" -k1 | awk -F"|" '{print $NF }'
A one-liner in perl for reversing the order of the fields in a line:
perl -lne 'print join " ", reverse split / /'
You could use it once, pipe the output to sort, then pipe it back and you'd achieve what you want. You can change / / to / +/ so it squeezes spaces. And you're of course free to use whatever regular expression you want to split the lines.
I think the only solution would be to use awk:
Put the last field to the front using awk.
Sort lines.
Put the first field to the end again.
Replace the last delimiter on the line with another delimiter that does not otherwise appear in the list, sort on the second field using that other delimiter as the sort(1) delimiter, and then revert the delimiter change.
delim=/
new_delim=" "
cat $list \
| sed "s|\(.*\)$delim|\1$new_delim|" \
| sort -t"$new_delim" -k 2,2 \
| sed "s|$new_delim|$delim|"
The problem is knowing what delimiter to use that does not appear in the list. You can make multiple passes over the list and then grep for a succession of potential delimiters, but it's all rather nasty - particularly when the concept of "sort on the last field of a line" is so simply expressed, yet the solution is not.
Edit: One safe delimiter to use for $new_delim is NUL since that cannot appear in filenames, but I don't know how to put a NUL character into a bourne/POSIX shell script (not bash) and whether sort and sed will properly handle it.
#!/usr/bin/ruby
f = ARGF.read
lines = f.lines
broken = lines.map {|l| l.split(/:/) }
sorted = broken.sort {|a, b|
a[-1] <=> b[-1]
}
fixed = sorted.map {|s| s.join(":") }
puts fixed
If all the answers involve perl or awk, might as well solve the whole thing in the scripting language. (Incidentally, I tried in Perl first and quickly remembered that I dislike Perl's lists-of-lists. I'd love to see a Perl guru's version.)
I want this list sorted on the filenames, which all start with numbers
indicating the order the files should be read.
find . | sed 's#.*/##' | sort
the sed replaces all parts of the list of results that ends in slashes. the filenames are whats left, and you sort on that.
Here is a python oneliner version, note that it assumes the field is integer, you can change that as needed.
echo file.txt | python3 -c 'import sys; list(map(sys.stdout.write, sorted(sys.stdin, key=lambda x: int(x.rsplit(" ", 1)[-1]))))'
| sed "s#(.*)/#\1"\\$'\x7F'\# \
| sort -t\\$'\x7F' -k2,2 \
| sed s\#\\$'\x7F'"#/#"
Still way worse than simple negative field indexes for sort(1) but using the DEL character as delimiter shouldn’t cause any problem in this case.
I also like how symmetrical it is.
sort allows you to specify the delimiter with the -t option, if I remember it well. To compute the last field, you can do something like counting the number of delimiters in a line and sum one. For instance something like this (assuming the ":" delimiter):
d=`head -1 FILE | tr -cd : | wc -c`
d=`expr $d + 1`
($d now contains the last field index).

truncate output in BASH

How do I truncate output in BASH?
For example, if I "du file.name" how do I just get the numeric value and nothing more?
later addition:
all solutions work perfectly. I chose to accept the most enlightning "cut" answer because I prefer the simplest approach in bash files others are supposed to be able to read.
If you know what the delimiters are then cut is your friend
du | cut -f1
Cut defaults to tab delimiters so in this case you are selecting the first field.
You can change delimiters: cut -d ' ' would use a space as a delimiter. (from Tomalak)
You can also select individual character positions or ranges:
ls | cut -c1-2
I'd recommend cut, as others have said. But another alternative that is sometimes useful because it allows any whitespace as separators, is to use awk:
du file.name | awk '{print $1}'
du | cut -f 1
If you just want the number of bytes of a single file, use the -s operator.
SIZE=-s file.name
That gives you a different number than du, but I'm not sure how exactly you're using this.
This has the advantage of not having to run du, and having bash get the size of the file directly.
It's hard to answer questions like this in a vacuum, because we don't know how you're going to use the data. Knowing that might suggest an entirely different answer.

Resources