truncate output in BASH - linux

How do I truncate output in BASH?
For example, if I "du file.name" how do I just get the numeric value and nothing more?
later addition:
all solutions work perfectly. I chose to accept the most enlightning "cut" answer because I prefer the simplest approach in bash files others are supposed to be able to read.

If you know what the delimiters are then cut is your friend
du | cut -f1
Cut defaults to tab delimiters so in this case you are selecting the first field.
You can change delimiters: cut -d ' ' would use a space as a delimiter. (from Tomalak)
You can also select individual character positions or ranges:
ls | cut -c1-2

I'd recommend cut, as others have said. But another alternative that is sometimes useful because it allows any whitespace as separators, is to use awk:
du file.name | awk '{print $1}'

du | cut -f 1

If you just want the number of bytes of a single file, use the -s operator.
SIZE=-s file.name
That gives you a different number than du, but I'm not sure how exactly you're using this.
This has the advantage of not having to run du, and having bash get the size of the file directly.
It's hard to answer questions like this in a vacuum, because we don't know how you're going to use the data. Knowing that might suggest an entirely different answer.

Related

Embedding quotation marks in command string generated by AWK?

I need to match all instances of strings in one file, with a master list in another. However, if my string is abc I want only that, not abcdef, abc1234 and so on.
So, a word boundary for the regex? Right now, I'm using a simple awk one liner:
cat results_file| sort -k 1| awk -F" " '{ print $1" /home/owner/file_2_search"}'|
xargs -L 1 /bin/grep -i
However, to force a word boundary, I'd need to grep string\b and the quotes (single or double) seem to be required.
In awk, \b is a special character, you need \\b ... And the quoted quotes ... (arg) ... Or am I missing something and overdoing this?
This is a Linux box, so presumably gawk. I have gone over quoting rules for awk, and realize this has got to be simple (and not complex ... but), but am not seeing it.
Had meant to post as an answer, not a comment. Will try to pose a more readable question, but confess to having second thoughts about doing this as a one-liner in the first place -- may be best to follow an alternate method. Appreciate the willingness to help.
--Joe

grep with variable numbers of whitespaces

I want to grep for a string which I know exists in my files. However, the source it comes from managed to change the number of whitespaces so that the content per se is identical in the string, but the length differs. => ordinary grep does not find it. Is there a way to adjust for it?
I dont's see a system behind the additional whitespace effect
Here's the original string
4FD0-A tr|A5ZLA0|A5ZLA0_9BACE Bacterial
and here's the modified string
4FD0-A tr|A5ZLA0|A5ZLA0_9BACE Bacterial
I believe that egrep would be your friend here. Try the following command:
egrep '4FD0-A\s+tr[|]A5ZLA0[|]A5ZLA0_9BACE\s+Bacterial' filename
I used a rather simple pattern for my example. Feel free to change it to suit.
You can use the "one or more" + operator. For example
grep '^4FD0-A\s\+tr|' myfile
The \s searches for any whitespace. If you want to limit it to only spaces just use a single space in place of the \s
You can use all kind of utilities for deleting spaces.
Try this:
cat file |tr -s '\ ' | grep '4FD0-A tr|A5ZLA0|A5ZLA0_9BACE Bacterial'

Sorting on the last field of a line

What is the simplest way to sort a list of lines, sorting on the last field of each line? Each line may have a variable number of fields.
Something like
sort -k -1
is what I want, but sort(1) does not take negative numbers to select fields from the end instead of the start.
I'd also like to be able to choose the field delimiter too.
Edit: To add some specificity to the question: The list I want to sort is a list of pathnames. The pathnames may be of arbitrary depth hence the variable number of fields. I want to sort on the filename component.
This additional information may change how one manipulates the line to extract the last field (basename(1) may be used), but does not change sorting requirements.
e.g.
/a/b/c/10-foo
/a/b/c/20-bar
/a/b/c/50-baz
/a/d/30-bob
/a/e/f/g/h/01-do-this-first
/a/e/f/g/h/99-local
I want this list sorted on the filenames, which all start with numbers indicating the order the files should be read.
I've added my answer below which is how I am currently doing it. I had hoped there was a simpler way - maybe a different sort utility - perhaps without needing to manipulate the data.
awk '{print $NF,$0}' file | sort | cut -f2- -d' '
Basically, this command does:
Repeat the last field at the beginning, separated with a whitespace (default OFS)
Sort, resolve the duplicated filenames using the full path ($0) for sorting
Cut the repeated first field, f2- means from the second field to the last
Here's a Perl command line (note that your shell may require you to escape the $s):
perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} <>"
Just pipe the list into it or, if the list is in a file, put the filename at the end of the command line.
Note that this script does not actually change the data, so you don't have to be careful about what delimeter you use.
Here's sample output:
>perl -e "print sort {(split '/', $a)[-1] <=> (split '/', $b)[-1]} " files.txt
/a/e/f/g/h/01-do-this-first
/a/b/c/10-foo
/a/b/c/20-bar
/a/d/30-bob
/a/b/c/50-baz
/a/e/f/g/h/99-local
something like this
awk '{print $NF"|"$0}' file | sort -t"|" -k1 | awk -F"|" '{print $NF }'
A one-liner in perl for reversing the order of the fields in a line:
perl -lne 'print join " ", reverse split / /'
You could use it once, pipe the output to sort, then pipe it back and you'd achieve what you want. You can change / / to / +/ so it squeezes spaces. And you're of course free to use whatever regular expression you want to split the lines.
I think the only solution would be to use awk:
Put the last field to the front using awk.
Sort lines.
Put the first field to the end again.
Replace the last delimiter on the line with another delimiter that does not otherwise appear in the list, sort on the second field using that other delimiter as the sort(1) delimiter, and then revert the delimiter change.
delim=/
new_delim=" "
cat $list \
| sed "s|\(.*\)$delim|\1$new_delim|" \
| sort -t"$new_delim" -k 2,2 \
| sed "s|$new_delim|$delim|"
The problem is knowing what delimiter to use that does not appear in the list. You can make multiple passes over the list and then grep for a succession of potential delimiters, but it's all rather nasty - particularly when the concept of "sort on the last field of a line" is so simply expressed, yet the solution is not.
Edit: One safe delimiter to use for $new_delim is NUL since that cannot appear in filenames, but I don't know how to put a NUL character into a bourne/POSIX shell script (not bash) and whether sort and sed will properly handle it.
#!/usr/bin/ruby
f = ARGF.read
lines = f.lines
broken = lines.map {|l| l.split(/:/) }
sorted = broken.sort {|a, b|
a[-1] <=> b[-1]
}
fixed = sorted.map {|s| s.join(":") }
puts fixed
If all the answers involve perl or awk, might as well solve the whole thing in the scripting language. (Incidentally, I tried in Perl first and quickly remembered that I dislike Perl's lists-of-lists. I'd love to see a Perl guru's version.)
I want this list sorted on the filenames, which all start with numbers
indicating the order the files should be read.
find . | sed 's#.*/##' | sort
the sed replaces all parts of the list of results that ends in slashes. the filenames are whats left, and you sort on that.
Here is a python oneliner version, note that it assumes the field is integer, you can change that as needed.
echo file.txt | python3 -c 'import sys; list(map(sys.stdout.write, sorted(sys.stdin, key=lambda x: int(x.rsplit(" ", 1)[-1]))))'
| sed "s#(.*)/#\1"\\$'\x7F'\# \
| sort -t\\$'\x7F' -k2,2 \
| sed s\#\\$'\x7F'"#/#"
Still way worse than simple negative field indexes for sort(1) but using the DEL character as delimiter shouldn’t cause any problem in this case.
I also like how symmetrical it is.
sort allows you to specify the delimiter with the -t option, if I remember it well. To compute the last field, you can do something like counting the number of delimiters in a line and sum one. For instance something like this (assuming the ":" delimiter):
d=`head -1 FILE | tr -cd : | wc -c`
d=`expr $d + 1`
($d now contains the last field index).

more efficent shell text manipulation

I am using this command:
cut -d: -f2
To sort and reedit text, Is there a more efficient way to do this without using sed or awk?
I would also like to know how I would append a period to the end of each field
At the moment the output is like $x['s'] and I would like it to be $x['s'] .
Just using standard unix tools
edit: I just wanted to know if it was possible without sed or awk, otherwise how would you do it with awk?
Short answer: not really
Longer answer: cut is intended for slicing up lines of text, it does that well. If you need a more complicated behavior, you'll need a text manipulation language. You have rejected the old time answers, so I'll recommend perl.
Any particular reason you don't want to use sed or awk?
edit: I just wanted to know if it was possible without sed or awk, otherwise how would you do it with awk?
awk -F: '{print $2"."}'

What is the easiest way to join 12 columns?

I have 12 columns separated by a tab. How can I join them side-by-side?
[Added] You can also tell me other methods as AWK: the faster the better.
Since you asked specifically about awk (there are tools better suited to the job), the following is a first-cut solution:
awk '{print $1$2$3$4$5$6$7$8$9$10$11$12}'
A more complicated and configurable solution, where you could change the number of columns used for output, would be:
awk -v lim=12 '{for(x=1;x<lim;x++){printf "%s",$x};print ""}'
Other possibilities, if you're not restricted to awk, are:
tr -d '\011' # to combine ALL columns on the line.
cut --output-delimiter='' -f1-12 # more general (1-12 or 3-7 or 1-6,9).
Based on your edit and comments, I suggest cut is the best tool for the job. Use "man cut", "info cut" or "cut --help" for more details (this depends on your platform).
If you are just using awk to concatenate the columns I would use 'tr' and delete tab
cat file1 | tr -d '\011'> file2
Try this:
{
print $1$2$3$4$5$6$7$8$9$(10)$(11)$(12)
}
I'm not an awk genius so I don't know if there's some sort of looping construct you can use.
Well, it depends on your editor/command of choice. But generally, it boild down to replacing the character with nothing.
For example, in vim: ":%s/\t//g"
You did not mention what tool you would like to use but any text editor would be able to replace the tab to an empty character, I guess that would work, that's what I usually do.

Resources