How to get comma/tab separated output of HDFS Quota? - linux

I am using below command to retrieve HDFS quota but I dont want the fancy output. Instead I need this output to be stored in a comma or tab separated format. By default it is not a tab separated.. Can anyone suggest this?
Command:
hdfs dfs -count -q -h -v /path/to/directory
Output is like this:
none inf 250 G 114.9 G 518 2.8 K 45.0 G /new/directory/X
Expected Output:
none,inf,250 G,114.9 G,518,2.8 K,45.0 G,/new/directory/X

How about using sed. They key thing is to identify a unique string to identify the separator in the hdfs output. That could be tab since you said they are tab separated. But, the sample output you posted used spaces.
Once you decide on a unique string use sed to search for that unique string and replace it with a comma. It looks like two or more spaces are unique to field separation in the hdfs output in all cases but the start of the line and the path. Perhaps you can accept a leading comma and do a second pass of sed for the path.
This Stack Overflow question covers sed replacing consecutive spaces.
hdfs dfs -count -q -h -v /path/to/directory | sed -e "s/[[:space:]]\{2,\}/,/g" | sed -e "s/[[:space:]]\//,\//g"
The solution is even simpler if they are tabs.
hdfs | sed -e $'s/\t/,/g'

Related

How to extract particular string from a file (linux)

I have the following file
vol-12345678 gp2
vol-89dfg58g
VOLUMES 2016-03-17T22:03:08.374Z False 100 16 snap-7073d522 in-use vol-4568gds4 gp2
ATTACHMENTS 2016-03-17T22:03:08.000Z True /dev/sda1 i-181ed33c attached vol-7ea1c83f
etc.
etc.
I want to extract all instances of 'vol-********' and output it to a file (without the other contents) resulting in a file of:
vol-12345678
vol-34556767
vol-34534sdf
...
This is a relatively small file so I could do it manually, but I have another file with 200+ cases. Any idea how to use this using GREP or SED or AWK? Thanks!
This should do it:
grep -o 'vol-[[:alnum:]]*' input.data | sort -u > output.data
UPDATE
Command:
sed -n 's/.*\b\(vol-[[:alnum:]]*\).*/\1/'p test2
Output:
vol-12345678
vol-89dfg58g
vol-4568gds4
vol-7ea1c83f
Flags:
n : Suppress automatic printing of pattern space.
p : Print out the pattern space
Pattern:
Look for 'vol-[alphanumeric]s'
Substitute it and print the first match with \1
More details Sed

Unix/Linux and MQ scripts explaination

echo "DISPLAY QL($Queue) CURDEPTH" \
| runmqsc Queue_Managr \
| grep 'CURDEPTH(' \
| sed 's/.*CURDEPTH//' \
| tr -d '()'.
Can anyone suggest how this script works? Actually this command displays the current depth value for a particular Q_Manager for a particular queue.
I understand "DISPLAY QL($Queue) CURDEPTH" | runmqsc Queue_Managr" - this command displays the queue name and curdepth{value}.
But I don't understand grep 'CURDEPTH(' | sed 's/.*CURDEPTH//' | tr -d '(). How does this command work?
It's a pipeline. It contains five stages, separated by the pipe character |. The output of one stage is used as the input to the next stage.
echo "DISPLAY blatti blatti" - this just outputs some text.
runmqsc Queue_Managr - Uses the text as input to the runmqsc-command, which does some MQ magic and outputs data.
grep 'CURDEPTH(' - Grep is a standard unix utility. It filters its input. In this case, only lines containing the text CURDEPTH( is allowed through to the next stage.
sed 's/.*CURDEPTH//' - Sed is another standard utility. It's short for "stream editor", and allows you edit the input as it passes through. In this case, the expression 's/.*CURDEPTH//' means to delete everything from the start of each line, up to and including the text CURDEPTH. (remember, only lines containing that text was passed through from the previous stage).
tr -d '()' - Finally, another standard utility, tr, which also allows editing the text that flows through from input to output. -d '()' means delete the characters ( and ) from the text.
The output from the final stage is shown in the terminal (if you ran your script in a terminal).
It's a fairly common way of building scripts in a unix shell. Generate the input data somehow, push it to a command, and massage the output data through a couple of stages each doing its little bit.
Long dissertations can be (and probably have been) written about all of grep, sed and tr. Look them up if you're interested.
CURDEPTH(3) DEFBIND(OPEN)
Notice that there are 2 pairs of attribute-value in this output. We need to handle only the appropriate pair.
We might be tempted to use the "cut" command to do simple trimming of the first pair to get the value.
However, the output from runmqsc for queues that have very long names (such as 48 characters) shows CURDEPTH as the 2nd pair (as shown below). Thus, a simple use of "cut" is no longer possible:
CRTIME(09.08.08) CURDEPTH(3)
The use of the "sed" (stream editor) can help us to get the value. Notice that the parenthesis are included.
$ echo "DISPLAY QL($QNAME) CURDEPTH" | runmqsc $QMNAME | grep 'CURDEPTH(' | sed 's/.*CURDEPTH//'
(3)
Notice that the answer is: (3)
Finally, it is necessary to remove the open and close parenthesis. This can be done using "tr" as follows:
$ echo "DISPLAY QL($QNAME) CURDEPTH" | runmqsc $QMNAME | grep 'CURDEPTH(' | sed 's/.*CURDEPTH//' | tr -d '()'
3
Notice that the answer is: 3

How do I use basic grep commands in Unix?

I need to display all the lines using the grep command that contain 2-6 'x's
Also need to know how to display all lines with 3 consecutive 'x's
I have tried grep x{2,6} example.txt but I keep getting an error saying that x6 is not found in the directory. My example file contains 7 lines increasing in the amount of 'x's by one in each line.
The Bash shell uses Brace Expansion to expand:
grep x{2,6} example.txt
into:
grep x2 x6 example.txt
Unless you have a file called x6 in your directory, you will get an error from grep telling you it can't open it.
Rule 1: enclose regular expressions to grep inside quotes — single quotes whenever possible.
Hence, use:
grep 'x{2,6}' example.txt
This deals with getting a regex to grep. Now we need to consider what it means. By default, this means look for the characters x, {, 2, ,, 6, } on a single line. Adding the -E option uses extended regular expressions, and the command looks for anything from 2 to 6 consecutive x's on a single line in the file:
grep -E 'x{2,6}' example.txt
However, it might be worth noting that this is pretty much the same as selecting 'xx' unless you have colouration on, or are selecting 'only' the matched text (the GNU grep extension -o option).
These are all for 2-6 adjacent x's, which is roughly what your proposed regex wanted.
You ask about three adjacent x's:
grep 'xxx' example.txt
The single quotes aren't 100% necessary, but they do no harm and remind you to use them for the regex in general.
Now we face the dilemma that you probably meant "between 2 and 6 x's on a single line, not necessarily adjacent, and not 0 or 1, nor 7 or more".
Rule 2: describe your required result precisely
Imprecise requirements lead to incorrect, or unintended, results. Meeting that requirement needs a more complex regex:
grep -E '^([^x]*x){2,6}[^x]*$' example.txt
That looks for 2-6 occurrences of zero or more non-x's followed by an x at the start of the line, followed by zero or more non-x's up to the end of line.
I need to display all the lines using GREP command that contain 2-6 'x's
grep -P '^(?:[^x]*x[^x]*){2,6}$' file
Also need to know how to display all lines with 3 consecutive 'x's
grep -P 'xxx' file

Linux sort -Help Wanted

I'm stuck in a problem for few days. Here it is maybe u got bigger brains than me!
I got a bunch of CSV files and i want them concatenated into a single .csv file, numeric sorted. Ok, first encountered problem is with the ID (i want to sort unly by ID) name.
eg
sort -f *.csv > output.csv This would work if i had standard ids like id001, id002, id010, id100
but my ids are like id1, id2, id10, id100 and this make my sort job inaccurate.
Ok
sort -t, -V *.csv > output.csv - This works perfectly on my test machine (sort --version GNU coreutils 8.5.0) but my live machine from work got 5.3.0 sort version (and they didn't had implemented -V syntax on it) and i cannot update it!
I'm feel so noob and unlucky
If you have a better idea please bring it on.
my csv file looks like
cn41 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn42 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn43 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn44 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn45 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
This is actually copy / paste from a csv. So let's say, this is my first CSV. and the other one looks like
cn201 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn202 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn203 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn204 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn205 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
Looking forward reading you!
Regards
You can use the -kX.Y for column X starting on Y character, together with -n for numeric:
sort -t, -k2.3 -n *csv
Given your sample file, it produces:
$ sort -t, -k2.3 -n file
,id1,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id2,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id10,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id40,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id101,aaaaaa,bbbbbbbbbb,cccccccccccc,ddddddd
,id201,aaaaaaaaa,bbbbbbbbbb,ccccccccccc,ddddddd
Update
For your given input, I would do:
$ cat *csv | sort -k1.3 -n
cn41 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn42 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn43 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn44 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn45 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
cn201 AQ34070YTW CDEAQ34070YTW 9C:B6:54:08:A3:C6 9C:B6:54:08:A3:C4
cn202 AQ34070YTY CDEAQ34070YTY 9C:B6:54:08:A4:22 9C:B6:54:08:A4:20
cn203 AQ34070YV1 CDEAQ34070YV1 9C:B6:54:08:9F:0E 9C:B6:54:08:9F:0C
cn204 AQ34070YV3 CDEAQ34070YV3 9C:B6:54:08:A3:7A 9C:B6:54:08:A3:78
cn205 AQ34070YW7 CDEAQ34070YW7 9C:B6:54:08:25:22 9C:B6:54:08:25:20
If your CSV format is fixed, you can use the shell equivalent of the decorate-sort-undecorate pattern:
cat *.csv | sed 's/^,id//' | sort -n | sed 's/^/,id/' >output.csv
The -n option is present even in ancient version of sort.
UPDATE: the updated input contains a number with a different prefix, and at a different position in the line. Here is a version that handles both kinds of input, as well as other inputs that have a number somewhere in the line, sorting by the first number:
cat *.csv | sed 's/^\([^0-9]*\)\([0-9][0-9]*\)/\2 \1\2/' \
| sort -n \
| sed 's/^[^ ]* //' > output.csv
You could try the -g option:
sort -t, -k 2.3 -g fileName
-t seperator
-k key/column
-g general numeric sort

Convert string to hexadecimal on command line

I'm trying to convert "Hello" to 48 65 6c 6c 6f in hexadecimal as efficiently as possible using the command line.
I've tried looking at printf and google, but I can't get anywhere.
Any help greatly appreciated.
Many thanks in advance,
echo -n "Hello" | od -A n -t x1
Explanation:
The echo program will provide the string to the next command.
The -n flag tells echo to not generate a new line at the end of the "Hello".
The od program is the "octal dump" program. (We will be providing a flag to tell it to dump it in hexadecimal instead of octal.)
The -A n flag is short for --address-radix=n, with n being short for "none". Without this part, the command would output an ugly numerical address prefix on the left side. This is useful for large dumps, but for a short string it is unnecessary.
The -t x1 flag is short for --format=x1, with the x being short for "hexadecimal" and the 1 meaning 1 byte.
If you want to do this and remove the spaces you need:
echo -n "Hello" | od -A n -t x1 | sed 's/ *//g'
The first two commands in the pipeline are well explained by #TMS in his answer, as edited by #James. The last command differs from #TMS comment in that it is both correct and has been tested. The explanation is:
sed is a stream editor.
s is the substitute command.
/ opens a regular expression - any character may be used. / is
conventional, but inconvenient for processing, say, XML or path names.
/ or the alternate character you chose, closes the regular expression and
opens the substitution string.
In / */ the * matches any sequence of the previous character (in this
case, a space).
/ or the alternate character you chose, closes the substitution string.
In this case, the substitution string // is empty, i.e. the match is
deleted.
g is the option to do this substitution globally on each line instead
of just once for each line.
The quotes keep the command parser from getting confused - the whole
sequence is passed to sed as the first option, namely, a sed script.
#TMS brain child (sed 's/^ *//') only strips spaces from the beginning of each line (^ matches the beginning of the line - 'pattern space' in sed-speak).
If you additionally want to remove newlines, the easiest way is to append
| tr -d '\n'
to the command pipes. It functions as follows:
| feeds the previously processed stream to this command's standard input.
tr is the translate command.
-d specifies deleting the match characters.
Quotes list your match characters - in this case just newline (\n).
Translate only matches single characters, not sequences.
sed is uniquely retarded when dealing with newlines. This is because sed is one of the oldest unix commands - it was created before people really knew what they were doing. Pervasive legacy software keeps it from being fixed. I know this because I was born before unix was born.
The historical origin of the problem was the idea that a newline was a line separator, not part of the line. It was therefore stripped by line processing utilities and reinserted by output utilities. The trouble is, this makes assumptions about the structure of user data and imposes unnatural restrictions in many settings. sed's inability to easily remove newlines is one of the most common examples of that malformed ideology causing grief.
It is possible to remove newlines with sed - it is just that all solutions I know about make sed process the whole file at once, which chokes for very large files, defeating the purpose of a stream editor. Any solution that retains line processing, if it is possible, would be an unreadable rat's nest of multiple pipes.
If you insist on using sed try:
sed -z 's/\n//g'
-z tells sed to use nulls as line separators.
Internally, a string in C is terminated with a null. The -z option is also a result of legacy, provided as a convenience for C programmers who might like to use a temporary file filled with C-strings and uncluttered by newlines. They can then easily read and process one string at a time. Again, the early assumptions about use cases impose artificial restrictions on user data.
If you omit the g option, this command removes only the first newline. With the -z option sed interprets the entire file as one line (unless there are stray nulls embedded in the file), terminated by a null and so this also chokes on large files.
You might think
sed 's/^/\x00/' | sed -z 's/\n//' | sed 's/\x00//'
might work. The first command puts a null at the front of each line on a line by line basis, resulting in \n\x00 ending every line. The second command removes one newline from each line, now delimited by nulls - there will be only one newline by virtue of the first command. All that is left are the spurious nulls. So far so good. The broken idea here is that the pipe will feed the last command on a line by line basis, since that is how the stream was built. Actually, the last command, as written, will only remove one null since now the entire file has no newlines and is therefore one line.
Simple pipe implementation uses an intermediate temporary file and all input is processed and fed to the file. The next command may be running in another thread, concurrently reading that file, but it just sees the stream as a whole (albeit incomplete) and has no awareness of the chunk boundaries feeding the file. Even if the pipe is a memory buffer, the next command sees the stream as a whole. The defect is inextricably baked into sed.
To make this approach work, you need a g option on the last command, so again, it chokes on large files.
The bottom line is this: don't use sed to process newlines.
echo hello | hexdump -v -e '/1 "%02X "'
Playing around with this further,
A working solution is to remove the "*", it is unnecessary for both the original requirement to simply remove spaces as well if substituting an actual character is desired, as follows
echo -n "Hello" | od -A n -t x1 | sed 's/ /%/g'
%48%65%6c%6c%6f
So, I consider this as an improvement answering the original Q since the statement now does exactly what is required, not just apparently.
Combining the answers from TMS and i-always-rtfm-and-stfw, the following works under Windows using gnu-utils versions of the programs 'od', 'sed', and 'tr':
echo "Hello"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
or in a CMD file as:
#echo "%1"| tr -d '\42' | tr -d '\n' | tr -d '\r' | od -v -A n -tx1 | sed "s/ //g"
A limitation on my solution is it will remove all double quotes (").
"tr -d '\42'" removes quote marks that the Windows 'echo' will include.
"tr -d '\r'" removes the carriage return, which Windows includes as well as '\n'.
The pipe (|) character must follow immediately after the string or the Windows echo will add that space after the string.
There is no '-n' switch to the Windows echo command.

Resources