when I use the cut command, I get a result that is not expected - linux

there is my script:
table_nm=$1
hive_db=$(echo $table_nm | cut -d'.' -f1)
hive_tb=$(echo $table_nm | cut -d'.' -f2)
At first, I got the right result:
$echo "dev.dmf_bird_cost_detail" | cut -d'.' -f1
dev #correct
$echo "dev.dmf_bird_cost_detail" | cut -d'.' -f2
dmf_bird_cost_detail #correct
but,i got something is wrong,if there is no specified character in $table_nm, I get this result:
$echo "dmf_bird_cost_detail" | cut -d'.' -f1
dmf_bird_cost_detail
$echo "dmf_bird_cost_detail" | cut -d'.' -f2
dmf_bird_cost_detail
$echo "dmf_bird_cost_detail" | cut -d'.' -f3
dmf_bird_cost_detail
The result that is not I expected, i hope it's empty, so i conducted some tests and found that if there is no specified character in the string, the command "cut" will return the original value, is that true?
At last,i know "awk" will solves my problem, but I would like to know why "cut" has the above result?
Thank you guys so much!

From POSIX cut specification:
-f list
[...] Lines with no field delimiters shall be passed through intact, unless -s is specified. [...]
why "cut" has the above result?
My guess would be that the first implementations of cut had such behavior (my guess would be it was a bug), and it was preserved and POSIX standardized existing behavior and added -s option. You can browse https://minnie.tuhs.org/cgi-bin/utree.pl for some old verison of cut.

The proper solution is probably to use a parameter expansion anyway.
hive_db=${table_nm%.*}
hive_tb=${table_nm#"$hive_db".}
If you expect more than one dot, you need some additional processing to extract the second field.
Because this uses shell built-ins, it is a lot more efficient than spawning two processes for each field you want to extract (and even then you should use proper quoting).

Related

Bash: Flip strings to the other side of the delimiter

Basically, I have a file formatted like
ABC:123
And I would like to flip the strings around the delimiter, so it would look like this
123:ABC
I would prefer to do this with bash/linux tools.
Thanks for any help!
That's reasonably easy with internal bash commands, assuming two fields, as per the following transcript:
pax:~$ x='abc:123'
pax:~$ echo "${x#*:}:${x%:*}"
123:abc
The first substitution ${x#*:} removes everything from the start up to the colon. The second, ${x%:*}, removes everything from the colon to the end.
Then you just re-join them with the colon in-between.
It doesn't matter for your particular data but % and # use the shortest possible pattern. The %% and ## variants will give you the longest possible pattern (greedy).
As an aside, this is ideal if you doing it for one string at a time since you don't need to kick up an external process to do the work for you. But, if you're processing an entire file, there are better ways to do it, such as with awk:
pax:~$ printf "abc:123\ndef:456\nghi:789\n" | awk -F: '{print $2 FS $1}'
123:abc
456:def
789:ghi
#!/bin/sh -x
var1=$(echo -e 'ABC:123' | cut -d':' -f1)
var2=$(echo -e 'ABC:123' | cut -d':' -f2)
echo -e "${var2}":"${var1}"
I use cut to split the string into two parts, and store both of those parts as variables.
From there, it's possible to use echo to re-arrange the variables as you see fit.
Using sed.
sed -E 's/(.*):(.*)/\2:\1/' file.txt
Using paste and cut with process substitution.
paste -d: <(cut -d : -f2 file.txt) <(cut -d : -f1 file.txt)
A slower/slowest shell solution on large set of data/files.
while IFS=: read -r left rigth; do printf '%s:%s\n' "$rigth" "$left"; done < file.txt

Extract all unique URL from log using sed

Can you help me with correct regexp from the sed syntaxis point of view? For now every regexp that i can write is marked by terminal as invalid.
If your log syntax is uniform, use this command
cut -f4 -d\" < logfile | sort -u
If you want to skip the query string from uniqness, use this
cut -f4 -d\" < logfile | cut -f1 -d\? | sort -u
Explanation
Filter the output with the cut command, take the 4th field (-f4) using " as separator (-d\"). The same with the second filter, using ? as separator

how to use sort, cut, and unique commands in pipe

I was wondering how do you use the cut, sort, and uniq commands in a pipeline and give a command line that indicates how many users are using each of the shells mentioned in /etc/passwd?
i'm not sure if this is right but
cut -f1 -d':' /etc/passwd | sort -n | uniq
?
Summarizing the answers excruciatingly hidden in comments:
You were close, only
as tripleee noticed, the shell is in the seventh field
as shellter noticed, since the shells are not numbers, -n is useless
as shellter noticed, for the counting, there's uniq -c
That gives
cut -f7 -d: /etc/passwd | sort | uniq -c

How to grep only the content that contains x and y?

I have 2mill lines of content and all lines look like this:
--username:orderID:email:country
I already added a -- prefix to all usernames.
What I need now is to get ONLY the usernames from the file. I think its possible with grep file starting with "--" ending with ":", but I have absolutely no idea.
So output should be:
usernameThank you all for the help.
THIS WORKED:
cut -d: -f1
Even without adding the prefix, you should be able to get the usernames with cut:
cut -d: -f1
-d says what the delimiter is, -f says which field(s) to return.
Try this:
cat YOUR_FILE | sed "s/:/\n/g" | grep "\-\-"

Unix cut except last two tokens

I'm trying to parse file names in specific directory. Filenames are of format:
token1_token2_token3_token(N-1)_token(N).sh
I need to cut the tokens using delimiter '_', and need to take string except the last two tokens. In above examlpe output should be token1_token2_token3.
The number of tokens is not fixed. I've tried to do it with -f#- option of cut command, but did not find any solution. Any ideas?
With cut:
$ echo t1_t2_t3_tn1_tn2.sh | rev | cut -d_ -f3- | rev
t1_t2_t3
rev reverses each line.
The 3- in -f3- means from the 3rd field to the end of the line (which is the beginning of the line through the third-to-last field in the unreversed text).
You may use POSIX defined parameter substitution:
$ name="t1_t2_t3_tn1_tn2.sh"
$ name=${name%_*_*}
$ echo $name
t1_t2_t3
It can not be done with cut, However, you can use sed
sed -r 's/(_[^_]+){2}$//g'
Just a different way to write ysth's answer :
echo "t1_t2_t3_tn1_tn2.sh" |rev| cut -d"_" -f1,2 --complement | rev

Resources