cut command in bash terminating on quotation marks - linux

So I am trying to read in a file that has a bunch of lines with an email address and then a nickname in them. I am trying to extract this nickname, which is surrounded by parentheses, like below
email#somewhere.com (Tom)
so my thought was just to use cut to get at the word Tom, but this is foiled when I end up with something like the following
email2#somewhereElse.com ("Bob")
Because Bob has quotes around it, the cut command fails as follows
cut: <file>: Illegal byte sequence
Does anyone know of a better way of doing this? or a way to solve this problem?

Reset your locale to C (raw uninterpreted byte sequence) to avoid Illegal byte sequence errors.
locale charmap
LC_ALL=C cut ... | LC_ALL=C sort ...

I think that
grep -o '(.*)' emailFile
should do it. "Go through all lines in the file. Look for a sequence that starts with open parens, then any characters until close parens. Echo the bit that matches the string to stdout."
This preserves the quotes around the nickname... as well as the brackets. If you don't want those, you can strip them:
grep -o '(.*)' emailFile | sed 's/[(")]//g'
("replace any of the characters between square brackets with nothing, everywhere")

perl -lne '$_=~/[^\(]*\(([^)]*)\)/g;print $1'
tested here

Related

Lab for my Alt OS class and I'm unsure of this Linux Command

I'm currently doing a lab for my Alt OS class and the professor gives multiple commands that you have to explain their function for. The one I'm stuck on is
find /home/ -user bob | xargs -d “\n” chown bill:bill
I understand that we are finding any items within bob's home folder and piping that to xargs which is delimiting something. I'm just unsure what the "\n" portion is doing. At the end, I understand we are taking whatever those results are and changing permissions to bill.
From man xargs:
--delimiter=delim, -d delim
Input items are terminated by the specified character. The specified delimiter may be a single character, a C-style character escape such
as \n, or an octal or hexadecimal escape code. Octal and hexadecimal escape codes are understood as for the printf command. Multibyte
characters are not supported. When processing the input, quotes and backslash are not special; every character in the input is taken lit‐
erally. The -d option disables any end-of-file string, which is treated like any other argument. You can use this option when the input
consists of simply newline-separated items, although it is almost always better to design your program to use --null where this is possi‐
ble.
The \n escape sequence in C means a newline. The -d '\n' is typically used in xargs to delimite items by newlines - read one item per line. There is a significant difference as to quote handling:
$ echo "quote'not terminated" | xargs
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
vs
$ echo "quote'not terminated" | xargs -d'\n'
quote'not terminated
On cppreference escape sequences you may find C escape sequences.

How to correctly detect and replace apostrophe (') with sed?

I'm having a directory with many files having special characters and spaces. I want to perform an operation with all these files so I'm trying to store all filenames in a list.txt and then run the command with this list.
The special characters in my list are & []'.
So basically I want to use sed to replace each occurence with \ + the character in question.
E.g. : filename .txt => filename\ .txt etc...
The thing is I have trouble handling apostrophes.
Here is my command as of now :
ls | sed 's/\ /\\ /g' | sed 's/\&/\\&/g' | sed "s/\'/\\'/g" | sed 's/\[/\\[/g' | sed 's/\]/\\]/g'
At first I had issues with, I believe, the apostrophes in the string command in conflict with the apostrophes surrounding the string. So I used double quotes instead, but it still doesn't work.
I've tried all these and nothing worked :
sed "s/\'/\\'/g" (escaping the apostrophe)
sed "s/'/\'/g" (escaping nothing)
sed "s/'/\\'/g" (escaping the backslash)
sed 's/"'"/\"'"/g' (double quoting single quote)
As a disclaimer, I must say, I'm completely new to sed. I just run my first sed command today, so maybe I'm doing something wrong I didn't realize.
PS : I've seen those thread, but no answer worked for me :
https://unix.stackexchange.com/questions/157076/how-to-remove-the-apostrophe-and-delete-the-space
How to replace to apostrophe ' inside a file using SED
This may do:
cat file
avbadf
test&rr
more [ yes
this ]
and'df
sed -r 's/(\x27|&|\[|\])/\\\1/g' file
avbadf
test\&rr
more \[ yes
this \]
and\'df
\x27 is equal to singe quote '
\x22 is equal to double quote "
Whoops, I found the answer to my question. Here is the working input :
sed "s/'/\\\'/g"
This will effectively replace any ' with \'.
However I'm having trouble understanding exactly what's happening here.
So if I understand correctly, we are escaping the backslash and the apostrophe in the replacement string. Now, if somebody could answer some those, I would be grateful :
Why don't we need to escape the first quote (the one in the pattern to find) ?
Why do we have to escape the backslash whereas for the other characters, there's no need ?
Why do we need to escape the second quote (the one in the replacement string) ?
I think all of your sed matches actually need that replacement pattern. This one seems to work for all examples:
ls | sed "s/\ /\\\ /g" | sed "s/\&/\\\&/g" | sed "s/\[/\\\[/g" | sed "s/\]/\\\]/g" | sed "s/'/\\\'/g"
So it is s/regex/replacement/command and 'regex' and 'replacement' have different sets of special characters.
The only one that's different is s/'/\\\'/g and there only because I don't believe there is any special ' character on the regex expression. There is some obscure \' special character in the replacement expression, for matching buffer ends in multi-line mode, accord to the docs. That might be why it needs an escape in the replacement side, but not in the regex side.
For example, \5 is a special character in the replacement expression, so to replace:
filename5.txt -> filename\5.txt
You would also need, as with apostrophe:
sed "s/5/\\\5/g"
It probably has to do with the mysterious inner works of sed parsing, it might read from right to left or something.
Please try the following:
sed 's/[][ &'\'']/\\&/g' file
By using the same example by #Jotne, the result will be:
gavbadf
gtest\&rr
gmore\ \[\ yes
gthis\ \]
gand\'df
[How it works]
The regex part in the sed s command above just defines a character
class of & []', which should be escaped with a backslash.
The right square bracket ] does not need escaping when put
immediately after the left square bracket [.
The obfuscating part will be the handling of a single quote.
We cannot put a single quote within single quotes even if we escape it.
The workaround is as follows: Say we have an assignment str='aaabbb'.
To put a single quote between "aaa" and "bbb", we can say as
str='aaa'\''bbb'.
It may look puzzling but it just concatenates the three sequences;
1) to close the single-quoted string as 'aaa'.
2) to put a single quote with an escaping backslash as \'.
3) to restart the single-quoted string as 'bbb'.
Hope this helps.

how to remove first two words of a strings output

I want to remove the first two words that come up in my output string. this string is also within another string.
What I have:
for servers in `ls /data/field`
do
string=`cat /data/field/$servers/time`
This sends this text:
00:00 down server
I would like to remove "00:00 down" so that it only displays "server".
I have tried using cut -d ' ' -f2- $string which ends up just removing directories that the command searches.
Any ideas?
Please, do the things properly :
for servers in /data/field/*; do
string=$(cut -d" " -f3- /data/field/$servers/time)
echo "$string"
done
backticks are deprecated in 2014 in favor of the form $( )
don't parse ls output, use glob instead like I do with data/field/*
Check http://mywiki.wooledge.org/BashFAQ for various subjects
Use -d option to set the delimtier to space
$ echo 00:00 down server | cut -d" " -f3-
server
Note Use the field number 3 as the count starts from 1 and not 0
From man page
-d, --delimiter=DELIM
use DELIM instead of TAB for field delimiter
N- from N'th byte, character or field, to end of line
More Tests
$ echo 00:00 down server hello world| cut -d" " -f3-
server hello world
The for loop is capable of iterating through the files using globbing. So I would write something like
for servers in /data/field*
do
string=`cut -d" " -f3- /data/field/$servers/time`
...
...
You can use sed as well:
sed 's/^.* * //'
For the examples given, I prefer cut. But for the general problem expressed by the question, the answers above have minor short-comings. For instance, when you don't know how many spaces are between the words (cut), or whether they start with a space or not (cut,sed), or cannot be easily used in a pipeline (shell for-loop). Here's a perl example that is fast, efficient, and not too hard to remember:
| perl -pe 's/^\s*(\S+\s+){2}//'
Perl's -p operates like sed's. That is, it gobbles input one line at a time, like -n, and after dong work, prints the line again. The -e starts the command-line-based script. The script is simply a one-line substitute s/// expression; substitute matching regular expressions on the left hand side with the string on the right-hand side. In this case, the right-hand side is empty, so we're just cutting out the expression found on the left-hand side.
The regular expression, particular to Perl (and all PLRE derivatives, like those in Python and Ruby and Javascript), uses \s to match whitespace, and \S to match non-whitespace. So the combination of \S+\s+ matches a word followed by its whitespace. We group that sub-expression together with (...) and then tell sed to match exactly 2 of those in a row with the {m,n} expression, where n is optional and m is 2. The leading \s* means trim leading whitespace.

find words in two quotes unix

I would like to display the last word in these lines I tried to look for example the word value but no answer, so I thought to look for the words between quotes but my file contains other words between quotes that I have I need not actually want to display the values ​​of the select tag knowing that my html file is.
grep '*' hosts.html | awk '{print $NF}'
For example:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
I would have
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
You need to set the field separator to > you do this with the -F option:
$ awk -F'>' '{print $NF}' hosts.html
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
Note: I'm not sure what you are trying to achieve by grep '*' hosts.html?
Interpreting the comment liberally, you have input lines which might contain:
value='www.visit-tunisia.com'>www.visit-tunisia.com
value='www.watania1.tn'>www.watania1.tn
value='www.watania2.tn'>www.watania2.tn
and you would like the names which are repeated on a line as the output:
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
This can be done using sed and capturing parentheses.
sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p"
The -n says "don't print unless I say to do so". The s///p command prints if the substitute works. The pattern looks for a stream of 'anything' (.*), a single quote, captures what's inside up to the next single quote ('\([^']*\)') followed by any text, the captured text (the first \1), and anything. The replacement text is what was captured (the second \1).
Example:
$ cat data
www and wotnot
value='www.visit-tunisia.com'>www.visit-tunisia.com
blah
value='www.watania1.tn'>www.watania1.tn
hooplah
value='www.watania2.tn'>www.watania2.tn
if 'nothing' is required, nothing will be done.
$ sed -n -e "s/.*'\([^']*\)'.*\1.*/\1/p" data
www.visit-tunisia.com
www.watania1.tn
www.watania2.tn
nothing
$
Clearly, you can refine the [^']* part of the match if you want to. I used double quotes around the expression since the pattern matches on single quotes. Life is trickier if you need to allow both single and double quotes; at that point, I'd put the script into a file and run sed -f script data to make life easier.
sed 's/.*>\(.*\)/\1/g' your_file

Removing a portion of a string that has forward slashes in it

I'm stumped with how to remove a portion of a string that has forward slashes and question marks in it.
Example: /diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN
and I need the output to be RXMWANT8WFYJNF7K6DXXXJLJVN
I've tried tr and sed but tr removes some of the characters I need in the output. sed is giving me trouble because of the forward slashes.
What's a quick method to remove the /diag/PeerManager/list?deviceid= portion of my string?
thanks!
echo "/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN" | sed -n 's:/[a-zA-Z]/[a-zA-Z]/[a-zA-Z]?[a-zA-Z]=::p'
This should do the trick. I chose the colon as the delimiter as it will not cause any issues with the forward slash. This makes a lot of assumptions about the type of input it will be receiving, specifically that it will only contain three backslashes with lower and uppercase letters between them, a series of letters ending in a question mark, another series of letters ending in an equals sign. This then removes those items and prints the remaining characters (your device id).
This worked for me:
sed 's/.*deviceid=\([^&]*\).*/\1/'
Example:
$ echo '/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN' | sed 's/.*deviceid=\([^&]*\).*/\1/'
RXMWANT8WFYJNF7K6DXXXJLJVN
This is not the most robust solution, but if you have a fixed set of input that will never change, it's probably good enough.
One way using awk, if there is only a single occurrence of an = on each line:
awk -F= '{ print $2 }' file.txt
Results:
RXMWANT8WFYJNF7K6DXXXJLJVN
Use Equals Sign as Field Delimiter
If you know that your GET query string will always have only one parameter (in this case, deviceid) then you can just use the equals sign as a field delimiter with the standard cut utility. For example:
$ echo '/diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN' |
cut -d= -f2-
RXMWANT8WFYJNF7K6DXXXJLJVN
How about:
$ echo /diag/PeerManager/list?deviceid=RXMWANT8WFYJNF7K6DXXXJLJVN | sed 's/^.*=//'
RXMWANT8WFYJNF7K6DXXXJLJVN

Resources