Using cut with unprintable delimiters

Using cut with unprintable delimiters - linux

Is it possible to use cut and have unprintable characters be the delimiter? For example I'd like to have the "^A" characters (also represented as \001) be the delimiter.

If you're using Bash,
cut -d $'\001' ...
works (see Bash Reference Manual # 3.1.2.4 ANSI-C Quoting).
Other (more portable) options,
cut -d `echo -e '\001'` ...
FS=`echo -e '\001'`
cut -d $FS ...
or inserting the control character directly using ^V as mentioned by Alnitak and etlerant -- on the shell command line, and in editors such as vi, this means "don't treat the next thing I type specially".

Yes, it's perfectly possible.
If typing in a shell, press ^V and then ^A to insert the ^A verbatim in the current line rather than have it treated as the normal 'go to start of line' command:
% cat -v foo
abc^Adef^Aghi
% cut -d^A -f2 foo
def

If for example you unprintable delimiter is tab which is equivalent of \t and you want to find the second to the end item of each line separated by tab you can use this:
cut -d $'\t' -f2- tablimited.csv

CTRL-V CTRL-A ?

Related

A good way to use sed to find and replace characters with 2 delimiters

I trying to find and replace items using bash. I was able to use sed to grab out some of the characters, but I think I might be using it in the wrong matter.
I am basically trying to remove the characters after ";" and before "," including removing ","
sed -e 's/\(;\).*\(,\)/\1\2/'
That is what I used to replace it with nothing. However, it ends up replacing everything in the middle so my output came out like this:
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;,reboot -f"
This is the original text of what I need to replace
cmd2="BMC,./socflash_x64 if=B600G3_BMC_V0207.ima;X,sleep 120;after_BMC,./run-after-bmc-update.sh;hba_fw,./hba_fw.sh;X,sleep 5;DB,2;X,reboot -f"
Is there any way to make it look like this output?
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;reboot -f
Ff there is any way to make this happen other than bash I am fine with any type of language.

Non-greedy search can (mostly) be simulated in programs that don't support it by replacing match-any (dot .) with a negated character class.
Your original command is
sed -e 's/\(;\).*\(,\)/\1\2/'
You want to match everything in between the semi-colon and the comma, but not another comma (non-greedy). Replace .* with [^,]*
sed -e 's/\(;\)[^,]*\(,\)/\1\2/'
You may also want to exclude semi-colons themselves, making the expression
sed -e 's/\(;\)[^,;]*\(,\)/\1\2/'
Note this would treat a string like "asdf;zxcv;1234,qwer" differently, since one would match ;zxcv;1234, and the other would match only ;1234,

In perl:
perl -pe 's/;.*?,/;/g;' -pe 's/^[^,]*,//' foo.txt
will output:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f
The .*? is non greedy matching before the comma. The second command is to remove from the beginning to the comma.

Something like:
echo $cmd2 | tr ';' '\n' | cut -d',' -f2- | tr '\n' ';' ; echo
result is:
./socflash_x64 if=B600G3_BMC_V0207.ima;sleep 120;./run-after-bmc-update.sh;./hba_fw.sh;sleep 5;2;reboot -f;
however, I thing your requirements are a few more complex, because 'DB,2' seems a particular case. After "tr" command, insert a "grep" or "grep -v" to include/exclude these cases.

Shell Script get text between 2 special characters

I have read a few things out there but can't seem to work out this particular problem. I am writing a shell script. I am reading a file to a variable using
LOCAL_CONFIG=`cat local-config.php`
Which has lines like this
define( 'DB_USER', 'abcxyz' );
define( 'DB_PASSWORD', 'qwerty' );
How can I get the abcxyz and the qwerty parts of this??
Thanks in advance

Using awk
$ awk -F"'" '/^define\(/ {print $4}' local-config.php
abcxyz
qwerty
Explanation:
-F"'"
This defines the field separator as the single quote.
/^define\(/
This selects the lines that start with define(
print $4
For those selected lines, this prints the fourth field.
Using sed
$ sed -rn "/^define\(/ {s/([^']*'){3}//; s/'.*//; p;}" local-config.php
abcxyz
qwerty
-rn
This turns on extended regex syntax and turns off automatic printing.
/^define\(/
This selects the lines that start with define(
{
This starts a group. Commands in this group are executed only for the selected lines.
s/([^']*'){3}//
This removes all text up through and including the third quote.
s/'.*//
This removes all text after the next remaining quote.
p
This prints the line.
}
This ends the group.

Use grep along with -P parameter to enable perl-regexp mode.
$ grep -oP "\bdefine\( *'[^']*' *, *'\K[^']*(?=' *\);)" file
abcxyz
qwerty
\K discards the previously matched characters from printing at the final.

"cut" command will do in a more simpler way...
Command:
cat local-config.php | cut -d "'" -f4
output:
abcxyz
qwerty
Explanation:
Using cut with ' as delimiter we need to take the fourth part(f4) in the lines.

How to remove OCTAL character using Linux?

I have a large file that I need to edit in Linux.
the file has data fields enclosed by double quotes ( "" ). But when I open the file using notepad++ I see SOH character between the double quotes (ie. "filed1"SOH"field2"SOHSOH"field3"SOH"field4")
And when I open the same file in vim I see the double quotes followed by ^A character. (ie. "filed1"^A"field2"^A^A"field3"^A"field4")
Then when I execute this command in the command line
cat filename.txt | od -c | more
I see that the character is shown as 001 (ie. "filed1"001"field2"001001"field3"001"field4")
I have tried the following via vim
:s%/\\001//g
I also tried this command
sed -e s/\001//g filename.text > filename_new.txt
sed -e s/\\001//g filename.text > filename_new.txt
I need to remove those characters from that file.
How can I do that?

Your attempts at escaping the SOH character with \001 were close.
GNU sed has an extension to specify a decimal value with \d001 (there are also octal and hexadecimal variants):
$ sed -i -e 's/\d001//g' file.txt
In Vim, the regular expression atom looks slightly different: \%d001; alternatively, you can directly enter the character in the :%s command-line via Ctrl + V followed by 001; cp. :help i_CTRL-V_digit.

Use echo -e to get a literal \001 character into your sed command:
$ sed -i -e $(echo -e 's/\001//g') file.txt
(-i is a GNU sed extension to request in-place editing.)

just keep it simple with awk instead of having to fuss with quotation formatting issues :
mawk NF=NF FS='\1' OFS=
"filed1""field2""field3""field4"

What is the proper way to insert tab in sed?

What is the proper way to insert tab in sed? I'm inserting a header line into a stream using sed. I could probably do a replacement of some character afterward to put in tab using regular expression, but is there a better way to do it?
For example, let's say I have:
some_command | sed '1itextTABtext'
I would like the first line to look like this (text is separated by a tab character):
text text
I have tried substituting TAB in the command above with "\t", "\x09", " " (tab itself). I have tried it with and without double quotes and I can't get sed to insert tab in between the text.
I am trying to do this in SLES 9.

Assuming bash (and maybe other shells will work too):
some_command | sed $'1itext\ttext'
Bash will process escapes, such as \t, inside $' ' before passing it as an arg to sed.

You can simply use the sed i command correctly:
some_command | sed '1i\
text text2'
where, as I hope it is obvious, there is a tab between 'text' and 'text2'. On MacOS X (10.7.2), and therefore probably on other BSD-based platforms, I was able to use:
some_command | sed '1i\
text\ttext2'
and sed translated the \t into a tab.
If sed won't interpret \t and inserting tabs at the command line is a problem, create a shell script with an editor and run that script.

As most answers say, probably literal tab char is the best.
info sed saying "\t is not portable." :
...
'\CHAR'
Matches CHAR, where CHAR is one of '$', '*', '.', '[', '\', or '^'.
Note that the only C-like backslash sequences that you can
portably assume to be interpreted are '\n' and '\\'; in particular
'\t' is not portable, and matches a 't' under most implementations
of 'sed', rather than a tab character.
...

Sed can do this, but it's awkward:
% printf "1\t2\n3\t4\n" | sed '1i\\
foo bar\\
'
foo bar
1 2
3 4
$
(The double backslashes are because I'm using tcsh as my shell; if you use bash, use single backslashes)
The space between foo and bar is a tab, which I typed by prepending it with CtrlV. You'll also need to prepend the newlines inside your single quotes with a CtrlV.
It would probably be simpler/clearer to do this with awk:
$ printf "1\t2\n3\t4\n" | awk 'BEGIN{printf("foo\tbar\n");} {print;}'

escape the tab character:
sed -i '/<setup>/ a \\tmy newly added line' <file_name>
NOTE: above we have two backslashes (\) first one is for escaping () and the next one is actual tab char (\t)

To illustrate the fact the BRE syntax for sed does mention that \t is not portable, Git 2.13 (Q2 2017) gets rid of it.
See commit fba275d (01 Apr 2017) by Junio C Hamano (gitster).
(Merged by Junio C Hamano -- gitster -- in commit 3c833ca, 17 Apr 2017)
contrib/git-resurrect.sh: do not write \t for HT in sed scripts
Just like we did in 0d1d6e5 ("t/t7003: replace \t with literal tab
in sed expression", 2010-08-12, Git 1.7.2.2), avoid writing "\t" for HT in sed scripts, which is not portable.
- sed -ne 's~^\([^ ]*\) .*\tcheckout: moving from '"$1"' .*~\1~p'
+ sed -ne 's~^\([^ ]*\) .* checkout: moving from '"$1"' .*~\1~p'
^^^^
|
(literal tab)

I found an alternate way to insert a tab by using substitution.
some_command | sed '1s/^/text\ttext\n/'
I still do not know of a way to do it using the insert method.

This command replace old to new in file.txt:
sed -i '' 's/old/new/' file.txt
This command will add a tab for new:
sed -i '' $'s/old/\tnew/' file.txt
This command replaces an entire string:
sed -i '' 's/.*old.*/new/' file.txt

How do I remove newlines from a text file?

I have the following data, and I need to put it all into one line.
I have this:
22791
;
14336
;
22821
;
34653
;
21491
;
25522
;
33238
;
I need this:
22791;14336;22821;34653;21491;25522;33238;
EDIT
None of these commands is working perfectly.
Most of them let the data look like this:
22791
;14336
;22821
;34653
;21491
;25522

tr --delete '\n' < yourfile.txt
tr -d '\n' < yourfile.txt
Edit:
If none of the commands posted here are working, then you have something other than a newline separating your fields. Possibly you have DOS/Windows line endings in the file (although I would expect the Perl solutions to work even in that case)?
Try:
tr -d "\n\r" < yourfile.txt
If that doesn't work then you're going to have to inspect your file more closely (e.g. in a hex editor) to find out what characters are actually in there that you want to remove.

tr -d '\n' < file.txt
Or
awk '{ printf "%s", $0 }' file.txt
Or
sed ':a;N;$!ba;s/\n//g' file.txt
This page here has a bunch of other methods to remove newlines.
edited to remove feline abuse :)

perl -p -i -e 's/\R//g;' filename
Must do the job.

paste -sd "" file.txt

Expanding on a previous answer, this removes all new lines and saves the result to a new file (thanks to #tripleee):
tr -d '\n' < yourfile.txt > yourfile2.txt
Which is better than a "useless cat" (see comments):
cat file.txt | tr -d '\n' > file2.txt
Also useful for getting rid of new lines at the end of the file, e.g. created by using echo blah > file.txt.
Note that the destination filename is different, important, otherwise you'll wipe out the original content!

You can edit the file in vim:
$ vim inputfile
:%s/\n//g

use
head -n 1 filename | od -c
to figure WHAT is the offending character.
then use
tr -d '\n' <filename
for LF
tr -d '\r\n' <filename
for CRLF

Use sed with POSIX classes
This will remove all lines containing only whitespace (spaces & tabs)
sed '/^[[:space:]]*$/d'
Just take whatever you are working with and pipe it to that
Example
cat filename | sed '/^[[:space:]]*$/d'

Using man 1 ed:
# cf. http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
ed -s file <<< $'1,$j\n,p' # print to stdout
ed -s file <<< $'1,$j\nwq' # in-place edit

xargs consumes newlines as well (but adds a final trailing newline):
xargs < file.txt | tr -d ' '

Nerd fact: use ASCII instead.
tr -d '\012' < filename.extension
(Edited cause i didn't see the friggin' answer that had same solution, only difference was that mine had ASCII)

Using the gedit text editor (3.18.3)
Click Search
Click Find and Replace...
Enter \n\s into Find field
Leave Replace with blank (nothing)
Check Regular expression box
Click the Find button
Note: this doesn't exactly address the OP's original, 7 year old problem but should help some noob linux users (like me) who find their way here from the SE's with similar "how do I get my text all on one line" questions.

Was having the same case today, super easy in vim or nvim, you can use gJ to join lines. For your use case, just do
99gJ
this will join all your 99 lines. You can adjust the number 99 as need according to how many lines to join. If just join 1 line, then only gJ is good enough.

$ perl -0777 -pe 's/\n+//g' input >output
$ perl -0777 -pe 'tr/\n//d' input >output

If the data is in file.txt, then:
echo $(<file.txt) | tr -d ' '
The '$(<file.txt)' reads the file and gives the contents as a series of words which 'echo' then echoes with a space between them. The 'tr' command then deletes any spaces:
22791;14336;22821;34653;21491;25522;33238;

Assuming you only want to keep the digits and the semicolons, the following should do the trick assuming there are no major encoding issues, though it will also remove the very last "newline":
$ tr -cd ";0-9"
You can easily modify the above to include other characters, e.g. if you want to retain decimal points, commas, etc.

I usually get this usecase when I'm copying a code snippet from a file and I want to paste it into a console without adding unnecessary new lines, I ended up doing a bash alias
( i called it oneline if you are curious )
xsel -b -o | tr -d '\n' | tr -s ' ' | xsel -b -i
xsel -b -o reads my clipboard
tr -d '\n' removes new lines
tr -s ' ' removes recurring spaces
xsel -b -i pushes this back to my clipboard
after that I would paste the new contents of the clipboard into oneline in a console or whatever.

I would do it with awk, e.g.
awk '/[0-9]+/ { a = a $0 ";" } END { print a }' file.txt
(a disadvantage is that a is "accumulated" in memory).
EDIT
Forgot about printf! So also
awk '/[0-9]+/ { printf "%s;", $0 }' file.txt
or likely better, what it was already given in the other ans using awk.

You are missing the most obvious and fast answer especially when you need to do this in GUI in order to fix some weird word-wrap.
Open gedit
Then Ctrl + H, then put in the Find textbox \n and in Replace with an empty space then fill checkbox Regular expression and voila.

To also remove the trailing newline at the end of the file
python -c "s=open('filename','r').read();open('filename', 'w').write(s.replace('\n',''))"

fastest way I found:
open vim by doing this in your commandline
vim inputfile
press ":" and input the following command to remove all newlines
:%s/\n//g
Input this to also remove spaces incase some characters were spaces :%s/ //g
make sure to save by writing to the file with
:w
The same format can be used to remove any other characters, you can use a website like this
https://apps.timwhitlock.info/unicode/inspect
to figure out what character you're missing
You can also use this to figure out other characters you can't see and they have a tool as well
Tool to learn of other invisible characters

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Using cut with unprintable delimiters - linux

Is it possible to use cut and have unprintable characters be the delimiter? For example I'd like to have the "^A" characters (also represented as \001) be the delimiter.

Yes, it's perfectly possible. If typing in a shell, press ^V and then ^A to insert the ^A verbatim in the current line rather than have it treated as the normal 'go to start of line' command: % cat -v foo abc^Adef^Aghi % cut -d^A -f2 foo def

If for example you unprintable delimiter is tab which is equivalent of \t and you want to find the second to the end item of each line separated by tab you can use this: cut -d $'\t' -f2- tablimited.csv

CTRL-V CTRL-A ?

Related

A good way to use sed to find and replace characters with 2 delimiters

Shell Script get text between 2 special characters

How to remove OCTAL character using Linux?

What is the proper way to insert tab in sed?

How do I remove newlines from a text file?

Categories

Resources