Linux command to view raw file, including its delimiters - linux

I'm trying to view the content of a file including its delimiters in terminal.
For example:
hello\t\tworld\n
hello\t\tworld\t\tagain\n
instead of just:
hello world
hello world again
I did this once awhile ago using either "sed" or "awk"....I think...but, I can't seem to remember any of it now.
Thanks for the help.

VI can show you this if open the file in it and type :set list.
e.g.
$ cat test.txt
hello world
hello world again
In VI the ^I are tabs and $ are Line Feeds.
Also like the comment states - cat -A will get you the same output:
$ cat -A test.txt
hello^I^Iworld$
hello^I^Iworld^I^Iagain$

you can use od command,
od -t a input_file | awk '{$1=""}1' |
awk 'BEGIN{RS="[ \t\n]+";ORS="";
d["sp"]=" "; d["nl"]="\\n\n"; d["ht"]="\\t"; d["cr"] = "\\r";
}length($0)>1{$0=d[$0]}1'
with input_file
hello world
hello world again
you get,
hello\t\tworld\n
hello\t\tworld again\n

This might work for you (GNU sed):
sed -n l0 file
This will show any tabs and newlines will be replaced by $.
If you wish to see newlines as \n then slurp the whole file:
sed -n '1h;1!H;$!d;x;l0' file

This sed should probably work, you can just modify it to match and display whatever it is you want to see (e.g. whitespace). The tricky part is matching newlines since sed utilizes newlines to delimit the stream, it is already consumed prior to you getting to it. So matching along the lines of sed 's/\n/\\n/' will fail, you can presume the newline by matching the end character and tack on the \\n.
sed 's/$/\\n/;s/\t/\\t/g;s/ \{4\}/\\t/g' file

Related

Separate a text file with sed

I have the following sample file:
evtlog.161202.002609.debugevtlog.161201.162408.debugevtlog.161202.011046.debugevtlog.161202.002809.debugevtlog.161201.160035.debugevtlog.161201.155140.debugevtlog.161201.232156.debugevtlog.161201.145017.debugevtlog.161201.154816.debug
I want to separate the string and add a newline after matching "debug" like this:
evtlog.161202.002609.debug
evtlog.161201.162408.debug
So far I tried almost everything with sed, but it doesn't seem to do what I want.
sed 's/debug/{G}' latest_evtlogs.out
sed '/debug/i "SAD"' latest_evtlogs.out
etc...
sed 's/debug/\n/g' latest_evtlogs.out doesn't work when I add it as a pipe in the script , but it does when I run it manually.
Here's how I generate the file:
printf $(ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/\n/g') >> latest_evtlogs.out
Initially I wanted to just add newline with awk, but it doesn't work either.
Any ideas why I can't separate the string with a newline ?
I'm using :
Distributor ID: Debian
Description: Debian GNU/Linux 5.0.10 (lenny)
Release: 5.0.10
Codename: lenny
Just add a new line after debug:
sed 's/debug/&\n/g' file
Note & prints back the matched text, so it is a way to print "debug" back.
This returns:
evtlog.161202.002609.debug
evtlog.161201.162408.debug
evtlog.161202.011046.debug
evtlog.161202.002809.debug
evtlog.161201.160035.debug
evtlog.161201.155140.debug
evtlog.161201.232156.debug
evtlog.161201.145017.debug
evtlog.161201.154816.debug
The problem is, that you are using the output of sed in a command expansion. In this context your shell will replace all newlines with spaces. The spaces are then used to do the word splitting, so that printf sees each line as a separate argument, interpreting the first line as the format argument and ignoring the rest as there are printf-placeholders in the format.
It should work if you drop the outer printf $() from your command and just redirect the output from your pipeline to your file:
ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/\n/g' >> latest_evtlogs.out
Maybe Perl is "happier" than sed on your system:
perl -pe 's/debug/&\n/g' < YourLogFile
Get will append what is in the hold buffer unto the pattern space (Usually just the current line read from the input file) So this cannot be used.
insert will print the specified text to standard output. So this cannot be used.
What you you want to to replace all debug with debug^J, where ^J is a newline, dependent on the sed version, you can either do:
sed 's/debug/&\n/g' input_file
But \n is - afaik - not strictly specified in POSIX sed. One can however use c strings:
sed 's/debug/&'$'\n''/g' input_file
Or a multi line string:
sed 's/debug/&\
/g' input_file
Thank you all for the answers.I finally did it like this :
echo $(ls -l $EVTLOG_PATH/evtlog|tail -n 10|awk '{printf $8 , "%s\n\n"}'|sed 's/debug/&\n/g') > temp.out
sed 's/ /\n/g' /share/sqa/dumps/5314577631/checks/temp.out > latest_evtlogs.out
It's not at all elegant, but it finally works.

remove \n and keep space in linux

I have a file contained \n hidden behind each line:
input:
s3741206\n
s2561284\n
s4411364\n
s2516482\n
s2071534\n
s2074633\n
s7856856\n
s11957134\n
s682333\n
s9378200\n
s1862626\n
I want to remove \n behind
desired output:
s3741206
s2561284
s4411364
s2516482
s2071534
s2074633
s7856856
s11957134
s682333
s9378200
s1862626
however, I try this:
tr -d '\n' < file1 > file2
but it goes like below without space and new line
s3741206s2561284s4411364s2516482s2071534s2074633s7856856s11957134s682333s9378200s1862626
I also try sed $'s/\n//g' -i file1 and it doesn't work in mac os.
Thank you.
This is a possible solution using sed:
sed 's/\\n/ /g'
with awk
awk '{sub(/\\n/,"")} 1' < file1 > file2
What you are describing so far in your question+comments doesn't make sense. How can you have a multi-line file with a hidden newline character at the end of each line? What you show as your input file:
s3741206\n
s2561284\n
s4411364\n
etc.
where each "\n" above according to your comment is a single newline character "\n" is impossible. If those "\n"s were newline characters then your file would simply look like:
s3741206
s2561284
s4411364
etc.
There's really only 2 possibilities I can think of:
You are wrongly interpreting what you are seeing in your input file
and/or using the wrong terminology and you actually DO have \r\n
at the end of every line. Run cat -v file to see the \rs as
^Ms and run dos2unix or similar (e.g. sed 's/\r$//' file) to
remove the \rs - you do not want to remove the \ns or you will
no longer have a POSIX text file and so POSIX tools will exhibit
undefined behavior when run on it. If that doesn't work for you then
copy/paste the output of cat -v file into your question so we can
see for sure what is in your file.
Or:
It's also entirely possible that your file is a perfectly fine POSIX
text file as-is and you are incorrectly assuming you will have a
problem for some reason so also include in your question a
description of the actual problem you are having, include an example
of the command you are executing on that input file and the output
you are getting and the output you expected to get.
You could use bash-native string substitution
$ cat /tmp/newline
s3741206\n
s2561284\n
s4411364\n
s2516482\n
s2071534\n
s2074633\n
s7856856\n
s11957134\n
s682333\n
s9378200\n
s1862626\n
$ for LINE in $(cat /tmp/newline); do echo "${LINE%\\n}"; done
s3741206
s2561284
s4411364
s2516482
s2071534
s2074633
s7856856
s11957134
s682333
s9378200
s1862626

Filter out only matched values from a text file in each line

I have a file "test.txt" with the lines below and also lot bunch of extra stuff after the "version"
soainfra_metrics{metric_group="sca_composite",partition="test",is_active="true",state="on",is_default="true",composite="test123"} map:stats version:1.0
soainfra_metrics{metric_group="sca_composite",partition="gello",is_active="true",state="on",is_default="true",composite="test234"} map:stats version:1.8
soainfra_metrics{metric_group="sca_composite",partition="bolo",is_active="true",state="on",is_default="true",composite="3415"} map:stats version:3.1
soainfra_metrics{metric_group="sca_composite",partition="solo",is_active="true",state="on",is_default="true",composite="hji"} map:stats version:1.1
I tried:
egrep -r 'partition|is_active|state|is_default|composite' test.txt
It's displaying every line, but I need only specific mentioned fields like this below,ignoring rest of the data/stuff or lines
in a nut shell, i want to display only these fields from a line not the rest
partition="test",is_active="true",state="on",is_default="true",composite="test123"
partition="gello",is_active="true",state="on",is_default="true",composite="test234"
partition="bolo",is_active="true",state="on",is_default="true",composite="3415"
partition="solo",is_active="true",state="on",is_default="true",composite="hji"
If your version of grep supports Perl-style regular expressions, then I'd use this:
grep -oP '.*?,\K[^}]+' file
It removes everything up to the first comma (\K kills any previous output) and prints everything up to the }.
Alternatively, using awk:
awk -F'}' '{ sub(/[^,]+,/, ""); print $1 }' file
This sets the field separator to } so the part you're interested in is the first field. It then uses sub to remove the part up to the first comma.
For completeness, you could also use sed:
sed 's/[^,]*,\([^}]*\).*/\1/' file
This captures the part after the first , up to the } and replaces the content of the line with it.
After the grep to pick out the lines you want, use sed to edit the lines:
sed 's/.*\(partition[^}]*\)} map.*/\1/'
This means: "whenever you see anything .*, followed by partition and
any number of non-}, then } map and anything else, grab the part
from partition up to but not including the brace \(...\) as group 1.
The replacement text is just group 1 \1.
Use a pipe | to connect the output of egrep to the input of sed:
egrep ... | sed ...
As far as i understood your file might have more lines you don't want to see, so i would use:
sed -n 's/.*\(partition.*\)}.*/\1/p' file
we use -n p to show only lines where we made substitution. The substitution part just gets the part of the line you need substituting the whole line with the pattern.
This might work for you (GNU sed):
sed -r 's/(partition|is_active|state|is_default|composite)="[^"]*"/\n&\n/g;s/[^\n]*\n([^\n]*)\n[^\n]*/\1,/g;s/,$//' file
Treat the problem as if it were a "decomposed club sandwich". Identify the fillings, remove the bread and tidy up.

How can I remove the last character of a file in unix?

Say I have some arbitrary multi-line text file:
sometext
moretext
lastline
How can I remove only the last character (the e, not the newline or null) of the file without making the text file invalid?
A simpler approach (outputs to stdout, doesn't update the input file):
sed '$ s/.$//' somefile
$ is a Sed address that matches the last input line only, thus causing the following function call (s/.$//) to be executed on the last line only.
s/.$// replaces the last character on the (in this case last) line with an empty string; i.e., effectively removes the last char. (before the newline) on the line.
. matches any character on the line, and following it with $ anchors the match to the end of the line; note how the use of $ in this regular expression is conceptually related, but technically distinct from the previous use of $ as a Sed address.
Example with stdin input (assumes Bash, Ksh, or Zsh):
$ sed '$ s/.$//' <<< $'line one\nline two'
line one
line tw
To update the input file too (do not use if the input file is a symlink):
sed -i '$ s/.$//' somefile
Note:
On macOS, you'd have to use -i '' instead of just -i; for an overview of the pitfalls associated with -i, see the bottom half of this answer.
If you need to process very large input files and/or performance / disk usage are a concern and you're using GNU utilities (Linux), see ImHere's helpful answer.
truncate
truncate -s-1 file
Removes one (-1) character from the end of the same file. Exactly as a >> will append to the same file.
The problem with this approach is that it doesn't retain a trailing newline if it existed.
The solution is:
if [ -n "$(tail -c1 file)" ] # if the file has not a trailing new line.
then
truncate -s-1 file # remove one char as the question request.
else
truncate -s-2 file # remove the last two characters
echo "" >> file # add the trailing new line back
fi
This works because tail takes the last byte (not char).
It takes almost no time even with big files.
Why not sed
The problem with a sed solution like sed '$ s/.$//' file is that it reads the whole file first (taking a long time with large files), then you need a temporary file (of the same size as the original):
sed '$ s/.$//' file > tempfile
rm file; mv tempfile file
And then move the tempfile to replace the file.
Here's another using ex, which I find not as cryptic as the sed solution:
printf '%s\n' '$' 's/.$//' wq | ex somefile
The $ goes to the last line, the s deletes the last character, and wq is the well known (to vi users) write+quit.
After a whole bunch of playing around with different strategies (and avoiding sed -i or perl), the best way i found to do this was with:
sed '$! { P; D; }; s/.$//' somefile
If the goal is to remove the last character in the last line, this awk should do:
awk '{a[NR]=$0} END {for (i=1;i<NR;i++) print a[i];sub(/.$/,"",a[NR]);print a[NR]}' file
sometext
moretext
lastlin
It store all data into an array, then print it out and change last line.
Just a remark: sed will temporarily remove the file.
So if you are tailing the file, you'll get a "No such file or directory" warning until you reissue the tail command.
EDITED ANSWER
I created a script and put your text inside on my Desktop. this test file is saved as "old_file.txt"
sometext
moretext
lastline
Afterwards I wrote a small script to take the old file and eliminate the last character in the last line
#!/bin/bash
no_of_new_line_characters=`wc '/root/Desktop/old_file.txt'|cut -d ' ' -f2`
let "no_of_lines=no_of_new_line_characters+1"
sed -n 1,"$no_of_new_line_characters"p '/root/Desktop/old_file.txt' > '/root/Desktop/my_new_file'
sed -n "$no_of_lines","$no_of_lines"p '/root/Desktop/old_file.txt'|sed 's/.$//g' >> '/root/Desktop/my_new_file'
opening the new_file I created, showed the output as follows:
sometext
moretext
lastlin
I apologize for my previous answer (wasn't reading carefully)
sed 's/.$//' filename | tee newFilename
This should do your job.
A couple perl solutions, for comparison/reference:
(echo 1a; echo 2b) | perl -e '$_=join("",<>); s/.$//; print'
(echo 1a; echo 2b) | perl -e 'while(<>){ if(eof) {s/.$//}; print }'
I find the first read-whole-file-into-memory approach can be generally quite useful (less so for this particular problem). You can now do regex's which span multiple lines, for example to combine every 3 lines of a certain format into 1 summary line.
For this problem, truncate would be faster and the sed version is shorter to type. Note that truncate requires a file to operate on, not a stream. Normally I find sed to lack the power of perl and I much prefer the extended-regex / perl-regex syntax. But this problem has a nice sed solution.

Replace whitespace with a comma in a text file in Linux

I need to edit a few text files (an output from sar) and convert them into CSV files.
I need to change every whitespace (maybe it's a tab between the numbers in the output) using sed or awk functions (an easy shell script in Linux).
Can anyone help me? Every command I used didn't change the file at all; I tried gsub.
tr ' ' ',' <input >output
Substitutes each space with a comma, if you need you can make a pass with the -s flag (squeeze repeats), that replaces each input sequence of a repeated character that is listed in SET1 (the blank space) with a single occurrence of that character.
Use of squeeze repeats used to after substitute tabs:
tr -s '\t' <input | tr '\t' ',' >output
Try something like:
sed 's/[:space:]+/,/g' orig.txt > modified.txt
The character class [:space:] will match all whitespace (spaces, tabs, etc.). If you just want to replace a single character, eg. just space, use that only.
EDIT: Actually [:space:] includes carriage return, so this may not do what you want. The following will replace tabs and spaces.
sed 's/[:blank:]+/,/g' orig.txt > modified.txt
as will
sed 's/[\t ]+/,/g' orig.txt > modified.txt
In all of this, you need to be careful that the items in your file that are separated by whitespace don't contain their own whitespace that you want to keep, eg. two words.
without looking at your input file, only a guess
awk '{$1=$1}1' OFS=","
redirect to another file and rename as needed
What about something like this :
cat texte.txt | sed -e 's/\s/,/g' > texte-new.txt
(Yes, with some useless catting and piping ; could also use < to read from the file directly, I suppose -- used cat first to output the content of the file, and only after, I added sed to my command-line)
EDIT : as #ghostdog74 pointed out in a comment, there's definitly no need for thet cat/pipe ; you can give the name of the file to sed :
sed -e 's/\s/,/g' texte.txt > texte-new.txt
If "texte.txt" is this way :
$ cat texte.txt
this is a text
in which I want to replace
spaces by commas
You'll get a "texte-new.txt" that'll look like this :
$ cat texte-new.txt
this,is,a,text
in,which,I,want,to,replace
spaces,by,commas
I wouldn't go just replacing the old file by the new one (could be done with sed -i, if I remember correctly ; and as #ghostdog74 said, this one would accept creating the backup on the fly) : keeping might be wise, as a security measure (even if it means having to rename it to something like "texte-backup.txt")
This command should work:
sed "s/\s/,/g" < infile.txt > outfile.txt
Note that you have to redirect the output to a new file. The input file is not changed in place.
sed can do this:
sed 's/[\t ]/,/g' input.file
That will send to the console,
sed -i 's/[\t ]/,/g' input.file
will edit the file in-place
Here's a Perl script which will edit the files in-place:
perl -i.bak -lpe 's/\s+/,/g' files*
Consecutive whitespace is converted to a single comma.
Each input file is moved to .bak
These command-line options are used:
-i.bak edit in-place and make .bak copies
-p loop around every line of the input file, automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-e execute the perl code
If you want to replace an arbitrary sequence of blank characters (tab, space) with one comma, use the following:
sed 's/[\t ]+/,/g' input_file > output_file
or
sed -r 's/[[:blank:]]+/,/g' input_file > output_file
If some of your input lines include leading space characters which are redundant and don't need to be converted to commas, then first you need to get rid of them, and then convert the remaining blank characters to commas. For such case, use the following:
sed 's/ +//' input_file | sed 's/[\t ]+/,/g' > output_file
This worked for me.
sed -e 's/\s\+/,/g' input.txt >> output.csv

Resources