Remove colour code from linux files [duplicate] - linux

This question already has answers here:
Can I programmatically "burn in" ANSI control codes to a file using unix utils?
(2 answers)
Closed 6 years ago.
I have an output file from a testing script (which I cannot alter), the output looks great in the terminal thanks to the encoding, which displays the output in nice colours.
However when I vim the file, I get the following:
^[[1m0024^[[0m, ^[[36munknown.10^[[0m --> ^[[32mUNKNOWN^[[0m
I would rather the file contained:
0024, unknown.10 --> UNKNOWN
There are a couple of similar questions on stackover flow, but so far I have not found a solution that works for me.
Any help would be greatly appreciated!
Many thanks!
Additional info:
I don't want to conceal the colour characters, I would like to remove them from the file.
The output goes into an evidence file, and then that file is pushed up to a GIT for the team to review. It is difficult to the GIT UI with those colour codes :(

To remove color control character, you may use the following sed command:
sed 's/\x1b\[[^\x1b]*m//g' file
As indicated in here, the a color code is composed of <Esc>[FormatCodem.
The escape character is \x1b in hexadecimal (sometimes noted as \e or \033).
The command looks for the sequence escape followed by square bracket \x1b\[ until the character m, if found it deletes it.
Everything in between these 2 characters is allowed except the escape character itself [^\x1b]*. This allows to have the shortest regex.

If you can't remove them from the tool producing the output, you could still remove them afterwards with the following sed command :
sed -r 's/\^\[\[[0-9]{1,2}m//g'
Example :
$ echo """^[[1m0024^[[0m, ^[[36munknown.10^[[0m --> ^[[32mUNKNOWN^[[0m""" | sed -r 's/\^\[\[[0-9]{1,2}m//g'
0024, unknown.10 --> UNKNOWN

Related

Using SED to replace capture group with regex pattern

I need some help with a sed command that I thought would help solve an issue I have. I have basically have long text files that look something like this:
>TRINITY_DN112253_co_g1_i2 Len=3873 path=[38000:0-183]
ACTCACGCCCACATAAT
The ACT text blocks continue on, and then there are more blocks of text that follow the same pattern, except the text after the > differs slightly by numbers. I want to replace only this header part (the part followed by the >) to everything up until the very last “_” the sed command I thought seemed logical is the following:
sed -i ‘s/>.*/TRINITY.*_/‘
However, sed is literally changing each header to TRINITY.*_ rather than capturing the block I thought it would. Any help is appreciated!
(Also.. just to make things clear, I thought that my sed command would convert the top header block into this:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
This might help:
sed '/^>/s/[^_]*$//' file
Output:
>TRINITY_DN112253_co_g1_
ACTCACGCCCACATAAT
See: The Stack Overflow Regular Expressions FAQ

How to strip binary characters from a file?

I've got a file that contains lines that look like this in vim:
^[[0;32msalt-2016.3.2-1.el6.noarch^[[0;0m^M
which look like this in more:
salt-2016.3.2-1.el6.noarch
I would like to produce a copy of this file that only contains the displayed characters as more shows them. I tried piping it through dos2unix but it refuses to do anything, complaining that "dos2unix: Binary symbol 0x1B found at line 2".
Probably I could achieve what I want with some sed statements, but I'm wondering whether there is a linux/unix utility that will take output from more or cat and produce a file that contains only the whitespace and text as displayed?
There's something called ansifilter which does exactly this. I tested it out on my file and it works.

Linux: Replace first string in file with contents of other file containing quotes and slashes.

I have spent all day today trying to find a proper solution, but I am not able to. My problem:
I have an XML file with tags containing multiple of the same.
Example:
<TASK INSTANCE />
<WORKFLOWLINK CONDITION=""/>
<WORKFLOWLINK CONDITION=""/>
I want to add the contents of an other XML file before the first <WORKFLOWLINK. The issue I've ran into is that this file is full of double quotes and slashes. I've tried replacing them and escaping them, but to no avail.
My tries mainly culminated on something like:
sed -e "0,/<WORKFLOWLINK/ /<WORKFLOWLINK/{ r ${filename}" -e "}" ${sourcefile}
If this isn't clear enough I'll get the exact data so you can see.
For the fun of sed:
sed -e "0,/<WORKFLOWLINK/{/<WORKFLOWLINK/{r ${sourcefile}" -e"}}"
The trick is to start a new "pattern/command" pair after your first address range condition 0,/<WORKFLOWLINK/.
Two nested patterns/addresses are not understood, there must be a command after the first pattern. Using an additional pair of curlies {} does that for you.
Apart from the brain exercise to do it in sed, #EdMorton is right in recommending to use an XML-processor. Also his request for an MCVE is appropriate. I had to do some guessing to see what you want and I hope I guessed right.
The mcve should at least have included
the error message or problem description defining your problem
the initialisation of your environment variables
some sample input; not the original data
You surely would have had an answer earlier and (in case mine does not satisfy you) probably a better one by now.
So, before your next question, please take the https://stackoverflow.com/tour
GNU sed version 4.2.1
GNU bash, version 3.1.17(1)-release (i686-pc-msys)
Everyone,
Thank you for thinking with me, even if I apparently broke some rules.
I have figured out a solution, granted it is not as pretty as can be, but for a one time action it is good enough.
I have moved from a single command to a combination of first detecting the location I want to put my data:
sed -e "0,/<WORKFLOWLINK/ s/<WORKFLOWLINK/##MARKER##\n\t<WORKFLOWLINK'" which will put the marker string in the desired location.
After this I replace the marker with the contents of the file I have. I managed to make the individual statements working when I was trying to do it all in a single statement before, so I just execute them separately.
sed -e "/##MARKER##/{r ${sourcefile}" -e 'd}'

How to edit this file using grep or using cat or using vim or using another tool?

One of my elder brother who is studying in Statistics. Now, he is writing his thesis paper in LaTeX. Almost all contents are written for the paper. And he took 5 number after point(e.g. 5.55534) for each value those are used for his calculation. But, at the last time his instructor said to change those to 3 number after point(e.g. 5.555) which falls my brother in trouble. Finding and correcting those manually is not easy. So, he told me to help.
I believe there is also a easy solution which is know to me. The snapshot of a portion of the thesis looks like-
&se($\hat\beta_1$)&0.35581&0.35573&0.35573\\
&mse($\hat\beta_1$)&.12945&.12947&.12947\\
\addlinespace
&$\hat\beta_2$&0.03329&0.03331&0.03331 \\
&se($\hat\beta_2$)&0.01593&0.01592&0.01591\\
&mse($\hat\beta_2$)&.000265&.000264&.000264 \\
\midrule
{n=100} & $\hat\beta_1$&-.52006&-.52001&-.51946\\
&se($\hat\beta_1$)&.22819&.22814&.22795\\
&mse($\hat\beta_1$)&.05247&.05244&.05234\\
\addlinespace
&$\hat\beta_2$&0.03134&0.03134&0.03133 \\
&se($\hat\beta_2$)&0.00979&0.00979&0.00979\\
&mse($\hat\beta_2$)&.000098&.000098&.000098
I want -
&se($\hat\beta_1$)&0.355&0.355&0.355\\
&mse($\hat\beta_1$)&.129&.129&.129\\
......................................................................
........................................................................
........................................................................
Note: Don't feel boring for the syntax(These are LaTeX syntax).
If anybody has solution or suggestion, please provide. Thank you.
In sed:
$ sed 's/\(\.[0-9]\{3\}\)[0-9]*/\1/g' file
&se($\hat\beta_1$)&0.355&0.355&0.355\\
&mse($\hat\beta_1$)&.129&.129&.129\\
ie. replace period starting numeric strings with at least 3 numbers with the leading period and three first numbers.
Here is the command in vim:
:%s/\.\d\{3}\zs\d\+//g
Explanation:
: entering command-mode
% is the range of all lines of the file
s substitution command
\.\d\{3}\zs\d\+ pattern you would like to change
\. literal point (.)
\d\{3} match 3 consecutive digits
\zs start substitution from here
\d\+ one or more digits
g Replace all occurrences in the line
Concerning grep and cat they have nothing to do with replacing text. These commands are only for searching and printing contents of files.
Instead, what you are looking is substitution there are lots of commands in Linux that can do that mainly sed, perl, awk, ex etc.

sort: string comparison failed Invalid or incomplete multibyte or wide character

I'm trying to use the following command on a text file:
$ sort <m.txt | uniq -c | sort -nr >m.dict
However I get the following error message:
sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘enwedig\r’ and ‘mwy\r’.
I'm using Cygwin on Windows 7 and was having trouble earlier editing m.txt to put each word within the file on a new line. Please see:
Using AWK to place each word in a text file on a new line
I'm not sure if I'm getting these errors due to this, or because m.txt contains characters from the Welsh alphabet (When I was working with Welsh text in Python, I was required t change the encoding to 'Latin-1').
I tried following the error message's advice and changing LC_ALL='C' however this has not helped. Can anyone elaborate on the errors I'm receiving and provide any advice on how I might go about trying to solve this problem.
UPDATE:
When trying dos2unix, errors were being displayed about invalid characters at certain lines. It turns out these were not Welsh characters, but other strange characters (arrows etc). I went through my text file removing these characters until I was able to use the dos2unix command without error. However, after using the dos2unix command all the text was concatenated (no spaces/newlines or anything, whereas it should have been so that each word in the file was on a seperate line) I then used unix2dos and the text file was back to normal. How can I each word on its own individual line and use the sort command without it giving me errors about '\r' characters?
I know it's an old question, but just running the command export LC_ALL='C' does the trick as described by sort: Set LC_ALL='C' to work around the problem..
Looks like a Windows line-ending related problem (\r\n versus \n). You can convert m.txt to Unix line-endings with
dos2unix m.txt
and then rerun your command.

Resources