What sed script can replace a range of hex characters with another - search

I need to replace some non text characters in some automatically generated files with spaces.
Although they are text files after processing some characters are added and the cannot be edited as text any more
Is there a sed command to do that?

Depending on your platform and sed version, you may or may not be able to do something like s/[\000-\037]/ /g; but the portable and simple alternative is this:
tr '\000-\037' ' ' <input >output
(All character codes are "binary"; I have assumed you mean control characters, but if you mean 8-bit characters \200-\377 or something else altogether, it's obviously trivial to adjust the range.)

Related

How to echo/print actual file contents on a unix system

I would like to see the actual file contents without it being formatted to print. For example, to show:
\n0.032,170\n0.034,290
Instead of:
0.032,170
0.34,290
Is there a command to echo the file's actual data in bash? I've tried using head, cat, more, etc. but all those seem to echo the "print-formatted" text. For example:
$ cat example.csv
0.032,170
0.34,290
How can I print the actual characters within the file?
This reads as if you miss understand what the "actual characters in the file" are. You will not find the characters \ and n in that file. But only a line feed, which is a specific character. So the utilities like cat do actually output exactly the characters in the file.
Putting it the other way around: if you really had those two characters literally in the file, then a utility like cat would actually output them. I just checked that, just to be sure.
You can easily check that yourself if you open the file using a hexeditor. There you will see the character 0A (decimal 10) which is a line feed character. You will not see the pair of the two characters \ and n somewhere in that file.
Many programming languages and also shell environments use escape sequences like \n in string definitions and identify those as control characters which would not be typable otherwise. So maybe that is where your impression comes from that your files should contain those two characters.
To display newlines as \n, you might try:
awk 1 ORS='\\n' input-file
This is not the "actual characters in the file", as \n is merely a conventional method of displaying a newline, but this does seem to be what you want.

Linux Sed command replace after special character

How can I use sed command in Linux to replace key value pair. I want to replace characters that occur after “:”
For example
App.log.level: “xyz”
It sounds like you just want something like sed 's/:.*$/: YOURTEXTHERE/' where the general format is sed 's/REPLACE_THIS/WITH_THIS/g'
The /:.*$/ bit means I want to replace all text from a colon to the end of the line. The : YOURTEXTHERE is what you're replacing with. (I'm putting the colon back in and putting the extra text.) Since I'm only doing one replacement per line, I don't need the g at the end (although it wouldn't hurt anything.)
A real example:
>> echo App.log.level: \"xyz\" | sed 's/:.*$/: YOURTEXTHERE/'
App.log.level: YOURTEXTHERE

using tr to strip characters but keep line breaks

I am trying to format some text that was converted from UTF-16 to ASCII, the output looks like this:
C^#H^#M^#M^#2^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#^#
T^#h^#e^#m^#e^# ^#M^#a^#n^#a^#g^#e^#r^# ^#f^#o^#r^# ^#3^#D^#S^#^#^#^#^#^#^#^#^#^#^#^#^#^#
The only text I want out of that is:
CHMM2
Theme Manager for 3DS
So there is a line break "\n" at the end of each line and when I use
tr -cs 'a-zA-Z0-9' 'newtext' infile.txt > outfile.txt
It is stripping the new line as well so all the text ends up in one big string on one line.
Can anyone assist with figuring out how to strip out only the ^#'s and keeping spaces and new lines?
The ^#s are most certainly null characters, \0s, so:
tr -d '\0'
Will get rid of them.
But this is not really the correct solution. You should simply use theiconv command to convert from UTF-16 to UTF-8 (see its man page for more information). That is, of course, what you're really trying to accomplish here, and this will be the correct way to do it.
This is an XY problem. Your problem is not deleting the null characters. Your real problem is how to convert from UTF-16 to either UTF-8, or maybe US-ASCII (and I chose UTF-8, as the conservative answer).

How to tell sed "do not remove some characters"?

I have a text file containing Arabic characters and some other characters (punctuation marks, numbers, English characters, ... ).
How can I tell sed to remove all the characters in the file, except Arabic ones? In short I can say that we typically tell sed to remove/replace some specific characters and print others, but now I am looking for a way to tell sed just print my desired characters, and remove all other characters.
With GNU sed, you should be able to specify characters by their hex code. You can use those in a a character class:
sed 's/[\x00-\x7F]//g' # hex notation
sed 's/[\o000-\o177]//g' # octal notation
You should also be able to achieve the same effect with the tr command:
tr -d '[\000-\177]'
Both methods assume UTF8 encoding of your input file. Multi-byte characters have their highest bit set, so you can simply strip everything that's a standard ASCII (7 bits) character.
To keep everything except some well defined characters, use a negative character classe:
sed 's/[^characters you want to keep]//g'
Using a pattern alike to [^…]\+ might improve performance of the regex.

Replacing comma on specific lines only

I have a dataset that is comma separated. But I have a little problem with its format. I want everything to be in the form x,x,x
Below is a sample of my dataset:
995970,16779453
995971,16828069
995972,
995973,16828069
995974,16827226
As you can see, most of my dataset is in the proper format but I have those commas on single id#'s also (my data is in form id#, connection#). How would I go about removing the commas on those single id#'s? I can't seem to figure it out just using a text editor. Any suggestions?
Edit: can I use some sort of regex expression to only remove it from those ids that have a specified length?
Edit2: Ok I figured it out using some regex, thanks for all the help!
In vi one would do something like
:%s/,$//
This means
: (enter a line mode command)
% (try the command on every line)
s (substitute)
,$ (match a comma at the end of a line)
(empty replacement text)
Sometimes you need something like /, *$/ do match a comma followed by 0 or more trailing spaces. You can get vi on windows in various different ways; one way is to install Cygwin.
You can select regular expression mode in Notepad++ and do find and replace using the following regex ,$. Leave the replace field blank.
With the sed command:
sed 's/, *//' < FILE
or inplace (requires GNU sed):
sed -ie 's/, *//' FILE

Resources