Remove diamond question marks from binary file on bash - linux

I am dynamically writing to a file the input of a serial port, like so:
sudo cu -s 19200 -l /dev/ttyUSB0 > serialContent.json
But when I open it, it shows me a lot of diamond question marks:
������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������*#*1##*1*0*702442501#9##*1*0*702442501#9##
What I want to get is only this portion: *#*1##*1*0*702442501#9##*1*0*702442501#9##
When I open the file with vim I get a lot of ^# characters.
I tried to replace the characters using sed 's|[^#]||g' serialContent.json > serialContent2.json and sed 's|[�]||g' serialContent.json > serialContent2.json with no luck.
This is what I get with this command:
$ file -bi serialContent.json
application/octet-stream; charset=binary
What can I do to remove those marks? Thanks!

This is the replacement character shown when you have non-printable data.
To remove all non-printable characters, you can pipe it through tr -cd '[:print:]':
sudo cu -s 19200 -l /dev/ttyUSB0 | tr -cd '[:print:]' > serialContent.json
What's considered printable depends on your locale. You may want to export LC_ALL=C first to ensure consistent results across machines.

Related

Eliminate multiple space from a file and modify the original file

Specify the command that removes multiple spaces from a text file, leaving a single space in their place. Extra requirements : Original file to be modified.
Managed to pull out those 3 commands:
awk '{$2=$2};1' filename.txt
tr -s '[:space:]' < filename.txt > filename.new && mv filename.new filename.txt
sed -i 's/\s\+/ /g' filename.txt
Not sure if using a 'temporary file' is the best way to do the trick. Is there any more efficient way to do the problem ? Doesn't matter if it is tr / sed / awk or anything else, you can post all of them.
Example input:
I'm just giving spaces
Output :
I'm just giving spaces
Edit: Still looking for more answers
I'd use ed over the non-standard sed -i (And non-portable RE in your example) if you want to alter the original file:
printf "%s\n" '1,$s/[[:space:]]\{2,\}/ /g' w | ed -s filename.txt
or with perl:
perl -pi -e 's/\s{2,}/ /g' filename.txt
The {2,} regular expression construct (\{2,\} for POSIX Basic Regular Expressions like sed and ed use) matches 2 or more of the previous token.
Both of these match any whitespace characters, not just space, because that's how your examples work. If the goal is to only compress multiple spaces, not spaces + tabs, switch out the [[:space:]] and \s for just a single space.
(Anything that modifies a file "in place", be it ed, sed -i, perl -i, or a regular editor, has a good chance that it's going to be using a temporary file under the hood, by the way. They just handle it for you so you don't have to do it manually like with your tr example.)

grep for a line in a file then remove the line

$ cat example.txt
Yields:
example
test
example
I want to remove 'test' string from this file.
$ grep -v test example.txt > example.txt
$ cat example.txt
$
The below works, but I have a feeling there is a better way!
$ grep -v test example.txt > example.txt.tmp;mv example.txt.tmp example.txt
$ cat example.txt
example
example
Worth noting that this is going to be on a file with over 10,000 lines.
Cheers
You could use sed,
sed -i '/test/d' example.txt
-i saves the changes made to that file. so you don't need to use a redirection operator.
-i[SUFFIX], --in-place[=SUFFIX]
edit files in place (makes backup if SUFFIX supplied)
You're doing it the right way but use an && before the mv to make sure the grep succeeded or you'll zap your original file:
grep -F -v test example.txt > example.txt.tmp && mv example.txt.tmp example.txt
I also added the -F options since you said you want to remove a string, not a regexp.
You COULD use sed -i but then you need to worry about figuring out and/or escaping sed delimiters and sed does not support searching for strings so you'd need to try to escape every possible combination of regexp characters in your search string to try to make sed treat them as literal chars (a process you CANNOT automate due to the position-sensitive nature of regexp chars) and all it'd save you is manually naming your tmp file since sed uses one internally anyway.
Oh, one other option - you could use GNU awk 4.* with "inplace editing". It also uses a tmp file internally like sed does but it does support string operations so you don't need to try to escape RE metacharacters and it doesn't have delimiters as part of the syntax to worry about:
awk -i inplace -v rmv="test" '!index($0,rmv)' example.txt
Any grep/sed/awk solution will run the in blink of an eye on a 10,000 line file.

clean letters and characters in files leaving only numbers using bash

I am reading files and i am doing something like:
cat file | sed s/\ //g |awk '$0 !~ /[^0-9]/'
With this line I want to clean anything different to numbers.
But i have a problem, when the file is not sorted the command works fine, but with a sorted file the command not works, the output is empty.
Who can help me?
with grep -o '[0-9]+' not works because:
I have a file like:
311435ll3e
kk13322;.
erre433
The output is:
311435
3
13322
433
And the 3 is in the second line, the output that i need is:
3114353
13322
433
As a general rule, there is no reason to have both awk and sed appearing in the same pipe, due to a large overlap of capability, and frequently the same is true of awk/grep/sed combinations.
If you just want to suppress the non-digit characters within lines of characters, use (eg) sed -e 's/[^0-9]//g' file, or if you want to do it in place with no backup, sed -i -e 's/[^0-9]//g' file, or in place with backup to a .bak file, sed -ibak -e 's/[^0-9]//g' file.
To suppress blank lines, you can append |egrep -v '^$' after the sed, but it's more efficient to just use sed's d command to delete the pattern space and start next cycle if the pattern space is empty. For example,
sed -e 's/[^0-9]//g; /^$/d' file
does a d if the line is empty after substitution.
The form suggested in 1_CR's comment,
sed -e 's/[^0-9]//g' -e '/./!d'
is an alternative. That form tests if the line has at least one character in it, and if so does not do a d.
If you want to suppress everything in the file that's not digits, use tr -cd 0-9 < file. This suppresses line feeds also.
Note, the form tr -cd [0-9] < file or tr -cd '[0-9]' < file is not correct; it will fail to suppress ] and [ characters because tr will regard them as part of SET1.

Remove blank lines with grep

I tried grep -v '^$' in Linux and that didn't work. This file came from a Windows file system.
Try the following:
grep -v -e '^$' foo.txt
The -e option allows regex patterns for matching.
The single quotes around ^$ makes it work for Cshell. Other shells will be happy with either single or double quotes.
UPDATE: This works for me for a file with blank lines or "all white space" (such as windows lines with \r\n style line endings), whereas the above only removes files with blank lines and unix style line endings:
grep -v -e '^[[:space:]]*$' foo.txt
Keep it simple.
grep . filename.txt
Use:
$ dos2unix file
$ grep -v "^$" file
Or just simply awk:
awk 'NF' file
If you don't have dos2unix, then you can use tools like tr:
tr -d '\r' < "$file" > t ; mv t "$file"
grep -v "^[[:space:]]*$"
The -v makes it print lines that do not completely match
===Each part explained===
^ match start of line
[[:space:]] match whitespace- spaces, tabs, carriage returns, etc.
* previous match (whitespace) may exist from 0 to infinite times
$ match end of line
Running the code-
$ echo "
> hello
>
> ok" |
> grep -v "^[[:space:]]*$"
hello
ok
To understand more about how/why this works, I recommend reading up on regular expressions. http://www.regular-expressions.info/tutorial.html
If you have sequences of multiple blank lines in a row, and would like only one blank line per sequence, try
grep -v "unwantedThing" foo.txt | cat -s
cat -s suppresses repeated empty output lines.
Your output would go from
match1
match2
to
match1
match2
The three blank lines in the original output would be compressed or "squeezed" into one blank line.
The same as the previous answers:
grep -v -e '^$' foo.txt
Here, grep -e means the extended version of grep. '^$' means that there isn't any character between ^(Start of line) and $(end of line). '^' and '$' are regex characters.
So the command grep -v will print all the lines that do not match this pattern (No characters between ^ and $).
This way, empty blank lines are eliminated.
I prefer using egrep, though in my test with a genuine file with blank line your approach worked fine (though without quotation marks in my test). This worked too:
egrep -v "^(\r?\n)?$" filename.txt
Do lines in the file have whitespace characters?
If so then
grep "\S" file.txt
Otherwise
grep . file.txt
Answer obtained from:
https://serverfault.com/a/688789
This code removes blank lines and lines that start with "#"
grep -v "^#" file.txt | grep -v ^[[:space:]]*$
awk 'NF' file-with-blank-lines > file-with-no-blank-lines
It's true that the use of grep -v -e '^$' can work, however it does not remove blank lines that have 1 or more spaces in them. I found the easiest and simplest answer for removing blank lines is the use of awk. The following is a modified a bit from the awk guys above:
awk 'NF' foo.txt
But since this question is for using grep I'm going to answer the following:
grep -v '^ *$' foo.txt
Note: the blank space between the ^ and *.
Or you can use the \s to represent blank space like this:
grep -v '^\s*$' foo.txt
I tried hard, but this seems to work (assuming \r is biting you here):
printf "\r" | egrep -xv "[[:space:]]*"
Using Perl:
perl -ne 'print if /\S/'
\S means match non-blank characters.
egrep -v "^\s\s+"
egrep already do regex, and the \s is white space.
The + duplicates current pattern.
The ^ is for the start
Use:
grep pattern filename.txt | uniq
Here is another way of removing the white lines and lines starting with the # sign. I think this is quite useful to read configuration files.
[root#localhost ~]# cat /etc/sudoers | egrep -v '^(#|$)'
Defaults requiretty
Defaults !visiblepw
Defaults always_set_home
Defaults env_reset
Defaults env_keep = "COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR
LS_COLORS"
root ALL=(ALL) ALL
%wheel ALL=(ALL) ALL
stack ALL=(ALL) NOPASSWD: ALL
Read lines from file exclude EMPTY Lines
grep -v '^$' folderlist.txt
folderlist.txt
folder1/test
folder2
folder3
folder4/backup
folder5/backup
Results will be:
folder1/test
folder2
folder3
folder4/backup
folder5/backup

Replace whitespaces with tabs in linux

How do I replace whitespaces with tabs in linux in a given text file?
Use the unexpand(1) program
UNEXPAND(1) User Commands UNEXPAND(1)
NAME
unexpand - convert spaces to tabs
SYNOPSIS
unexpand [OPTION]... [FILE]...
DESCRIPTION
Convert blanks in each FILE to tabs, writing to standard output. With
no FILE, or when FILE is -, read standard input.
Mandatory arguments to long options are mandatory for short options
too.
-a, --all
convert all blanks, instead of just initial blanks
--first-only
convert only leading sequences of blanks (overrides -a)
-t, --tabs=N
have tabs N characters apart instead of 8 (enables -a)
-t, --tabs=LIST
use comma separated LIST of tab positions (enables -a)
--help display this help and exit
--version
output version information and exit
. . .
STANDARDS
The expand and unexpand utilities conform to IEEE Std 1003.1-2001
(``POSIX.1'').
I think you can try with awk
awk -v OFS="\t" '$1=$1' file1
or SED if you preffer
sed 's/[:blank:]+/,/g' thefile.txt > the_modified_copy.txt
or even tr
tr -s '\t' < thefile.txt | tr '\t' ' ' > the_modified_copy.txt
or a simplified version of the tr solution sugested by Sam Bisbee
tr ' ' \\t < someFile > someFile
Using Perl:
perl -p -i -e 's/ /\t/g' file.txt
better tr command:
tr [:blank:] \\t
This will clean up the output of say, unzip -l , for further processing with grep, cut, etc.
e.g.,
unzip -l some-jars-and-textfiles.zip | tr [:blank:] \\t | cut -f 5 | grep jar
Example command for converting each .js file under the current dir to tabs (only leading spaces are converted):
find . -name "*.js" -exec bash -c 'unexpand -t 4 --first-only "$0" > /tmp/totabbuff && mv /tmp/totabbuff "$0"' {} \;
Download and run the following script to recursively convert soft tabs to hard tabs in plain text files.
Place and execute the script from inside the folder which contains the plain text files.
#!/bin/bash
find . -type f -and -not -path './.git/*' -exec grep -Iq . {} \; -and -print | while read -r file; do {
echo "Converting... "$file"";
data=$(unexpand --first-only -t 4 "$file");
rm "$file";
echo "$data" > "$file";
}; done;
Using sed:
T=$(printf "\t")
sed "s/[[:blank:]]\+/$T/g"
or
sed "s/[[:space:]]\+/$T/g"
You can also use astyle. I found it quite useful and it has several options too:
Tab and Bracket Options:
If no indentation option is set, the default option of 4 spaces will be used. Equivalent to -s4 --indent=spaces=4. If no brackets option is set, the
brackets will not be changed.
--indent=spaces, --indent=spaces=#, -s, -s#
Indent using # spaces per indent. Between 1 to 20. Not specifying # will result in a default of 4 spaces per indent.
--indent=tab, --indent=tab=#, -t, -t#
Indent using tab characters, assuming that each tab is # spaces long. Between 1 and 20. Not specifying # will result in a default assumption of
4 spaces per tab.`
This will replace consecutive spaces with one space (but not tab).
tr -s '[:blank:]'
This will replace consecutive spaces with a tab.
tr -s '[:blank:]' '\t'
If you are talking about replacing all consecutive spaces on a line with a tab then tr -s '[:blank:]' '\t'.
[root#sysresccd /run/archiso/img_dev]# sfdisk -l -q -o Device,Start /dev/sda
Device Start
/dev/sda1 2048
/dev/sda2 411648
/dev/sda3 2508800
/dev/sda4 10639360
/dev/sda5 75307008
/dev/sda6 96278528
/dev/sda7 115809778
[root#sysresccd /run/archiso/img_dev]# sfdisk -l -q -o Device,Start /dev/sda | tr -s '[:blank:]' '\t'
Device Start
/dev/sda1 2048
/dev/sda2 411648
/dev/sda3 2508800
/dev/sda4 10639360
/dev/sda5 75307008
/dev/sda6 96278528
/dev/sda7 115809778
If you are talking about replacing all whitespace (e.g. space, tab, newline, etc.) then tr -s '[:space:]'.
[root#sysresccd /run/archiso/img_dev]# sfdisk -l -q -o Device,Start /dev/sda | tr -s '[:space:]' '\t'
Device Start /dev/sda1 2048 /dev/sda2 411648 /dev/sda3 2508800 /dev/sda4 10639360 /dev/sda5 75307008 /dev/sda6 96278528 /dev/sda7 115809778
If you are talking about fixing a tab-damaged file then use expand and unexpand as mentioned in other answers.
sed 's/[[:blank:]]\+/\t/g' original.out > fixed_file.out
This will for example reduce the amount of tabs.. or spaces into one single tab.
You can also do it for situations of multiple spaces/tabs into one space:
sed 's/[[:blank:]]\+/ /g' original.out > fixed_file.out

Resources