Different characters take more/less data? - text

I am working on a personal project and I'm wondering if certain characters take up more data in a text file than others. I need to choose a character to seperate items in my file, but if a 0 uses less bytes than a ! or something, it would be best to do that. I know all characters have an ASCII value, but would a lower ASCII value mean the character can be stored in fewer bytes?
This might be an incredibly stupid question, but I don't see any information on the topic online so I came here to check.
Thanks!

It depends on which character set you are using as to whether or not one character will take up more space than another. Some character sets are variable-width [1]. UTF-8 is one such character set. Using UTF-8 as an example, the standard ASCII characters are all 1 byte in width, whereas the extended ASCII characters start to take up multiple bytes (up to 6) [2].
In your example, of '0' and '!': both are standard ASCII and therefore both are 1 byte in width in UTF-8.
References:
Variable Width Encoding (Wikipedia)
UTF-8 Description (Wikipedia)

You can also test sizes of different characters by putting them in a file and checking the file size using ls -l or stat -f "%N %z" commands
test $cat a
0
test $cat b
!
test $cat c
ક
test $cat d
æ
test $stat -f "%N %z" *
a 2
b 2
c 4
d 3
test $ls -l
total 32
-rw-r--r-- 1 spundun wheel 2 Jun 2 14:10 a
-rw-r--r-- 1 spundun wheel 2 Jun 2 14:10 b
-rw-r--r-- 1 spundun wheel 4 Jun 2 14:11 c
-rw-r--r-- 1 spundun wheel 3 Jun 2 14:13 d
test $
I believe each file has an extra byte to indicate end-of-file EOF. so the sizes of the characters are 1, 1, 3, 2 respectively.

Related

Sort command is printing in unordered sequence in linux terminal

I have one file named temp following are the data in the file
0.9
1
2
3
10
4
5
6
When i am execute sort temp in the terminal i am getting the answer as
0.9
1
10
2
3
4
5
But my expected answer is
0.9
1
2
3
4
5
10
Can anyone help me in this?
See man sort:
-n, --numeric-sort
compare according to string numerical value
Thus, use
sort -n temp
The command sort by "dictionary-order" by default
If you want to order numerically use -n
sort -n file

Run script on specific file in all subdirs

I've written a script (foo) which makes a simple sed replacement on text in the input file. I have a directory (a) containing a large number of subdirectories (a/b1, a/b2 etc) which all have the same subdirs (c, etc) and contain a file with the same name (d). So the rough structure is:
a/
-b1/
--c/
---d
-b2/
--c/
---d
-b3/
--c/
---d
I want to run my script on every file (d) in the tree. Unfortunately the following doesn't work:
sudo sh foo a/*/c/d
how do I use wildcards in a bash command like this? Do I have to use find with specific max and mindepth, or is there a more elegant solution?
The wildcard expansion in your example should work, and no find should be needed. I assume a b and c are just some generic file names to simplify the question. Do any of your folders/files contain spaces?
If you do:
ls -l a/*/d/c
are you getting the files you need listed? If so, then it is how you handle the $* in your script file. Mind sharing it with us?
As you can see, wildcard expansion works
$ ls -l a/*/c/d
-rw-r--r-- 1 user wheel 0 15 Apr 08:05 a/b1/c/d
-rw-r--r-- 1 user wheel 0 15 Apr 08:05 a/b2/c/d
-rw-r--r-- 1 user wheel 0 15 Apr 08:05 a/b3/c/d

Bash script to rename a heap of folders

I have a directory that looks a little like this:
drw-r--r-- 1 root root 0 Jan 24 17:26 -=1=-directoryname
drw-r--r-- 1 root root 0 Jan 24 17:26 -=2=-directoryname
drw-r--r-- 1 root root 0 Jan 24 17:26 -=3=-directoryname
drw-r--r-- 1 root root 0 Jan 24 17:26 -=4=-directoryname
drw-r--r-- 1 root root 0 Jan 24 17:26 -=5=-directoryname
I am trying to write a script to change these folders from
-=1=- Folder#1
to strip off the "-=1=-" section, but alas I am having no luck.
Can anyone help me find a solution to this?
So far my script below has failed me.
#!/bin/bash
for i in {1..250}
do
rename "-=$i=-" ""*
i=i+1
done
I have used the 1..250 because there are 250 folders.
Given the number, you can manufacture the names and use the mv command:
#!/bin/bash
for i in {1..250}
do
mv "-=$i=- Folder#$i" "Folder#$i"
done
With the Perl-based rename command (sometimes called prename), you could use:
rename 's/-=\d+=- //' -=*=-*Folder#*
or, given the revised question (the information after the pattern isn't fixed):
rename 's/-=\d+=- //' -=*=-*
This worked! Can you please explain how it worked? What's the \d+ for?
The \d is Perl regex notation for a digit 0..9. The + modifier indicates 'one or more'. So, the regex part of s/-=\d+=- // looks for a minus, an equals, one or more digits, an equals, a minus and a space. The replace part converts all of the matched material into an empty string. It's all surrounded by single quotes so the shell leaves it alone (though there's only the backslash that's a shell metacharacter in that substitute command, but the backslash and space would need protecting if you omitted the quotes).
I'm not sure how you'd use the C-based rename command for this job; it is much less powerful than the Perl-based version.

Add blank line before a certain phrase in a text file in Linux?

I'm using Kali Linux, trying to sort out some input from Nmap. Basically, I ran a scan from NMap, and need to extract specific pieces of information from it. I've got it to show everything I need using the following command:
cat discovery.txt | grep 'Nmap scan report for\|Service Info: OS:\|OS CPE:\|OS guesses:\|OS matches\|OS details'
Essentially, each section of information I need will start with "Nmap scan report for [IP ADDRESS]"
I'd like to add to my command to have it create a blank line before every appearance of the word "Nmap", to clearly separate each chunk of information.
Is there any command I can use to do this?
sed '/Nmap/i
' file
That's a literal newline after the i
A demo: add a newline before each line ending with a "0" or a "5"
seq 19 | sed '/0$\|5$/i
'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Sure, you can use Perl.
perl -pe 's/^Nmap/\nNmap/'

where my '[]' is?, bash program

in linux ,bash program.
I write this:
msg=`date '+%m-%d %H:%M'`" alipay recharge [$sum] in past 15 mins"
echo $msg >> $MonitorLog
Mostly it works ,but sometime.the result will like this:
07-15 09:01 card recharge 0 in past 30 mins
My sentence changes. not 0, if $sum=0 ,it should be:
07-15 09:01 card recharge [0] in past 30 mins
I don't know where my '[]' is? can you help me ,thanks a lot.
You are hitting shell globbing. See the output below.
$ ls -l
total 4
-rw-r--r-- 1 root root 0 Jul 14 21:40 5
$ sum=10
$ msg=`date '+%m-%d %H:%M'`" alipay recharge [$sum] in past 15 mins"
$ echo $msg
07-14 21:41 alipay recharge [10] in past 15 mins
$ sum=5
$ msg=`date '+%m-%d %H:%M'`" alipay recharge [$sum] in past 15 mins"
$ echo $msg
07-14 21:41 alipay recharge 5 in past 15 mins
$ echo "$msg"
07-14 21:41 alipay recharge [5] in past 15 mins
#Etan Reisinger's answer contains the crucial pointer:
Shell expansions are inadvertently applied to $msg, because it is unquoted.
tl;dr:
Double-quote your variable references to protect them from interpretation by the shell:
echo "$msg" >> "$MonitorLog" # due to double-quoting, contents of $msg used as is
Generally, the only reason NOT to double-quote a variable reference is the express intent to have the shell interpret the value (apply expansions to it) - see below.
In the case at hand, here's what happens if you do not double-quote $msg:
After splitting the value of $msg into words by whitespace (word splitting), pathname expansion is applied to each:
I.e., each word that looks like a glob (a filename pattern), is matched against filenames - in the specified directory or, without a path component, in the current one - and if matches are found, that word is replaced by matching filenames.
A word such as [0] happens to be a valid glob ([...] encloses a set of matching characters; in this case, the set is made up of only 1 char., 0), and if a file named 0 happens to be present in the current directory, [0] is replaced by that matching filename, 0 - effectively making the [] disappear - this is what happened in the OP's case.
(See man bash, section Pathname Expansion, for what constitutes valid globs.)

Resources