Bash script : select lines starting with number in specific range - linux

I have a file containing lines like:
121<some letters> random text ...
1234<some letters> random numbers etc...
Each line starts with a number followed by some letters.I'm looking for a way to select only the lines which start with a number in a specific interval, for example : [0-9999] . I'm having difficulty in selecting these lines if the number of digits can vary.
Tried using grep but can't seem to find the correct way to write the regex.

awk '($1+0)>10 && ($1+0)<50' file
would print lines that start with a number from 11 to 49 inclusive.

Through grep,
grep -E '^([1-9][0-9]?[0-9]?[0-9]|[0-9])\b' file

Related

linux shell script delimiter

How to change delimiter from current comma (,) to semicolon (;) inside .txt file using linux command?
Here is my ME_1384_DataWarehouse_*.txt file:
Data Warehouse,ME_1384,Budget for HW/SVC,13/05/2022,10,9999,13/05/2022,27,08,27,08
Data Warehouse,ME_1384,Budget for HW/SVC,09/05/2022,10,9999,09/05/2022,45,58,45,58
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
Data Warehouse,ME_1384,Budget for HW/SVC,25/05/2022,10,9999,25/05/2022,7,54,7,54
It is very important that value of last two columns is number with 2 decimal places, so value of last 2 columns in first row for example is:"27,08"
That could be the main problem why delimiter couldn't be change in proper way.
I tried with:
sed 's/,/;/g' ME_1384_DataWarehouse_*.txt
and every comma sign has been changed, including mentioned value of the last 2 columns.
Is there anyone who can help me out with this issue?
With sed you can replace the nth occurrence of a certain lookup string. Example:
$ sed 's/,/;/4' file
will replace the 4th comma with a semicolon.
So, if you know you have 11 fields (10 commas), you can do
$ sed 's/,/;/g;s/;/,/10;s/;/,/8' file
Example:
$ seq 1 11 | paste -sd, | sed 's/,/;/g;s/;/,/10;s/;/,/8'
1;2;3;4;5;6;7;8,9;10,11
Your question is somewhat unclear, but if you are trying to say "don't change the last comma, or the third-to-last one", a solution to that might be
perl -pi~ -e 's/,(?![^,]+(?:,[^,]+,[^,]+)?$)/;/g' ME_1384_DataWarehouse_*.txt
Perl in isolation does not perform any loop over the input lines, but the -p option says to loop over input one line at a time, like sed, and print every line (there is also -n to simulate the behavior of sed -n); the -i~ says to modify the file, but save the original with a tilde added to its file name as a backup; and the regex uses a negative lookahead (?!...) to protect the two fields you want to exempt from the replacement. Lookaheads are a modern regex feature which isn't supported by older tools like sed.
Once you are satisfied with the solution, you can remove the ~ after -i to disable the generation of backups.
You can do this with awk:
awk -F, 'BEGIN {OFS=";"} {a=$NF;NF-=1; printf "%s,%s\n",$0,a} ' input_file
This should work with most awk version (do not count on Solaris standard awk)
The idea is to store the last element from row in variable, decrease the number of fields and then print using new delimiter, comma and stored last field.

Insert characters before a line that contains numbers

Given a text file with lines (for example, a file with three sentences, it will be three lines).
It is necessary in the lines where there are numbers to add the current time in front of them (lines).
By inserting the current time, I sort of figured it out:
sed "s/^/$(date +%T) /" text.txt
I saw it but it doesn't suit me as it is here used IF
But how can I make the strings also be checked for the presence of digits?
But how to check a string for numbers and insert a date before it with one command?
It is possible without
if
statement?
You can use a regex to match the lines
sed "/[0-9]/s/^/$(date +%T) /" text.txt

Using sed to add character in a line which contains TWO specific Strings

I want to do something like this:
sed "/^[^+]/ s/\(.*$1|$2.*$\)/+\ \1/" -i file
where 2 specific String Parameters are being checked in a file and in those lines where BOTH parameters ($1 | $2) occur, a + is added at the beginning of the line if there was no + before.
Tried different variations so far and ending up either checking both but then sed'ing every line that contains 1 of the 2 Strings or some errors.
Thankful for any clarifications regarding slash and backslash escaping (respectively single/double quotes) i guess thats where my problem lies.
Edit: Wished outcome: (Folder containing bunch of text files one of which has the following 2 lines)
sudo bash MyScript.sh 01234567 Wanted
Before:
Some Random Text And A Number 01234567 and i'm Wanted.
Another Random Text with Diff Number 09812387 and i'm still Wanted.
Expected:
+ Some Random Text And A Number 01234567 and i'm Wanted.
Another Random Text with Diff Number 09812387 and i'm still Wanted.
For an input file that looks as follows:
$ cat infile
Some Random Text And A Number 01234567 and i'm Wanted.
Another Random Text with Diff Number 09812387 and i'm still Wanted.
and setting $1 and $2 to 01234567 and Wanted (in a script, these are just the first two positional parameters and don't have to be set):
$ set -- 01234567 Wanted
the following command would work:
$ sed '/^+/b; /'"$1"'/!b; /'"$2"'/s/^/+ /' infile
+ Some Random Text And A Number 01234567 and i'm Wanted.
Another Random Text with Diff Number 09812387 and i'm still Wanted.
This is how it works:
sed '
/^+/b # Skip if line starts with "+"
/'"$1"'/!b # Skip if line doesn't contain first parameter
/'"$2"'/s/^/+ / # Prepend "+ " if second parameter is matched
' infile
b is the "branch" command; when used on its own (as opposed to with a label to jump to), it skips all commands.
The first two commands skip lines that start with + or that don' t contain the first parameter; if we're on the line with the s command, we already know that the current line doesn't start with + and contains the first parameter. If it contains the second parameter, we prepend + .
For quoting, I have single quoted the whole command except for where the parameters are included:
'single quoted'"$parameter"'single quoted'
so I don't have to escape anything unusual. This assumes that the variable in the double quoted part doesn't contain any metacharacters that might confuse sed.

Using grep to find a word containing only one time a letter

I have a text file containing the following words :
White
Pinkman
Goodman
White
Pinkman
Fring
I want to use grep to obtain the words that contain the letter "n" only one time.
I used the following command : grep -E 'n{1}'
but it still gives me the words that contain "n" more than one time.
Thanks for your help.

How to delete double lines in bash

Given a long text file like this one (that we will call file.txt):
EDITED
1 AA
2 ab
3 azd
4 ab
5 AA
6 aslmdkfj
7 AA
How to delete the lines that appear at least twice in the same file in bash? What I mean is that I want to have this result:
1 AA
2 ab
3 azd
6 aslmdkfj
I do not want to have the same lines in double, given a specific text file. Could you show me the command please?
Assuming whitespace is significant, the typical solution is:
awk '!x[$0]++' file.txt
(eg, The line "ab " is not considered the same as "ab". It is probably simplest to pre-process the data if you want to treat whitespace differently.)
--EDIT--
Given the modified question, which I'll interpret as only wanting to check uniqueness after a given column, try something like:
awk '!x[ substr( $0, 2 )]++' file.txt
This will only compare columns 2 through the end of the line, ignoring the first column. This is a typical awk idiom: we are simply building an array named x (one letter variable names are a terrible idea in a script, but are reasonable for a one-liner on the command line) which holds the number of times a given string is seen. The first time it is seen, it is printed. In the first case, we are using the entire input line contained in $0. In the second case we are only using the substring consisting of everything including and after the 2nd character.
Try this simple script:
cat file.txt | sort | uniq
cat will output the contents of the file,
sort will put duplicate entries adjacent to each other
uniq will remove adjcacent duplicate entries.
Hope this helps!
The uniq command will do what you want.
But make sure the file is sorted first, it only checks for consecutive lines.
Like this:
sort file.txt | uniq

Resources