grep, awk and sed alternatives for python? [closed] - python-3.x

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
os.system(r"grep -R 'Webpage\|Thumbnail' tmp | awk -F ' ' '{print $2}' | sed '1~2s/\(.*\)/]\[img]\1\[\/img]\[\/URL]/g ; N;s/\(.*\)\n\(.*\)/\2\1/ ; s/^/\[URL=/' | tr -d '[:space:]' > ./" + t + ".files/bbcode.txt")
What its doing, grep lines with Keywords in tmp > awk split at space delimiter to get everything after the keywords > sed add "[img]" at start and "[/img][/URL]" at the end of every line > sed add "[URL=" to start and "]" at the end of every second line > move all odd lines to beginning of all even lines > remove all spaces and merge into one big line.
Please can someone point me in the right direction to do this in python?

Here is a simple stab at a Python replacement.
grep -R will recursively search regular files in the destination directory. This can be replaced with os.walk('tmp'). Remember that the third result from os.walk is just the file names; you have to glue back the directory in front of each one.
Fields are generally numbered starting with 1 in the Unix command-line tools, while Python's indexing is zero-based. So the second field from the line is line.split(' ')[1], not line.split(' ')[2]
Without access to your files, I had to guess what the sed script is really receiving as input. I'm assuming that every second output is a "Webpage" one and every other is a "Thumbnail" one.
Tangentially, piping Awk to sed and tr is basically useless; Awk can do everything those two tools can do all by itself (though a nontrivial sed script might be hard to reimplement in Awk -- but this is not an example of that. 1~2 is a GNU sed extension so this was never very portable to begin with, and would be a lot easier to read and understand in Awk.). Conversely, splitting on a single space with Awk is kind of overkill; cut -d ' ' -f2 would be a more economical and succinct way to do that.
import os
with open(t + ".files/bbcode.txt", "w") as bbcode:
for root, dirs, files in os.walk('tmp'):
for file in files:
with open(os.path.join(root, file)) as lines:
idx = 0
for line in lines:
if 'Webpage' in line or 'Thumbnail' in line:
idx += 1
field = line.split(' ')[1]
if idx % 2 == 1:
thumb = field
next
bbcode.write(
'[URL=%s][img]%s[/img][/URL]' % (field, thumb))
The decision to collect all output on a single long line is dubious; could you perhaps be persuaded to add a final \n to the write format string?

Related

How can I concatenate string in loop in linux bash? [duplicate]

This question already has answers here:
Are shell scripts sensitive to encoding and line endings?
(14 answers)
Closed last year.
I am new on coding in bash Linux and I have the following problem.
I'm trying to concatenate string in loop to create a path. I have a text file in which I stored some strings to use in the loop. I wrote this example just to show you the problem:
for bio in `cat /data/giordano/species_ranges/prova_bio.txt` # list of strings: "bio_01", "bio_02"...
do
echo /data/giordano/species_range/$bio.tif # concatenation
done
The result I expect would be:
/data/giordano/species_range/bio_01.tif
/data/giordano/species_range/bio_02.tif
/data/giordano/species_range/bio_03.tif
But what actually came out was:
.tifa/giordano/species_range/bio_01
.tifa/giordano/species_range/bio_02
.tifa/giordano/species_range/bio_03
/data/giordano/species_range/bio_04.tif
I really don't understand what kind of problem it is...
I suggest that awk would be simpler for this task. We use tr to remove the Cr line endings
~/tests/bash $ tr -d "\r" < data/giordano/species_range/proverbio.txt | awk '{ print "/data/giordano/species_range/" $0 ".tif"
> }'
/data/giordano/species_range/bio_1.tif
/data/giordano/species_range/bio_2.tif
/data/giordano/species_range/bio_3.tif
/data/giordano/species_range/bio_4.tif
Thank you to Charles Duffy for the improvements.
You probably have Windows line endings in your file, which contain an additional carriage return (\r). This makes the cursor go to the beginning of the line. You can remove the \rs from your file by piping to tr. Extend your first line like this:
for bio in `cat /data/giordano/species_ranges/prova_bio.txt | tr -d '\r'`

How to copy a file with line number in linux? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I am noob, and not good at linux
I want to copy specific line(20~29) of the file that containing its original line number.
So, I need a new file starting with 20.
Just like this
20 ~~~
21 ~~~
22 ~~~
.
.
.
29 ~~~
(vertically)
I need your help.
Thanks
You can use the following command:
awk 'NR >= 10 && NR <= 20 { print NR,$0}' filename
This will read the file filename and print the line number (NR) followed by the line ($0). The conditions NR >= 10 and NR <= 20 allow you to choose which lines you want to extract.
If you want to save it in a file, you can redirect the output to a file like so:
awk 'NR >= 10 && NR <= 20 { print NR,$0}' filename > new_file
It will put the line numbers and the lines inside a new file called new_file.
awk is a powerfull program which can perform many operations on a text file/stream. I recommand you to learn how to use this tool if you want to achieve complicated things easily in linux.
It can be done in many ways-
Using head- tail command-
head -n 29 your_file.txt |tail -n -10 > new_file.txt
Using sed command- is stream editor and it can perform lot’s of function on file like, searching, find and replace, insertion or deletion.
sed -n '20,29p' your_file.txt > new_file.txt
Using awk command- is used for selecting & manipulating data in a file or passed data passed in runtime
awk 'NR>=20 && NR<=29' your_file.txt > new_file.txt

I have a requirement of searching a pattern from a file and displaying the pattern only in the screen,not the whole line .How can I do it in linux? [duplicate]

This question already has answers here:
Can grep show only words that match search pattern?
(15 answers)
Closed 5 years ago.
I have a requirement of searching a pattern like x=<followed by any values> from a file and displaying the pattern i.e x=<followed by any values>, only in the screen, not the whole line. How can I do it in Linux?
I have 3 answers, from simple (but with caveats) to complex (but foolproof):
1) If your pattern never appears more than once per line, you could do this (assuming your shell is
PATTERN="x="
sed "s/.*\($PATTERN\).*/\1/g" your_file | grep "$PATTERN"
2) If your pattern can appear more than once per line, it's a bit harder. One easy but hacky way to do this is to use a special characters that will not appear on any line that has your pattern, eg, "#":
PATTERN="x="
SPECIAL="#"
grep "$PATTERN" your_file | sed "s/$PATTERN/$SPECIAL/g" \
| sed "s/[^$SPECIAL]//g" | sed "s/$SPECIAL/$PATTERN/g"
(This won't separate the output pattern per line, eg. you'll see x=x=x= if a source line had 3 times "x=", this is easy to fix by adding a space in the last sed)
3) Something that always works no matter what:
PATTERN="x="
awk "NF>1{for(i=1;i<NF;i++) printf FS; print \"\"}" \
FS="$PATTERN" your_file

Remove lines in text file which contain fewer than 4 pipes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I have a text file with data separated by 4 separate |
There are some problem lines in the file. These lines contain fewer than 4 pipes.
The data in the problem rows is not needed and I want to run a command on the file which deletes any line which contains fewer than four pipes. I would also like to know how many lines were deleted afterwards so if this could be printed on the screen once the command is applied that would be ideal.
Sample data:
865|Blue Moon Club|Havana Project|34d|879
899|Soya Plates|Dimsby|78a|699
657|Sherlock
900|Forestry Commission|Eden Project|68d|864
Desired output:
865|Blue Moon Club|Havana Project|34d|879
899|Soya Plates|Dimsby|78a|699
900|Forestry Commission|Eden Project|68d|864
I have tried awk '|>=3' file.txt which didn't work. There is a lot of info out there regarding awk, some of which I found, but there's so much it makes it difficult to find exactly what I want to do due to its sheer volume.
To eliminate the lines:
grep '|.*|.*|.*|' file > newfile
To count the number of bad lines:
grep -cv '|.*|.*|.*|' file
That doesn't do the edit in place; you could do that with sed but it is often safer to do edits like this to a newfile, in order to avoid losing data if you make a mistake.
The first grep pattern matches any line with four pipe symbols. (By default, grep uses "Basic" regular expressions, in which you have to write the alternation operator \|. So you can use | as an ordinary character.)
The second invocation counts (-c) the number of non-matching (-v) lines.
Here's a simple sed solution:
sed -n -i.bak '/|.*|.*|.*|/p' file
The -n option turns off automatic printing, so the command only prints the lines which match the pattern. (Again, by default, sed uses basic regexes.). The -i.bak option does the edit in place, creating a backup of the original with the name file.bak.
If you wanted to select lines with exactly four pipes, you could use awk:
awk -F'|' 'NF==5' file > newfile
which will set the filed separator to a pipe symbol and then select the lines with exactly five fields, which are the lines with four pipes.
A useful tool to count lines is wc:
wc -l file
will tell you how many lines are in file; if you count lines in both file and newfile, the difference will obviously be the number of deletions. You could do that computation in awk, too, but it's a bit wordier:
awk -F'|' 'NF==5{print;next}{del+=1}END{print del >>"/dev/stderr"}' file > newfile
This will do:
sed -i.bak '/\([^|]*|\)\{4\}/!d' file
Or (as Cyrus's comment)
sed -i.bak -E '/(\|[^\|]*){4}/!d' file
Or
sed -n '/^[^|]*|[^|]*|[^|]*|[^|]*|$/p' file > newfile
Or
sed -e '/^[^|]*|[^|]*|[^|]*|$/d' \
-e '/^[^|]*|[^|]*|$/d' \
-e '/^[^|]*|$/d' \
-e '/^[^|]*$/d' \
-i.bak file
This won't give you line count though. To get line count run grep -cv '^[^|]*|[^|]*|[^|]*|[^|]*|$' file on the original file as rici mentioned, or compare the line number before and after with wc -l file command
Explanation:
The first two sed matches loosely 4 pipes (not less but can be more) and the third one matches exactly 4 | (not more or less).
The fourth sed matches exactly 3,2,1 and 0 pipes (|) and deletes those lines (in place) and prepares a backup file (file.bak) of the original.

How to use Linux to read a file line by line and replace all the spaces into ','? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I am a beginner.. I'd like to use Linux shell to make the following file
1 2 2
2 3 4
4 5 2
4 2 1
....
into
1,2,2
2,3,4
4,5,2
4,2,1
Thank you very much!
Are you looking for something like this:-
sed -e "s/ /,/g" < a.txt
or may be easier like this:
tr ' ' ',' <input >output
or in Vim you can use the Regex:
s/ /,/g
The question asks "line by line". In bash :
while read line; do echo $line | sed 's/ /,/g'; done < file
It will read file line by line into line, print (echo) each line and pipe (|) it to sed which will change spaces into commas. You can add > newfile at the end (but > file won't work) if you need to store it in a file.
But if you don't need anything else than changing characters in the file, processing the whole file at once is easier and probably quicker :
sed -i 's/ /,/g' file
(option -i is for modifying the file directly, as opposed to print modifications to stdout).
Read more about sed to understand its syntax, you'll need it eventually.

Resources