How to read lines and create files removing https:// - linux

I want a bash script to read a file with n line and do the following
1.Pass the line as input to a python file ( Its done, no issues )
2.Create a file and redirect the output of the python
for line in $(cat file.txt)
do
touch $line-links
python file.py $line > $line-links
done
the problem is file.txt consists of links in the form of https://www.example.com.
And when I execute the bash, it's throwing an error
touch: https://www.example.com-links.txt: No such file or directory
I realized I have to remove the https:// portion, but how to create the file removing https:// from the line

You can use a simple variable expansion.
See:
for line in $(cat file.txt)
do
url=${line##*//}
touch ${url}-links
python file.py $ > ${url}-links
done

You can use this python script. Save this as update_line.py and then python update_line.py. This will remove all the https:// in the file
fpath = 'file.txt'
with open(fpath, 'r+') as f:
lines = f.readlines()
f.seek(0)
f.truncate()
for line in lines:
line = line.replace('https://', '')
f.write(line)

for line in $(cat file.txt)
do
touch ${line-links/https:\/\//}
python file.py $line > ${line-links/https:\/\//}
done

You could use sed command to accomplish the task
for line in $(cat file.txt)
do
variable=$( echo 'https://www.example.com' | sed 's/https\?:\/\///') | touch $variable // Using Sed to strip the https://
$python file.py $line > $line-links
done

You can use a while read loop and parameter expansion.
while IFS= read -r url; do
url=${url#*${url%%':'*}}
url=${url#*'//'}
> "$url"
python file.py "$url" > "${url}-links"
done < file.txt

I figured out the solution in a different way
for line in $(cat file.txt)
do
echo $line | cut -d"/" -f3 | grep "\S" | xargs touch
python file.py $line > $line
done
grep "\S" is used to remove blank lines , the output will be domain.com file created

There are many options, this is one of them :)
for line in $(cat file.txt)
do
domain=$( echo $line | awk -F '//' {'print $2'} )
touch "$domain-links"
$python file.py $line > $domain-links
done

Related

pipe then hyphen (stdin) as an alternative to for loop

I wrote a few sed an awk commands to extract a set of IDs that are associated with file names. I would like to run a set of commands using these filenames from id.txt
cat id.txt
14235.gz
41231.gz
41234.gz
I usually write for loops as follows:
for i in $(cat id.txt);
do
command <options> $i
done
I thought I could also do cat id.txt | command <options> -
Is there a way to pipe the output of cat, awk, sed, etc, line by line into a command?
Use a while read loop see Don't read lines wit for
while IFS= read -r line_in_text_file; do
echo "$line_in_text_file"
done < id.txt
Commands don't usually get their filename arguments on standard input. Using - as an argument means to read the file contents from standard input instead of a named file, it doesn't mean to get the filename from stdin.
You can use command substitution to use the contents of the file as all the filename arguments to the command:
command <options> $(cat id.txt)
or you can use xargs
xargs command <options> < id.txt
Is there a way to pipe the output of cat, awk, sed, etc, line by line into a command?
Compound commands can be placed in a pipe, the syntax is not very strict. The usual:
awk 'some awk script' |
while IFS= read -r line; do
echo "$line"
done |
sed 'some sed script'
I avoid reading input line by line using a while read - it's very slow. It's way faster to use awk scripts and other commands.
Command groups can be used to:
awk 'some awk script' |
{ # or '(', but there is no need for a subshell
echo "header1,header2"
# remove first line
IFS= read -r first_line
# ignore last line
sed '$d'
} |
sed 'some sed script'
Remember that pipe command are run in a subshell, so variable changes will not affect parent shell.
Bash has process substitution extension that let's you run a while loop inside parent shell:
var=1
while IFS= read -r line; do
if [[ "$line" == 2 ]]; then
var=2
fi
done <(
seq 10 |
sed '$d'
)
echo "$var" # will output 2
xargs can do this
cat id.txt | xargs command
From xargs help
$ xargs --help
Usage: xargs [OPTION]... COMMAND [INITIAL-ARGS]...
Run COMMAND with arguments INITIAL-ARGS and more arguments read from input.
Mandatory and optional arguments to long options are also
mandatory or optional for the corresponding short option.
-0, --null items are separated by a null, not whitespace;
disables quote and backslash processing and
logical EOF processing
-a, --arg-file=FILE read arguments from FILE, not standard input
-d, --delimiter=CHARACTER items in input stream are separated by CHARACTER,
not by whitespace; disables quote and backslash
...

Unwanted line break using echo and cat

I'm trying to add a line at the beginning of a file, using
echo 'time/F:x1:x2' | cat - file.txt>newfile.txt
But this produces line breaks at each line in the new file (except for after the added 'time/F:x1:x2' line). Any ideas on how to avoid this?
Use -n to disable the trailing newline:
echo -n 'time/F:x1:x2' | cat - file.txt > newfile.txt
There are other ways, too:
sed '1s|^|time/F:x1:x2|' file.txt > newfile.txt
How about
{ echo 'time/F:x1:x2'; cat file.txt; } >newfile.txt
or
sed '1i\
time/F:x1:x2' file.txt > newfile.txt
Actually you don't even need the echo and pipe if you're using bash. Just use a herestring:
<<< 'time/F:x1:x2' cat - file.txt > newfile.txt

Bash script to remove last three charater in a file name

For ex the file is this:
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00
I want to rename this file to:
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN
Using ${parameter%word} (Remove matching suffix pattern):
$ echo "$fn"
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00
$ echo "${fn%:*}"
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN
Using cut
$ echo $fn
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00
$ echo $fn |cut -d: -f1
NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN
Using awk
echo $fn |awk -F : '{print $1}'
more ways...
According to the link here:
This should work:
awk '{old=$0;gsub(/...$/,"",$0);system("mv \""old"\" "$0)}'
provided the file name is given as input.
For eg:
ls -1 NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00|nawk '{old=$0;gsub(/...$/,"",$0);system("mv \""old"\" "$0)}'
Rename file using bash string manipulations:
# Filename needs to be in a variable
file=NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00
# Rename file
mv "$file" "${file%???}"
This removes the last three characters from filename.
Using just bash:
fn='NBDG6_CDRCCN_4004_-TTNBDG6_CCN_51-140108-1433-802580.00.Blk32768Blk.CCN:00'
mv "$fn" "${fn::-3}"
if you have Ruby
echo NBDG6_CD* | ruby -e 'f=gets.chomp;File.rename(f, f[0..-4])'

save the output of a bash file

i have some files in a folder, and i need the first line of each folder
transaction1.csv
transaction2.csv
transaction3.csv
transaction4.csv
and i have the next code
#All folders that begin with the word transaction
folder='"transaction*"'
ls `echo $folder |sed s/"\""/\/g` >testFiles
# The number of lines of testFiles that is the number of transaction files
num=`cat testFiles | wc -l`
for i in `seq 1 $num`
do
#The first transaction file
b=`cat testFiles | head -1`
#The first line of the first transaction file
cat `echo $b` | sed -n 1p
#remove the first line of the testFiles
sed -i '1d' testFiles
done
This code works, the problem is that i need save the first line of each file in a file
and if i change the line:
cat `echo $b` | sed -n 1p > salida
it not works =(
In bash:
for file in *.csv; do head -1 "$file" >> salida; done
As Adam mentioned in the comment this has an overhead of opening the file each time through the loop. If you need better performance and reliability use the following:
for file in *.csv; do head -1 "$file" ; done > salida
head -qn1 *.csv
head -n1 will print the first line of each file, and -q will suppress the header when more than one file is given on the command-line.
=== Edit ===
If the files are not raw text (for example, if they're compressed with "bzip2" as mentinoned in your comment) and you need to do some nontrivial preprocessing on each file, you're probably best off going with a for loop. For example:
for f in *.csv.bz2 ; do
bzcat "$f" | head -n1
done > salida
(Another option would be to bunzip2 the files and then head them in two steps, such as bunzip2 *.csv.bz2 && head -qn1 *.csv > salida; however, this will of course change the files in place by decompressing them, which is probably undesirable.)
this awk one-liner should do what you want:
awk 'FNR==1{print > "output"}' *.csv
the first line of each csv will be saved into file: output
Using sed:
for f in *.csv; do sed -n "1p" "$f"; done >salida

How can I prepend a string to the beginning of each line in a file?

I have the following bash code which loops through a text file, line by line .. im trying to prefix the work 'prefix' to each line but instead am getting this error:
rob#laptop:~/Desktop$ ./appendToFile.sh stusers.txt kp
stusers.txt
kp
./appendToFile.sh: line 11: /bin/sed: Argument list too long
115000_210org#house.com,passw0rd
This is the bash script ..
#!/bin/bash
file=$1
string=$2
echo "$file"
echo "$string"
for line in `cat $file`
do
sed -e 's/^/prefix/' $line
echo "$line"
done < $file
What am i doing wrong here?
Update:
Performing head on file dumps all the lines onto a single line of the terminal, probably related?
rob#laptop:~/Desktop$ head stusers.txt
rob#laptop:~/Desktop$ ouse.com,passw0rd
a one-line awk command should do the trick also:
awk '{print "prefix" $0}' file
Concerning your original error:
./appendToFile.sh: line 11: /bin/sed: Argument list too long
The problem is with this line of code:
sed -e 's/^/prefix/' $line
$line in this context is file name that sed is running against. To correct your code you should fix this line as such:
echo $line | sed -e 's/^/prefix/'
(Also note that your original code should not have the < $file at the end.)
William Pursell addresses this issue correctly in both of his suggestions.
However, I believe you have correctly identified that there is an issue with your original text file. dos2unix will not correct this issue, as it only strips the carriage returns Windows sticks on the end of lines. (However, if you are attempting to read a Linux file in Windows, you would get a mammoth line with no returns.)
Assuming that it is not an issue with the end of line characters in your text file, William Pursell's, Andy Lester's, or nullrevolution's answers will work.
A variation on the while read... suggestion:
while read -r line; do echo "PREFIX " $line; done < $file
This could be run directly from the shell (no need for a batch / script file):
while read -r line; do echo "kp" $line; done < stusers.txt
The entire loop can be replaced by a single sed command that operates on the entire file:
sed -e 's/^/prefix/' $file
A Perl way to do it would be:
perl -p -e's/^/prefix' filename
or
perl -p -e'$_ = "prefix $_"' filename
In either case, that reads from filename and prints the prefixed lines to STDOUT.
If you add a -i flag, then Perl will modify the file in place. You can also specify multiple filenames and Perl will magically do all of them.
Instead of the for loop, it is more appropriate to use while read...:
while read -r line; do
do
echo "$line" | sed -e 's/^/prefix/'
done < $file
But you would be much better off with the simpler:
sed -e 's/^/prefix/' $file
Use sed. Just change the word prefix.
sed -e 's/^/prefix/' file.ext
If you want to save the output in another file
sed -e 's/^/prefix/' file.ext > file_new.ext
You don't need sed, just concatenate the strings in the echo command
while IFS= read -r line; do
echo "prefix$line"
done < filename
Your loop iterates over each word in the file:
for line in `cat file`; ...
sed -i '1a\
Your Text' file1 file2 file3
A solution without sed/awk and while loops:
xargs -n1 printf "$prefix%s\n" < "$file"

Resources