Linux command to copy file contents one word per line into a new file created from the same command - linux

I need to copy the file contents one word per line into a new file created from the same command.
This is the file:
Never
gonna
give you up
Never
gonna
let you down
Never
gonna
turn around and desert you
Command already tried:
cat a6q4-input.txt >> a6q4-pt2.txt
This copies the file to newly created file a6q4-pt2.txt in exact format however does not changes the layout of the word content to one word per line.

Perl one-liner:
perl -lne 'print for split' a6q4-input.txt > a6q4-pt2.txt
For each line of the files given on the command line (-n), split it up into words based on whitespace with split, and print each one on its own line with print. -l adds a newline after each print (And strips them from each line when read).

You can use tr to replace spaces with newlines:
tr ' ' '\n' < input.txt > output.txt

Use xargs:
cat in_file | xargs -n1 > out_file
Or use a Perl one-liner:
perl -lane 'print for #F' in_file > out_file
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array #F on whitespace or on the regex specified in -F option.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Related

Delete everything after pattern including pattern

I have a text file like
some
important
content
goes here
---from here--
some
unwanted content
I am trying to delete all lines after ---from here-- including ---from here--. That is, the desired output is
some
important
content
goes here
I tried sed '1,/---from here--/!d' input.txt but it's not removing the ---from here-- part. If I use sed '/---from here--.*/d' input.txt, it's only removing ---from here-- text.
How can I remove lines after a pattern including that pattern?
EDIT
I can achieve it by doing the first operation and pipe its output to second, like sed '1,/---from here--/!d' input.txt | sed '/---from here--.*/d' > outputput.txt.
Is there a single step solution?
Another approach with sed:
sed '/---from here--/,$d' file
The d(delete) command is applied to all lines from first line containing ---from here-- up to the end of file($)
Another awk approach:
awk '/---from here--/{exit}1' file
If you have GNU awk 4.1.0+, you can add -i inplace to change the file in-place.
Otherwise appened | tee file to change the file in-place.
I'm not positive, but I believe this will work:
sed -n '/---from here--/q; p' file
The q command tells sed to quit processing input lines after matching a given line.
Could you please try following(in case you are ok with awk).
awk '/--from here--/{found_from=1} !found_from{print}' Input_file
You can try Perl
perl -ne ' $x++ if /---from here--/; print if !$x '
using your inputs..
$ cat johnykutty.txt
some
important
content
goes here
---from here--
some
unwanted content
$ perl -ne ' $x++ if /---from here--/; print if !$x ' johnykutty.txt
some
important
content
goes here
$

Linux: Append variable to end of line using line number as variable

I am new to shell scripting. I am using ksh.
I have this particular line in my script which I use to append text in a variable q to the end of a particular line given by the variable a
containing the line number .
sed -i ''$a's#$#'"$q"'#' test.txt
Now the variable q can contain a large amount of text, with all sorts of special characters, such as !##$%^&*()_+:"<>.,/;'[]= etc etc, no exceptions. For now, I use a couple of sed commands in my script to remove any ' and " in this text (sed "s/'/ /g" | sed 's/"/ /g'), but still when I execute the above command I get the following error
sed: -e expression #1, char 168: unterminated `s' command
Any sed, awk, perl, suggestions are very much appreciated
The difficulty here is to quote (escape) the substitution separator characters # in the sed command:
sed -i ''$a's#$#'"$q"'#' test.txt
For example, if q contains # it will not work. The # will terminate the replacement pattern prematurely. Example: q='a#b', a=2, and the command expands to
sed -i 2s#$#a#b# test.txt
which will not append a#b to the end of line 2, but rather a#.
This can be solved by escaping the # characters in q:
sed -i 2s#$#a\#b# test.txt
However, this escaping could be cumbersome to do in shell.
Another approach is to use another level of indirection. Here is an example of using a Perl one-liner. First q is passed to the script in quoted form. Then, within the script the variable assigned to a new internal variable $q. Using this approach there is no need to escape the substitution separator characters:
perl -pi -E 'BEGIN {$q = shift; $a = shift} s/$/$q/ if $. == $a' "$q" "$a" test.txt
Do not bother trying to sanitize the string. Just put it in a file, and use sed's r command to read it in:
echo "$q" > tmpfile
sed -i -e ${a}rtmpfile test.txt
Ah, but that creates an extra newline that you don't want. You can remove it with:
sed -e ${a}rtmpfile test.txt | awk 'NR=='$a'{printf $0; next}1' > output
Another approach is to use the patch utility if present in your system.
patch test.txt <<-EOF
${a}c
$(sed "${a}q;d" test.txt)$q
.
EOF
${a}c will be replaced with the line number followed by c which means the operation is a change in line ${a}.
The second line is the replacement of the change. This is the concatenated value of the original text and the added text.
The sole . means execute the commands.

How can I remove the last character of a file in unix?

Say I have some arbitrary multi-line text file:
sometext
moretext
lastline
How can I remove only the last character (the e, not the newline or null) of the file without making the text file invalid?
A simpler approach (outputs to stdout, doesn't update the input file):
sed '$ s/.$//' somefile
$ is a Sed address that matches the last input line only, thus causing the following function call (s/.$//) to be executed on the last line only.
s/.$// replaces the last character on the (in this case last) line with an empty string; i.e., effectively removes the last char. (before the newline) on the line.
. matches any character on the line, and following it with $ anchors the match to the end of the line; note how the use of $ in this regular expression is conceptually related, but technically distinct from the previous use of $ as a Sed address.
Example with stdin input (assumes Bash, Ksh, or Zsh):
$ sed '$ s/.$//' <<< $'line one\nline two'
line one
line tw
To update the input file too (do not use if the input file is a symlink):
sed -i '$ s/.$//' somefile
Note:
On macOS, you'd have to use -i '' instead of just -i; for an overview of the pitfalls associated with -i, see the bottom half of this answer.
If you need to process very large input files and/or performance / disk usage are a concern and you're using GNU utilities (Linux), see ImHere's helpful answer.
truncate
truncate -s-1 file
Removes one (-1) character from the end of the same file. Exactly as a >> will append to the same file.
The problem with this approach is that it doesn't retain a trailing newline if it existed.
The solution is:
if [ -n "$(tail -c1 file)" ] # if the file has not a trailing new line.
then
truncate -s-1 file # remove one char as the question request.
else
truncate -s-2 file # remove the last two characters
echo "" >> file # add the trailing new line back
fi
This works because tail takes the last byte (not char).
It takes almost no time even with big files.
Why not sed
The problem with a sed solution like sed '$ s/.$//' file is that it reads the whole file first (taking a long time with large files), then you need a temporary file (of the same size as the original):
sed '$ s/.$//' file > tempfile
rm file; mv tempfile file
And then move the tempfile to replace the file.
Here's another using ex, which I find not as cryptic as the sed solution:
printf '%s\n' '$' 's/.$//' wq | ex somefile
The $ goes to the last line, the s deletes the last character, and wq is the well known (to vi users) write+quit.
After a whole bunch of playing around with different strategies (and avoiding sed -i or perl), the best way i found to do this was with:
sed '$! { P; D; }; s/.$//' somefile
If the goal is to remove the last character in the last line, this awk should do:
awk '{a[NR]=$0} END {for (i=1;i<NR;i++) print a[i];sub(/.$/,"",a[NR]);print a[NR]}' file
sometext
moretext
lastlin
It store all data into an array, then print it out and change last line.
Just a remark: sed will temporarily remove the file.
So if you are tailing the file, you'll get a "No such file or directory" warning until you reissue the tail command.
EDITED ANSWER
I created a script and put your text inside on my Desktop. this test file is saved as "old_file.txt"
sometext
moretext
lastline
Afterwards I wrote a small script to take the old file and eliminate the last character in the last line
#!/bin/bash
no_of_new_line_characters=`wc '/root/Desktop/old_file.txt'|cut -d ' ' -f2`
let "no_of_lines=no_of_new_line_characters+1"
sed -n 1,"$no_of_new_line_characters"p '/root/Desktop/old_file.txt' > '/root/Desktop/my_new_file'
sed -n "$no_of_lines","$no_of_lines"p '/root/Desktop/old_file.txt'|sed 's/.$//g' >> '/root/Desktop/my_new_file'
opening the new_file I created, showed the output as follows:
sometext
moretext
lastlin
I apologize for my previous answer (wasn't reading carefully)
sed 's/.$//' filename | tee newFilename
This should do your job.
A couple perl solutions, for comparison/reference:
(echo 1a; echo 2b) | perl -e '$_=join("",<>); s/.$//; print'
(echo 1a; echo 2b) | perl -e 'while(<>){ if(eof) {s/.$//}; print }'
I find the first read-whole-file-into-memory approach can be generally quite useful (less so for this particular problem). You can now do regex's which span multiple lines, for example to combine every 3 lines of a certain format into 1 summary line.
For this problem, truncate would be faster and the sed version is shorter to type. Note that truncate requires a file to operate on, not a stream. Normally I find sed to lack the power of perl and I much prefer the extended-regex / perl-regex syntax. But this problem has a nice sed solution.

linux command for finding a substring and moving it to the end of line

I need to read a file line by line in Linux, find a substring in each line, remove it and place it at the end of that line.
Example:
Line in the original file:
a,b,c,substring,d,e,f
Line in the output file:
a,b,c,d,e,f,substring
How do I do it with the Linux command? Thanks!
sed '/substring/{ s///; s/$/substring/;} '
will handle a fixed substring. Note that if substring begins with a ,, this handles your example case well. If the substring is not fixed but may be a general regular expression:
sed 's/\(substring\)\(.*\)/\2\1'
If you are looking for general csv parsing, you should rephrase the question. (It will be difficult to apply this solution to find a fixed string at the start of a line if you are thinking of the input as comma separated fields.)
I always prefer to use perl's command line to do such regex tasks - perl is powerful enough to cover awk and sed in most of my usages, and both available in windows and linux, it is just easy and handy to me, so the solution in perl would be like:
perl -ne "s/^(.*?)(?:(?<comma>,)(?<substr>substring)|(?<substr>substring)(?<comma>,))(?<right>.*)$/$1$+{right}$+{comma}$+{substr}/; print" input.txt > output.txt
or a simpler one:
perl -lpe "if(s/(,substring|substring,)//){ s/$/,substring/ }" input.txt > output.txt
input.txt
substring,a,b,c,d,e,f
a,b,c,substring,d,e,f
a,b,c,d,e,f,substring
substring,a
a,substring
substring
a
output.txt
a,b,c,d,e,f,substring
a,b,c,d,e,f,substring
a,b,c,d,e,f,substring
a,substring
a,substring
substring
a
You can edit based on your actual input:
If there are any space between words and commas
If you are using tab as separator
Some explanation of the command line:
use perl's -n -e options: -n means process the input line by line in a loop; -e means one line program in the command line
use perl's -l -p options: -l means process multilines; -p means always print
The one line program is just a regex replacement and a print
(?:pattern) means group but don't capture the match
(?<comma>) is a named group, you then need to use $+{comma} hash to access it

Add a prefix string to beginning of each line

I have a file as below:
line1
line2
line3
And I want to get:
prefixline1
prefixline2
prefixline3
I could write a Ruby script, but it is better if I do not need to.
prefix will contain /. It is a path, /opt/workdir/ for example.
# If you want to edit the file in-place
sed -i -e 's/^/prefix/' file
# If you want to create a new file
sed -e 's/^/prefix/' file > file.new
If prefix contains /, you can use any other character not in prefix, or
escape the /, so the sed command becomes
's#^#/opt/workdir#'
# or
's/^/\/opt\/workdir/'
awk '$0="prefix"$0' file > new_file
In awk the default action is '{print $0}' (i.e. print the whole line), so the above is equivalent to:
awk '{print "prefix"$0}' file > new_file
With Perl (in place replacement):
perl -pi 's/^/prefix/' file
You can use Vim in Ex mode:
ex -sc '%s/^/prefix/|x' file
% select all lines
s replace
x save and close
If your prefix is a bit complicated, just put it in a variable:
prefix=path/to/file/
Then, you pass that variable and let awk deal with it:
awk -v prefix="$prefix" '{print prefix $0}' input_file.txt
Here is a hightly readable oneliner solution using the ts command from moreutils
$ cat file | ts prefix | tr -d ' '
And how it's derived step by step:
# Step 0. create the file
$ cat file
line1
line2
line3
# Step 1. add prefix to the beginning of each line
$ cat file | ts prefix
prefix line1
prefix line2
prefix line3
# Step 2. remove spaces in the middle
$ cat file | ts prefix | tr -d ' '
prefixline1
prefixline2
prefixline3
If you have Perl:
perl -pe 's/^/PREFIX/' input.file
Using & (the whole part of the input that was matched by the pattern”):
cat in.txt | sed -e "s/.*/prefix&/" > out.txt
OR using back references:
cat in.txt | sed -e "s/\(.*\)/prefix\1/" > out.txt
Using the shell:
#!/bin/bash
prefix="something"
file="file"
while read -r line
do
echo "${prefix}$line"
done <$file > newfile
mv newfile $file
While I don't think pierr had this concern, I needed a solution that would not delay output from the live "tail" of a file, since I wanted to monitor several alert logs simultaneously, prefixing each line with the name of its respective log.
Unfortunately, sed, cut, etc. introduced too much buffering and kept me from seeing the most current lines. Steven Penny's suggestion to use the -s option of nl was intriguing, and testing proved that it did not introduce the unwanted buffering that concerned me.
There were a couple of problems with using nl, though, related to the desire to strip out the unwanted line numbers (even if you don't care about the aesthetics of it, there may be cases where using the extra columns would be undesirable). First, using "cut" to strip out the numbers re-introduces the buffering problem, so it wrecks the solution. Second, using "-w1" doesn't help, since this does NOT restrict the line number to a single column - it just gets wider as more digits are needed.
It isn't pretty if you want to capture this elsewhere, but since that's exactly what I didn't need to do (everything was being written to log files already, I just wanted to watch several at once in real time), the best way to lose the line numbers and have only my prefix was to start the -s string with a carriage return (CR or ^M or Ctrl-M). So for example:
#!/bin/ksh
# Monitor the widget, framas, and dweezil
# log files until the operator hits <enter>
# to end monitoring.
PGRP=$$
for LOGFILE in widget framas dweezil
do
(
tail -f $LOGFILE 2>&1 |
nl -s"^M${LOGFILE}> "
) &
sleep 1
done
read KILLEM
kill -- -${PGRP}
Using ed:
ed infile <<'EOE'
,s/^/prefix/
wq
EOE
This substitutes, for each line (,), the beginning of the line (^) with prefix. wq saves and exits.
If the replacement string contains a slash, we can use a different delimiter for s instead:
ed infile <<'EOE'
,s#^#/opt/workdir/#
wq
EOE
I've quoted the here-doc delimiter EOE ("end of ed") to prevent parameter expansion. In this example, it would work unquoted as well, but it's good practice to prevent surprises if you ever have a $ in your ed script.
Here's a wrapped up example using the sed approach from this answer:
$ cat /path/to/some/file | prefix_lines "WOW: "
WOW: some text
WOW: another line
WOW: more text
prefix_lines
function show_help()
{
IT=$(CAT <<EOF
Usage: PREFIX {FILE}
e.g.
cat /path/to/file | prefix_lines "WOW: "
WOW: some text
WOW: another line
WOW: more text
)
echo "$IT"
exit
}
# Require a prefix
if [ -z "$1" ]
then
show_help
fi
# Check if input is from stdin or a file
FILE=$2
if [ -z "$2" ]
then
# If no stdin exists
if [ -t 0 ]; then
show_help
fi
FILE=/dev/stdin
fi
# Now prefix the output
PREFIX=$1
sed -e "s/^/$PREFIX/" $FILE
You can also achieve this using the backreference technique
sed -i.bak 's/\(.*\)/prefix\1/' foo.txt
You can also use with awk like this
awk '{print "prefix"$0}' foo.txt > tmp && mv tmp foo.txt
Using Pythonize (pz):
pz '"preix"+s' <filename
Simple solution using a for loop on the command line with bash:
for i in $(cat yourfile.txt); do echo "prefix$i"; done
Save the output to a file:
for i in $(cat yourfile.txt); do echo "prefix$i"; done > yourfilewithprefixes.txt
You can do it using AWK
echo example| awk '{print "prefix"$0}'
or
awk '{print "prefix"$0}' file.txt > output.txt
For suffix: awk '{print $0"suffix"}'
For prefix and suffix: awk '{print "prefix"$0"suffix"}'
For people on BSD/OSX systems there's utility called lam, short for laminate. lam -s prefix file will do what you want. I use it in pipelines, eg:
find -type f -exec lam -s "{}: " "{}" \; | fzf
...which will find all files, exec lam on each of them, giving each file a prefix of its own filename. (And pump the output to fzf for searching.)
If you need to prepend a text at the beginning of each line that has a certain string, try following. In the following example, I am adding # at the beginning of each line that has the word "rock" in it.
sed -i -e 's/^.*rock.*/#&/' file_name
SETLOCAL ENABLEDELAYEDEXPANSION
YourPrefix=blabla
YourPath=C:\path
for /f "tokens=*" %%a in (!YourPath!\longfile.csv) do (echo !YourPrefix!%%a) >> !YourPath!\Archive\output.csv

Resources