Split one file into multiple files based on delimiter

Split one file into multiple files based on delimiter - linux

I have one file with -| as delimiter after each section...need to create separate files for each section using unix.
example of input file
wertretr
ewretrtret
1212132323
000232
-|
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|
Expected result in File 1
wertretr
ewretrtret
1212132323
000232
-|
Expected result in File 2
ereteertetet
232434234
erewesdfsfsfs
0234342343
-|
Expected result in File 3
jdhg3875jdfsgfd
sjdhfdbfjds
347674657435
-|

A one liner, no programming. (except the regexp etc.)
csplit --digits=2 --quiet --prefix=outfile infile "/-|/+1" "{*}"
tested on:
csplit (GNU coreutils) 8.30
Notes about usage on Apple Mac
"For OS X users, note that the version of csplit that comes with the OS doesn't work. You'll want the version in coreutils (installable via Homebrew), which is called gcsplit." — #Danial
"Just to add, you can get the version for OS X to work (at least with High Sierra). You just need to tweak the args a bit csplit -k -f=outfile infile "/-\|/+1" "{3}". Features that don't seem to work are the "{*}", I had to be specific on the number of separators, and needed to add -k to avoid it deleting all outfiles if it can't find a final separator. Also if you want --digits, you need to use -n instead." — #Pebbl

awk '{f="file" NR; print $0 " -|"> f}' RS='-\\|' input-file
Explanation (edited):
RS is the record separator, and this solution uses a gnu awk extension which allows it to be more than one character. NR is the record number.
The print statement prints a record followed by " -|" into a file that contains the record number in its name.

Debian has csplit, but I don't know if that's common to all/most/other distributions. If not, though, it shouldn't be too hard to track down the source and compile it...

I solved a slightly different problem, where the file contains a line with the name where the text that follows should go. This perl code does the trick for me:
#!/path/to/perl -w
#comment the line below for UNIX systems
use Win32::Clipboard;
# Get command line flags
#print ($#ARGV, "\n");
if($#ARGV == 0) {
print STDERR "usage: ncsplit.pl --mff -- filename.txt [...] \n\nNote that no space is allowed between the '--' and the related parameter.\n\nThe mff is found on a line followed by a filename. All of the contents of filename.txt are written to that file until another mff is found.\n";
exit;
}
# this package sets the ARGV count variable to -1;
use Getopt::Long;
my $mff = "";
GetOptions('mff' => \$mff);
# set a default $mff variable
if ($mff eq "") {$mff = "-#-"};
print ("using file switch=", $mff, "\n\n");
while($_ = shift #ARGV) {
if(-f "$_") {
push #filelist, $_;
}
}
# Could be more than one file name on the command line,
# but this version throws away the subsequent ones.
$readfile = $filelist[0];
open SOURCEFILE, "<$readfile" or die "File not found...\n\n";
#print SOURCEFILE;
while (<SOURCEFILE>) {
/^$mff (.*$)/o;
$outname = $1;
# print $outname;
# print "right is: $1 \n";
if (/^$mff /) {
open OUTFILE, ">$outname" ;
print "opened $outname\n";
}
else {print OUTFILE "$_"};
}

The following command works for me. Hope it helps.
awk 'BEGIN{file = 0; filename = "output_" file ".txt"}
/-|/ {getline; file ++; filename = "output_" file ".txt"}
{print $0 > filename}' input

You can also use awk. I'm not very familiar with awk, but the following did seem to work for me. It generated part1.txt, part2.txt, part3.txt, and part4.txt. Do note, that the last partn.txt file that this generates is empty. I'm not sure how fix that, but I'm sure it could be done with a little tweaking. Any suggestions anyone?
awk_pattern file:
BEGIN{ fn = "part1.txt"; n = 1 }
{
print > fn
if (substr($0,1,2) == "-|") {
close (fn)
n++
fn = "part" n ".txt"
}
}
bash command:
awk -f awk_pattern input.file

Here's a Python 3 script that splits a file into multiple files based on a filename provided by the delimiters. Example input file:
# Ignored
######## FILTER BEGIN foo.conf
This goes in foo.conf.
######## FILTER END
# Ignored
######## FILTER BEGIN bar.conf
This goes in bar.conf.
######## FILTER END
Here's the script:
#!/usr/bin/env python3
import os
import argparse
# global settings
start_delimiter = '######## FILTER BEGIN'
end_delimiter = '######## FILTER END'
# parse command line arguments
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input-file", required=True, help="input filename")
parser.add_argument("-o", "--output-dir", required=True, help="output directory")
args = parser.parse_args()
# read the input file
with open(args.input_file, 'r') as input_file:
input_data = input_file.read()
# iterate through the input data by line
input_lines = input_data.splitlines()
while input_lines:
# discard lines until the next start delimiter
while input_lines and not input_lines[0].startswith(start_delimiter):
input_lines.pop(0)
# corner case: no delimiter found and no more lines left
if not input_lines:
break
# extract the output filename from the start delimiter
output_filename = input_lines.pop(0).replace(start_delimiter, "").strip()
output_path = os.path.join(args.output_dir, output_filename)
# open the output file
print("extracting file: {0}".format(output_path))
with open(output_path, 'w') as output_file:
# while we have lines left and they don't match the end delimiter
while input_lines and not input_lines[0].startswith(end_delimiter):
output_file.write("{0}\n".format(input_lines.pop(0)))
# remove end delimiter if present
if not input_lines:
input_lines.pop(0)
Finally here's how you run it:
$ python3 script.py -i input-file.txt -o ./output-folder/

Use csplit if you have it.
If you don't, but you have Python... don't use Perl.
Lazy reading of the file
Your file may be too large to hold in memory all at once - reading line by line may be preferable. Assume the input file is named "samplein":
$ python3 -c "from itertools import count
with open('samplein') as file:
for i in count():
firstline = next(file, None)
if firstline is None:
break
with open(f'out{i}', 'w') as out:
out.write(firstline)
for line in file:
out.write(line)
if line == '-|\n':
break"

cat file| ( I=0; echo -n "">file0; while read line; do echo $line >> file$I; if [ "$line" == '-|' ]; then I=$[I+1]; echo -n "" > file$I; fi; done )
and the formated version:
#!/bin/bash
cat FILE | (
I=0;
echo -n"">file0;
while read line;
do
echo $line >> file$I;
if [ "$line" == '-|' ];
then I=$[I+1];
echo -n "" > file$I;
fi;
done;
)

This is the sort of problem I wrote context-split for:
http://stromberg.dnsalias.org/~strombrg/context-split.html
$ ./context-split -h
usage:
./context-split [-s separator] [-n name] [-z length]
-s specifies what regex should separate output files
-n specifies how output files are named (default: numeric
-z specifies how long numbered filenames (if any) should be
-i include line containing separator in output files
operations are always performed on stdin

Here is a perl code that will do the thing
#!/usr/bin/perl
open(FI,"file.txt") or die "Input file not found";
$cur=0;
open(FO,">res.$cur.txt") or die "Cannot open output file $cur";
while(<FI>)
{
print FO $_;
if(/^-\|/)
{
close(FO);
$cur++;
open(FO,">res.$cur.txt") or die "Cannot open output file $cur"
}
}
close(FO);

Try this python script:
import os
import argparse
delimiter = '-|'
parser = argparse.ArgumentParser()
parser.add_argument("-i", "--input-file", required=True, help="input txt")
parser.add_argument("-o", "--output-dir", required=True, help="output directory")
args = parser.parse_args()
counter = 1;
output_filename = 'part-'+str(counter)
with open(args.input_file, 'r') as input_file:
for line in input_file.read().split('\n'):
if delimiter in line:
counter = counter+1
output_filename = 'part-'+str(counter)
print('Section '+str(counter)+' Started')
else:
#skips empty lines (change the condition if you want empty lines too)
if line.strip() :
output_path = os.path.join(args.output_dir, output_filename+'.txt')
with open(output_path, 'a') as output_file:
output_file.write("{0}\n".format(line))
ex:
python split.py -i ./to-split.txt -o ./output-dir

Related

How to remove values from text file from different position

I have a file containing different values:
30,-4,098511E-02
30,05,-4,098511E-02
41,9,15,54288
I need to remove values from this file but from different position, for example:
30
30,05
41,9
I tried to do it with sed to remove the last value but my problem is when I encounter the 41,9,15,54288 it does not work. Any idea if there is a way to do it?
I tried this
echo "30,-4,098511E-02" | sed 's/,.*/,/'

Using sed
$ sed -E 's/(([0-9]+,?){1,2}),[0-9-].*/\1/' input_file
30
30,05
41,9

I would do it using perl, like this:
#!/usr/bin/perl
use strict;
use warnings;
# my $inputPath = '/Users/myuser/Desktop/inputs/a.txt';
# my $outputPath = '/Users/myuser/Desktop/outputs/a_result.txt';
if ($inputPath eq "") {
print "Enter the full path of your input file: ";
$inputPath = <STDIN>;
chomp $inputPath;
}
if ($outputPath eq "") {
print "Enter the full path of your input file: ";
$outputPath = <STDIN>;
chomp $outputPath;
}
open my $info, $inputPath or die "Could not open $inputPath: $!";
open FH, '>', $outputPath or die "Could not open $outputPath : $!";
while( my $line = <$info>) {
chomp $line;
# print "line read: $line\n";
# 30,05,-4,098511E-02
# [0-9]: begins with a digit
# 3
# [0-9]+: begins with two digits
# 30
# [0-9]+: begins with two digits and a comma
# [0-9]+?: begins with two digits and has or has not a comma
# [0-9]+?: begins with two digits and has or has not a comma
# 30,
# {1,2}: one or two times
# 30,05,
# [0-9-]: anything that is a digit, or a dash
# 30,05,-
# [0-9-].: anything that is a digit, or a dash and any character after that
# 30,05,-4
# *: Matches anything in the place of the *, or a "greedy" match (e.g. ab*c returns abc, abbcc, abcdc)
# 30,05,-4,098511E-02
if ($line =~ m{((([0-9]+,?){1,2}),[0-9-].*)}) {
# print "becomes: $1\n";
print FH "$1\n"; # Print to the file
} else {
print "not found!\n";
}
}
close $info;
I wrote the explanations of my regex in the comments of my code.

How do I move a line of text in a file to the top if it contains a word (need for multiple files)

I have a file structure as follows:
folder\spkgbuild
folder2\spkgbuild
folder3\spkgbuild
each spkgbuild has the following:
name=packagename
version=1.0
release=1
# depends : packages
# description : blah blah
build()
{
}
I want to move # description to the top of the file and # depends under it, like this:
# description : blah blah
# depends : packages
name=packagename
version=1.0
release=1
build()
{
}
Any idea how?

And a far from robust Linux one-liner:
find -type f -name spkgbuild -printf "%h\n" | xargs -L 1 bash -c 'cd $0 ; cat <(grep -xE "^# des.*" spkgbuild) <(grep -xE "^# dep.*" spkgbuild) <(grep -xvE "^# de.*" spkgbuild) > spkgbuild.1;mv spkgbuild.1 spkgbuild'
Explanation:
Find all files named spkgbuild
Pass the directory each lives in to xargs, cd into each direcotry containing the file respectively, use three greps to rearrange the lines, write the output to a temporary file (spkgbuild.1), mv that file to spkgbuild when done.

there it is, i feel like a should be paied for this though
filesToModify = ["add your path here", "and here", "and so on"]
linesToMove = ["# description :", "# depends :"] # you can add more
for fName in filesToModify:
content = ""
with open(fName, "r") as fl:
lines = []
for line in fl:
moved = False
for start in linesToMove:
if line.startswith(start):
lines.insert(0, line + ("\n" if not line.endswith("\n") else ""))
moved = True
break
if not moved:
lines.append(line)
content = "".join(lines)
with open(fName, "w") as fl:
fl.write(content)
then run i like python script.py.

How to format lines in a file by using shell language?

The purpose of the program is to make comments in the file begin in the same column.
if a line begins with ; then it doesn't change
if a line begins with code then ; the program should insert space before ; so it will start in the same column with the farthest ;
for example:
Before:
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
After：
; Also change "-f elf " for "-f elf64" in build command. # These two line don't change
; # because they start with ;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
I am a beginner in Linux and shell, so far I have got
echo "Enter the filename"
read name
cat $name | while read line;
do ....
Our teacher told us that we should use two while loop;
Record the longest length before; in the first loop and do the changes in the second while loop.
for now I don't know how to use awk or sed to find the longest length before;
Any ideas?

Here is the solution, assuming that comments in your file begin with the first semi-colon (;) that is not inside a string:
$ cat tst.awk
BEGIN{ ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }
{
nostrings = ""
tail = $0
while ( match(tail,/'[^']*'/) ) {
nostrings = nostrings substr(tail,1,RSTART-1) sprintf("%*s",RLENGTH,"")
tail = substr(tail,RSTART+RLENGTH)
}
nostrings = nostrings tail
cur = index(nostrings,";")
}
NR==FNR { max = (cur > max ? cur : max); next }
cur > 1 { $0 = sprintf("%-*s%s", max-1, substr($0,1,cur-1), substr($0,cur)) }
{ print }
.
$ awk -f tst.awk file
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello; world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
and below is how you get to it from a naive starting point (I added a semi-colon inside your Hello World! string for testing - make sure to verify all suggested solutions using that).
Note that the above DOES contain 2 loops on the input as your teacher suggests, but you do not need to manually write them as awk provides the loops for you each time it reads the file. If your input file contains tabs or similar then you need to remove them in advance, e.g. by using pr -e -t.
Here is how you get to the above:
If you cannot have semi-colons in other contexts than as the start of comments then all you need is:
$ cat tst.awk
{ cur = index($0,";") }
NR==FNR { max = (cur > max ? cur : max); next }
cur > 1 { $0 = sprintf("%-*s%s", max-1, substr($0,1,cur-1), substr($0,cur)) }
{ print }
which you'd execute as awk -f tst.awk file file (yes, specify your input file twice).
If your code can contain semi-colons in contexts that are not the start of a comment, e.g. in the middle of a string, then you need to tell us how we can identify semi-colons in comment-start vs other contexts but if it can ONLY appear between singe quotes in strings, e.g. the ; inside 'Hello; World!' below:
$ cat file
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello; world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
then this is all you need to replace every string with a series of blank chars before finding the first semi-colon (which is then presumably the start of a comment):
$ cat tst.awk
{
nostrings = ""
tail = $0
while ( match(tail,/'[^']*'/) ) {
nostrings = nostrings substr(tail,1,RSTART-1) sprintf("%*s",RLENGTH,"")
tail = substr(tail,RSTART+RLENGTH)
}
nostrings = nostrings tail
cur = index(nostrings,";")
}
...the rest as before...
and finally if you don't want to specify the file name twice on the command line, just duplicate it's name in the ARGV[] array by adding this line at the top:
BEGIN{ ARGV[ARGC] = ARGV[ARGC-1]; ARGC++ }

There are a few printf tricks that make this a manageable project. Take a look at the following. The script formats the assembly file with the assembly code beginning at column 0 to code_width - 1 with the comments following at column code_width lined up after the code. The script is fairly well commented so you should be able to follow along.
The usage is:
bash nameofscript.sh input_file [code_width (default 46char)]
or if you make nameofscript.sh executable, then simply:
./nameofscript.sh input_file [code_width (default 46char)]
NOTE: this script requires Bash, if not run on bash, you may experience inconsistent results. If you have multiple embedded ; in each line, the first will be considered the beginning of a comment. Let me know if you have questions.
#!/bin/bash
## basic function to trim (or stip) the leading & trailing whitespace from a variable
# passed to the fuction. Usage: VAR=$(trimws $VAR)
function trimws {
[ -z "$1" ] && return 1
local strln="${#1}"
[ "$strln" -lt 2 ] && return 1
local trimstr=$1
trimstr="${trimstr#"${trimstr%%[![:space:]]*}"}" # remove leading whitespace characters
trimstr="${trimstr%"${trimstr##*[![:space:]]}"}" # remove trailing whitespace characters
printf "%s" "$trimstr"
return 0
}
afn="$1" # input assembly filename
cwidth=${2:--46} # code field width (- is left justified)
[ "${cwidth:0:1}" = '-' ] || cwidth=-${cwidth} # make sure first char is '-'
[ -r "$afn" ] || { # validate input file is readable
printf "error: file not found: '%s'. Usage: %s <filename> [code_width (46 ch)]\n" "$afn" "${0//\//}"
exit 1
}
## loop through file splitting on ';'
while IFS=$';\n' read -r code comment || [ -n "$comment" ]; do
[ -n "$code" ] || { # if no '$code' comment only line
if [ -n "$comment" ]; then
printf ";%s\n" "$comment" # output the line unchanged
else
printf "\n" # it was a blank line to begin with
fi
continue # read next line
}
code=$(trimws "$code") # trim leading and trailing whitespace
comment=$(trimws "$comment") # same
printf "%*s ; %s\n" "$cwidth" "$code" "$comment" # output new format
done <"$afn"
exit 0
input:
$ cat dat/asmfile.txt
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)
output:
$ bash fmtasmcmt.sh
; Also change "-f elf " for "-f elf64" in build command.
;
section .data ; section for initialized data
str: db 'Hello world!', 0Ah ; message string with new-line char
; at the end (10 decimal)

So yeah, use a while loop to find the longest length, given your input in the local file input:
length=0
length2=0
while IFS= read -r -- i; do
(( ${#i} > length2 )) && length2=${#i}
i=${i/\;*/}
(( ${#i} > length )) && length=${#i}
done < ./input
(( length++ )); (( length2++ ))
In your next while loop, detect whether the line starts with ; using [[ ${i:0:1} = ';' ]] and output it, or format the output with awk using the length you determined: awk -F\; -v len=$length '{ printf "%-"len"s %-40s\n", $1, $2}'. Check here (http://www.unix.com/shell-programming-scripting/117543-formatting-output-columns.html) for more info on column formatting.
Edit: In case you didn't figure it out, the second loop looks like:
while IFS= read -r -- i; do
# echo the original if the line starts with ';'
[[ ${i:0:1} = ';' ]] && echo "$i" && continue
# column formatting with awk
(echo "$i" | grep -q ';') && echo "$i" | awk -v len=$length -v len2=$length2 -F\; '{printf "%-"len"s %-"len2"s\n",$1,";"$2}' || echo "$i"
done < ./input
That will give you what you want for the output.

I think I'm going to use this example for my personal formatting!
#!/usr/bin/perl -s -0
use strict;
our ($com); # command line option
$com = ";" unless defined $com ;
my $max=0;
$_= <>; # slurp file
while( /\n(.+?)$com/g ){
$max=length($1) if length($1) > $max }
s/\n(.+?)$com/sprintf("\n%-$max"."s$com",$1)/ge;
print $_; # print file
usage: align_coms input (after chmod+install)
Options: -com=... to redefine comments (default = ; )
and you can try align_coms -com=# align_coms to align this scripts perl comments :)
Edit 1:
Please see the (wise) comment of #EdMorton about problems when the input has strings (or similar) containing comment starters.
Edit 2: The following version can deal with 'alo; word' "alo; word". It is still
not safe -- real languages have always some extra detail (ex '...\'...', multiline comments) but it is a little bit more robust...
#!/usr/bin/perl -s -0
use strict;
our ($com); # command line option
$com = ";" unless defined $com ;
my $nc=qr{ # no comment regex
( '[^'\n]*' # '....'
| "[^"\n]*" # "...."
| . # common chars
)+?
}x;
my $max=0;
$_= <>; # slurp file
while( /\n($nc)$com/g ){
$max=length($1) if length($1) > $max }
s/\n($nc)$com/sprintf("\n%-$max"."s$com",$1)/ge;
print $_; # print file

Replacing values of a file based on a conf file

I have a conf file which has the format of variable="value" where values may have special characters as well. An example line is:
LINE_D="(L#'id' == 'log') AND L#'id' IS NULL"
I have another file F which should replace values based on this conf file. For example, if there is line in F
PRINT '$LINE_D'
it should be replaced by
PRINT '(L#'id' == 'log') AND L#'id' IS NULL'
How can I a program in shell script which takes conf and F and generate the values in F replaced.
Thanks

Your definition of what's required leaves lots of gaps, so you'll probably need to tweak this script. It is a cut-down version of a more complex script originally designed to process makefiles. That means there is probably material you could remove from here without causing trouble, though I've gotten rid of most of the extraneous processing.
#!usr/bin/env perl
#
# Note: this script can take input from stdin or from one or more files.
# For example, either of the following will work:
# cat config file | setmacro
# setmacro file
use strict;
use warnings;
use Getopt::Std;
# Usage:
# -b -- omit blank lines
# -c -- omit comments
# -d -- debug mode (verbose)
# -e -- omit the environment
my %opt;
my %MACROS;
my $input_line;
die "Usage: $0 [-bcde] [file ...]" unless getopts('bcde', \%opt);
# Copy environment into hash for MAKE macros
%MACROS = %ENV unless $opt{e};
my $rx_macro = qr/\${?([A-Za-z]\w*)}?/; # Matches $PQR} but ideally shouldn't
# For each line in each file specified on the command line (or stdin by default)
while ($input_line = <>)
{
chomp $input_line;
do_line($input_line);
}
# Expand macros in given value
sub macro_expand
{
my($value) = #_;
print "-->> macro_expand: $value\n" if $opt{d};
while ($value =~ $rx_macro)
{
print "Found macro = $1\n" if $opt{d};
my($env) = $MACROS{$1};
$env = "" unless defined $env;
$value = $` . $env . $';
}
print "<<-- macro_expand: $value\n" if $opt{d};
return($value);
}
# routine to recognize macros
sub do_line
{
my($line) = #_;
if ($line =~ /^\s*$/o)
{
# Blank line
print "$line\n" unless $opt{b};
}
elsif ($line =~ /^\s*#/o)
{
# Comment line
print "$line\n" unless $opt{c};
}
elsif ($line =~ /^\s*([A-Za-z]\w*)\s*=\s*(.*)\s*$/o)
{
# Macro definition
print "Macro: $line\n" if $opt{d};
my $lhs = $1;
my $rhs = $2;
$rhs = $1 if $rhs =~ m/^"(.*)"$/;
$MACROS{$lhs} = ${rhs};
print "##M: $lhs = <<$MACROS{$lhs}>>\n" if $opt{d};
}
else
{
print "Expand: $line\n" if $opt{d};
$line = macro_expand($line);
print "$line\n";
}
}
Given a configuration file, cfg, containing:
LINE_D="(L#'id' == 'log') AND L#'id' IS NULL"
and another file, F, containing:
PRINT '$LINE_D'
PRINT '${LINE_D}'
the output of perl setmacro.pl cfg F is:
PRINT '(L#'id' == 'log') AND L#'id' IS NULL'
PRINT '(L#'id' == 'log') AND L#'id' IS NULL'
This matches the required output, but gives me the heebie-jeebies with its multiple single quotes. However, the customer is always right!
(I think I got rid of the residual Perl 4-isms; the base script still had a few remnants left over, and some comments about how Perl 5.001 handles things differently. It does use $` and $' which is generally not a good idea. However it works, so fixing that is an exercise for the reader. The regex variable is not now necessary; it was when it was also recognizing make macro notations — $(macro) as well as ${macro}.)

grep lines before and after in aix/ksh shell

I want to extract lines before and after a matched pattern.
eg: if the file contents are as follows
absbasdakjkglksagjgj
sajlkgsgjlskjlasj
hello
lkgjkdsfjlkjsgklks
klgdsgklsdgkldskgdsg
I need find hello and display line before and after 'hello'
the output should be
sajlkgsgjlskjlasj
hello
lkgjkdsfjlkjsgklks
This is possible with GNU but i need a method that works in AIX / KSH SHELL WHERE NO GNU IS INSTALLED.

sed -n '/hello/{x;G;N;p;};h' filename

I've found it is generally less frustrating to build the GNU coreutils once, and benefit from many more features http://www.gnu.org/software/coreutils/

Since you'll have Perl on the machine, you could use the following code, but you'd probably do better to install the GNU utilities. This has options -b n1 for lines before and -f n1 for lines following the match. It works with PCRE matches (so if you want case-insensitive matching, add an i after the regex instead using a -i option. I haven't implemented -v or -l; I didn't need those.
#!/usr/bin/env perl
#
# #(#)$Id: sgrep.pl,v 1.7 2013/01/28 02:07:18 jleffler Exp $
#
# Perl-based SGREP (special grep) command
#
# Print lines around the line that matches (by default, 3 before and 3 after).
# By default, include file names if more than one file to search.
#
# Options:
# -b n1 Print n1 lines before match
# -f n2 Print n2 lines following match
# -n Print line numbers
# -h Do not print file names
# -H Do print file names
use warnings;
use strict;
use constant debug => 0;
use Getopt::Std;
my(%opts);
sub usage
{
print STDERR "Usage: $0 [-hnH] [-b n1] [-f n2] pattern [file ...]\n";
exit 1;
}
usage unless getopts('hnf:b:H', \%opts);
usage unless #ARGV >= 1;
if ($opts{h} && $opts{H})
{
print STDERR "$0: mutually exclusive options -h and -H specified\n";
exit 1;
}
my $op = shift;
print "# regex = $op\n" if debug;
# print file names if -h omitted and more than one argument
$opts{F} = (defined $opts{H} || (!defined $opts{h} and scalar #ARGV > 1)) ? 1 : 0;
$opts{n} = 0 unless defined $opts{n};
my $before = (defined $opts{b}) ? $opts{b} + 0 : 3;
my $after = (defined $opts{f}) ? $opts{f} + 0 : 3;
print "# before = $before; after = $after\n" if debug;
my #lines = (); # Accumulated lines
my $tail = 0; # Line number of last line in list
my $tbp_1 = 0; # First line to be printed
my $tbp_2 = 0; # Last line to be printed
# Print lines from #lines in the range $tbp_1 .. $tbp_2,
# leaving $leave lines in the array for future use.
sub print_leaving
{
my ($leave) = #_;
while (scalar(#lines) > $leave)
{
my $line = shift #lines;
my $curr = $tail - scalar(#lines);
if ($tbp_1 <= $curr && $curr <= $tbp_2)
{
print "$ARGV:" if $opts{F};
print "$curr:" if $opts{n};
print $line;
}
}
}
# General logic:
# Accumulate each line at end of #lines.
# ** If current line matches, record range that needs printing
# ** When the line array contains enough lines, pop line off front and,
# if it needs printing, print it.
# At end of file, empty line array, printing requisite accumulated lines.
while (<>)
{
# Add this line to the accumulated lines
push #lines, $_;
$tail = $.;
printf "# array: N = %d, last = $tail: %s", scalar(#lines), $_ if debug > 1;
if (m/$op/o)
{
# This line matches - set range to be printed
my $lo = $. - $before;
$tbp_1 = $lo if ($lo > $tbp_2);
$tbp_2 = $. + $after;
print "# $. MATCH: print range $tbp_1 .. $tbp_2\n" if debug;
}
# Print out any accumulated lines that need printing
# Leave $before lines in array.
print_leaving($before);
}
continue
{
if (eof)
{
# Print out any accumulated lines that need printing
print_leaving(0);
# Reset for next file
close ARGV;
$tbp_1 = 0;
$tbp_2 = 0;
$tail = 0;
#lines = ();
}
}

I had a situation where I was stuck with a slow telnet session on a tablet, believe it or not, and I couldn't write a Perl script very easily with that keyboard. I came up with this hacky maneuver that worked in a pinch for me with AIX's limited grep. This won't work well if your grep returns hundreds of lines, but if you just need one line and one or two above/below it, this could do it. First I ran this:
cat -n filename |grep criteria
By including the -n flag, I see the line number of the data I'm seeking, like this:
2543 my crucial data
Since cat gives the line number 2 spaces before and 1 space after, I could grep for the line number right before it like this:
cat -n filename |grep " 2542 "
I ran this a couple of times to give me lines 2542 and 2544 that bookended line 2543. Like I said, it's definitely fallable, like if you have reams of data that might have " 2542 " all over the place, but just to grab a couple of quick lines, it worked well.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Split one file into multiple files based on delimiter - linux

Debian has csplit, but I don't know if that's common to all/most/other distributions. If not, though, it shouldn't be too hard to track down the source and compile it...

The following command works for me. Hope it helps. awk 'BEGIN{file = 0; filename = "output_" file ".txt"} /-|/ {getline; file ++; filename = "output_" file ".txt"} {print $0 > filename}' input

Related

How to remove values from text file from different position

How do I move a line of text in a file to the top if it contains a word (need for multiple files)

How to format lines in a file by using shell language?

Replacing values of a file based on a conf file

grep lines before and after in aix/ksh shell

Categories

Resources