Convert Row to Column in shell - linux

I am in need of converting the below in multiple files. Text need not be same, but will be in the same format and length
File 1:
XXXxx81511
XXX is Present
abcdefg
07/09/2014
YES
1
XXX
XXX-XXXX
File 2:
XXXxx81511
XXX is Present
abcdefg
07/09/2014
YES
1
XXX
XXX-XXXX
TO
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX
Basically converting row to column and appending to a new file while adding commas to separate them.
I am trying cat filename | tr '\n' ',' but the results do get added in the same line. like this
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX,XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXXXXX-XXXX

Use:
paste -sd, file1 file2 .... fileN
#e.g.
paste -sd, *.txt file*
prints
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
and if you need the empty line after each one
paste -sd, file* | sed G
prints
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
XXXxx81511,XXX is Present,abcdefg,07/09/2014,YES,1,XXX,XXX-XXXX
Short perl variant:
perl -pe 'eof||s|$/|,|' files....

You need to insert an echo after tr. Use a script like this:
for f in file1 file2; do
tr '\n' ',' < "$f"; echo
done > files.output

Use a for loop:
for f in file*; do sed ':a;N;$!ba;s/\n/,/g' < $f; done
The sed code was taken from sed: How can I replace a newline (\n)?. tr '\n' ',' didn't work on my limited test setup.

perl -ne 'chomp; print $_ . (($. % 8) ? "," : "\n")' f*
where:
-n reads the file line by line but doesn't print each line
-e executes the code from the command line
8 number of lines in each file
f* glob for files (replace with something that will select all
your files). If you need a specific order, you will probably need
something more complicated here.

Related

cat without line breaks: why does tr '\n' not work?

I generated 1000 output files containing a single line with (mistakenly) no line break at the end, so that
cat filnename_* > outfile
generates a file with a single line. I attempted to remedy this using
cat filename_* | tr '\n' ' ' > outfile
but I get exactly the same result - a file with a single line of output. Why doesn't the latter code (which ought to add a line break for each filename_* file) accomplish what I'm trying to do?
I think you could manually append a line break to your 1000 out files, and then cat them all later:
echo | tee -a filename_*
cat filnename_* > outfile
Edit:
Change the first step to echo | tee -a filename_* as #rowboat suggested
If all your files are missing the final linefeed then you can use sed for adding it on the fly:
# with GNU sed
sed '$s/$/\n/' filnename_* > outfile
# with standard sed and bash, zsh, etc...
sed $'$s/$/\\\n/' filnename_* > outfile
# with standard sed and a POSIX shell
sed '$s/$/\
/' filnename_* > outfile
tr '\n' ' ' says to replace each \n with a space; you've already stated the inputs do not contain any \n so the tr does nothing and the final output is just a copy of the input
Setup:
for ((i=1;i<=5;i++))
do
printf 'abcd' > out${i}
done
$ cat out*
abcdabcdabcdabcdabcd
Many commands can process a file and add a \n, it just depends on how much typing you want to do, eg:
$ sed 's/$/&/' out* # or: sed -n '/$/p' out*
abcd
abcd
abcd
abcd
abcd
$ awk '1' out*
abcd
abcd
abcd
abcd
abcd
I'm not coming up with any ideas on how to use cat to append a \n but one idea would be to use a user-defined function; assume we want to name our new function catn (cat and add \n on end):
$ type -a catn # verify name "catn" not currently in use
-bash: type: catn: not found
$ catn() { awk '1' "${#:--}"; } # wrap function definition around the awk solution
$ catn out*
abcd
abcd
abcd
abcd
abcd

Searching specific lines of files using GREP

I have a directory with many text files. I want to search a given string in specific lines in the files(like searching for 'abc' in only 2nd and 3rd line of each file). Then When I find A match I want to print line 1 of the matching file.
My Approach - I'm doing a grep search with -n option and storing the output in a different file and then searching that file for the line number. Then I'm trying to get the file name and then print out it's first line.
Using the approach I mentioned above I'm not able to get the file name of the right file and even if I get that this approach is very lengthy.
Is there a better and fast solution to this?
Eg.
1.txt
file 1
one
two
2.txt
file 2
two
three
I want to search for "two" in line 2 of each file using grep and then print the first line of the file with match. In this example that would be 2.txt and the output should be "file 2"
I know it is easier using sed/awk but is there any way to do this using grep?
Use sed instead (GNU sed):
parse.sed
1h # Save the first line to hold space
2,3 { # On lines 2 and 3
/my pattern/ { # Match `my pattern`
x # If there is a match bring back the first line
p # and print it
:a; n; ba # Loop to the end of the file
}
}
Run it like this:
sed -snf parse.sed file1 file2 ...
Or as a one-liner:
sed -sn '1h; 2,3 { /my pattern/ { x; p; :a; n; ba; } }' file1 file2 ...
You might want to emit the filename as well, e.g. with your example data:
parse2.sed
1h # Save the first line to hold space
2,3 { # On lines 2 and 3
/two/ { # Match `my pattern`
F # Output the filename of the file currently being processed
x # If there is a match bring back the first line
p # and print it
:a; n; ba # Loop to the end of the file
}
}
Run it like this:
sed -snf parse2.sed file1 file2 | paste -d: - -
Output:
file1:file 1
file2:file 2
$ awk 'FNR==2{if(/one/) print line; nextfile} FNR==1{line=$0}' 1.txt 2.txt
file 1
$ awk 'FNR==2{if(/two/) print line; nextfile} FNR==1{line=$0}' 1.txt 2.txt
file 2
FNR will have line number for the current file being read
use FNR>=2 && FNR<=3 if you need a range of lines
FNR==1{line=$0} will save the contents of first line for future use
nextfile should be supported by most implementations, but the solution will still work (slower though) if you need to remove it
With grep and bash:
# Grep for a pattern and print filename and line number
grep -Hn one file[12] |
# Loop over matches where f=filename, n=match-line-number and s=matched-line
while IFS=: read f n s; do
# If match was on line 2 or line 3
# print the first line of the file
(( n == 2 || n == 3 )) && head -n1 $f
done
Output:
file 1
Only using grep, cut and | (pipe):
grep -rnw pattern dir | grep ":line_num:" | cut -d':' -f 1
Explanation
grep -rnw pattern dir
It return name of the file(s) where the pattern was found along with the line number.
It's output will be somthing like this
path/to/file/file1(.txt):8:some pattern 1
path/to/file/file2(.txt):4:some pattern 2
path/to/file/file3(.txt):2:some pattern 3
Now I'm using another grep to get the file with the right line number (for e.g. file that contains the pattern in line 2)
grep -rnw pattern dir | grep ":2:"
It's output will be
path/to/file/file3(.txt):2:line
Now I'm using cut to get the filename
grep -rnw pattern dir | grep ":2:" | cut -d':' -f 1
It will output the file name like this
path/to/file/file3(.txt)
P.S. - If you want to remove the "path/to/file/" from the filename you can use rev then cut and again rev, you can try this yourself or see the code below.
grep -rnw pattern dir | grep ":2:" | cut -d':' -f 1 | rev | cut -d'/' -f 1 | rev

Test if ALL The Contents of One File exist in a Second File

I have found a few examples on stackoveflow on how to do this but none of them work for me.
bash text search: find if the content of one file exists in another file
I want to test whether ALL the contents of one text file exists in the same format/block/style somewhere in a second file and if not add the contents of SRC >> $TGT.
If I execute these commands manually in the console, then it returns the contents of $SRC:
SRC="mytextfile1.txt"
TGT="mytextfile2.txt"
grep -F -f $SRC $TGT
cat $TGT|grep -f $SRC
And this returns nothing:
grep $SRC -q -f $TGT
And this keeps appending each time it is executed:
function append {
f1=$(wc -c < "$SRC")
diff -y <(od -An -tx1 -w1 -v "$SRC") <(od -An -tx1 -w1 -v "$TGT") | \
rev | cut -f2 | uniq -c | grep -v '[>|]' | numgrep /${f1}../ | \
grep -q -m1 '.+*' || cat "$SRC" >> "$TGT";
}
So how can I do this so that it can then be tested in an if statement ?!
EDIT
Here's an example of the file contents:
$SRC File
text 1
text 2
text d
text e
text f
text g
$TGT File Before Modified
text 1
text 2
text 3
text 4
text a
text b
text c
$TGT File After Modified
text 1
text 2
text 3
text 4
text a
text b
text c
text 1
text 2
text d
text e
text f
text g
I would use perl's index for this:
if ! perl -0 -we '
open my $f1, "<", "mytextfile1.txt";
open my $f2, "<", "mytextfile2.txt";
exit( index(<$f2>, <$f1>) == -1 )'
then
cat mytextfile1.txt >> mytextfile2.txt
fi
The key here is -0, which makes the <> operator read the entire file instead of just one line. Note that the logic is somewhat convolutee. If index returns -1, the content is not matched and perl returns non-zero, which the shell treats as failure. So the if condition is inverted. It seems more natural that perl succeeds when the content matches, but perhaps it would be cleaner to use != and and remove the outer inversion.
Could you please try following, based on your logic(explained by OP in comments, that all contents of Input_file src should be present in same order in Input_file tgt) try following.
awk '
FNR==NR{
a[FNR,$0]
val1=(val1?val1 ORS:"")$0
next
}
((FNR,$0) in a){
count++
val2=(val2?val2 ORS:"")$0
}
END{
if(count==length(a)){
print val1 ORS val2
}
}
' file_src file_tgt

Cut matching line and X successive lines until newline and paste into file

I would like to match all lines from a file containing a word, and take all lines under until coming two two newline characters in a row.
I have the following sed code to cut and paste specific lines, but not subsequent lines:
sed 's|.*|/\\<&\\>/{w results\nd}|' teststring | sed -file.bak -f - testfile
How could I modify this to take all subsequent lines?
For example, say I wanted to match lines with 'dog', the following should take the first 3 lines of the 5:
The best kind of an animal is a dog, for sure
-man's best friend
-related to wolves
Racoons are not cute
Is there a way to do this?
This should do:
awk '/dog/ {f=1} /^$/ {f=0} f {print > "new"} !f {print > "tmp"}' file && mv tmp file
It will set f to true if word dog is found, then if a blank line is found set f to false.
If f is true, print to new file.
If f is false, print to tmp file.
Copy tmp file to original file
Edit: Can be shorten some:
awk '/dog/ {f=1} /^$/ {f=0} {print > (f?"new":"tmp")}' file && mv tmp file
Edit2: as requested add space for every section in the new file:
awk '/dog/ {f=1;print ""> "new"} /^$/ {f=0} {print > (f?"new":"tmp")}' file && mv tmp file
If the original files does contains tabs or spaces instead of just a blank line after each dog section, change from /^$/ to /^[ \t]*$/
This might work for you (GNU sed):
sed 's|.*|/\\<&\\>/ba|' stringFile |
sed -f - -e 'b;:a;w resultFile' -e 'n;/^$/!ba' file
Build a set of regexps from the stringFile and send matches to :a. Then write the matched line and any further lines until an empty line (or end of file) to the resultFile.
N.B. The results could be sent directly to resultFile,using:
sed 's#.*#/\\<&\\>/ba#' stringFile |
sed -nf - -e 'b;:a;p;n;/^$/!ba' file > resultFile
To cut the matches from the original file use:
sed 's|.*|/\\<&\\>/ba|' stringFile |
sed -f - -e 'b;:a;N;/\n\s*$/!ba;w resultFile' -e 's/.*//p;d' file
Is this what you're trying to do?
$ awk -v RS= '/dog/' file
The best kind of an animal is a dog, for sure
-man's best friend
-related to wolves
Could you please try following.
awk '/dog/{count="";found=1} found && ++count<4' Input_file > temp && mv temp Input_file

How to concatenate multiple lines of output to one line?

If I run the command cat file | grep pattern, I get many lines of output. How do you concatenate all lines into one line, effectively replacing each "\n" with "\" " (end with " followed by space)?
cat file | grep pattern | xargs sed s/\n/ /g
isn't working for me.
Use tr '\n' ' ' to translate all newline characters to spaces:
$ grep pattern file | tr '\n' ' '
Note: grep reads files, cat concatenates files. Don't cat file | grep!
Edit:
tr can only handle single character translations. You could use awk to change the output record separator like:
$ grep pattern file | awk '{print}' ORS='" '
This would transform:
one
two
three
to:
one" two" three"
Piping output to xargs will concatenate each line of output to a single line with spaces:
grep pattern file | xargs
Or any command, eg. ls | xargs. The default limit of xargs output is ~4096 characters, but can be increased with eg. xargs -s 8192.
grep xargs
In bash echo without quotes remove carriage returns, tabs and multiple spaces
echo $(cat file)
This could be what you want
cat file | grep pattern | paste -sd' '
As to your edit, I'm not sure what it means, perhaps this?
cat file | grep pattern | paste -sd'~' | sed -e 's/~/" "/g'
(this assumes that ~ does not occur in file)
This is an example which produces output separated by commas. You can replace the comma by whatever separator you need.
cat <<EOD | xargs | sed 's/ /,/g'
> 1
> 2
> 3
> 4
> 5
> EOD
produces:
1,2,3,4,5
The fastest and easiest ways I know to solve this problem:
When we want to replace the new line character \n with the space:
xargs < file
xargs has own limits on the number of characters per line and the number of all characters combined, but we can increase them. Details can be found by running this command: xargs --show-limits and of course in the manual: man xargs
When we want to replace one character with another exactly one character:
tr '\n' ' ' < file
When we want to replace one character with many characters:
tr '\n' '~' < file | sed s/~/many_characters/g
First, we replace the newline characters \n for tildes ~ (or choose another unique character not present in the text), and then we replace the tilde characters with any other characters (many_characters) and we do it for each tilde (flag g).
Here is another simple method using awk:
# cat > file.txt
a
b
c
# cat file.txt | awk '{ printf("%s ", $0) }'
a b c
Also, if your file has columns, this gives an easy way to concatenate only certain columns:
# cat > cols.txt
a b c
d e f
# cat cols.txt | awk '{ printf("%s ", $2) }'
b e
I like the xargs solution, but if it's important to not collapse spaces, then one might instead do:
sed ':b;N;$!bb;s/\n/ /g'
That will replace newlines for spaces, without substituting the last line terminator like tr '\n' ' ' would.
This also allows you to use other joining strings besides a space, like a comma, etc, something that xargs cannot do:
$ seq 1 5 | sed ':b;N;$!bb;s/\n/,/g'
1,2,3,4,5
Here is the method using ex editor (part of Vim):
Join all lines and print to the standard output:
$ ex +%j +%p -scq! file
Join all lines in-place (in the file):
$ ex +%j -scwq file
Note: This will concatenate all lines inside the file it-self!
Probably the best way to do it is using 'awk' tool which will generate output into one line
$ awk ' /pattern/ {print}' ORS=' ' /path/to/file
It will merge all lines into one with space delimiter
paste -sd'~' giving error.
Here's what worked for me on mac using bash
cat file | grep pattern | paste -d' ' -s -
from man paste .
-d list Use one or more of the provided characters to replace the newline characters instead of the default tab. The characters
in list are used circularly, i.e., when list is exhausted the first character from list is reused. This continues until
a line from the last input file (in default operation) or the last line in each file (using the -s option) is displayed,
at which time paste begins selecting characters from the beginning of list again.
The following special characters can also be used in list:
\n newline character
\t tab character
\\ backslash character
\0 Empty string (not a null character).
Any other character preceded by a backslash is equivalent to the character itself.
-s Concatenate all of the lines of each separate input file in command line order. The newline character of every line
except the last line in each input file is replaced with the tab character, unless otherwise specified by the -d option.
If ‘-’ is specified for one or more of the input files, the standard input is used; standard input is read one line at a time,
circularly, for each instance of ‘-’.
On red hat linux I just use echo :
echo $(cat /some/file/name)
This gives me all records of a file on just one line.

Resources