This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
This is a sample line from my file:
42001232 2011-07-01 51 100001 0 100002 0 2011-07-02 51 100003 0 100004 0
How do I arrange it to look like this
42001232 2011-07-01 51 100001 0
42001232 2011-07-01 51 100002 0
42001232 2011-07-02 51 100003 0
42001232 2011-07-02 51 100004 0
Apart from the first column, all the columns are repeating starting with a date.
I need to organize it in a tabular form. Also, the delimiter here is TAB.
Here's one way using awk. Run like:
awk -f script.awk file
Contents of script.awk:
BEGIN {
FS=OFS="\t"
}
{
for(i=2;i<=NF;i++) {
if ($i ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}$/) {
for (j=i+2;j<=NF;j+=2) {
if ($j ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}$/) {
break
}
else {
print $1, $i, $(i+1), $j, $(j+1)
}
}
}
}
}
Results:
42001232 2011-07-01 51 100001 0
42001232 2011-07-01 51 100002 0
42001232 2011-07-02 51 100003 0
42001232 2011-07-02 51 100004 0
Alternatively, here's the one-liner:
awk 'BEGIN { FS=OFS="\t" } { for(i=2;i<=NF;i++) if ($i ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}$/) for (j=i+2;j<=NF;j+=2) if ($j ~ /^[0-9]{4}-[0-9]{2}-[0-9]{2}$/) break; else print $1, $i, $(i+1), $j, $(j+1) }' file
This works on the given data:
#!/usr/bin/env perl
use strict;
use warnings;
use English qw( -no_match_vars );
$OFS = qq"\t";
while (<>)
{
chomp;
my(#fields) = split /\s+/, $_;
my $col1 = shift #fields;
my $date = shift #fields;
my $col3 = shift #fields;
while (scalar(#fields) > 1)
{
if ($fields[0] =~ /^\d{4}-\d\d-\d\d$/)
{
$date = shift #fields;
$col3 = shift #fields;
next;
}
else
{
my $col4 = shift #fields;
my $col5 = shift #fields;
print $col1, $date, $col3, $col4, "$col5\n";
}
}
print STDERR "oops - debris $fields[0] left over\n" if (scalar(#fields) != 0);
}
The output I got is:
42001232 2011-07-01 51 100001 0
42001232 2011-07-01 51 100002 0
42001232 2011-07-02 51 100003 0
42001232 2011-07-02 51 100004 0
That's a perfectly horrid format to have to parse. I've had to make some assumptions about the way the repetitions are handled, so that the column after a date is fixed until the next date, for example.
Related
I have a file with a wide range of information and I want to extract some data from here. I only will post here the interesting part. I want to extract IQ and JQ values as well as the J_ij[meV] value which is two lines above. I read this question How to print 5 consecutive lines after a pattern in file using awk where a pattern is used to extract information bellow and I was thinking doing something similar. My initial idea was:
awk '/IQ =/ { print $6,$12 } /IQ =/ {for(i=2; i<=2; i++){ getline; print $11 }}' input.text > output.txt
Loop appears not to working
IT IQ JT JQ N1 N2 N3 DRX DRY DRZ DR J_ij [mRy] J_ij [meV]
IT = 1 IQ = **1** JT = 1 JQ = **1**
->Q = ( -0.250, 0.722, 0.203) ->Q = ( -0.250, 0.722, 0.203)
1 1 1 1 0 0 0 0.000 0.000 0.000 0.000 0.000000000 **0.000000000**
IT = 1 IQ = **1** JT = 6 JQ = **6**
->Q = ( -0.250, 0.722, 0.203) ->Q = ( 0.000, 1.443, 0.609)
1 1 6 6 -1 0 -1 -0.250 -0.144 -0.406 0.498 0.135692822 **1.846194885**
IT = 1 IQ = **1** JT = 8 JQ = **8**
->Q = ( -0.250, 0.722, 0.203) ->Q = ( 0.000, 0.577, 0.609)
1 1 8 8 0 0 -1 0.250 -0.144 -0.406 0.498 0.017676555 **0.240501782**
My expected output is:
IQ JQ J_ij [meV]
1 1 0.000000000
1 6 1.846194885
1 8 0.240501782
It comes from the bold words (** **), first line is only indicative.
Could you please try following. Written and tested with shown examples.
awk '
BEGIN{
print "IQ JQ J_ij [meV]"
}
FNR>1 && /IQ =/{
value=$6 OFS $12
found=1
next
}
found && NF && !/ ->Q/{
if(value){
print value OFS $NF
}
value=found=""
}' Input_file
Output will be as follows.
IQ JQ J_ij [meV]
1 1 0.000000000
1 6 1.846194885
1 8 0.240501782
Hi Wonderful People/My Gurus and all kind-hearted people.
I've a fixed width file and currently i'm trying to find the length of those rows that contain x bytes. I tried couple of awk commands but, it is not giving me the result that i wanted. My fixed width contains 208bytes, but there are few rows that don't contain 208 bytes. I"m trying to discover those records that doesn't have 208bytes.
this cmd gave me the file length
awk '{print length;exit}' file.text
here i tried to print rows that contain 101 bytes, but it didn't work.
awk '{print length==101}' file.text
Any help/insights here would be highly helpful
With awk:
awk 'length() < 208' file
Well, length() gives you the number of characters, not bytes. This number can differ in unicode context. You can use the LANG environment variable to force awk to use bytes:
LANG=C awk 'length() < 208' file
Perl to the rescue!
perl -lne 'print "$.:", length if length != 208' -- file.text
-n reads the input line by line
-l removes newlines from the input before processing it and adds them to print
The one-liner will print line number ($.) and the length of the line for each line whose length is different than 208.
if you're using gawk, then it's no issue, even in typical UTF-8 locale mode :
length(s) = # chars native to locale,
# typically that means # utf-8 chars
match(s, /$/) - 1 = # raw bytes # this also work for pure-binary
# inputs, without triggering
# any error messages in gawk Unicode mode
Best illustrated by example :
0000000 3347498554 3381184647 3182945161 171608122
: Ɔ ** LJ ** Ȉ ** ɉ ** 㷽 ** ** : 210 : \n
072 306 206 307 207 310 210 311 211 343 267 275 072 210 072 012
: ? 86 ? 87 ? 88 ? 89 ? ? ? : 88 : nl
58 198 134 199 135 200 136 201 137 227 183 189 58 136 58 10
3a c6 86 c7 87 c8 88 c9 89 e3 b7 bd 3a 88 3a 0a
0000020
# gawk profile, created Sat Oct 29 20:32:49 2022
BEGIN {
1 __ = "\306\206\307\207\310" (_="\210") \
"\311\211\343\267\275"
1 print "",__,_
1 STDERR = "/dev/stderr"
1 print ( match(_, /$/) - 1, "_" ) > STDERR # *A
1 print ( length(__), match(__, /$/) - 1 ) > STDERR # *B
1 print ( (__~_), match(__, (_) ".*") ) > STDERR # *C
1 print ( RSTART, RLENGTH ) > STDERR # *D
}
1 | _ *A # of bytes off "_" because it was defined as 0x88 \210
5 | 11 *B # of chars of "__", and
# of bytes of it :
# 4 x 2-byte UC
# + 1 x 3-byte UC = 11
1 | 3 *C # does byte \210 exist among larger string (true/1),
# and which unicode character is 1st to
# contain \210 - the 3rd one, by original definition
3 | 3 *D # notice I also added a ".*" to the tail of this match() :
# if the left-side string being tested is valid UTF-8,
# then this will match all the way to the end of string,
# inclusive, in which you can deduce :
#
# "\210 first appeared in 3rd-to-last utf-8 character"
Combining that inferred understanding :
RLENGTH = "3 chars to the end, inclusive",
with knowledge of how many to its left :
RSTART - 1 = "2 chars before",
yields a total count of 3 + 2 = 5, affirming length()'s result
I have the following Perl code::
#!/usr/bin/perl
use threads;
use Thread::Queue;
use DateTime;
$| = 1; my $numthreads = 20;
$min = 1;
$max = 100;
my $fetch_q = Thread::Queue->new();
our $total = 0;
sub fetch {
while ( my $target = $fetch_q->dequeue() ) {
print $total++ . " ";
}
}
my #workers = map { threads->create( \&fetch ) } 1 .. $numthreads;
$fetch_q->enqueue( $min .. $max );
$fetch_q->end();
foreach my $thr (#workers) {$thr->join();}
The code creates 20 threads and then increments a variable $total.
The current output is something like:
0 0 0 0 0 1 0 0 1 1 2 0 0 1 0 2 0 3 0 1 0 2 1 0 2 1 0 0 3 0
But the desired output is:
1 2 3 4 5 6 7 8 9 10 .... 30
Is there a way to have Perl increment the variable? The order does not matter (i.e. its fine if it is 1 2 4 5 3).
use threads::shared;
my $total :shared = 0;
lock $total;
print ++$total . " ";
I have two text files:
File1.txt
dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}
gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
fdf 767 4643 {"klhf":"3455" kgs:"4566"}
.
.
File2.txt
8853
6437437567
36265
4566
.
.
Output could be two files
Match.txt
gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
fdf 767 4643 {"klhf":"3455" kgs:"4566"}
Non_Match.txt
dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}
Can someone help me write bash script for this?
I think i have the logic here if it helps:
for (rows in File1.txt) {
bool found = false;
for (id in File2.txt) {
if (row contains id) {
found = true;
echo row >> Match.txt
break;
}
}
if (!found) {
echo row >> Non_Match.txt
}
}
Edit Part:
I also have a bash script but its not helping as it is not putting the row which matches but instead only the ID that matches..
#!/bin/bash
set -e
file1="File2.txt"
file2="File1.txt"
for id in $(tail -n+1 "${file1}"); do
if ! grep "${id}" "${file2}"; then
echo "${id}" >>non_matches.txt
else
echo "${id}" >>matches.txt
fi
done
You could use grep -f to look for search patterns that are listed in a separate file. It'd probably be good to use the -F (fixed strings) and -w (match whole words) flags as well.
grep -Fw -f File2.txt File1.txt > Match.txt
grep -Fwv -f File2.txt File1.txt > Non_Match.txt
This sounds a bit like diff or wdiff if you want to do this on word level.
If you run diff on your two files, you will generate the following output:
< dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}
< gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
< fdf 767 4643 {"klhf":"3455" kgs:"4566"}
---
> 8853
> 6437437567
> 36265
> 4566
It means that the "minimal" way (per line) to modify the first file into the second is removing all lines and add all new lines.
If however the second file would have been:
8853
6437437567
gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
36265
4566
The diff output is:
1c1,2
< dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}
---
> 8853
> 6437437567
3c4,5
< fdf 767 4643 {"klhf":"3455" kgs:"4566"}
---
> 36265
> 4566
So diff no longer asks to remove the second line.
wdiff does approximately the same, but on word level:
[-dadads 434 43 {"4fsdf":"66db1" fdf1:"5834"}-]{+8853
6437437567+}
gsgss 45 0 {"gsdg":"8853" sgdfg:"4631"}
[-fdf 767 4643 {"klhf":"3455" kgs:"4566"}-]
{+36265
4566+}
We have a very simple tcp messaging script that cats some text to a server port which returns and displays a response.
The part of the script we care about looks something like this:
cat someFile | netcat somehost 1234
The response the server returns is 'complete' once we get a certain character code (specifically &001C) returned.
How can I close the connection when I receive this special character?
(Note: The server won't close the connection for me. While I currently just CTRL+C the script when I can tell it's done, I wish to be able to send many of these messages, one after the other.)
(Note: netcat -w x isn't good enough because I wish to push these messages through as fast as possible)
Create a bash script called client.sh:
#!/bin/bash
cat someFile
while read FOO; do
echo $FOO >&3
if [[ $FOO =~ `printf ".*\x00\x1c.*"` ]]; then
break
fi
done
Then invoke netcat from your main script like so:
3>&1 nc -c ./client.sh somehost 1234
(You'll need bash version 3 for the regexp matching).
This assumes that the server is sending data in lines - if not you'll have to tweak client.sh so that it reads and echoes a character at a time.
How about this?
Client side:
awk -v RS=$'\x1c' 'NR==1;{exit 0;}' < /dev/tcp/host-ip/port
Testing:
# server side test script
while true; do ascii -hd; done | { netcat -l 12345; echo closed...;}
# Generate 'some' data for testing & pipe to netcat.
# After netcat connection closes, echo will print 'closed...'
# Client side:
awk -v RS=J 'NR==1; {exit;}' < /dev/tcp/localhost/12345
# Changed end character to 'J' for testing.
# Didn't wish to write a server side script to generate 0x1C.
Client side produces:
0 NUL 16 DLE 32 48 0 64 # 80 P 96 ` 112 p
1 SOH 17 DC1 33 ! 49 1 65 A 81 Q 97 a 113 q
2 STX 18 DC2 34 " 50 2 66 B 82 R 98 b 114 r
3 ETX 19 DC3 35 # 51 3 67 C 83 S 99 c 115 s
4 EOT 20 DC4 36 $ 52 4 68 D 84 T 100 d 116 t
5 ENQ 21 NAK 37 % 53 5 69 E 85 U 101 e 117 u
6 ACK 22 SYN 38 & 54 6 70 F 86 V 102 f 118 v
7 BEL 23 ETB 39 ' 55 7 71 G 87 W 103 g 119 w
8 BS 24 CAN 40 ( 56 8 72 H 88 X 104 h 120 x
9 HT 25 EM 41 ) 57 9 73 I 89 Y 105 i 121 y
10 LF 26 SUB 42 * 58 : 74
After 'J' appears, server side closes & prints 'closed...', ensuring that the connection has indeed closed.
Try:
(cat somefile; sleep $timeout) | nc somehost 1234 | sed -e '{s/\x01.*//;T skip;q;:skip}'
This requires GNU sed.
How it works:
{
s/\x01.*//; # search for \x01, if we find it, kill it and the rest of the line
T skip; # goto label skip if the last s/// failed
q; # quit, printing current pattern buffer
:skip # label skip
}
Note that this assumes there'll be a newline after \x01 - sed won't see it otherwise, as sed operates line-by-line.
Maybe have a look at Ncat as well:
"Ncat is the culmination of many key features from various Netcat incarnations such as Netcat 1.x, Netcat6, SOcat, Cryptcat, GNU Netcat, etc. Ncat has a host of new features such as "Connection Brokering", TCP/UDP Redirection, SOCKS4 client and server supprt, ability to "Chain" Ncat processes, HTTP CONNECT proxying (and proxy chaining), SSL connect/listen support, IP address/connection filtering, plus much more."
http://nmap-ncat.sourceforge.net
This worked best for me. Just read the output with a while loop and then check for "0x1c" using an if statement.
while read i; do
if [ "$i" = "0x1c" ] ; then # Read until "0x1c". Then exit
break
fi
echo $i;
done < <(cat someFile | netcat somehost 1234)