read many lines with specific position

read many lines with specific position - linux

Thank you for the time you soent reading it, maybe it is a nooby question
I have a file of 10081 lines, this is an example of the file (a nordic seismic bulletin):
2016 1 8 0921 21.5 L -22.382 -67.835 148.9 OSC 18 0.3 4.7LOSC 1
2016 1 8 1515 43.7 L -20.762 -67.475 188.7 OSC 16 .30 3.7LOSC 1
2016 1 9 0529 35.9 L -18.811 -67.278 235.9 OSC 16 0.5 3.9LOSC 1
2016 110 313 55.6 L -22.032 -67.375 172.0 OSC 14 .30 3.0LOSC 1
2016 110 1021 36.5 L -16.923 -66.668 35.0 OSC 16 0.4 4.5LOSC 1
I tried the following code to extract some information from the file and save them in a separate file.
awk 'NR==1 {print substr($0,24,7), substr($0,32,7), substr($0,40,5)}' select.inp > lat_lon_depth.xyz
substr($0,24,7) means that I take from the 24th position 7 characters which is
the latitude information (-22.382) and the same for the others (longitude from 32th place with 7 characters and depth on 4oth position with 5 characters).
So the question, is possible to go trought all the lines of file and have all latitude, longitude and depth.
Thank you for the time

Related

gnuplot: how to correctly interpret negative times?

I have some issue with negative times in gnuplot.
Basically, I would like to write a negative time, e.g. as -00:01:00, but gnuplot is not interpreting it as -60 seconds, but as +60 seconds.
I can somehow understand why: because -00 hours is equal to +00 hours and then 01 minutes are counted positive.
Did I overlook something? Is there maybe an easy workaround?
More examples are given below. Let's convert some times in the format %H:%M:%S (actually %tH:%tM:%tS).
I'm fine with all lines, except line 6.
Line 7 will be interpreted as %tH:%tM without seconds that's why it is -3660 seconds.
Code:
### negative times
reset session
$Data <<EOD
1 01:00:00
2 01:00:01
3 -01:00:00
4 -01:00:01
5 00:01:01
6 -00:01:01
7 -01:01
8 00:-01:-01
9 00:-01:01
EOD
myTimeFmt = "%tH:%tM:%tS"
set table $Test
plot $Data u 1:(strcol(2)):(timecolumn(2,myTimeFmt)) w table
unset table
print $Test
### end of code
Result:
1 01:00:00 3600
2 01:00:01 3601
3 -01:00:00 -3600
4 -01:00:01 -3601
5 00:01:01 61
6 -00:01:01 61
7 -01:01 -3660
8 00:-01:-01 -61
9 00:-01:01 -61

The following is an attempt to include the possibility of entering negative times starting with -00 hours %tH:%tM:%tS (or minutes %tM:%tS).
It will handle cases 4 and 6 differently than gnuplot currently will do.
The workaround will handle cases which have negative or -00 hours and additionally negative minutes or seconds (cases 7-14 and 16-17) the same way as gnuplot will do. Well, the latter are strange formats anyway.
Code:
### workaround for handling negative times
reset session
$Data <<EOD
1 01:00:00
2 -01:00:00
3 00:01:00
4 -00:01:00
5 00:00:01
6 -00:00:01
7 00:00:-01
8 00:-00:01
9 00:-00:-01
10 00:-01:01
11 00:-01:-01
12 -00:-01:-01
13 -00:-01:01
14 -00:-01:-01
15 -01:01:01
16 -01:-01:-01
17 01:-01:-01
EOD
myTimeFmt = "%tH:%tM:%tS"
myTimeSigned(fmt,s) = s[1:1] eq '-' && strptime("%tH",s)==0 && strptime(fmt,s)>0 ? \
-strptime(fmt,s[2:]) : strptime(fmt,s)
myTime(n,fmt) = myTimeSigned(fmt,strcol(n))
set table $Test
plot $Data u 1:(strcol(2)):(timecolumn(2,myTimeFmt)):(myTime(2,myTimeFmt)) w table
unset table
print $Test
### end of code
Result:
input gnuplot workaround
1 01:00:00 3600 3600
2 -01:00:00 -3600 -3600
3 00:01:00 60 60
4 -00:01:00 60 -60 # different
5 00:00:01 1 1
6 -00:00:01 1 -1 # different
7 00:00:-01 -1 -1
8 00:-00:01 1 1
9 00:-00:-01 -1 -1
10 00:-01:01 -61 -61
11 00:-01:-01 -61 -61
12 -00:-01:-01 -61 -61
13 -00:-01:01 -61 -61
14 -00:-01:-01 -61 -61
15 -01:01:01 -3661 -3661
16 -01:-01:-01 -3661 -3661
17 01:-01:-01 3539 3539

Difficulties creating a scatter graph in Excel

I have 4 columns of data to display in a scatter graph in excel
Trade Name
Amatib
AMOXICI
Amoxinsol
Amoxival
Amoxy Activ
Bioamoxi
Biocillin
Citramox
CITRAMOX 50
MAXYL
Octacillin
Rhemox
SOLAMOCTA 
Trioxyl500
Irl DDD
16
15
15
20
20
20
15
15
15
15
12
15
13.1
15
AVE Irl
15.8
15.8
15.8
15.8
15.8
15.8
15.8
15.8
15.8
15.8
15.8
15.8
15.8
15.8
EU DDD
16
16
16
16
16
16
16
16
16
16
16
16
16
16
I want the y axis to be a list of the names in a row and the x axis to go somewhere from 10 to 22 with each number a non connected point, or just the first two rows of data and then I can add in a straight line for the 15.8 and 16. I can't figure out how to do it!
Thanks

lbr, you will have to use a Line Chart and then make it look like a scatter. Scatter seems to only accept numeric x and y axis values.
Here I selected my Trade Name and irlDD data in columns and went Insert Line Chart. Then I set:
Line - No Line
Marker Options - built in
Let me know if you need more help with this. To set the axis to 10 to 22 then you will just format the y axis (double click on axis, set the minimum bound to 10 for example).

Incorrect Empirical Semivariogram Value

My gstat program for calculating empirical semivariogram on walker lake data is as follows
data = read.table("C:/Users/chandan/Desktop/walk470.csv",sep=",",header=TRUE);
attach(data);
coordinates(data)=~x+y;
walk.var1 <- variogram(v ~ x+y, data=data,width=5,cutoff=100);
The result is as follows
np dist gamma
1 105 3.836866 32312.63
2 459 8.097102 44486.82
3 1088 12.445035 60230.48
4 985 17.874264 76491.36
5 1579 22.227711 75103.67
6 1360 27.742246 83595.83
7 1747 32.291155 91248.20
8 1447 37.724524 97610.65
9 2233 42.356048 85857.03
10 1794 47.537644 93263.63
11 2180 52.295711 98282.98
12 2075 57.601882 91589.39
13 2848 62.314646 91668.70
14 2059 67.627847 95803.45
15 2961 72.310575 91975.76
16 2240 77.648900 95858.87
17 3067 82.379802 88123.56
18 2463 87.641359 87568.94
19 2746 92.334788 97991.56
20 2425 97.754121 93914.31
I have written a code of my own version of the same peoblem using classical sample variogram estimator. The number of points, dist are coming exactly as in the output. But the gamma value is not same. Why is that and what should I do to make it exactly same with gstat output?
Thanks in advance...

How can I swap numbers inside data block of repeating format using linux commands?

I have a huge data file, and I hope to swap some numbers of 2nd column only, in the following format file. The file have 25,000,000 dataset, and 8768 lines each.
%% Edited: shorter 10 line example. Sorry for the inconvenience. This is typical one data block.
# Dataset 1
#
# Number of lines 10
#
# header lines
5 11 3 10 120 90 0 0.952 0.881 0.898 2.744 0.034 0.030
10 12 3 5 125 112 0 0.952 0.897 0.905 2.775 0.026 0.030
50 10 3 48 129 120 0 1.061 0.977 0.965 3.063 0.001 0.026
120 2 4 5 50 186 193 0 0.881 0.965 0.899 0.917 3.669 0.000 -0.005
125 3 4 10 43 186 183 0 0.897 0.945 0.910 0.883 3.641 0.000 0.003
186 5 4 120 125 249 280 0 0.899 0.910 0.931 0.961 3.727 0.000 -0.001
193 6 4 120 275 118 268 0 0.917 0.895 0.897 0.937 3.799 0.000 0.023
201 8 4 278 129 131 280 0 0.921 0.837 0.870 0.934 3.572 0.000 0.008
249 9 4 186 355 179 317 0 0.931 0.844 0.907 0.928 3.615 0.000 0.008
280 10 4 186 201 340 359 0 0.961 0.934 0.904 0.898 3.700 0.000 0.033
#
# Dataset 1
#
# Number of lines 10
...
As you can see, there are 7 repeating header lines in the head, and 1 trailing line at the end of the dataset. Those header and trailing lines are all beginning from #. As a result, the data will have 7 header lines, 8768 data lines, and 1 trailing line, total 8776 lines per a data block. That one trailing line only contains sinlge '#'.
I want to swap some numbers in 2nd columns only. First, I want to replace
1, 9, 10, 11 => 666
2, 6, 7, 8 => 333
3, 4, 5 => 222
of the 2nd column, and then,
666 => 6
333 => 3
222 => 2
of the 2nd column. I hope to conduct this replacing for all repeating dataset.
I tried this with python, but the data is too big, so it makes memory error. How can I perform this swapping with linux commands like sed or awk or cat commands?
Thanks
Best,

This might work for you, but you'd have to use GNU awk, as it's using the gensub command and $0 reassignment.
Put the following into an executable awk file ( like script.awk ):
#!/usr/bin/awk -f
BEGIN {
a[1] = a[9] = a[10] = a[11] = 6
a[2] = a[6] = a[7] = a[8] = 3
a[3] = a[4] = a[5] = 2
}
function swap( c2, val ) {
val = a[c2]
return( val=="" ? c2 : val )
}
/^( [0-9]+ )/ { $0 = gensub( /^( [0-9]+)( [0-9]+)/, "\\1 " swap($2), 1 ) }
47 # print the line
Here's the breakdown:
BEGIN - set up an array a with mappings of the new values.
create a user defined function swap to provide values for the 2nd column from the a array or the value itself. The c2 element is passed in, while the val element is a local variable ( becuase no 2nd argument is passed in ).
when a line starts with a space followed by a number and a space (the pattern), then use gensub to replace the first occurrance of the first number pattern with itself concatenated with a space and the return from swap(the action). In this case, I'm using gensub's replacement text to preserve the first column data. The second column is passed to swap using the field data identifier of $2. Using gensub should preserve the formatting of the data lines.
47 - an expression that evaluates to true provides the default action of printing $0, which for data lines might have been modified. Any line that wasn't "data" will be printed out here w/o modifications.
The provided data doesn't show all the cases, so I made up my own test file:
# 2 skip me
9 2 not going to process me
1 1 don't change the for matting
2 2 4 23242.223 data
3 3 data that's formatted
4 4 7 that's formatted
5 5 data that's formatted
6 6 data that's formatted
7 7 data that's formatted
8 8 data that's formatted
9 9 data that's formatted
10 10 data that's formatted
11 11 data that's formatted
12 12 data that's formatted
13 13 data that's formatted
14 s data that's formatted
# some other data
Running the executable awk (like ./script.awk data) gives the following output:
# 2 skip me
9 2 not going to process me
1 6 don't change the for matting
2 3 4 23242.223 data
3 2 data that's formatted
4 2 7 that's formatted
5 2 data that's formatted
6 3 data that's formatted
7 3 data that's formatted
8 3 data that's formatted
9 6 data that's formatted
10 6 data that's formatted
11 6 data that's formatted
12 12 data that's formatted
13 13 data that's formatted
14 s data that's formatted
# some other data
which looks alright to me, but I'm not the one with 25 million datasets.
You'd also most definitely want to try this on a smaller sample of your data first (the first few datasets?) and redirect stdout a temp file perhaps like:
head -n 26328 data | ./script.awk - > tempfile
You can learn more about the elements used in this script here:
awk basics (the man page)
Arrays
User defined functions
String functions - gensub()
And of course, you should spend some quality time reviewing awk related questions and answers on Stack Overflow ;)

linux/shell script

I have written a program which generates parameter index for 2 variables. Say, a and b in steps of 5. like this I have to do for 23 variables. So I don't want to write 23 for-loops to run, how can I make it into a single for-loop which is common for all 23 variables. I hope it can be done with an array, but i don't know how to implement it via program.
Could you please help me?
Program:
int z, p
float a, b
float a0, an, s, a1, b0, bn, b1
str var
s=5; a0=1; an=10; b0=8; bn=13 // s= steps, a0, b0= initial value, an,bn=final value
z=0
a1=(an-a0)/s
b1=(bn-b0)/s
for (a=(a1+a0);a<=an;a=a+a1)
for (b=(b1+b0);b<=bn;b=b+b1)
echo {z} {a} {b} -format "%25s" >> /home/genesis/genesis-2.3/genesis/Scripts/kinetikit/dhanu19.txt
z=z+1
end
end
output : dhanu19.txt
0 2.8 9
1 2.8 10
2 2.8 11
3 2.8 12
4 2.8 13
5 4.6 9
6 4.6 10
7 4.6 11
8 4.6 12
9 4.6 13
10 6.4 9
11 6.4 10
12 6.4 11
13 6.4 12
14 6.4 13
15 8.2 9
16 8.2 10
17 8.2 11
18 8.2 12
19 8.2 13
20 10 9
21 10 10
22 10 11
23 10 12
24 10 13

Have you considered writing either a script or a program to write the script for you? Generating shell-scripts, then running them can sometimes be a powerful solution to problems.

Which Shell are you referring to? Declaring Arrays has some syntactical differences between zsh, bash or so...

Let's assume you write the 23 for loop.
If you have 5 steps for each loop, you will end up with 5^23 parameter !
Let's suppose each loop outputs 1 byte, you still need to store something like 10^16 bytes, or ten thousand terabytes.
I think you should reconsider your problem, or reformulate your question
Edit :
This is not a forums (and aven in forums you can edit your post).
Please edit your question instead of posting new answer, I think it is interesting

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

read many lines with specific position - linux

Related

gnuplot: how to correctly interpret negative times?

Difficulties creating a scatter graph in Excel

Incorrect Empirical Semivariogram Value

How can I swap numbers inside data block of repeating format using linux commands?

linux/shell script

Categories

Resources