I have this data set ( txt).
For example:
input:
0 275,276,45,278
1 442,22,455,0,456,457,458
75 62,263,264,265,266,267
80 0,516,294,517,518,519
I would like as output
output:
0 275
0 276
0 45
...
1 442
1 22
...
80 0
I use unix terminal. Let me know if you have some ideas. Thanks
Ignoring the last part you mentioned "80 454", I found a solution to print as required.
Suppose all these values are stored in a file names "stack.txt", the following bash code will be useful.
#!/bin/bash
while read i;do
f=$(awk -F" " '{print $1}' <<< $i)
line=$(cut -d" " -f2 <<<$i)
for m in $(echo $line | sed "s/,/ /g"); do
echo $f" "$m
done
echo "..."
done<stack.txt
Output will be
0 275
0 276
0 277
0 278
0 279
0 280
0 281
0 282
0 283
...
1 442
1 22
1 455
1 0
1 456
1 457
1 458
...
75 62
75 263
75 264
75 265
75 266
75 267
...
80 0
80 516
80 294
80 517
80 518
80 519
...
Using a perl one-liner:
perl -lane 'print "$F[0] $_" for split /,/, $F[1]' input.txt
{m,n,g}awk 'gsub(",",RS $!_ FS)^_'
0 275
0 276
0 277
0 278
0 279
0 280
0 281
0 282
0 283
1 442
1 22
1 455
1 0
1 456
1 457
1 458
75 62
75 263
75 264
75 265
75 266
75 267
80 0
80 516
80 294
80 517
80 518
80 519
Related
I have multiple measurements per 'Subject' in file 1. I only want to use the highest quality, singular measurement per Subject. In my second file I have the exact list of which measurement is the best for each Subject. This information is contained in the column 'seriesnumber'. The number in the 'seriesnumber' column in file 2 corresponds to the best measurement for a Subject. I Need to extract only these rows from my file 1.
I have tried to use awk, join, and merge to try and accomplish this but came up with errors and strange incomplete files.
join code:
join -j2 file1 file2
awk code:
awk ' FILENAME=="file1" {arr[$2]=$0; next}
FILENAME=="file2" {print arr[$2]} ' file1 file2 > newfile
File 1 Example
Subject Seriesnumber
19-1-1001 2 8655 661 15250 60747 8005 3919 7393 2264 1479 1663 22968 4180 1712 689 781 4255 90 1260 7233 154 15643 63421 7361 4384 6932 2062 4526 1742 686 4575 100 1684 0 1194 0 0 5 0 0 147 699 315 305 317 565 1361200 1338210 1338690 304258 308180 612438 250614 255920 506534 66645 802424 1206450 1187010 1185180 1816840 1 1 21 17 38 1765590
19-1-1001 10 8992 507 15722 64032 8728 3929 7208 2075 1529 1529 22503 3993 1819 710 764 3870 87 1247 7361 65 16128 66226 8165 4384 6669 1805 4405 1752 779 4039 103 1705 0 1280 0 0 10 0 0 186 685 300 318 320 598 1370490 1347160 1347520 306588 307188 613775 251704 256521 508225 65808 808802 1208880 1189150 1187450 1827880 1 1 22 26 48 1778960
19-1-1103 2 3303 317 12146 57569 7008 3617 6910 2018 811 1593 18708 4708 1429 408 668 3279 14 1289 2351 85 13730 60206 6731 4137 7034 2038 4407 1483 749 3576 85 1668 0 948 0 0 7 0 0 129 602 288 291 285 748 1250030 1238540 1238820 301810 301062 602872 215029 218080 433108 61555 781150 1107360 1098510 1097220 1635560 1 1 32 47 79 1555850
19-1-1103 9 3236 286 12490 59477 7000 3558 6782 2113 894 1752 19338 4818 1724 387 649 3345 56 1314 2077 133 13885 60414 6628 4078 7063 2031 4269 1709 610 3707 112 1947 0 990 0 0 8 0 0 245 604 279 280 284 693 1269820 1258050 1258320 306856 309614 616469 215658 220876 436534 61859 796760 1124870 1115990 1114510 1630740 1 1 32 42 74 1556790
19-10-1010 2 3344 608 14744 59165 8389 4427 6962 2008 716 1496 21980 4008 1474 769 652 3715 61 1400 3049 1072 15767 61919 8325 4824 7117 1936 4001 1546 684 3935 103 1434 0 1624 0 0 3 0 0 316 834 413 520 517 833 1350760 1337040 1336840 311985 312592 624577 246800 251133 497933 65699 809736 1200320 1189410 1188280 1731270 1 1 17 13 30 1606700
19-10-1010 6 3242 616 15205 61330 8019 4520 6791 2093 735 1558 22824 3981 1546 653 614 3672 96 1227 2992 1070 16450 64189 8489 4407 6953 2099 4096 1668 680 4116 99 1449 0 2161 0 0 19 0 0 263 848 387 525 528 824 1339090 1325830 1325780 309464 311916 621380 239958 244616 484574 65493 810887 1183120 1172600 1171430 1720000 1 1 16 26 42 1587100
File 2 Example
Subject seriesnumber
19-10-1010 2
19-10-1166 2
19-102-10005 2
19-102-10006 2
19-103-10009 2
19-103-10010 2
19-104-10013 11
19-104-10014 2
19-105-10017 6
19-105-10018 6
The desired output would like something like this:
Where I no longer have duplicate entries per subject. The second column will look different because the preferred series number will differ per subject.
19-10-1010 2 3344 608 14744 59165 8389 4427 6962 2008 716 1496 21980 4008 1474 769 652 3715 61 1400 3049 1072 15767 61919 8325 4824 7117 1936 4001 1546 684 3935 103 1434 0 1624 0 0 3 0 0 316 834 413 520 517 833 1350760 1337040 1336840 311985 312592 624577 246800 251133 497933 65699 809736 1200320 1189410 1188280 1731270 1 1 17 13 30 1606700
19-10-1166 2 3699 312 15373 61787 8026 4248 6385 1955 608 2194 21394 4260 1563 886 609 3420 25 1101 3415 417 16909 63040 7236 4264 5933 1852 4156 1213 654 4007 53 1336 5 1597 0 0 18 0 0 110 821 300 514 466 854 1193020 1179470 1179420 282241 273236 555477 204883 203228 408111 61343 740736 1036210 1026080 1024910 1563950 1 1 39 40 79 1415890
19-102-10005 2 8733 514 13024 50735 7729 3775 4955 1575 1045 1141 20415 3924 1537 990 651 3515 134 1259 8571 232 13487 51374 7150 4169 5192 1664 3760 1620 596 3919 189 1958 0 1479 0 0 36 0 0 203 837 459 409 439 1072 1224350 1200010 1200120 287659 290445 578104 216976 220545 437521 57457 737161 1095770 1074440 1073050 1637570 1 1 31 22 53 1618600
19-102-10006 2 8347 604 13735 42231 7266 3836 6473 2057 1099 1007 18478 3769 1351 978 639 3332 125 1197 8207 454 13774 43750 6758 4274 6148 1921 3732 1584 614 3521 180 1611 0 1241 0 0 25 0 0 254 813 410 352 372 833 1092800 1069450 1069190 244104 245787 489891 202201 205897 408098 59170 634640 978807 958350 957462 1485600 1 1 19 19 38 1472020
19-103-10009 2 4222 596 14702 52038 7428 4065 6598 2166 835 1854 22613 3397 1387 879 568 3729 93 1315 3414 222 14580 52639 7316 3997 6447 1986 4067 1529 596 3778 113 1689 0 2097 0 0 23 0 0 260 761 326 400 359 772 1204670 1190100 1189780 256560 260381 516941 237316 243326 480642 60653 681040 1070620 1059370 1058440 1605990 1 1 25 23 48 1593730
19-103-10010 2 5254 435 14688 47120 7772 3130 5414 1711 741 1912 20643 3594 1449 882 717 3663 41 999 6465 605 14820 49390 6361 3826 5527 1523 3513 1537 639 3596 80 1261 0 1475 0 0 18 0 0 283 827 383 414 297 627 1135490 1117320 1116990 243367 245896 489263 221809 227084 448893 55338 639719 1009370 994519 993639 1568140 1 1 14 11 25 1542210
19-104-10013 2 7276 341 11836 53018 7912 3942 6105 2334 795 2532 21239 4551 1258 1176 430 3636 83 1184 8811 396 12760 53092 7224 4361 6306 1853 4184 1278 543 3921 175 1814 0 2187 0 0 8 0 0 266 783 381 382 357 793 1011640 987712 987042 206633 228397 435031 170375 191222 361597 61814 601948 879229 859619 859103 1586150 1 1 224 162 386 1557120
19-104-10014 2 5964 355 13297 55439 8599 4081 5628 1730 970 1308 20196 4519 1363 992 697 3474 62 1232 6830 472 14729 59478 7006 4443 6156 1825 4492 1726 827 4017 122 1804 0 1412 0 0 17 0 0 259 672 299 305 319 779 1308470 1288970 1288910 284018 285985 570003 258525 257355 515880 62485 746108 1166160 1149700 1148340 1826660 1 1 33 24 57 1630580
19-105-10017 2 7018 307 13848 53855 8345 3734 6001 2095 899 1932 20712 4196 1349 645 823 4212 72 1475 3346 1119 13970 55202 7411 3975 5672 1737 3778 1490 657 4089 132 1689 0 1318 0 0 23 0 0 234 745 474 367 378 760 1122360 1104380 1104520 235806 233881 469687 217939 220736 438675 61471 639143 985718 970903 969619 1583800 1 1 51 51 102 1558470
19-105-10018 2 16454 1098 12569 52521 8215 3788 5858 1805 788 1147 21028 3496 1492 665 634 3796 39 1614 10700 617 12813 52098 8091 3901 5367 1646 3544 1388 723 3938 47 1819 0 1464 0 0 42 0 0 330 832 301 319 400 788 1148940 1114080 1113560 225179 227218 452397 237056 237295 474351 59172 614884 1019300 986820 986144 1607900 1 1 19 28 47 1591480
19-105-10020 2 4096 451 13042 48597 7601 3228 5665 1582 778 1670 19769 3612 1187 717 617 3672 103 962 2627 467 13208 48466 6619 3461 5217 1360 3575 1388 718 3783 90 1370 0 862 0 0 6 0 0 216 673 386 439 401 682 1081580 1068850 1068890 233290 235396 468686 209666 214472 424139 54781 619447 958522 948737 947554 1493740 1 1 16 11 27 1452900
For file1 containing (I removed long useless lines):
Subject Seriesnumber
19-1-1001 2 8655 661 15250 60747 800
19-1-1001 10 8992 507 15722 64032 872
19-1-1103 2 3303 317 12146 57569 700
19-1-1103 9 3236 286 12490 59477 700
19-10-1010 2 3344 608 14744 59165 838
19-10-1010 6 3242 616 15205 61330 801
and file2 containig:
Subject seriesnumber
19-10-1010 2
19-10-1166 2
19-102-10005 2
19-102-10006 2
19-103-10009 2
19-103-10010 2
19-104-10013 11
19-104-10014 2
19-105-10017 6
19-105-10018 6
The following awk will output:
$ awk 'NR==FNR{a[$1, $2];next} ($1, $2) in a' file2 file1
19-10-1010 2 3344 608 14744 59165 838
Note that the first file argument to awk is file2 not file1 (small optimization)! How it works:
NR == FNR - if line number is file line number. Ie. choose only first file passed to awk.
a[$1, $2] - remember index $1,$2 in associative array a
next - do not parse rest of script and restart with next line
($1, $2) in a - check if $1, $2 is in associative array a
because of next this is run only for the second file as passed to awk
if this expression returns with true, then the line will be printed (this is how awk works).
Alternatively you could do the follow, but it will store the whole file1 in memory, which is... memory consuming..., the code above only stores $1, $2 indexes in memory.
awk 'NR==FNR{arr[$1, $2]=$0} NR!=FNR{print arr[$1, $2]}' file1 file2
I have a really strange problem and don't know how to solve it.
I am using Ubuntu 18.04.2 together with Python 3.7.3 64-bit and use VScode as an editor.
I am reading data from a database and write it to a csv file with csv.writer
import pandas as pd
import csv
with open(raw_path + station + ".csv", "w+") as f:
file = csv.writer(f)
# Write header into csv
colnames = [par for par in param]
file.writerow(colnames)
# Write data into csv
for row in data:
file.writerow(row)
This works perfectly fine, it provides a .csv file with all the data I read from the database up to the current timestep. However in a later working step I have to read this data to a pandas dataframe and merge it with another pandas dataframe. I read the files like this:
data1 = pd.read_csv(raw_path + file1, sep=',')
data2 = pd.read_csv(raw_path + file2, sep=',')
And then merge the data like this:
comb_data = pd.merge(data1, data2, on="datumsec", how="left").fillna(value=-999)
For 5 out of 6 locations that I do this, everything works perfectly fine, the combined dataset has the same length as the two seperate ones. However for one location pd.read_csv seems not to read the csv files properly. I checked whether the problem is already in the database readout but everything is OK there, I can open both files with sublime and they have the same length, however when I read them with pandas.read_csv one shows less lines. The best part is, this problem is appearing totally random. Sometimes it works and reads the entire file, sometimes not. AND it occures at different locations in the file. Sometimes it stops after approx. 20000 entries, sometimes at 45000, sometimes somewhere else.. just totally random.
Here is an overview of my test output when I print all the lengths of the files
print(len(data1)): 57105
print(len(data2)): 57105
both values directly after read out from database, before writing it anywhere..
After saving the data as csv as described above and opening it in excel or sublime or anything I can confirm that the data contains 57105 rows. Everything is where it is supposed to be.
However if I try to read the data as with pd.read_csv
print(len(data1)): 48612
print(len(data2)): 57105
both values after reading in the data from the csv file
data1 48612
datumsec tl rf ff dd ffx
0 1538352000 46 81 75 288 89
1 1538352600 47 79 78 284 93
2 1538353200 45 82 79 282 93
3 1538353800 44 84 71 284 91
4 1538354400 43 86 77 288 96
5 1538355000 43 85 78 289 91
6 1538355600 46 80 79 286 84
7 1538356200 51 72 68 285 83
8 1538356800 52 71 68 281 73
9 1538357400 48 75 68 276 80
10 1538358000 45 78 62 271 76
11 1538358600 42 82 66 273 76
12 1538359200 43 81 70 274 78
13 1538359800 44 80 68 275 78
14 1538360400 45 78 66 279 72
15 1538361000 45 78 67 282 73
16 1538361600 43 79 63 275 71
17 1538362200 43 81 69 280 74
18 1538362800 42 80 70 281 76
19 1538363400 43 78 69 285 77
20 1538364000 43 78 71 285 77
21 1538364600 44 75 61 288 71
22 1538365200 45 73 56 290 62
23 1538365800 45 72 44 297 57
24 1538366400 44 73 51 286 57
25 1538367000 43 76 61 281 70
26 1538367600 40 79 66 284 73
27 1538368200 39 78 70 291 76
28 1538368800 38 80 71 287 81
29 1538369400 36 81 74 285 81
... ... .. ... .. ... ...
48582 1567738800 7 100 0 210 0
48583 1567739400 6 100 0 210 0
48584 1567740000 5 100 0 210 0
48585 1567740600 6 100 0 210 0
48586 1567741200 4 100 0 210 0
48587 1567741800 4 100 0 210 0
48588 1567742400 5 100 0 210 0
48589 1567743000 4 100 0 210 0
48590 1567743600 4 100 0 210 0
48591 1567744200 4 100 0 209 0
48592 1567744800 4 100 0 209 0
48593 1567745400 5 100 0 210 0
48594 1567746000 6 100 0 210 0
48595 1567746600 5 100 0 210 0
48596 1567747200 5 100 0 210 0
48597 1567747800 5 100 0 210 0
48598 1567748400 5 100 0 210 0
48599 1567749000 6 100 0 210 0
48600 1567749600 6 100 0 210 0
48601 1567750200 5 100 0 210 0
48602 1567750800 4 100 0 210 0
48603 1567751400 5 100 0 210 0
48604 1567752000 6 100 0 210 0
48605 1567752600 7 100 0 210 0
48606 1567753200 6 100 0 210 0
48607 1567753800 5 100 0 210 0
48608 1567754400 6 100 0 210 0
48609 1567755000 7 100 0 210 0
48610 1567755600 7 100 0 210 0
48611 1567756200 7 100 0 210 0
[48612 rows x 6 columns]
datumsec tl rf schnee ival6
0 1538352000 115 61 25 107
1 1538352600 115 61 25 107
2 1538353200 115 61 25 107
3 1538353800 115 61 25 107
4 1538354400 115 61 25 107
5 1538355000 115 61 25 107
6 1538355600 115 61 25 107
7 1538356200 115 61 25 107
8 1538356800 115 61 25 107
9 1538357400 115 61 25 107
10 1538358000 115 61 25 107
11 1538358600 115 61 25 107
12 1538359200 115 61 25 107
13 1538359800 115 61 25 107
14 1538360400 115 61 25 107
15 1538361000 115 61 25 107
16 1538361600 115 61 25 107
17 1538362200 115 61 25 107
18 1538362800 115 61 25 107
19 1538363400 115 61 25 107
20 1538364000 115 61 25 107
21 1538364600 115 61 25 107
22 1538365200 115 61 25 107
23 1538365800 115 61 25 107
24 1538366400 115 61 25 107
25 1538367000 115 61 25 107
26 1538367600 115 61 25 107
27 1538368200 115 61 25 107
28 1538368800 115 61 25 107
29 1538369400 115 61 25 107
... ... ... ... ... ...
57075 1572947400 -23 100 -2 -999
57076 1572948000 -23 100 -2 -999
57077 1572948600 -22 100 -2 -999
57078 1572949200 -23 100 -2 -999
57079 1572949800 -24 100 -2 -999
57080 1572950400 -23 100 -2 -999
57081 1572951000 -21 100 -1 -999
57082 1572951600 -21 100 -1 -999
57083 1572952200 -23 100 -1 -999
57084 1572952800 -23 100 -1 -999
57085 1572953400 -22 100 -1 -999
57086 1572954000 -23 100 -1 -999
57087 1572954600 -22 100 -1 -999
57088 1572955200 -24 100 0 -999
57089 1572955800 -24 100 0 -999
57090 1572956400 -25 100 0 -999
57091 1572957000 -26 100 -1 -999
57092 1572957600 -26 100 -1 -999
57093 1572958200 -27 100 -1 -999
57094 1572958800 -25 100 -1 -999
57095 1572959400 -27 100 -1 -999
57096 1572960000 -29 100 -1 -999
57097 1572960600 -28 100 -1 -999
57098 1572961200 -28 100 -1 -999
57099 1572961800 -27 100 -1 -999
57100 1572962400 -29 100 -2 -999
57101 1572963000 -29 100 -2 -999
57102 1572963600 -29 100 -2 -999
57103 1572964200 -30 100 -2 -999
57104 1572964800 -28 100 -2 -999
[57105 rows x 5 columns]
To me there is no obvious reason in the data why it should have problems reading the entire file and obviously there are none, considering that sometimes it reads the entire file and sometimes not.
I am really clueless about this. Do you have any idea how to cope with that and what could be the problem?
I finally solved my problem and as expected it was not within the file itself. I am using multiprocesses to run the named functions and some other things in parallel. The reading from database + writing to csv file and reading from csv file are performed in two different processes. Therefore the second process (reading from csv) did not know that the csv file was still being written and read only what was already available in the csv file. Because the file was opened by a different process it did not throw an exception when being opened.
I thought I already took care of this but obviously not thoroughly enough, excluding every possible case.
I had completely the same problem with a different application and also did not understand what was wrong, because sometimes it worked and sometimes it didn't.
In a for loop, I was extracting the last two rows of a dataframe that I was creating in the same file. Sometimes, the extracted rows where not the last two at all, but most of the times it worked fine. I guess the program started extracting the last two rows before the writing process was done.
I paused the script for half a second to make sure the writing process is done:
import time
time.sleep(0.5)
However, I don't think this is not a very elegant solution, since it might not be sufficient if somebody with a slower computer uses the script for instance.
Vroni, how did you solve this in the end, is there a way to define that a specific process must not be processed parallel with other tasks. I did not define anything about parallel processing in my program, so I think if this is the cause it is done automatically.
I got two files.
file 1:
4
14
18
45
53
60
64
102
106
158
162
file2:
28 1 2
54 1 2
90 1 1
103 1 1
155 1 17
191 1 1
235 1 1
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
529 57 3
542 53 1
560 58 6
562 164 25
568 164 5
I want to extract the value from file2 if the second column of file two matches the value in file 1.
So the expected output will be:
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
542 53 1
I saw many of the solution online is using python or Perl, however, I want to use linux command to do this, any idea?
This should do it?
awk 'FNR==NR{a[$0]++};FNR!=NR{if($2 in a){print}}' file1 file2
245 4 1
275 4 1
362 4 1
377 18 1
391 18 1
413 18 2
466 18 2
492 18 2
494 18 41
498 45 1
522 45 1
542 53 1
Explanation:
we hand awk both files (order is important in this case!).
as long as we read the first file (FNR==NR) we store each value in an array a[$1]++
when we reach the second file we just check if values from the second file's second column ($2) are in the array; if yes, we print them.
I need to get the line before the last of a given prefix on a whole file, which command can I use to do so(sed or any other option)?
For example: (the prefix is the first two numbers of each line: e.g., (16 316) (0 312), etc)
file.txt:
16 316 32
16 316 0
0 312 9997
0 312 0
0 312 21309
0 312 0
0 313 10108
0 313 0
0 313 32732
0 313 0
0 314 9277
0 314 0
0 314 19781
0 314 0
0 315 7
0 315 0
0 315 9380
0 315 0
0 315 30388
0 315 0
The output should be:
16 316 32
0 312 21309
0 313 32732
0 314 19781
0 315 30388
Thanks
I'm assuming those blank lines are not actually in your file.
tac file.txt | awk '$1 " " $2 != prefix {getline; print; prefix = $1 " " $2}' | tac
16 316 32
0 312 21309
0 313 32732
0 314 19781
0 315 30388
Reverse the file. When I see a new prefix, print the next line. Then reverse the output.
This awk should work:
awk '$3>0{a[$1,$2]=$0} END{for (i in a) print a[i]}' file
16 316 32
0 312 21309
0 313 32732
0 314 19781
0 315 30388
First, bash is not installed on the system I am using. So, no bash based answers please. Ash does not do ifs with Regexes.
In an ash shell script I have a list of acceptable responses:
# NB: IFS = the default IFS = <space><tab><newline>
# 802.11 channel names
channels="1 2 3 4 5 5u 6 7 7u 8 8u 9 9u 10 10u 11 11u 2l 36 3l 40 40u 44 48
48u 4l 52 56 56u 5l 60 61 64 64u 7l 100 104 104u 108 112
112u 116 132 136 136u 140 144 144u 149 153 153u 157 161 161u 165 36l
44l 52l 60l 100l 108l 132l 140l 149l 157l 36/80 40/80 44/80 48/80 52/80 56/80 60/80
64/80 100/80 104/80 108/80 112/80 132/80 136/80 140/80 144/80 149/80 153/80 157/80 161/80"
A menu routine has returned "MENU_response" containing a possibly matching response
I want to see if I got back a valid response.
for t in "$channels"; do
echo "B MENU_response = \"${MENU_response}\" test = \"${t}\""
if [ "${MENU_response}" = "${t}" ]; then
break
fi
done
The echo in the loop is reposting that $t = all of $channels, which makes no sense. I have used this technique in several other places and it works fine.
Can someone tell me why this is happening? Do I need to wrap quotes around each individual channel?
Removing the quotes around "$channels" works for me:
$ channels="1 2 3 4 5 5u 6 7 7u 8 8u 9 9u 10 10u 11 11u 2l 36 3l 40 40u 44 48
> 48u 4l 52 56 56u 5l 60 61 64 64u 7l 100 104 104u 108 112
> 112u 116 132 136 136u 140 144 144u 149 153 153u 157 161 161u 165 36l
> 44l 52l 60l 100l 108l 132l 140l 149l 157l 36/80 40/80 44/80 48/80 52/80 56/80 60/80
> 64/80 100/80 104/80 108/80 112/80 132/80 136/80 140/80 144/80 149/80 153/80 157/80 161/80"
$ for t in $channels; do echo $t; done
1
2
3
4
5
# etc.