We have deployed a global Apache Cassandra cluster (node: 12, RF: 3, version: 3.11.2) in our production environment. We are running into an issue where running major compaction on column family is failing to clear tombstones from one node (out of 3 replicas) even though metadata information shows min timestamp passed gc_grace_seconds set on the table.
Here is sstable metadata output
SSTable: mc-4302-big
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1
Maximum timestamp: 1560326019515476
SSTable min local deletion time: 1560233203
SSTable max local deletion time: 2147483647
Compressor: org.apache.cassandra.io.compress.LZ4Compressor
Compression ratio: 0.8808303792058351
TTL min: 0
TTL max: 0
First token: -9201661616334346390 (key=bca773eb-ecbb-49ec-9330-cc16da310b58:::)
Last token: 9117719078924671254 (key=7c23b975-5354-4c82-82e5-1762bac75a8d:::)
minClustringValues: [00000f8f-74a9-4ce3-9d87-0a4dabef30c1]
maxClustringValues: [ffffc966-a02c-4e1f-bdd1-256556624288]
Estimated droppable tombstones: 46.31761624099541
SSTable Level: 0
Repaired at: 0
Replay positions covered: {}
totalColumnsSet: 0
totalRows: 618382
Estimated tombstone drop times:
1560233680: 353
1560234658: 237
1560235604: 176
1560236803: 471
1560237652: 402
1560238342: 195
1560239166: 373
1560239969: 356
1560240586: 262
1560241207: 247
1560242037: 387
1560242847: 357
1560243742: 280
1560244469: 283
1560245095: 353
1560245957: 357
1560246773: 362
1560247956: 449
1560249034: 217
1560249849: 310
1560251080: 296
1560251984: 304
1560252993: 239
1560253907: 407
1560254839: 977
1560255761: 671
1560256486: 317
1560257199: 679
1560258020: 703
1560258795: 507
1560259378: 298
1560260093: 2302
1560260869: 2488
1560261535: 2818
1560262176: 2842
1560262981: 1685
1560263708: 1830
1560264308: 808
1560264941: 1990
1560265753: 1340
1560266708: 2174
1560267629: 2253
1560268400: 1627
1560269174: 2347
1560270019: 2579
1560270888: 3947
1560271690: 1727
1560272446: 2573
1560273249: 1523
1560274086: 3438
1560275149: 2737
1560275966: 3487
1560276814: 4101
1560277660: 2012
1560278617: 1198
1560279680: 769
1560280441: 1337
1560281033: 608
1560281876: 2065
1560282546: 2926
1560283128: 6305
1560283836: 824
1560284574: 71
1560285166: 140
1560285828: 118
1560286404: 83
1560295835: 72
1560296951: 456
1560297814: 670
1560298496: 271
1560299333: 473
1560300159: 284
1560300831: 127
1560301551: 536
1560302309: 425
1560303302: 860
1560304064: 465
1560304782: 319
1560305657: 323
1560306552: 236
1560307454: 368
1560308409: 320
1560309178: 210
1560310091: 177
1560310881: 85
1560311970: 147
1560312706: 76
1560313495: 88
1560314847: 687
1560315817: 1618
1560316544: 1245
1560317423: 5361
1560318491: 2060
1560319595: 5853
1560320587: 5390
1560321473: 3868
1560322644: 5784
1560323703: 6861
1560324838: 7200
1560325744: 5642
Count Row Size Cell Count
1 0 3054
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
10 0 0
12 0 0
14 0 0
17 0 0
20 0 0
24 0 0
29 0 0
35 0 0
42 0 0
50 0 0
60 98 0
72 49 0
86 46 0
103 2374 0
124 39 0
149 36 0
179 43 0
215 18 0
258 26 0
310 24 0
372 18 0
446 16 0
535 19 0
642 27 0
770 17 0
924 12 0
1109 14 0
1331 23 0
1597 20 0
1916 12 0
2299 11 0
2759 11 0
3311 11 0
3973 12 0
4768 5 0
5722 8 0
6866 5 0
8239 5 0
9887 6 0
11864 5 0
14237 10 0
17084 1 0
20501 8 0
24601 2 0
29521 2 0
35425 3 0
42510 2 0
51012 2 0
61214 1 0
73457 2 0
88148 3 0
105778 0 0
126934 3 0
152321 2 0
182785 1 0
219342 0 0
263210 0 0
315852 0 0
379022 0 0
454826 0 0
545791 0 0
654949 0 0
785939 0 0
943127 0 0
1131752 0 0
1358102 0 0
1629722 0 0
1955666 0 0
2346799 0 0
2816159 0 0
3379391 1 0
4055269 0 0
4866323 0 0
5839588 0 0
7007506 0 0
8409007 0 0
10090808 1 0
12108970 0 0
14530764 0 0
17436917 0 0
20924300 0 0
25109160 0 0
30130992 0 0
36157190 0 0
43388628 0 0
52066354 0 0
62479625 0 0
74975550 0 0
89970660 0 0
107964792 0 0
129557750 0 0
155469300 0 0
186563160 0 0
223875792 0 0
268650950 0 0
322381140 0 0
386857368 0 0
464228842 0 0
557074610 0 0
668489532 0 0
802187438 0 0
962624926 0 0
1155149911 0 0
1386179893 0 0
1663415872 0 0
1996099046 0 0
2395318855 0 0
2874382626 0
3449259151 0
4139110981 0
4966933177 0
5960319812 0
7152383774 0
8582860529 0
10299432635 0
12359319162 0
14831182994 0
17797419593 0
21356903512 0
25628284214 0
30753941057 0
36904729268 0
44285675122 0
53142810146 0
63771372175 0
76525646610 0
91830775932 0
110196931118 0
132236317342 0
158683580810 0
190420296972 0
228504356366 0
274205227639 0
329046273167 0
394855527800 0
473826633360 0
568591960032 0
682310352038 0
818772422446 0
982526906935 0
1179032288322 0
1414838745986 0
Estimated cardinality: 3054
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1560233203
EncodingStats minTimestamp: 1
KeyType: org.apache.cassandra.db.marshal.CompositeType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)
ClusteringTypes: [org.apache.cassandra.db.marshal.UUIDType]
StaticColumns: {}
RegularColumns: {}
So far here is what we have tried,
1) major compaction with lower gc_grace_seconds
2) nodetool garbagecollect
3) nodetool scrub
None of the above methods is helping. Again, this is only happening for one node (out of total 3 replicas)
The tombstone markers generated during your major compaction are just that, markers. The data has been removed but a delete marker is left in place so that the other replicas can have gc_grace_seconds to process them too. The tombstone markers are fully dropped the next time the SSTable is compacted. Unfortunately because you've run a major compaction (rarely ever recommended) it may be a long time until there are suitable SSTables for compaction with it to clean up the tombstones. Remember that the tombstone drop will also only happen after local_delete_time + gc_grace_seconds as defined by the table.
If you're interested in learning more about how tombstones and compaction work together in the context of delete operations I suggest reading the following articles:
https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/dml/dmlAboutDeletes.html
https://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
Related
I have a data in formatted in 10-columns as follow:
# col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
I want to plot all the data individually as a single column,that is the 11th data will be 11 and so on. How can I do this in gnuplot directly?
example of the data can be obtained here: data
This is a bit a special data format. Well, you could rearrange it with whatever tools, but you can also rearrange with gnuplot only.
For this, you need to have the data in a datablock. How to get it from a file into a datablock see the answer to this question: gnuplot: load datafile 1:1 into datablock
Code:
### plot special dataformat
reset session
$Data <<EOD
#SP# 12 10511.100
265 7 2 5 2 10 6 10 4 4
8 8 4 8 8 7 17 16 12 17
9 23 18 16 18 26 18 31 31 38
35 58 48 95 107 156 161 199 282 398
448 704 851 1127 1399 1807 2272 2724 3376 4077
4903 6458 7158 9045 9279 12018 13765 14212 17397 19166
21159 23650 25537 28003 29645 35385 34328 36021 42720 39998
45825 48111 49548 46591 53471 53888 56166 61747 57867 59226
59888 65953 61544 68233 68770 69336 63925 69660 69781 70590
76419 70791 70411 75909 70082 76136 69906 75069 75168 74690
73897 73656 73134 77603 70795 77603 68092 74208 73385 66906
71924 70866 74408 67869 67703 70924 65004 68566 62694 65917
64636 62988 62372 64923 59231 58266 60636 59191 54090 56428
55222 53519 52724 53973 49649 51418 46858 48289 46800 45395
44235 43087 40999 42777 39129 40020 37985 37019 35739 34925
33344 33968 30874 31292 30141 29528 27956 27001 25712 25842
23857 23752 22900 21926 20853 19897 19063 18997 18345 16499
16631 15810 15793 14158 13609 13429 13022 12276 11579 10810
10930 9743 9601 8939 8762 8338 7723 7470 6815 6774
6342 6056 5939 5386 5264 4889 4600 4380 4151 3982
3579 3557 3335 3220 3030 2763 2769 2516 2409 2329
2310 2153 2122 1948 1813 1879 1671 1666 1622 1531
1584 1455 1430 1409 1345 1291 1300 1284 1373 1261
1189 1373 1258 1220 1134 1261 1213 1116 1288 1087
1113 1137 1182 1087 1213 1061 1132 1211 1004 1081
1130 1144 1208 1089 1114 1088 1116 1188 1137 1150
1216 1101 1092 1148 1115 1161 1262 1157 1206 1183
1177 1274 1203 1150 1161 1206 1215 1166 1248 1217
1212 1250 1239 1292 1226 1262 1209 1329 1178 1383
1219 1175 1265 1264 1361 1206 1266 1285 1189 1284
1330 1223 1325 1338 1250 1322 1256 1252 1353 1269
1278 1281 1349 1256 1326 1309 1262 1374 1303 1293
1350 1297 1262 1144 1305 1224 1259 1292 1447 1187
1342 1267 1197 1327 1189 1248 1250 1198 1290 1299
1233 1173 1327 1206 1231 1205 1182 1232 1233 1158
1193 1137 1180 1211 1196 1176 1096 1131 1086 1134
1125 1122 1090 1145 1053 1067 1097 1003 1044 993
1056 1006 915 959 923 943 1026 930 927 929
914 849 920 818 808 888 877 808 848 867
735 785 769 738 744 716 708 677 660 657
589 626 649 581 578 597 580 539 495 541
528 402 457 435 425 417 415 408 366 375
322 341 292 286 272 313 263 255 246 207
213 176 195 180 181 168 153 140 114 130
106 100 97 92 71 71 72 59 57 49
43 42 35 38 36 26 33 29 29 14
22 19 11 11 14 14 6 6 9 4
7 5 2 5 1 3 0 0 0 2
0 1 3 0 2 0 0 0 0 1
1 0 3 1 0 1 2 1 2 0
0 3 0 0 1 0 0 1 2 0
1 0 0 0 0 0 0 1 2 0
2 3 1 0 0 3 1 0 1 0
0 1 0 1 1 0 0 1 0 0
0 1 0 0 1 2 1 2 0 1
1 1 0 0 0 0 0 0 0 2
1 1 0 0 0 1 0 1 0 1
1 1 1 1 1 0 0 3 0 2
1 1 1 0 1 0 1 0 0 2
1 1 0 1 0 0 0 1 0 1
0 0 0 0 2 3 1 2 0 0
1 2 0 1 2 1 1 1 1 1
1 0 1 0 0 2 1 2 2 1
0 0 1 1 1 0 1 1 1 0
0 2 1 1 1 0 0 0 1 1
0 2 1 1 2 0 2 1 1 1
1 1 0 0 0 2 0 0 1 0
1 0 1 1 2 2 0 0 0 3
2 0 0 0 2 0 1 1 0 1
0 0 0 2 1 4 0 1 0 1
2 0 0 0 0 0 1 0 0 2
0 0 0 0 1 0 0 0 0 0
0 1 0 1 0 0 0 0 1 2
0 0 1 0 1 0 0 1 0 1
0 0 2 1 1 0 0 1 0 0
0 0 0 1 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1 0 0
0 0 0 1 0 0 0 1 0 0
1 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 1
0 0 1 2 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
0 0 0 0
EOD
set print $Data2
HeaderLines = 1
Line = ''
do for [i=HeaderLines+1:|$Data|] {
Line = Line.sprintf(' %s',$Data[i][1:strlen($Data[i])-1])
}
print Line
set print
WavStart = 450
WavStep = 0.1
myWav(col) = WavStart + column(col)*WavStep
set grid x,y
plot $Data2 u (myWav(1)):0:3 matrix w l lc "red" notitle
### end of code
Result:
I have some bladefs volume and I just checked /proc/self/mountstats where I see statistics per operations:
...
opts: rw,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.0.2.100,mountvers=3,mountport=903,mountproto=tcp,local_lock=all
age: 18129
caps: caps=0x3fc7,wtmult=512,dtsize=32768,bsize=0,namlen=255
sec: flavor=1,pseudoflavor=1
events: 18840 116049 23 5808 22138 21048 146984 13896 287 2181 0 7560 31380 0 9565 5106 0 6471 0 0 13896 0 0 0 0 0 0
bytes: 339548407 48622919 0 0 311167118 48622919 76846 13896
RPC iostats version: 1.0 p/v: 100003/3 (nfs)
xprt: tcp 875 1 7 0 0 85765 85764 1 206637 0 37 1776 35298
per-op statistics
NULL: 0 0 0 0 0 0 0 0
GETATTR: 18840 18840 0 2336164 2110080 92 8027 8817
SETATTR: 0 0 0 0 0 0 0 0
LOOKUP: 21391 21392 0 3877744 4562876 118 103403 105518
ACCESS: 20183 20188 0 2584304 2421960 72 10122 10850
READLINK: 0 0 0 0 0 0 0 0
READ: 3425 3425 0 465848 311606600 340 97323 97924
WRITE: 2422 2422 0 48975488 387520 763 200645 201522
CREATE: 2616 2616 0 447392 701088 21 870 1088
MKDIR: 858 858 0 188760 229944 8 573 705
SYMLINK: 0 0 0 0 0 0 0 0
MKNOD: 0 0 0 0 0 0 0 0
REMOVE: 47 47 0 6440 6768 0 8 76
RMDIR: 23 23 0 4876 3312 0 3 5
RENAME: 23 23 0 7176 5980 0 5 6
LINK: 0 0 0 0 0 0 0 0
READDIR: 160 160 0 23040 4987464 0 16139 16142
READDIRPLUS: 15703 15703 0 2324044 8493604 43 1041634 1041907
FSSTAT: 1 1 0 124 168 0 0 0
FSINFO: 2 2 0 248 328 0 0 0
PATHCONF: 1 1 0 124 140 0 0 0
COMMIT: 68 68 0 9248 10336 2 272 275...
about my bladefs. I am interested in READ operation statistics. As I know the last column (97924) means:
execute: How long ops of this type take to execute (from
rpc_init_task to rpc_exit_task) (microsecond)
How to interpret this? Is it the average time of each read operation regardless of the block size? I have very strong suspicion that I have problems with NFS: am I right? The value of 0.1 sec looks bad for me, but I am not sure how exactly to interpret this time: average, some sum...?
After reading the kernel source, the statistics are printed from net/sunrpc/stats.c rpc_clnt_show_stats() and the 8th column of per-op statistics statistics seems to printed from _print_rpc_iostats, it's printing struct rpc_iostats member om_execute. (The newest kernel has 9 columns with errors on the last column.)
That member looks to be only referenced/actually changed in rpc_count_iostats_metrics with:
execute = ktime_sub(now, task->tk_start);
op_metrics->om_execute = ktime_add(op_metrics->om_execute, execute);
Assuming ktime_add does what it says, the value of om_execute only increases. So the 8th column of mountstats would be the sum of the time of operations of this type.
I've a Dataframe as below. (resulted from pivot_table() method)
Location Loc 2 Loc 3 Loc 5 Loc 8 Loc 9
Item
1 404 317 272 113 449
3 1,205 870 846 371 1,632
5 208 218 128 31 268
7 107 54 57 17 179
9 387 564 245 83 571
10 364 280 115 34 252
16 104 80 72 22 143
17 111 85 44 10 209
18 124 182 67 27 256
19 380 465 219 103 596
if you take a closer look at it, there are missing Locations (eg, Loc 1, Loc, 4, etc) and missing Items (eg, 2, 4,8, etc)
I want to export this to my Excel pre-defined Template which has all the Locations & Items & fill the table based on Items & Values.
I know I can export the dataframe to a different excel sheet & use SUMIFS() or INDEX(), MATCH() formulas. but, I want to do this directly from Python/Panda to excel.
Below should be the result after exporting
Loc 1 Loc 2 Loc 3 Loc 4 Loc 5 Loc 6 Loc 7 Loc 8 Loc 9
1 0 404 317 0 272 0 0 113 449
2 0 0 0 0 0 0 0 0 0
3 0 1205 870 0 846 0 0 371 1632
4 0 0 0 0 0 0 0 0 0
5 0 208 218 0 128 0 0 31 268
6 0 0 0 0 0 0 0 0 0
7 0 107 54 0 57 0 0 17 179
8 0 0 0 0 0 0 0 0 0
9 0 387 564 0 245 0 0 83 571
10 0 364 280 0 115 0 0 34 252
11 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0
16 0 104 80 0 72 0 0 22 143
17 0 111 85 0 44 0 0 10 209
18 0 124 182 0 67 0 0 27 256
19 0 380 465 0 219 0 0 103 596
20 0 0 0 0 0 0 0 0 0
Use DataFrame.reindex with new index and columns values in arrays or lists:
idx = np.arange(1, 21)
cols = [f'Loc {x}' for x in np.arange(1, 10)]
df = df.reindex(index=idx, columns=cols, fill_value=0)
print (df)
Loc 1 Loc 2 Loc 3 Loc 4 Loc 5 Loc 6 Loc 7 Loc 8 Loc 9
1 0 404 317 0 272 0 0 113 449
2 0 0 0 0 0 0 0 0 0
3 0 1,205 870 0 846 0 0 371 1,632
4 0 0 0 0 0 0 0 0 0
5 0 208 218 0 128 0 0 31 268
6 0 0 0 0 0 0 0 0 0
7 0 107 54 0 57 0 0 17 179
8 0 0 0 0 0 0 0 0 0
9 0 387 564 0 245 0 0 83 571
10 0 364 280 0 115 0 0 34 252
11 0 0 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0
13 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0
16 0 104 80 0 72 0 0 22 143
17 0 111 85 0 44 0 0 10 209
18 0 124 182 0 67 0 0 27 256
19 0 380 465 0 219 0 0 103 596
20 0 0 0 0 0 0 0 0 0
I wrote two programs to run on Linux, each using a different algorithm, and I want to find a way (preferably using a benchmarking software) to compare the CPU usage and IO operations between these two programs.
Is there such a thing? and if yes, where can I find them. Thanks.
You can try hardinfo
Or there are like n different tools measuring system performance if measuring it while running your app solves your purpose
And you can also check this thread
You might try vmstat command:
vmstat 2 20 > vmstat.txt
20 samples of 2 seconds
bi = KB in, bo = KB out with wa = waiting for I/O
I/O can also increase cache demands
%CPU utilisation = us (user) = sy (system)
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 277504 17060 82732 0 0 91 87 1432 236 11 3 84 1
0 0 0 277372 17068 82732 0 0 0 24 1361 399 23 8 59 10
test start
0 1 0 275240 17068 82732 0 0 0 512 1342 305 24 4 69 4
2 1 0 275232 17068 82780 0 0 24 10752 4176 216 7 8 0 85
1 1 0 275240 17076 82732 0 0 12288 2590 5295 243 15 8 0 77
0 1 0 275240 17076 82748 0 0 8 11264 4329 214 6 12 0 82
0 1 0 275240 17076 82780 0 0 16 11264 4278 233 15 10 0 75
0 1 0 275240 17084 82780 0 0 19456 542 6563 255 10 7 0 83
0 1 0 275108 17084 82748 0 0 5128 3072 3501 265 16 37 0 47
3 1 0 275108 17084 82748 0 0 924 5120 8369 3845 12 33 0 55
0 1 0 275116 17092 82748 0 0 1576 85 11483 6645 5 50 0 45
1 1 0 275116 17092 82748 0 0 0 136 2304 689 3 9 0 88
2 1 0 275084 17100 82732 0 0 0 352 2374 800 14 26 0 61
0 0 0 275076 17100 82732 0 0 546 118 2408 1014 35 17 47 1
0 1 0 275076 17104 82732 0 0 0 62 1324 76 3 2 89 7
1 1 0 275076 17108 82732 0 0 0 452 1879 442 8 13 66 12
0 0 0 275116 17108 82732 0 0 800 352 2456 1195 19 17 56 8
0 1 0 275116 17112 82732 0 0 0 54 1325 76 4 1 88 8
test end
1 1 0 275116 17116 82732 0 0 0 510 1717 286 6 10 72 11
1 0 0 275076 17116 82732 0 0 1600 1152 3087 1344 23 29 41 7
I would like to convert this vector with strings in each row and spaces separating the elements within one string:
> v.input_red
[1] "pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 "
[2] "pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 "
[3] "pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 "
to a dataframe with a column for each element. But I'm not quite sure how to extract the elements from the strings. Best way would be to convert the whole thing at once somehow, I guess..
Wanted result-dataframe (created manually):
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33 V34 V35
1 pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249
2 pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249
3 pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249
Thanks in advance!
Matthias
For quite some time, read.table and family have had a text argument that lets you read directly from character vectors. There's no need to write the object to a file first.
Your sample data...
v.input_red <- c("pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 ",
"pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 ",
"pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 ")
... directly read in:
read.table(text = v.input_red, header = FALSE)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
# 1 pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0
# 2 pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0
# 3 pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0
# V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33
# 1 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff
# 2 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff
# 3 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff
# V34 V35 V36 V37
# 1 0 249 0 0
# 2 0 249 0 0
# 3 0 249 0 0
Assuming file is a file name that you save on your system:
writeLines(v.input_red, file)
data <- read.table(file)
Is this solution what you were looking for?
s1 <- "pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 "
s2 <- "pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 "
s3 <- "pm 0 100 2.1 59 70 15.5 14.8 31 984 32 0 56 55 0 0 0 0 0 0 -60 -260 0 0 6 0 0 0 0 0 20 8 2ff 0 249 0 0 "
df <- t(data.frame(strsplit(s1, " "),strsplit(s2, " "),strsplit(s3, " ")))
row.names(df) <- c("s1", "s2", "s3")
strsplit splits the string at each space char. Concatenated as data.frame gives you a df wih 3 columns. So you have to transpose it with t. I changes row names for better readability.