SAS - Column Pointer Error when Importing COBOL Structured Data

SAS - Column Pointer Error when Importing COBOL Structured Data - io

I have a structure MF file that I'm now trying to import into SAS. The fixed portion of the record is working, but when I hit the Days line the column pointer is at 30. The variable is at 15. How did the column pointer jump to 30 vs my expected value and how can I update the below.
data test;
INFILE file1 recfm=N;
INPUT seqno s370fPD9.
type $ebcdic2.
record_type $ebcdic1.
occurs s370fPD2.
#;
ARRAY Day{60} Day1-Day60;
ARRAY AMT{60} AMT1-AMT60;
ARRAY CRED{60} CRED1-CRED60;
ARRAY PAY{60} PAY1-PAY60;
DO I = 1 TO OCCURS;
input Days{I} s370fPD2.
AMT{I} s370fPD6. +2
CRED{I} s370fPD6. +2
PAY{I} s370fPD8. #;
end;
run;
I get the following in the log. While there isn't a note for Amt1, it is not populated correctly:
NOTE: INVALID DATA for Days1 at byte position 30-31.
NOTE: INVALID DATA for CRED1 at byte position 52-57.
NOTE: INVALID DATA for PAY1 at byte position 70-71.
For the first record, days1 should be at 15. Cred1 at 25 Pay1 at 33.

Related

How do I fix USER FATAL MESSAGE 740?

How do I fix USER FATAL MESSAGE 740? This error is generated by Nastran when I try to run a BDF/DAT file of mine.
*** USER FATAL MESSAGE 740 (RDASGN)
UNIT NUMBER 5 HAS ALREADY BEEN ASSIGNED TO THE LOGICAL NAME INPUT
USER ACTION: CHANGE THE UNIT NUMBER ON THE ASSIGN STATEMENT AND IF THE UNIT IS USED FOR
PARAM,POST,<0 THEN SPECIFY PARAM,OUNIT2 WITH THE NEW UNIT NUMBER.
AVOID USING THE FOLLOWING UNIT NUMBERS THAT ARE ASSIGNED TO SPECIAL FILES IN MSC.NASTRAN:
1 THRU 12, 14 THRU 22, 40, 50, 51, 91, 92. SEE THE MSC.NASTRAN INSTALLATIONS/OPERATIONS
GUIDE SECTION ON MAKING FILE ASSIGNMENTS OR MSC.NASTRAN QUICK REFERENCE GUIDE ON
ASSIGN PHYSICAL FILE FOR REFERENCE.
Below is the head of my BDF file.
assign userfile='SUB1_PLATE.csv', status=UNKNOWN, form=formatted, unit=52
SOL 200
CEND
ECHO = NONE
DESOBJ(MIN) = 35
set 30=1008,1007,1015,1016
DESMOD=SUB1_PLATE
SUBCASE 1
$! Subcase name : DefaultLoadCase
$LBCSET SUBCASE1 DefaultLbcSet
ANALYSIS = STATICS
SPC = 1
LOAD = 6
DESSUB = 99
DISPLACEMENT(SORT1,PLOT,REAL)=ALL
STRESS(SORT1,PLOT,VONMISES,CORNER)=ALL
BEGIN BULK
param,xyunit,52
[...]
ENDDATA

Below is the solution
Correct
assign userfile='SUB1_PLAT.csv', status=UNKNOWN, form=formatted, unit=52
I shortened the name of CSV file to SUB1_PLAT.csv. This reduced the length of the line to 72 characters.
Incorrect
assign userfile='SUB1_PLATE.csv', status=UNKNOWN, form=formatted, unit=52
The file management section is limited to 72 characters, spaces included. The incorrect line stretches 73 characters. The nastran reader ignores the 73rd character and on. Instead of reading "unit=52" the reader reads "unit=5" which triggers the error.
|<--------------------- 72 Characters -------------------------------->||<- Characters are ignored truncated ->
assign userfile='SUB1_PLATE.csv', status=UNKNOWN, form=formatted, unit=52
References
MSC Nastran Reference Guide
The records of the first four sections are input in free-field format
and only columns 1 through 72 are used for data. Any information in
columns 73 through 80 may appear in the printed echo, but will not be
used by the program. If the last character in a record is a comma,
then the record is continued to the next record.

Why does my PySpark regular expression not give more than the first row?

Taking inspiration from this answer: https://stackoverflow.com/a/61444594/4367851 I have been able to split my .txt file into columns in a Spark DataFrame. However, it only gives me the first game - even though the sample .txt file contains many more.
My code:
basefile = spark.sparkContext.wholeTextFiles("example copy 2.txt").toDF().\
selectExpr("""split(replace(regexp_replace(_2, '\\\\n', ','), ""),",") as new""").\
withColumn("Event", col("new")[0]).\
withColumn("White", col("new")[2]).\
withColumn("Black", col("new")[3]).\
withColumn("Result", col("new")[4]).\
withColumn("UTCDate", col("new")[5]).\
withColumn("UTCTime", col("new")[6]).\
withColumn("WhiteElo", col("new")[7]).\
withColumn("BlackElo", col("new")[8]).\
withColumn("WhiteRatingDiff", col("new")[9]).\
withColumn("BlackRatingDiff", col("new")[10]).\
withColumn("ECO", col("new")[11]).\
withColumn("Opening", col("new")[12]).\
withColumn("TimeControl", col("new")[13]).\
withColumn("Termination", col("new")[14]).\
drop("new")
basefile.show()
Output:
+--------------------+---------------+-----------------+--------------+--------------------+--------------------+-----------------+-----------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+
| Event| White| Black| Result| UTCDate| UTCTime| WhiteElo| BlackElo| WhiteRatingDiff| BlackRatingDiff| ECO| Opening| TimeControl| Termination|
+--------------------+---------------+-----------------+--------------+--------------------+--------------------+-----------------+-----------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+
|[Event "Rated Cla...|[White "BFG9k"]|[Black "mamalak"]|[Result "1-0"]|[UTCDate "2012.12...|[UTCTime "23:01:03"]|[WhiteElo "1639"]|[BlackElo "1403"]|[WhiteRatingDiff ...|[BlackRatingDiff ...|[ECO "C00"]|[Opening "French ...|[TimeControl "600...|[Termination "Nor...|
+--------------------+---------------+-----------------+--------------+--------------------+--------------------+-----------------+-----------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+
Input file:
[Event "Rated Classical game"]
[Site "https://lichess.org/j1dkb5dw"]
[White "BFG9k"]
[Black "mamalak"]
[Result "1-0"]
[UTCDate "2012.12.31"]
[UTCTime "23:01:03"]
[WhiteElo "1639"]
[BlackElo "1403"]
[WhiteRatingDiff "+5"]
[BlackRatingDiff "-8"]
[ECO "C00"]
[Opening "French Defense: Normal Variation"]
[TimeControl "600+8"]
[Termination "Normal"]
1. e4 e6 2. d4 b6 3. a3 Bb7 4. Nc3 Nh6 5. Bxh6 gxh6 6. Be2 Qg5 7. Bg4 h5 8. Nf3 Qg6 9. Nh4 Qg5 10. Bxh5 Qxh4 11. Qf3 Kd8 12. Qxf7 Nc6 13. Qe8# 1-0
[Event "Rated Classical game"]
.
.
.
Each game starts with [Event so I feel like it should be doable as the file has repeating structure, alas I can't get it to work.
Extra points:
I don't actually need the move list so if it's easier they can be deleted.
I only want the content of what is inside the " " for each new line once it has been converted to a Spark DataFrame.
Many thanks.

wholeTextFiles reads each file into a single record. If you read only one file, the result will a RDD with only one row, containing the whole text file. The regexp logic in the question returns only one result per row and this will be the first entry in the file.
Probably the best solution would be to split the file at the os level into one file per game (for example here) so that Spark can read the multiple games in parallel. But if a single file is not too big, splitting the games can also be done within PySpark:
Read the file(s):
basefile = spark.sparkContext.wholeTextFiles(<....>).toDF()
Create a list of columns and convert this list into a list of column expressions using regexp_extract:
from pyspark.sql import functions as F
cols = ['Event', 'White', 'Black', 'Result', 'UTCDate', 'UTCTime', 'WhiteElo', 'BlackElo', 'WhiteRatingDiff', 'BlackRatingDiff', 'ECO', 'Opening', 'TimeControl', 'Termination']
cols = [F.regexp_extract('game', rf'{col} \"(.*)\"',1).alias(col) for col in cols]
Extract the data:
split the whole file into an array of games
explode this array into single records
delete the line breaks within each record so that the regular expression works
use the column expressions defined above to extract the data
basefile.selectExpr("split(_2,'\\\\[Event ') as game") \
.selectExpr("explode(game) as game") \
.withColumn("game", F.expr("concat('Event ', replace(game, '\\\\n', ''))")) \
.select(cols) \
.show(truncate=False)
Output (for an input file containing three copies of the game):
+---------------------+-----+-------+------+----------+--------+--------+--------+---------------+---------------+---+--------------------------------+-----------+-----------+
|Event |White|Black |Result|UTCDate |UTCTime |WhiteElo|BlackElo|WhiteRatingDiff|BlackRatingDiff|ECO|Opening |TimeControl|Termination|
+---------------------+-----+-------+------+----------+--------+--------+--------+---------------+---------------+---+--------------------------------+-----------+-----------+
|Rated Classical game |BFG9k|mamalak|1-0 |2012.12.31|23:01:03|1639 |1403 |+5 |-8 |C00|French Defense: Normal Variation|600+8 |Normal |
|Rated Classical game2|BFG9k|mamalak|1-0 |2012.12.31|23:01:03|1639 |1403 |+5 |-8 |C00|French Defense: Normal Variation|600+8 |Normal |
|Rated Classical game3|BFG9k|mamalak|1-0 |2012.12.31|23:01:03|1639 |1403 |+5 |-8 |C00|French Defense: Normal Variation|600+8 |Normal |
+---------------------+-----+-------+------+----------+--------+--------+--------+---------------+---------------+---+--------------------------------+-----------+-----------+

entering text in a file at specific locations by identifying the number being integer or real in linux

I have an input like below
46742 1 48276 48343 48199 48198
46744 1 48343 48344 48200 48199
46746 1 48344 48332 48201 48200
48283 3.58077402e+01 -2.97697746e+00 1.50878647e+02
48282 3.67231688e+01 -2.97771595e+00 1.50419488e+02
48285 3.58558188e+01 -1.98122787e+00 1.50894850e+02
Each segment with the 2nd entry like 1 being integer is like thousands of lines and then starts the segment with the 2nd entry being real like 3.58077402e+01
Before anything beings I have to input a text like
*Revolved
*Gripped
*Crippled
46742 1 48276 48343 48199 48198
46744 1 48343 48344 48200 48199
46746 1 48344 48332 48201 48200
*Cracked
*Crippled
48283 3.58077402e+01 -2.97697746e+00 1.50878647e+02
48282 3.67231688e+01 -2.97771595e+00 1.50419488e+02
48285 3.58558188e+01 -1.98122787e+00 1.50894850e+02
so I need to enter specific texts at those locations. It is worth mentioning that the file is space delimited and not tabs delimited and that the text starting with * has to be at the very left of the line without spacing. The format of the rest of the file should be kept too.
Any suggestions with sed or awk would be highly appreaciated!
The text in the beginning could entered directly so that is not a prime problem since that is the start of the file, problematic is the second bunch of line so identify that the second entry has turned to real.

An awk with fixed strings:
awk 'BEGIN{print "*Revolved\n*Gripped\n*Crippled"}
match($2,"\+")&&!pr{print "*Cracked\n*Crippled";pr=1}1' yourfile
match($2,"\+")&&!pr : When + char is found at $2 field(real number) and pr flag is null.

Use matlab to search excel data file for time range and copy data into variable

In my excel file I have a time column in 12 hr clock time and a bunch of data columns. I have pasted a snippet of it in this post as a code since i cant attach a file. I am trying to build a gui that will take an input from the user like so:
start time: 7:29:32 AM
End time: 7:29:51 AM
Then do the following:
calculate the time that has passed in seconds (should be just a row count, data is gathered once a second)
copy the data in the time range from the "Data 3" column in to a variable perform other calculations on the data copied as needed
I am having some trouble figuring out what to do to search the time data and find its location since it imports as text with xlsread. any ideas?
The data looks like this:
Time Data 1 Data 2 Data 3 Data 4 Data 5
7:29:25 AM 0.878556385 0.388400561 0.076890401 0.93335277 0.884750618
7:29:26 AM 0.695838393 0.712762566 0.014814069 0.81264949 0.450303694
7:29:27 AM 0.250846937 0.508617941 0.24802015 0.722457624 0.47119616
7:29:28 AM 0.206189924 0.82970364 0.819163787 0.060932817 0.73455323
7:29:29 AM 0.161844331 0.768214077 0.154097877 0.988201094 0.951520263
7:29:30 AM 0.704242494 0.371877481 0.944482485 0.79207359 0.57390951
7:29:31 AM 0.072028024 0.120263127 0.577396985 0.694153791 0.341824004
7:29:32 AM 0.241817775 0.32573323 0.484644494 0.377938298 0.090122672
7:29:33 AM 0.500962945 0.540808907 0.582958676 0.043377373 0.041274613
7:29:34 AM 0.087742217 0.596508236 0.020250297 0.926901109 0.45960323
7:29:35 AM 0.268222071 0.291034947 0.598887588 0.575571111 0.136424853
7:29:36 AM 0.42880255 0.349597405 0.936733938 0.232128788 0.555528823
7:29:37 AM 0.380425154 0.162002488 0.208550466 0.776866494 0.79340504
7:29:38 AM 0.727940393 0.622546124 0.716007768 0.660480612 0.02463804
7:29:39 AM 0.582772435 0.713406643 0.306544291 0.225257421 0.043552277
7:29:40 AM 0.371156954 0.163821476 0.780515577 0.032460418 0.356949005
7:29:42 AM 0.484167263 0.377878242 0.044189636 0.718147456 0.603177625
7:29:43 AM 0.294017186 0.463360581 0.962296024 0.504029061 0.183131098
7:29:44 AM 0.95635086 0.367849494 0.362230918 0.984421096 0.41587606
7:29:45 AM 0.198645523 0.754955312 0.280338922 0.79706146 0.730373691
7:29:46 AM 0.058483961 0.46774544 0.86783339 0.147418954 0.941713252
7:29:47 AM 0.411193343 0.340857813 0.162066261 0.943124515 0.722124394
7:29:48 AM 0.389312994 0.129281042 0.732723258 0.803458815 0.045824426
7:29:49 AM 0.549633038 0.73956852 0.542532728 0.618321989 0.358525184
7:29:50 AM 0.269925317 0.501399748 0.938234302 0.997577871 0.318813506
7:29:51 AM 0.798825842 0.24038537 0.958224157 0.660124357 0.07469288
7:29:52 AM 0.963581196 0.390150081 0.077448543 0.294604314 0.903519943
7:29:53 AM 0.890540963 0.50284339 0.229976565 0.664538451 0.926438543
7:29:54 AM 0.46951573 0.192568637 0.506730373 0.060557482 0.922857391
7:29:55 AM 0.56552394 0.952136998 0.739438663 0.107518765 0.911045415
7:29:56 AM 0.433149875 0.957190309 0.475811126 0.855705733 0.942255155
and this is the code I am using:
[Data,Text] = xlsread('C:\Users\data.xlsx',2);
IndexStart=strmatch('7:29:29 AM',Text,'exact'); %start time
IndexEnd=strmatch('2:30:29 PM',Text,'exact'); %end time
seconds = IndexEnd-IndexStart;
TestData = Data([IndexStart: IndexEnd],:);

You probably need to:
Use strfind to find the relevant string in the data imported
Use datenum to convert the date to serial date numbers, to be able to calculate the elapsed time between the two points.
It would help if you posted your code so far though.
EDIT based on comments:
Here's what I would do for cycling through the list of start and end times:
[Data,Text] = xlsread('C:\Users\data.xlsx',2);
start_times = {'7:29:29 AM','7:29:35 AM','7:29:44 AM','7:29:49 AM'}; % etc...
end_times = {'2:30:29 PM','2:30:59 PM','2:31:22 PM','2:32:49 PM'}; % etc...
elapsed_time = zeros(length(start_times),1);
TestData = cell(length(start_times),1); % need a cell array because data can/will be of unequal lengths
for k=1:length(start_times)
IndexStart=strmatch(start_times{k},Text,'exact'); %start time
IndexEnd=strmatch(end_times{k},Text,'exact'); %end time
elapsed_time(k) = IndexEnd-IndexStart;
TestData{k} = Data([IndexStart: IndexEnd],:);
end

Use the "Import Data" from the Variable Tag in the Home menu. There you can set how you want the data to be imported like. With or without heading and the format.

How do I insert a row with a TimeUUIDType column in Cassandra?

In Cassandra, I have the following Column Family:
<ColumnFamily CompareWith="TimeUUIDType" Name="Posts"/>
I'm trying to insert a record into it as follows using a C++ generated function generated by Thrift:
ColumnPath new_col;
new_col.__isset.column = true; /* this is required! */
new_col.column_family.assign("Posts");
new_col.super_column.assign("");
new_col.column.assign("1968ec4a-2a73-11df-9aca-00012e27a270");
client.insert("Keyspace1", "somekey", new_col, "Random Value", 1234, ONE);
However, I'm getting the following error: "UUIDs must be exactly 16 bytes"
I've even tried the Cassandra CLI with the following command:
set Keyspace1.Posts['somekey']['1968ec4a-2a73-11df-9aca-00012e27a270'] = 'Random Value'
but I still get the following error:
Exception null
InvalidRequestException(why:UUIDs must be exactly 16 bytes)
at org.apache.cassandra.thrift.Cassandra$insert_result.read(Cassandra.java:11994)
at org.apache.cassandra.thrift.Cassandra$Client.recv_insert(Cassandra.java:659)
at org.apache.cassandra.thrift.Cassandra$Client.insert(Cassandra.java:632)
at org.apache.cassandra.cli.CliClient.executeSet(CliClient.java:420)
at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:80)
at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:132)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:173)

Thrift is a binary protocol; 16 bytes means 16 bytes. "1968ec4a-2a73-11df-9aca-00012e27a270" is 36 bytes. You need to get your library to give you the raw, 16 bytes form.
I don't use C++ myself, but "version 1 uuid" is the magic string you want to google for when looking for a library that can do this. http://www.google.com/search?q=C%2B%2B+version+1+uuid

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

SAS - Column Pointer Error when Importing COBOL Structured Data - io

Related

How do I fix USER FATAL MESSAGE 740?

Why does my PySpark regular expression not give more than the first row?

entering text in a file at specific locations by identifying the number being integer or real in linux

Use matlab to search excel data file for time range and copy data into variable

How do I insert a row with a TimeUUIDType column in Cassandra?

Categories

Resources