Sorting records in a CSV file - node.js

Tell me how you can sort records in CSV files using typescript + node js. Sort by Id.
The number of records in files can be up to 1 million.
Here's an example of file entries:
Blockquote

Here's a conceptual solution:
Create a new SQLite db with a table having the appropriate schema for your columns
Stream the data from the source CSV file, reading one line at a time: parse and insert the data from the line into the db table from the previous step
Create the output CSV file and append the header line
Iterate over the db table entries in the desired sort order, one at a time: convert each entry back into a CSV line in the correct column order, and then append the line to the CSV file in the previous step
Cleanup: (Optionally validate your new CSV file, and then) delete the SQLite db
If you can fit the entire parsed CSV data in memory at once, you push each line into an array instead of using a db. Then you just sort the array in-place and iterate its elements.

Related

Automatic change in data type when reopening CSV file

After changing Data type of column from General or Number to Text and saving as CSV file (column has Numbers only). When you reopen the file the data type is getting changed back to General automatically.
How to stop it getting changed automatically ? I need the change made in CSV file format for uploading to big query.
Thanks.
I tried VBA, data transformation in excel, Text function, Putting ' in front of number, Text to Columns option.
CSV has no data type but you can bigquery load with schema
load with schema
ex using bq load with schema string
bq load --source_format=CSV mydataset.mytable ./myfile.csv schema:STRING,string:FLOAT
or with schema file
bq load --source_format=CSV mydataset.mytable ./myfile.csv schema.json

Write set of sql files to same table

I have a set of *.sql files (>2000) which contain table creation scripts per file. My goal is to write all of the sql files to one table. I want to accomplish this by creating the table before reading and writing the files. Each file I read I want to filter to only contain the INSERT INTO lines.
I currently read the files as follows:
with open(file) as f:
sql = f.read()
This returns a string which holds the table creation string as well as the INSERT statements. Is there any simple string filtering way to only return the lines containing the INSERT statements?

merge multiple CSV files in one CSV file and create super schema in final CSV file using unix shell script or unix awk

I want to merge multiple CSV files into one CSV file using the UNIX shell script. The final CSV file contains super schema columns(all columns from each file). non-available column values consider for empty value or blank space value in the final CSV file.
Example:
file111.csv:
"E_TYPE","TIMESTAMP","EXEC_TIME","DBT_TIME","CALLOUT_TIME","CLIENT_IP"
"BBCout","20191011000022.423","95","0","2019-01-11T00:00:05.300Z","200.50.000.333"
"BBCout","20200403122024.123","96","1","2020-04-03T00:00:05.300Z","300.50.000.333"
"BBCout","20210102083426.543","92","0","2021-01-02T00:00:05.300Z","400.50.000.333"
file222.csv:
"E_TYPE","TIMESTAMP","TYPE","METHOD","TIME","RT_SIZE","URL","UID_DERIVED","CLIENT_IP"
"AACallout","20210215000030.815","REST","POST","61","71","""https://st.aaa.xxx.net/n1/yyy/zzz""","0055QAQ","200.50.000.333"
"AACallout","20201210000012.800","REST","GET","67","75","""https://st.aaa.xxx.net/n1/yyy/zzz""","0055BBBQ","300.00.000.111"
final merged CSV should contain all columns and non-available columns should be empty values or blank space.
final CSV file:
"E_TYPE","TIMESTAMP","CLIENT_IP","EXEC_TIME","DBT_TIME","CALLOUT_TIME","TYPE","METHOD","TIME","RT_SIZE","URL","UID_DERIVED"
"BBCout","20191011000022.423","200.50.000.333","95","0","2019-01-11T00:00:05.300Z",,,,,,
"BBCout","20200403122024.123","300.50.000.333","96","1","2020-04-03T00:00:05.300Z",,,,,,
"BBCout","20210102083426.543","400.50.000.333","92","0","2021-01-02T00:00:05.300Z",,,,,,
"AACallout","20210215000030.815","200.50.000.333",,,,"REST","POST","61","71","""https://st.aaa.xxx.net/n1/yyy/zzz""","0055QAQ"
"AACallout","20201210000012.800","300.00.000.111",,,,"REST","GET","67","75","""https://st.aaa.xxx.net/n1/yyy/zzz""","0055BBBQ"

pyspark DataFrame get original row CSV string

I'm loading a CSV file to a spark DataFrame
At this point I'm doing some parsing and validation, if the validation is failed - I want to write the original CSV line to a different file
Is it possible getting the original string from the DataFrame object?
I thought about getting the ln number from the DataFrame, and extracting it from the original file
I guess it will be better using the DF object, but if not possible - extract from file

Linux, Need to do an update instead of insert in shell script control file

I have a csv file that I want to load the data in my database table. In my control file I am trying to load the data but I get a constraint error that one of my columns that I am not selecting cannot be null value. So instead of an insert can I do an update in my control file?
This is my error Traffic_Profile_Name cannot be null, but I don't need this column so I rather do an update based on ID only.
Record 1: Rejected - Error on table TRAFFIC_PROFILE_TEMP.
ORA-01400: cannot insert NULL into ("SERVICE_USER"."TRAFFIC_PROFILE_TEMP"."TRAFFIC_PROFILE_NAME")
I only have the ID in my first column in the csv, I want to update the table based on ID is 124 updated the other 5 columns Y,Y,Y,Y,STANDARD Here is my csv file below:
124,Y,Y,Y,Y,STANDARD
125,Y,Y,Y,Y,BENIGN
126,Y,N,N,N,BENIGN
140,Y,Y,N,N,FRAME
141,Y,Y,N,N,FRAME
My control file:
LOAD DATA
INFILE '/home/ye831c/migration/log/conv2015_10_LogicalComponent_CosProfile.csv'
BADFILE '/home/ye831c/migration/bad/conv2015_10_LogicalComponent_CosProfile.bad'
DISCARDFILE '/home/ye831c/migration/bad/conv2015_10_LogicalComponent_CosProfile.dsc'
APPEND
INTO TABLE TRAFFIC_PROFILE_temp
FIELDS TERMINATED BY ","
TRAILING NULLCOLS
(TRAFFIC_PROFILE_ID, PE_INGRESS_FLAG, PE_EGRESS_FLAG, CE_INGRESS_FLAG, CE_EGRESS_FLAG, COS_PROFILE_TYPE)

Resources