ParserError: Error tokenizing data. C error - python-3.x

i'm using a script ScriptGlobal.py that will call and execute 2 other scripts script1.py and script2.py exec(open("./script2.py").read()) AND exec(open("./script1.py").read())
The output of my script1 is the creation of csv file.
df1.to_csv('file1.csv',index=False)
The output of my script2 is the creation of another csv file.
df2.to_csv('file2.csv',index=False)
In my ScriptGlobal.py i want to read the 2 files file1.csv and file2.csv and then i got this error.
ParserError: Error tokenizing data. C error: Expected 1 fields in line 16, saw 3
Is there solution to do it without doing manuallyu the manipulation in EXCEL ?
Thank you

Did you try save these two .csv files as ANSI? I had problem with .csv when they were saved as UTF-8.

Related

Execute a subprocess that takes an input file and write the output to a file

I am using a third-party C++ program to generate intermediate results for the python program that I am working on. The terminal command that I use looks like follows, and it works fine.
./ukb/src/ukb_wsd --ppr_w2w -K ukb/scripts/wn30g.bin -D ukb/scripts/wn30_dict.txt ../data/glass_ukb_input2.txt > ../data/glass_ukb_output2w2.txt
If I break it down into smaller pieces:
./ukb/src/ukb_wsd - executable program
--ppr_w2w - one of the options/switches
-K ukb/scripts/wn30g.bin - parameter K indicates that the next item is a file (network file)
-D ukb/scripts/wn30_dict.txt - parameter D indicate that the next item is a file (dictionary file)
../data/glass_ukb_input2.txt - input file
> - shell command to write the output to a file
../data/glass_ukb_output2w2.txt - output file
The above works fine for one instance. I am trying to do this for around 70000 items (input files). So found a way by using the subprocess module in Python. The body of the python function that I created looks like this.
with open('../data/glass_ukb_input2.txt', 'r') as input, open('../data/glass_ukb_output2w2w_subproc.txt', 'w') as output:
subprocess.run(['./ukb/src/ukb_wsd', '--ppr_w2w', '-K', 'ukb/scripts/wn30g.bin', '-D', 'ukb/scripts/wn30_dict.txt'],
stdin=input,
stdout=output)
This error is no longer there
When I execute the function, it gives an error as follows:
...
STDOUT = subprocess.STDOUT
AttributeError: module 'subprocess' has no attribute 'STDOUT'
Can anyone shed some light about solving this problem.
EDIT
The error was due to a file named subprocess.py in the source dir which masked Python's subprocess file. Once it was removed no error.
But the program could not identify the input file given in stdin. I am thinking it has to do with having 3 input files. Is there a way to provide more than one input file?
EDIT 2
This problem is now solved with the current approach:
subprocess.run('./ukb/src/ukb_wsd --ppr_w2w -K ukb/scripts/wn30g.bin -D ukb/scripts/wn30_dict.txt ../data/glass_ukb_input2.txt > ../data/glass_ukb_output2w2w_subproc.txt',shell=True)

String method tutorial in abinit - no output file produced

I'm trying to follow this tutorial in abinit: https://docs.abinit.org/tutorial/paral_images/
When trying to run abinit for any of the tstring files, no output file is produced. For instance, I copy the files tstring_01.in and tstring.files into the subdirectory work_paral_string, edit the tstring.files with the appropriate file names, and run the command mpirun -n 20 abinit < tstring.files > log 2> err. No error message is shown but also no output is produced. (The expected output file would be tstring_01.out.)
Any suggestions?

Deleting rows from csv based on input file

I have a daily process which runs on Linux that returns a set of failed updated users, and need to delete these bad rows from the large user csv for importation into a database.
My output file contains the USER_ID for each failed user.
I'm trying to create an updated file with these removed.
I have reviewed the multitude of examples available, but none seem to work correctly. I've included a sample of the error file and the user file.
The first row is a header, and should be ignored
My error file:
"USER_ID"
"CA781558"
"LN764767"
My user file:
"USER_ID","FIRSTNAME","LASTNAME","LAST_ACTIVITY","GROUD_UID"
"CA781558","Dani","Roper","2015-07-17 19:47:21","CF93DF0A-BD23AF87D20A"
"BT055163","Alexis","Richardo","2016-04-19 21:23:08","CB71F91E-7E638292ABD5"
"LN764767","Peter","Rajosz","2016-03-18 11:59:29","973C4AD2-63BA12BB91CD"
"TN479717","Jerry","Alindos","2015-06-12 07:37:56","1DA745BA-71CB88AA91EA"
"FR915163","Alexis","Richardo","2016-04-19 21:23:08","DBA8B91E-7A6B8292ABD5"
"GB135767","Peter","Rajosz","2016-03-18 11:59:29","AE3C4AD2-63BA181B91CD"
"SG839717","Jerry","Alindos","2015-06-12 07:37:56","1BA746BA-71CB88AA91EA"
Expected Output:
"USER_ID","FIRSTNAME","LASTNAME","LAST_ACTIVITY","GROUD_UID"
"BT055163","Alexis","Richardo","2016-04-19 21:23:08","CB71F91E-7E638292ABD5"
"TN479717","Jerry","Alindos","2015-06-12 07:37:56","1DA745BA-71CB88AA91EA"
"FR915163","Alexis","Richardo","2016-04-19 21:23:08","DBA8B91E-7A6B8292ABD5"
"GB135767","Peter","Rajosz","2016-03-18 11:59:29","AE3C4AD2-63BA181B91CD"
"SG839717","Jerry","Alindos","2015-06-12 07:37:56","1BA746BA-71CB88AA91EA"
Can you help? Thank you in advance
You can use awk like this:
awk -F, 'FNR==NR{del[$1]; next} FNR==1 || !($1 in del)' err.txt file.txt
"USER_ID","FIRSTNAME","LASTNAME","LAST_ACTIVITY","GROUD_UID"
"BT055163","Alexis","Richardo","2016-04-19 21:23:08","CB71F91E-7E638292ABD5"
"TN479717","Jerry","Alindos","2015-06-12 07:37:56","1DA745BA-71CB88AA91EA"
"FR915163","Alexis","Richardo","2016-04-19 21:23:08","DBA8B91E-7A6B8292ABD5"
"GB135767","Peter","Rajosz","2016-03-18 11:59:29","AE3C4AD2-63BA181B91CD"
"SG839717","Jerry","Alindos","2015-06-12 07:37:56","1BA746BA-71CB88AA91EA"

Octave. dlmread: unable to open file 'myfile.txt'

I have a txt file with data in form: 2104,3,399900 i_e int1,int2,int3 I have 50 rows in the same format.
Now I can want to put the data into variable, say, a.
I am using the the command :
a = csvread('ex1data2.txt');
%I have tried a = dlmread('ex1data2.txt'); it too does't work
it produces an error :
error: dlmread: unable to open file 'ex1data2.txt'.
I have added the path of the directory that have file to the octave search paths.
How can I read the text file?
Thanks.

SAS: Unzipping multiple .dat files at once

I want to unzip a file that contains a .dat file using SAS. I have over 100 files to unzip and therefore I want to do it automatically with SAS. I've tried to use the following:
FILENAME ZIPFILE SASZIPAM 'Z:\folder\file';
DATA newdata;
INFILE ZIPFILE(file.dat) dsd DLM='|';
INPUT var1 var2 var$;
RUN;
That doesn't work. Is there a problem when you use ZIPFILE SASCIPAM to unzip a .dat file? I have SAS 9.3.
Is there a better alternative?
I guess you could use macros to do it, but it might get a bit messy. SAS has for many years offered the X command for contacting the command prompt. This method requires that you have downloaded a free file archiver/de-compresser (e.g. 7-Zip, WinRAR etc.).
I like using the command line version of 7-Zip. I use a 32-bit machine, so you might have to use a different .exe file.
The syntax for unzipping multiple files in a directory:
data _null_;
X "cd (7-Zip_installed_location)"; <=== File that contains the 7za.exe
X "7za e (zip_files_location)*.zip -o(output_destination)";
run;
For Example, I have some zip files in the folder called "Compressed":
data _null_;
X "cd C:\7-Zip Comm";
X "7za e C:\sasdata\Compressed\*.zip -oC:\sasdata\New";
run;
e stands for "extract".
* means for all.
-o stands for "output".
I have never seen SASZIPAM as a file reference type.
I would do this like this:
filename zipfile ZIP 'Z:\folder\file.zip';
DATA newdata;
INFILE ZIPFILE(file.dat) dsd DLM='|';
INPUT var1 var2 var$;
RUN;
It's possible I am missing that type in the documentation. Can you paste the ERROR you get in your log?
Double quotes for each of the path will resovle any issue while extracting .z file on windows 7 (SAS 9.3): (FYI - 2nd statement is in one line)
data _null_;
x '"C:\Program Files\7-Zip\7z.exe" x "E:\sas\config\Lev1\SASApp\Data\*.Z" -o"E:\sas\config\Lev1\SASApp\Data"';
run;
Regards
Bhavin

Resources