CentOS | error apache spark file already exists Sparkcontext - apache-spark

I am unable to write to the file which I create. In windows it's working fine. In centos it says file already exists and does not write anything.
File tempFile= new File("temp/tempfile.parquet");
tempFile.createNewFile();
parquetDataSet.write().parquet(tempFile.getAbsolutePath());
Following is the error: file already exists
2020-02-29 07:01:18.007 ERROR 1 --- [nio-8090-exec-1] c.gehc.odp.util.JsonToParquetConverter : Stack Trace: {}org.apache.spark.sql.AnalysisException: path file:/temp/myfile.parquet already exists.;
2020-02-29 07:01:18.007 ERROR 1 --- [nio-8090-exec-1] c.gehc.odp.util.JsonToParquetConverter : sparkcontext close

The default savemode in spark is ErrorIfExists. This means that if the file with the same filename you intend to write already exists, it will give an exception similar to the one you got above. This is happening in your case because you are creating the file yourself rather than leaving that task to spark. There are 2 ways in which you can resolve the situation:
1) You can either mention savemode as "overwrite" or "append" in the write command:
parquetDataSet.write.mode("overwrite").parquet(tempFile.getAbsolutePath());
2) Or, you can simply remove the create new file command and straightaway pass the destination path in your spark write command as follows:
parquetDataSet.write.parquet("temp/tempfile.parquet");

Related

SoapUI specify alternate logdir as a property defined on the command line

I'm upgrading from SoapUI 5.4.0 to 5.7.0 and trying to put the log files in a specific directory. Note: The alternate error logs directory was working prior to the upgrade.
I have both the following specified in my JAVA_OPTS for SoapUITestCaseRunner
-Dsoapui.logroot="%SOAPUI_LOGSDIR%"
-Dsoapui.log4j.config="%SOAPUI_HOME%/soapui-log4j.xml"
In my soapui-log4j.xml I specify the error file as:
<RollingFile name="ERRORFILE"
fileName="${soapui.logroot}/soapui-errors.log"
filePattern="${soapui.logroot}/soapui-errors.log.%i"
append="true">
The error file then gets created without resolving ${soapui.logroot} e.g.
$ find . -name "*errors*"
./${soapui.logroot}/soapui-errors.log
I also tried it as lookup but ended up with this:
ERROR Unable to create file ${sys:soapui.logroot}/soapui-errors.log java.io.IOException: The filename, directory name, or volume label syntax is incorrect
Am I missing anything? Any ideas for next steps?
I tried replacing
fileName="${soapui.logroot}/soapui-errors.log"
with
fileName="${sys:soapui.logroot}/soapui-errors.log"
and it worked for me.
I no longer see unresolved '${soapui.logroot}' directory created.
A

Writing file denied

I am getting an error writing a file, that is driving me crazy.
I have an C# netcore 5 application running on RH Linux.
I mounted an shared folder (windows) using: sudo mount -t cifs -o username=MyDomainUsername,password=MyDomainUsernamePassword,domain=MyDomain,dir_mode=0777,file_mode=0777 //ipv4_from_destination/Reports /fileshare/Reports
Then I run the app, using just ./WebApi --urls=http://+:8060
The read/write test executes the following steps:
Create a text file.
Write the text file.
Delete de text file.
Creates a directory
Creates a text file inside that directory
Writes the text file
Deletes the text file
Deletes the directory.
Now the problem:
The text file is created
The write operation fails.
Where goes part of the log:
Creating file: /fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp...Ok
Writing file: /fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp...[16:22:20 ERR] ID:87988856-a765-4474-9ed9-2f04aef35771 PATH:/api/about ERROR:System.UnauthorizedAccessException:Access to the path '/fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp' is denied. TRACE: at System.IO.FileStream.WriteNative(ReadOnlySpan`1 source)
at System.IO.FileStream.FlushWriteBuffer()
at System.IO.FileStream.FlushInternalBuffer()
at System.IO.FileStream.Flush(Boolean flushToDisk)
at System.IO.FileStream.Flush()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.IO.StreamWriter.Flush()
at WebApi.Controllers.ApplicationController.TestFileSystem(String folder) in xxxxxxx\WebApi\Controllers\ApplicationController.cs:line 116
What I discovered so far:
I can create and delete the files and directories.
I cannot write to files.
Can someone give me an hint on this?
Solved using the cifs option nobrl

shutil.rmtree() error when trying to remove NFS-mounted directory

Attempting to execute shutil.rmtree(path) on a directory managed by NFS consistently fails. Below you can see that os.rmdir(path) within shutil.rmtree(path) causes the exception. Is there a more robust way for me to achieve the expected result?
It appears that it removes all of the files, yet a hidden .nfs file remains in the directory for a short amount of time. I'm guessing that the process from which I'm calling rmtree has an open file handle to one of the files inside the directory, which, when deleted, apparently causes NFS to write a new hidden file. That would cause os.rmdir to fail on attempting to remove a non-empty directory.
Traceback (most recent call last):
File "/home/me/pre3/lib/python3.6/shutil.py", line 484, in rmtree
onerror(os.rmdir, path, sys.exc_info())
File "/home/me/pre3/lib/python3.6/shutil.py", line 482, in rmtree
os.rmdir(path)
OSError: [Errno 39] Directory not empty:
NFS details:
$ nfsstat -m
/home/me/nfs from XXX.YYY.ZZZ:/mnt/path/to/nfs
Flags: rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=50,retrans=2,sec=sys,mountaddr=REDACTED,mountvers=3,mountport=832,mountproto=udp,local_lock=none,addr=REDACTED
I'm using Python 3.6.6 on Ubuntu 16.04.
If the python logging module is logging to the target output directory, it will maintain an open file. A workaround is to call logging.shutdown() first, then called shutil.rmtree(path). This is not a general answer to the broader question, however.
You could try defining an error handler function to be passed to the onerror arg for shutil.rmtree: https://docs.python.org/3/library/shutil.html#shutil.rmtree
def handle_rmtree_err(function, path, excinfo):
...
shutil.rmtree(my_path, onerror=handle_rmtree_err)
There are all sorts of reasons why a process may be holding onto a file, so I can't tell you what the error handler should do exactly.
If you haven't figured out what is holding onto the file, try $ lsof | grep .nfsXXXX.
If all else fails you could time.sleep(secs) and retry shutil.rmtree.

How to handle reading non-existing file in spark

Am trying to read some files from HDFS using spark sc.wholeTextFiles, I pass a list of the required files, yet the job keeps throwing
py4j.protocol.Py4JJavaError: An error occurred while calling o98.showString.
: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist:
if one of the files didn't exist.
how can I bypass the not found files and only read found ones ?
To know if a file exists (and delete it, in my case) I do the following:
import org.apache.hadoop.fs.{FileSystem, Path}
val fs = FileSystem.get(spark.sparkContext.hadoopConfiguration)
if (fs.exists(new Path(fullPath))) {
println("Output directory already exists. Deleting it...")
fs.delete(new Path(fullPath), true)
}
Use jvm File system from spark context to check file
fs.exists(sc._jvm.org.apache.hadoop.fs.Path("path/to/file.csv"))
fs = sc._jvm.org.apache.hadoop.fs.FileSystem.get(sc._jsc.hadoopConfiguration())
fs.exists(sc._jvm.org.apache.hadoop.fs.Path("test.csv"))
True
fs.exists(sc._jvm.org.apache.hadoop.fs.Path("fail_test.csv"))
False

Win10: ASDF can't load system (ASDF_OUTPUT_TRANSLATION error)

Update 2
I think #faré is right, it's an output translation problem.
So I declared the evironment variable ASDF_OUTPUT_TRANSLATIONS and set it to E:/. Now (asdf:require-system "my-system") yields a different error: Uneven number of components in source to destination mapping: "E:/" which led me to this SO-topic.
Unfortunately, his solution doesn't work for me. So I tried the other answer and set ASDF_OUTPUT_TRANSLATIONS to (:output-translations (t "E:/")). Now I get yet another error:
Invalid source registry (:OUTPUT-TRANSLATIONS (T "E:/")).
One and only one of
:INHERIT-CONFIGURATION or
:IGNORE-INHERITED-CONFIGURATION
is required.
(will be skipped)
Original Posting
I have a simple system definition but can't get ASDF to load it.
(asdf-version 3.1.5, sbcl 1.3.12 (upgraded to 1.3.18 AMD64), slime 2.19, Windows 10)
What I have tried so far
Following the ASDF manual: "4.1 Configuring ASDF to find your systems"
There it says:
For Windows users, and starting with ASDF 3.1.5, start from your
%LOCALAPPDATA%, which is usually ~/AppData/Local/ (but you can ask in
a CMD.EXE terminal echo %LOCALAPPDATA% to make sure) and underneath
create a subpath config/common-lisp/source-registry.conf.d/
That's exactly what I did:
Echoing %LOCALAPPDATA% which evaluates to C:\Users\my-username\AppData\Local
Underneath I created the subfolders config\common-lisp\source-registry.conf.d\ (In total: C:\Users\my-username\AppData\Local\config\common-lisp\source-registry.conf.d\
The manual continues:
there create a file with any name of your choice but with the type conf, for instance 50-luser-lisp.conf; in this file, add the following line to tell ASDF to recursively scan all the subdirectories under /home/luser/lisp/ for .asd files: (:tree "/home/luser/lisp/")
That’s enough. You may replace /home/luser/lisp/ by wherever you want to install your source code.
In the source-registry.conf.d folder I created the file my.conf and put in it (:tree "C:/Users/my-username/my-systems/"). This folder contains a my-system.asd.
And here comes the weird part:
If I now type (asdf:require-system "my-system") in the REPL I get the following error:
Can't create directory C:\Users\my-username\AppData\Local\common-lisp\sbcl-1.3.12-win-x86\C\Users\my-username\my-systems\C:\
So the problem is not that ASDF doesn't find the file, it does -- but (whatever the reason) it tries to create a really weird subfolder hierarchy which ultimately fails because at the end it tries to create the folder C: but Windows doesn't allow foldernames containing a colon.
Another approach: (push path asdf:*central-registry*)
If I try
> (push #P"C:/Users/my-username/my-systems/" asdf:*central-registry*)
(#P"C:/Users/my-username/my-systems/"
#P"C:/Users/my-username/AppData/Roaming/quicklisp/quicklisp/")
> (asdf:require-system "my-system")
I get the exact same error.
I don't know what to do.
Update
Because of the nature of the weird path ASDF was trying to create I thought maybe I could bypass the problem by specifying a relative path instead of an absolute one.
So I tried
  (:tree "\\Users\\my-username\\my-systems")
in my conf file. Still the same error.
Ahem. It looks like an output-translations problem.
I don't have a Windows machine right now, but this all used to work last time I tried.
Can you setup some ad hoc output-translations for now that will make it work?

Resources