I'm trying to setup data import from files using apache tika and solr. There are shared docs folder on nfs mounted share. Unfortunately, I can't perform dataimport, 1 file processed and then exception:
[http-8080-3] ERROR org.apache.solr.handler.dataimport.DocBuilder - Exception while processing: files document : null:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to read content Processing Document # 2
at org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:71)
....
at java.lang.Thread.run(Thread.java:744)
Caused by: java.io.IOException: Access denied
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createNewFile(File.java:1006)
at java.io.File.createTempFile(File.java:1989)
at org.apache.tika.io.TemporaryResources.createTemporaryFile(TemporaryResources.java:66)
at org.apache.tika.io.TikaInputStream.getFile(TikaInputStream.java:533)
at org.apache.tika.io.TikaInputStream.getFileChannel(TikaInputStream.java:564)
at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:373)
at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165)
at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113)
at org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityProcessor.java:140)
... 26 more
So it seems to be some problem with permissions while writing temporary files. Unfortunately, I have no idea where exactly tike tries to write that temporary files so I can't check permissions on nfs. I checked permission for tika home folder (core configuration) and docs folder and subfolders - all ok, including problematic document.
I also tried to change docs directory in my core config to other (on the same nfs share) and all is ok. So, do you have any idea how to track my issue?
[EDIT]
I just noticed that it's not really permission problem. Everything works for files .docx and .pdf. But on .doc file it fails. Do you have any ideas?
Related
I am getting an error writing a file, that is driving me crazy.
I have an C# netcore 5 application running on RH Linux.
I mounted an shared folder (windows) using: sudo mount -t cifs -o username=MyDomainUsername,password=MyDomainUsernamePassword,domain=MyDomain,dir_mode=0777,file_mode=0777 //ipv4_from_destination/Reports /fileshare/Reports
Then I run the app, using just ./WebApi --urls=http://+:8060
The read/write test executes the following steps:
Create a text file.
Write the text file.
Delete de text file.
Creates a directory
Creates a text file inside that directory
Writes the text file
Deletes the text file
Deletes the directory.
Now the problem:
The text file is created
The write operation fails.
Where goes part of the log:
Creating file: /fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp...Ok
Writing file: /fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp...[16:22:20 ERR] ID:87988856-a765-4474-9ed9-2f04aef35771 PATH:/api/about ERROR:System.UnauthorizedAccessException:Access to the path '/fileshare/Reports/test.616db7d1-07fb-4599-a0cf-749e6a8b34ec.tmp' is denied. TRACE: at System.IO.FileStream.WriteNative(ReadOnlySpan`1 source)
at System.IO.FileStream.FlushWriteBuffer()
at System.IO.FileStream.FlushInternalBuffer()
at System.IO.FileStream.Flush(Boolean flushToDisk)
at System.IO.FileStream.Flush()
at System.IO.StreamWriter.Flush(Boolean flushStream, Boolean flushEncoder)
at System.IO.StreamWriter.Flush()
at WebApi.Controllers.ApplicationController.TestFileSystem(String folder) in xxxxxxx\WebApi\Controllers\ApplicationController.cs:line 116
What I discovered so far:
I can create and delete the files and directories.
I cannot write to files.
Can someone give me an hint on this?
Solved using the cifs option nobrl
I am trying to copy a file from one folder to another on a mounted folder. I see the following error. Note that this is on mounted NFS folder not on HDFS.The error is coming up from the line of code that does a create() of the destination file. The "No such file" error is not on the source.
java.io.IOException: Cannot run program "chmod": error=2, No such file or direct ory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1059)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:938)
at org.apache.hadoop.util.Shell.run(Shell.java:901)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java: 1213)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1307)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:1289)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSys tem.java:840)
at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFile System.java:522)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission( RawLocalFileSystem.java:562)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.jav a:534)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.jav a:705)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav a:456)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.jav a:443)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1118)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:1098)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:987)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:975)
The issue got resolved after setting the PATH variable to chmod.
I have kept the resource files under the resource folder of the main jar which is a shared module. I am trying to access the file with
File pdf1File = new File (getClass().getClassLoader().getResource("com/org/pack/Page2_Template.pdf").getFile());
I tried resourceAsStream() as well.
This is the error:
... 3 more
Caused by: java.io.FileNotFoundException: C:\wildfly\content\my-ear.ear\lib\my-jar-6.8.0.0.11-RELEASE-SNAPSHOT.jar\com\org\pack1\Page1_Template.pdf (The system cannot find the path specified)
I run a real elasticsearch cluster in a production environment using consul +overlay +docker,I attach a container, When I change elasticsearch.yml,another file which name is elasticsearch.yml~ appears, then I run elasticsearch ,there has a error
Exception in thread "main" ElasticsearchException[Failed to load logging configuration]; nested: NoSuchFileException[/usr/local/biop/elasticsearch/config/elasticsearch.yml~];
Likely root cause: java.nio.file.NoSuchFileException: /usr/local/biop/elasticsearch/config/elasticsearch.yml~
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:225)
at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372)
at java.nio.file.Files.walkFileTree(Files.java:2706)
at org.elasticsearch.common.logging.log4j.LogConfigurator.resolveConfig(LogConfigurator.java:142)
at org.elasticsearch.common.logging.log4j.LogConfigurator.configure(LogConfigurator.java:103)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:243)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
I don't know why there will have this file which name is 'elasticsearch.yml~'and it can't be deleted. How to solve this problem?Thanks.
The extra file is created by your text/code editor. Once you are done editing your text file, you should simply delete it.
It seems that docker try to use it as a configuration file.
However, it might be that the error "[Failed to load logging configuration]" is not caused by that itself, there is probably not so good configuration about logging in your elastisearch.yml anyway.
Im using tika-app.jar with the version 1.12,to try to find the list of corrupted files that can't be opened in a specified folder.
the problem is when i tested inside windows it gives me in the log folder some exception that allow me to know what files that can't be opened like this :
Caused by: org.apache.poi.openxml4j.exceptions.InvalidOperationException: Can't open the specified file: 'folder\mi-am-CV.docx'
but the problem in linux is only i get a broad error in the log folder like this:
WARN org.apache.tika.batch.FileResourceConsumer - <parse_ex resourceId="test-corrupted-2.doc">org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser#f6e9bd4
so i can't know specificaly what files that are really corrupted and can't be opened.
here's the shell command that i use for that in linux :
java -Dlog4j.debug -Dlog4j.configuration=file:log4j_driver.xml -cp "bin/*" org.apache.tika.cli.TikaCLI -JXX:-OmitStackTraceInFastThrow -JXmx5g -JDlog4j.configuration=file:log4j.xml -bc tika-batch-config-basic-test.xml -i /folder -o outxml -numConsumers 10
thanks.