Linux untar command shows file names as question marks - linux

A while ago I had compressed an application using Linux "tar -cf" command. At that time some of the file names were in a different language.
Now when I uncompress using "tar -xf" it shows the file names in the other language as question marks.
Is there a way that when I uncompress it keeps the original file names as they were?
Your help is highly appreciated.

Good question ! It's expected that like any Unix command, tar could pipe its output to another program, if possible including filename data. A quick googling reveals that this is the case: as described in this blog post, GNU tar supports the --to-command parameter to write the output to a pipe, instead of directly operating on the directory.
http://osmanov-dev-notes.blogspot.com.br/2010/07/how-to-handle-filename-encodings-in.html
So it's a matter of writing a script to convert the filename to UTF-8, like it's done in the cited post. Another option, also described in the text, that becomes obvious after you read it is to simply extract everything and then write a script to convert every file in the directory. There's a trivial php script in the link that does this.
Finally, you can always write your own custom tar version with the help of scripting languages, and that's easy. Python, for example has the tarfile module built in the standard library:
http://docs.python.org/2/library/tarfile.html#examples
You could use TarFile.extractfile(), shutils.copyfileobj() and str.decode() in a loop to manually extract the files while changing the file name encoding.
References:
http://www.gnu.org/software/tar/manual/tar.html#SEC84
http://docs.python.org/2/library/tarfile.html
http://www.manpagez.com/man/1/iconv/

Related

File name multiple extensions order

I want to create some bash scripts. They're actually going to be build scripts for Scala, so I'm going to identify them with my own .bld extension. They will be a sort of sub type of a shell script. Hence I want them to be easily recognised as a shell script. Should I call them
ProjectA.bld.sh //or
ProjectA.sh.bld
Edit: My natural inclination would be to go for the former but .tar.gz files seem to follow the latter naming convention.
A shell script doesn't mind what you call it.
It just needs to be..
executable (chmod +x)
in your path
contain a "shebang" as it's first line #!/bin/sh
The shebang determines which program is used to execute your script.
Call it ProjectA.bld.sh (or preferably buildProjectA.sh).
The .sh extension (although not necessary for the script to run) will allow you and everyone else to easily recognise it as a shell script.
While for the most part, naming conventions like this don't really matter at all to Unix/Linux, the usual convention is for the "extensions" to be in the order of the steps used to create the file. So, for example, a file named foo.tar.bz2.gpg.part01 would indicate a sequence of operations like the following:
Use tar to create foo.tar, which contains some other files
Use bzip2 to compress foo.tar into foo.tar.bz2
Use gnupg to encrypt foo.tar.bz2 into foo.tar.bz2.gpg
Use split or something similar to break the file into chunks for transmission/storage, resulting in one or more foo.tar.bz2.gpg.part* files.
The naming conventions are mostly just for human semantic meaning, though, and there's nothing stopping you from doing exactly the opposite, or even something completely random, except your own ability to remember exactly what you did...

"batch" files in bash

I want to make a "batch" file so to say for some bash commands (convert.sh). I think it would be best to describe the situation. i have a $#!^ ton of mp4 videos that i want converted into mp3's. it would take me an unreasonable amount of time to convert them using ffmpeg -i /root/name\ of\ video.mp4 /name\ of\ video.mp3 for every single video. not to mention the fact that all the file names are long and complicated so typos are a possibility. so i want to know how to make a shell script (for bash) that will take every file with the extension .mp4 and convert it to a .mp3 with the same name one by one. as in it converts one then when it done it moves on to the next one. im using a lightweight version of linux so any 3rd part soft probably wont work so i need to use ffmpeg...
many thanks in advance for any assistance you can provide
PS: i cant seem to get the formatting sytax on the website to work right so if somone can format this for me and maybe post a link to a manual on how it works that would be much appreciated =)
PPS: i understand that questions about using the ffmpeg command should be asked on superuser however since i dont so much have any questions about the specific command and this relates more to scripting a bash file i figure this is the right place
A bash for loop should do it for you in no time:
SRC_DIR=/root
DST_DIR=/somewhereelse
for FILE in ${SRC_DIR}/*.mp4
do
ffmpeg -i "${FILE}" "${DST_DIR}/$(basename \"${FILE}\" .mp4).mp3"
done
Sorry - I don't know the ffmpeg command line options, so I just copied exactly what's in your post.
1) use find:
find . -name \*.mp4 | xargs ./my_recode_script.sh
2) my_recode_script.sh - see this question
so you can easily change the extension for output file name
the rest is trivial scripting job:
ffmpeg -i $name $new_name # in my_recode_script.sh after changing extension
this is enough for one-time script, if you want something reusable, wrap it with yet another script which receive path to dir, extensions from which to which to recode and call other parts :)

Getting linux terminal value from my application

I am developing a Qt application in Linux. I wanted to pass Linux commands to a terminal. That worked but now i also want to get a response from the terminal for this specific command.
For example,
ls -a
As you know this command lists the directories and files of the current working directory. I now want to pass the returned values from the ls call to my application. What is a correct way to do this?
QProcess is the qt class that will let you spawn a process and read the result. There's an example of usage for reading the result of a command on that page.
popen() , api of linux systerm , return FILE * that you can read it like a file descriptor, may help youp erhaps。
Parsing ls(1) output is dangerous -- make a few files with funny names in a directory and test it out:
touch "one file"
touch "`printf "\x0a\x0a\x0ahello\x0a world"`"
That creates two files in the current working directory. I expect your attempts to parse ls(1) output won't work. This might be alright if you're showing the results to a human, (though a human will be immensely confused if a filename includes output that looks just like ls(1) output!) but if you're trying to present something like an explorer.exe or Finder.app representation of files in the filesystem, this is horribly broken.
Instead, use opendir(3), readdir(3), and closedir(3) to read directory entries yourself. This will be safer, more portable, and (as a side benefit) slightly better performing.

How can you tell what files are currently open by any user?

I am trying to write a script or a piece of code to archive files, but I do not want to archive anything that is currently open. I need to find a way to determine what files in a directory are open. I want to use either Perl or a shell script, but can try use other languages if needed. It will be in a Linux environment and I do not have the option to use lsof. I have also had inconsistant results with fuser. Thanks for any help.
I am trying to take log files in a directory and move them to another directory. If the files are open however, I do not want to do anything with them.
You are approaching the problem incorrectly. You wish to keep files from being modified underneath you while you are reading, and cannot do that without operating system support. The best that you can hope for in a multi-user system is to keep your archive metadata consistent.
For example, if you are creating the archive directory, make sure that the number of bytes stored in the archive matches the directory. You can checksum the file contents before and after reading the filesystem and compare that with what you wrote to the archive and perhaps flag it as "inconsistent".
What are you trying to accomplish?
Added in response to comment:
Look at logrotate to steal ideas about how to handle this consistently just have it do the work for you. If you are concerned that rename of files will make processes that are currently writing them will break things, take a look at man 2 rename:
rename() renames a file, moving it
between directories if required. Any
other hard links to the file (as
created using link(2)) are unaffected.
Open file descriptors for oldpath are
also unaffected.
If newpath already exists it will be atomically replaced (subject
to a few conditions; see ERRORS
below), so that there is no point at
which another process attempting to
access newpath will find it missing.
Try ls -l /proc/*/fd/* as root.
msw has answered the question correctly but if you want to file the list of open processes, the lsof command will give it to you.

Mount executable as file on Linux/Unix filesystem

Is it possible to make an executable look like a read-only file on Linux, such that opening the "file" for reading actually executes the file and makes its stdout available for reading as if it were data in the "file"? It should be openable by any program that knows how to open a file for reading, for example 'cat'.
Look at popen. Basically what you are describing is a pipe.
P.S. If you need language specific help, edit the question and add the language/environment you're working in and I'll try to provide more specifics.
Use FUSE
On unix-like OS's you can send the output of a program to a named pipe that is opened by another program. Look at the mkfifo command to create the named pipe. The named pipe works a lot like a file, with some limitations. For example, it is not seekable.
Seems like you could pipe the output of the program into whatever you are using to "read". The problem is if you want to open the executable in say emacs or vim or whatever, it's not a matter of the executable so much as the editor doesn't know any other way to interpret it.

Resources