Add comments next to files in Linux

Add comments next to files in Linux - linux

I'm interested in simply adding a comment next to my files in Linux (Ubuntu). An example would be:
info user ... my_data.csv Raw data which was sent to me.
info user ... my_data_cleaned.csv Raw data with duplicates filtered.
info user ... my_data_top10.csv Cleaned data with only top 10 values selected for each ID.
So sort of the way you can comment commits in Git. I don't particularly care about searching on these tags, filtering them etc. Just seeings them when I list files in a directory. Bonus if the comments/tags follow the document around as I copy or move it.

Most filesystem types support extended attributes where you could store comments.
So for example to create a comment on "foo.file":
xattr -w user.comment "This is a comment" foo.file
The attributes can be copied/moved with the file just be aware that many utilities require special options to copy the extended attributes.
Then to list files with comments use a script or program that grabs the extended attribute. Here is a simple example to use as a starting point, it just lists the files in the current directory:
#!/bin/sh
ls -1 | while read -r FILE; do
comment=`xattr -p user.comment "$FILE" 2>/dev/null`
if [ -n "$comment" ]; then
echo "$FILE Comment: $comment"
else
echo "$FILE"
fi
done
The xattr command is really slow and poorly written (it doesn't even return error status) so I suggest something else if possible. Use setfattr and getfattr in a more complex script than what I have provided. Or maybe a custom ls command that is aware of the user.comment attribute.

This is a moderately serious challenge. Basically, you want to add attributes to files, keep the attributes when the file is copied or moved, and then modify ls to display the values of these attributes.
So, here's how I would attack the problem.
1) Store the information in a sqlLite database. You can probably get away with one table. The table should contain the complete path to the file, and your comment. I'd name the database something like ~/.dirinfo/dirinfo.db. I'd store it in a subfolder, because you may find later on that you need other information in this folder. It'd be nice to use inodes rather than pathnames, but they change too frequently. Still, you might be able to do something where you store both the inode and the pathname, and retrieve by pathname only if the retrieval by inode fails, in which case you'd then update the inode information.
2) write a bash script to create/read/update/delete the comment for a given file.
3) Write another bash function or script that works with ls. I wouldn't call it "ls" though, because you don't want to mess with all the command line options that are available to ls. You're going to be calling ls always as ls -1 in your script, possibly with some sort options, such as -t and/or -r. Anyway, your script will call ls -1 and loop through the output, displaying the file name, and the comment, which you'll look up using the script from 2). You may also want to add file size, but that's up to you.
4) write functions to replace mv and cp (and ln??). These would be wrapper functions that would update the information in your table, and then call the regular Unix versions of these commands, passing along any arguments received by the functions (i.e. "$#"). If you're really paranoid, you'd also do it for things like scp, which can be used (inefficiently) to copy files locally. Still, it's unlikely you'll catch all the possibilities. What if someone else does a mv on your file, who doesn't have the function you have? What if some script moves the file by calling /bin/mv? You can't easily get around these kinds of issues.
Or if you really wanted to get adventurous, you'd write some C/C++ code to do this. It'd be faster, and honestly not all that much more challenging, provided you understand fork() and exec(). I can't recall whether sqlite has a C API. I assume it does. You'd have to tangle with that, too, but since you only have one database, and one table, that shouldn't be too challenging.
You could do it in perl, too, but I'm not sure that it would be that much easier in perl, than in bash. Your actual code isn't that complex, and you're not likely to be doing any crazy regex stuff or string manipulations. There are just lots of small pieces to fit together.
Doing all of this is much more work than should be expected for a person answering a question here, but I've given you the overall design. Implementing it should be relatively easy if you follow the design above and can live with the constraints.

Related

Unix create multiple files with same name in a directory

I am looking for some kind of logic in linux where I can place files with same name in a directory or file system.
For e.g. i create a file abc.txt, so the next time if any process creates abc.txt it should automatically check and make the file named as abc.txt.1 should be created, then next time abc.txt.2 and so on...
Is there a way to achieve this.
Any logic or third party tools are also welcomed.

You ask,
For e.g. i create a file abc.txt, so the next time if any process
creates abc.txt it should automatically check and make the file named
as abc.txt.1 should be created
(emphasis added). To obtain such an effect automatically, for every process, without explicit provision by processes, it would have to be implemented as a feature of the filesystem containing the files. Such filesystems are called versioning filesystems, though typically the details are slightly different from what you describe. Most importantly, however, although such filesystems exist for Linux, none of them are mainstream. To the best of my knowledge, none of the major Linux distributions even offers one as a distribution-supported option.
Although it's a bit dated, see also Linux file versioning?
You might be able to approximate that for many programs via a customized version of the C standard library, but that's not foolproof, and you should not expect it to have universal effect.
It would be an altogether different matter for an individual process to be coded for such behavior. It would need to check for existing files and choose an appropriate name when opening each new file. In doing so, some care needs to be taken to avoid related race conditions, but it can be done. Details would depend on the language in which you are writing.

You can use BASH expression to achieve this. For example if I wanted to make 10 files all with the same name, but having a unique number value I would do the following:
# touch my_file{01..10}.txt
This would create 10 files starting at 01 all the way to 10. This method is also hand for looping over files in a sequence or if your also creating directories.
Now if i am reading you question right your asking that if you move a file or create a file in a directory. you would want the a script to automatically create a new file for you? If that is the case then just use a test and if there is a file move that file and mark it. Me personally I use time stamps to do so.
Logic:
# The [ -f ] tests if the file is present
if [ -f $MY_FILE_NAME ]; then
# If the file is present move the file and give it the PID
# That way the name will always be unique
mv $MY_FILE_NAME $MY_FILE_NAME_$$
mv $MY_NEW_FILE .
else
# Move or make the file here
mv $MY_NEW_FILE .
fi
As you can see the logic is very simple. Hope this helps.
Cheers

I don't know about Your particular use case, but You may try to look at logrotate:
https://wiki.archlinux.org/index.php/Logrotate

Interactive quiz in Bash (Multiple Q's)

I'm teaching an introductory Linux course and have abandoned the paper-based multiple-choice quizzes and have created interactive quizzes in Bash. My quiz script is functional, but kind of quick-and-dirty, and now I'm in the improvement phase and looking for suggestions.
First off, I'm not looking to automate the grading, which certainly simplifies things.
Currently, I have a different script file for each quiz, and the questions are hard-coded. That's obviously terrible, so I created a .txt file holding the questions, delimited by lines with "question 01" etc. I can loop through and use sed -n "/^quest.*$i\$/,/^quest.*$(($i+1))\$/p", but this prints the delimiter lines. I can pipe through sed "/^q/d" or head -n-1|tail -n+2 to get rid of them, but is there a better way?
Second issue: For questions where the answer is an actual command, I'm printing a [user]$ prompt, but for short-answer, I'm using a >. In my text file, for each question, the last line is the prompt to use. Initially, I was thinking I could store the question in a variable and |tail -1 it to get the prompt, but duh, when you store it it strips newlines. I want the cursor to immediately follow the prompt, so I either need to pass it to read -p or strip the final newline from the output. (Or create some marker in the file to differentiate between the $ and > prompt.) One thought I had was to store each question in a separate file and just cat it to display it, making sure there was no newline at the end. That might be kind of a pain to maintain, but it would solve both problems. Thoughts?
Now to how I'm actually running the quiz. This is a Fedora 20 box, and I tried copying bash and setuid-ing it to me so that it would be able to read the quiz script that the students couldn't normally read, but I couldn't get that to work. After some trial and error, I ended up copying touch and setuid-ing it to me, then using that to create their answer file in a "submit" directory with an ACL so new files have o=w so they can write to their answer file (in the quiz with >> echo) but not read it back or access the directory. The only major loophole I see with this is that they can delete their file by name and start the quiz over with no record of having done so. Since I'm not doing any automatic grading, I'm not terribly concerned with the students being able to read the script file, although if I'm storing the questions separately, I suppose I could make a copy of cat and setuid it to read in files that they can't access.
Also, I realize that Bash is not the best choice for this, and learning the required simple input/output for Python or something better would not take much effort. Perhaps that's my next step.

1) You could use
sed -n "/^quest.*$i\$/,/^quest.*$(($i+1))\$/ { //!p }"
Here // repeats the last attempted pattern, which is the opening pattern in the first line of the range and the closing pattern for the rest.
...by the way, if you really want to do this with sed, you better be damn sure that i is a number, or you'll run into code injection problems.
2) You can store multiline command output in a variable without problems. You just have to make sure you quote the variable everafter to avoid shell expansion on it. For example,
QUESTION=$(sed -n "/^quest.*$i\$/,/^quest.*$(($i+1))\$/ { //!p }" questions.txt)
echo -n "$QUESTION" # <-- the double quotes are important here.
The -n option to echo tells echo to not append a newline at the end, which should take care of your prompt problem.
3) Yes, well, hackery breeds more hackery. If you want to lock this down, the first order of business would be to not give students a shell on the test machine. You could put your script behind inetd and have the students fill it out with telnet or something, I suppose, but...really, why bash? If it were me, I'd knock something together with a web server and one of the several gazillion php web quiz frameworks. Although I also have to wonder why it's a problem if students can see the questions and the answers they gave. It's not like all students use the same account and can see each other's answers, is it? (is it?) Don't store an answer key on the same machine and you shouldn't have a problem.

ln has unexpected behavior when using a wildcard

I am planning on filing a bug on coreutils for this, as this behavior is unexpected, and there isn't any practical use for it in the real world... Although it did make me chuckle at first, as I never even knew one could create files with wildcard in their filename. How practical is a filename with a wildcard in it? Who even uses such a feature?
I recently ran a bash command similar to this:
ln -s ../../avatars/* ./
Unfortunately, I did not add the correct amount of "../", so rather than providing me with an informative error, it merely creates a link to a "*" file which does not exist. I would expect this to do that:
ln -s "../../avatars/*" ./
As this is the proper way to address such a filename.
Before a submit a bug on coreutils, I would like the opinion of others. Is there any practical use for this behavior, or should ln provide a meaningful error message?
And yes, I know one can just link to the entire directory, rather than each file within, but I do not wish newly created files to be replicated to the old location. There are only a few files in there that are being linked right now.
Some might even say that using a wildcard in symlinking is bad practice. However, I know the contents of the directory exactly, and this is much quicker than manually doing each file manually.

This isn't a bug.
In the shell, if you use a wildcard pattern that doesn't match anything, then the pattern isn't substituted. For example, if you do this:
echo *.c
If you have no .c files in the current directory, it will just print "*.c". If there are .c files in the current directory, then *.c will be replaced with that list.
For many commands, if you specify files that don't exist it is an error, and you get a message that seems to make sense, like "cannot access *.c". But for ln -s, since it is a symbolic link, the actual file doesn't have to exist, and it goes ahead and makes the link.

Redirect program output without changing directory

Problem
I'm writing a set of scripts to help with automated batch job execution on a cluster.
The specific thing I have is a $OUTPUT_DIR, and an arbitrary $COMMAND.
I would like to execute the $COMMAND such that its output ends up in $OUTPUT_DIR.
For example, if COMMAND='cp ./foo ./bar; mv ./bar ./baz', I would like to run it such that the end result is equivalent to cp ./foo ./$OUTPUT_DIR/baz.
Ideally, the solution would look something like eval PWD="./$OUTPUT_DIR" $COMMAND, but that doesn't work.
Known solutions
[And their problems]
Editing $COMMAND: In most cases the command will be a script, or a compiled C or FORTRAN executable. Changing the internals of these isn't an option.
unionfs, aufs, etc.: While this is basically perfect, users running this won't have root, and causing thousands+ of arbitrary mounts seems like a questionable choice.
copying/ hard/soft links: This might be the solution I will have to use: some variety of actually duplicating the entire content of ./ into ./$OUTPUT_DIR
cd $OUTPUT_DIR; ../$COMMAND : Fails if $COMMAND ever reads files
pipes : only works if $COMMAND doesn't directly work with files; which it usually does
Is there another solution that I'm missing, or is this request actually impossible?
[EDIT:]Chosen Solution
I'm going to go with something where each object in the directory is symbolic-linked into the output directory, and the command is then run from there.
This has the downside of creating a lot of symbolic links, but it shouldn't be too bad.

You can't solve this without making some assumptions about the interface of $COMMAND. There is no single definition of what "output ends up in $OUTPUT_DIR" means. For one program this may be some files, but another program might just print something to stdout and yet another might try sending some data over the internet using some protocol or display something in a GUI and there isn't an obvious way of mapping all of these to "output goes to $OUTPUT_DIR".
So, you need to invent some assumptions and require any $COMMAND implementation to follow them. Then, it may get as simple as requesting that the command accept a parameter such as --target=<DIR>. If your command was some simple command, you would have to create a wrapper script around it to translate that parameter into what the app accepts. cp, mv and a few more utils already accept the parameter --target, so that may be a good starting point.

You cannot set the output directory, you can only set the working directory.
The problem is, once you set the working directory, other references are going to be invalid. For example in your code foo:
cp ./foo ./bar
If you have a specific command, there are workarounds (creating a script that alters arguments, prepending the directory to specific arguments), but in general this is not possible.

Getting linux terminal value from my application

I am developing a Qt application in Linux. I wanted to pass Linux commands to a terminal. That worked but now i also want to get a response from the terminal for this specific command.
For example,
ls -a
As you know this command lists the directories and files of the current working directory. I now want to pass the returned values from the ls call to my application. What is a correct way to do this?

QProcess is the qt class that will let you spawn a process and read the result. There's an example of usage for reading the result of a command on that page.

popen() ， api of linux systerm , return FILE * that you can read it like a file descriptor， may help youp erhaps。

Parsing ls(1) output is dangerous -- make a few files with funny names in a directory and test it out:
touch "one file"
touch "`printf "\x0a\x0a\x0ahello\x0a world"`"
That creates two files in the current working directory. I expect your attempts to parse ls(1) output won't work. This might be alright if you're showing the results to a human, (though a human will be immensely confused if a filename includes output that looks just like ls(1) output!) but if you're trying to present something like an explorer.exe or Finder.app representation of files in the filesystem, this is horribly broken.
Instead, use opendir(3), readdir(3), and closedir(3) to read directory entries yourself. This will be safer, more portable, and (as a side benefit) slightly better performing.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string