Obtaining a flat list of all blobs in a .git/objects/ folder

Obtaining a flat list of all blobs in a .git/objects/ folder - linux

In the .git/objects/ folder there are many folders with files within such as ab/cde.... I understand that these are actually blobs abcde...
Is there a way to obtain a flat file listing of all blobs under .git/objects/ with no / being used a delimitor between ab and cde in the example above? For e.g.
abcde....
ab812....
74axs...
I tried
/.git/objects$ du -a .
This does list recursively all folders and files within the /objects/ folder but the blobs are not listed since the command lists the folder followed by the filename (as the OS recognizes them, as opposed to git). Furthermore, the du command does not provide a flat listing in a single column -- it provides the output in two columns with a numeric entry (disk usage) in the first column.

I think you should start round here (git version 2.37.2):
git rev-list --all --objects --filter=object:type=blob
Doing it this way offers the advantage of not only checking the directory where the unpacked objects are but also the objects that are already packed (which are not in that directory anymore).

If you are in the .git/objects/ folder
Try this.
find . -type f | sed -e 's/.git\/objects\///' | sed -e 's/\///'
sed -e requires the sed script, which means a find/replace pattern.
's/.git\/objects\///' finds .git/objects/ and replace it to '' which is nothing. therefore sed command remove the pattern.
\ in the pattern is an escape character.
After first sed command ends,
the results will be (in linux.)
61/87c3f3d6c61c1a6ea475afb64265b83e73ec26
To remove / which refers a directory sign,
sed -e 's/\///'
If you are in the directory which contains .git
find .git/objects/ -type f | sed -e 's/.git\/objects\///' | sed -e 's/\///'
try this.

Related

How to delete folders except two with linux

I have many directories of backup starting with "backup_".
I want to keep only the two last created folders.
I did this command to show the last two created:
ls -1 -t -d */ | head -2
The problem is i don't know how to exclude the result of that command from remove command (rm -rf | ...).
I know grep -v only works with strings.

In general, xargs is the tool you want to use to pass a generated list of names as arguments to a command. In your case, you just need to invert the head -2 to a command that prints everything except the first 2 lines. eg:
cmd-to-generate-file-list | sed -e 1,2d | xargs rm
The sed will delete the first two lines, and xargs will call rm with each line of output as an argument. Note that it is not generally safe to use ls to generate the file list, but that is a different issue entirely.

A zsh specific approach:
setopt extended_glob # Turn on extended globbing if it's not already enabled
dirs=( backup_*(#q/om) ) # Match only directories, sorted by modification time - newest first
rm -rf "${dirs[#]:2}" # Delete all but the first two elements of that array of directory names
See the documentation for more on zsh glob qualifiers like the above uses. They can make things with filenames that are tedious or difficult to do in other shell dialects trivial.

Replace spaces in all files in a directory with underscores

I have found some similar questions here but not this specific one and I do not want to break all my files. I have a list of files and I simply need to replace all spaces with underscores. I know this is a sed command but I am not sure how to generically apply this to every file.
I do not want to rename the files, just modify them in place.
Edit: To clarify, just in case it's not clear, I only want to replace whitespace within the files, file names should not be changed.

find . -type f -exec sed -i -e 's/ /_/g' {} \;
find grabs all items in the directory (and subdirectories) that are files, and passes those filenames as arguments to the sed command using the {} \; notation. The sed command it appears you already understand.
if you only want to search the current directory, and ignore subdirectories, you can use
find . -maxdepth 1 -type f -exec sed -i -e 's/ /_/g' {} \;

This is a 2 part problem. Step 1 is providing the proper sed command, 2 is providing the proper command to replace all files in a given directory.
Substitution in sed commands follows the form s/ItemToReplace/ItemToReplaceWith/pattern, where s stands for the substitution and pattern stands for how the operation should take place. According to this super user post, in order to match whitespace characters you must use either \s or [[:space:]] in your sed command. The difference being the later is for POSIX compliance. Lastly you need to specify a global operation which is simply /g at the end. This simply replaces all spaces in a file with underscores.
Substitution in sed commands follows the form s/ItemToReplace/ItemToReplaceWith/pattern, where s stands for the substitution and pattern stands for how the operation should take place. According to this super user post, in order to match whitespace characters you must use either just a space in your sed command, \s, or [[:space:]]. The difference being the last 2 are for whitespace catching (tabs and spaces), with the last needed for POSIX compliance. Lastly you need to specify a global operation which is simply /g at the end.
Therefore, your sed command is
sed s/ /_/g FileNameHere
However this only accomplishes half of your task. You also need to be able to do this for every file within a directory. Unfortunately, wildcards won't save us in the sed command, as * > * would be ambiguous. Your only solution is to iterate through each file and overwrite them individually. For loops by default should come equipped with file iteration syntax, and when used with wildcards expands out to all files in a directory. However sed's used in this manner appear to completely lose output when redirecting to a file. To correct this, you must specify sed with the -i flag so it will edit its files. Whatever item you pass after the -i flag will be used to create a backup of the old files. If no extension is passed (-i '' for instance), no backup will be created.
Therefore the final command should simply be
for i in *;do sed -i '' 's/ /_/g' $i;done
Which looks for all files in your current directory and echos the sed output to all files (Directories do get listed but no action occurs with them).

Well... since I was trying to get something running I found a method that worked for me:
for file in `ls`; do sed -i 's/ /_/g' $file; done

Compare files content with similar names on two folders

I have two folders (I'll use database names as example):
MongoFolder/
CassandraFolder/
These two folders have similar files inside like:
MongoFolder/
MongoFile
MongoStatus
MongoConfiguration
MongoPlugin
CassandraFolder/
CassandraFile
CassandraStatus
CassandraConfiguration
Those files have content also very similar, only changing the name of the database for example, so they all have code or configuration only changing the name Mongo for Cassandra.
How can I compare this two folders, so the result is the files missing from one to the other (for example the file CassandraPlugin for the CassandraFolder) and also that the contents of the files alike, have to be similar, only changing the database name.

This will give you the names of the missing files (minus the database name):
find MongoFolder/ CassandraFolder/ | \
sed -e s/Mongo//g -e s/Cassandra//g | sort | uniq -u
Output:
Folder/Plugin

the following provides a full diff, including missing files and changed content:
cp -r CassandraFolder cmpFolder
# rename files
find cmpFolder -name "Cassandra*" -print | while read file; do
mongoName=`echo "$file" | sed 's/Cassandra/Mongo/'`
mv "$file" "$mongoName"
done
# fix content
find cmpFolder -type f -exec perl -pi -e 's/Cassandra/Mongo/g' {} \;
# inspect result
diff -r MongoFolder cmpFolder # or use a gui tool like kdiff3
I haven't tested this though, feel free fix bugs or to ask if something specific is unclear.
Instead of mv you can also use rename but that's different on different flavours of linux.

Listing entries in a directory using grep

I'm trying to list all entries in a directory whose names contain ONLY upper-case letters. Directories need "/" appended.
#!/bin/bash
cd ~/testfiles/
ls | grep -r *.*
Since grep by default looks for upper-case letters only (right?), I'm just recursively searching through the directories under testfiles for all names who contain only upper-case letters.
Unfortunately this doesn't work.
As for appending directories, I'm not sure why I need to do this. Does anyone know where I can start with some detailed explanations on what I can do with grep? Furthermore how to tackle my problem?

No, grep does not only consider uppercase letters.
Your question I a bit unclear, for example:
from your usage of the -r option, it seems you want to search recursively, however you don't say so. For simplicity I assume you don't need to; consider looking into #twm's answer if you need recursion.
you want to look for uppercase (letters) only. Does that mean you don't want to accept any other (non letter) characters, but which are till valid for file names (like digits or dashes, dots, etc.)
since you don't say th it i not permissible to have only on file per line, I am assuming it is OK (thus using ls -1).
The naive solution would be:
ls -1 | grep "^[[:upper:]]\+$"
That is, print all lines containing only uppercase letters. In my TEMP directory that prints, for example:
ALLBIG
LCFEM
WPDNSE
This however would exclude files like README.TXT or FILE001, which depending on your requirements (see above) should most likely be included.
Thus, a better solution would be:
ls -1 | grep -v "[[:lower:]]\+"
That is, print all lines not containing an lowercase letter. In my TEMP directory that prints for example:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM
WPDNSE
~DFA0214428CD719AF6.TMP
Finally, to "properly mark" directories with a trailing '/', you could use the -F (or --classify) option.
ls -1F | grep -v "[[:lower:]]\+"
Again, example output:
ALLBIG
ALLBIG-01.TXT
ALLBIG005.TXT
CRX_75DAF8CB7768
LCFEM/
WPDNSE/
~DFA0214428CD719AF6.TMP
Note a different option would to be use find, if you can live with the different output (e.g. find ! -regex ".*[a-z].*"), but that will have a different output.

The exact regular expression depend on the output format of your ls command. Assuming that you do not use an alias for ls, you can try this:
ls -R | grep -o -w "[A-Z]*"
note that with -R in ls you will recursively list directories and files under the current directory. The grep option -o tells grep to only print the matched part of the text. The -w options tell grep to consider as match only for whole words. The "[A-Z]*" is a regexp to filter only upper-cased words.
Note that this regexp will print TEST.txt as well as TEXT.TXT. In other words, it will only consider names that are formed by letters.

It's ls which lists the files, not grep, so that is where you need to specify that you want "/" appended to directories. Use ls --classify to append "/" to directories.
grep is used to process the results from ls (or some other source, generally speaking) and only show lines that match the pattern you specify. It is not limited to uppercase characters. You can limit it to just upper case characters and "/" with grep -E '^[A-Z/]*$ or if you also want numbers, periods, etc. you could instead filter out lines that contain lowercase characters with grep -v -E [a-z].
As grep is not the program which lists the files, it is not where you want to perform the recursion. ls can list paths recursively if you use ls -R. However, you're just going to get the last component of the file paths that way.
You might want to consider using find to handle the recursion. This works for me:
find . -exec ls -d --classify {} \; | egrep -v '[a-z][^/]*/?$'
I should note, using ls --classify to append "/" to the end of directories may also append some other characters to other types of paths that it can classify. For instance, it may append "*" to the end of executable files. If that's not OK, but you're OK with listing directories and other paths separately, this could be worked around by running find twice - once for the directories and then again for other paths. This works for me:
find . -type d | egrep -v '[a-z][^/]*$' | sed -e 's#$#/#'
find . -not -type d | egrep -v '[a-z][^/]*$'

Remove all files of a certain type except for one type in linux terminal

On my computer running Ubuntu, I have a folder full of hundreds files all named "index.html.n" where n starts at one and continues upwards. Some of those files are actual html files, some are image files (png and jpg), and some of them are zip files.
My goal is to permanently remove every single file except the zip archives. I assume it's some combination of rm and file, but I'm not sure of the exact syntax.

If it fits into your argument list and no filenames contain colon a simple pipe with xargs should do:
file * | grep -vi zip | cut -d: -f1 | tr '\n' '\0' | xargs -0 rm

First find to find matching file, then file to get file types. sed eliminates other file types and also removes everything but the filenames from the output of file. lastly, rm for deleting:
find -name 'index.html.[0-9]*' | \
xargs file | \
sed -n 's/\([^:]*\): Zip archive.*/\1/p' |
xargs rm

I would run:
for f in in index.html.*
do
file "$f" | grep -qi zip
[ $? -ne 0 ] && rm -i "$f"
done
and remove -i option if you feel confident enough

Here's the approach I'd use; it's not entirely automated, but it's less error-prone than some other approaches.
file * > cleanup.sh
or
file index.html.* > cleanup.sh
This generates a list of all files (excluding dot files), or of all index.html.* files, in your current directory and writes the list to cleanup.sh.
Using your favorite text editor (mine happens to be vim), edit cleanup.sh:
Add #!/bin/sh as the first line
Delete all lines containing the string "Zip archive"
On each line, delete everything from the : to the end of the line (in vim, :%s/:.*$//)
Replace the beginning of each line with "rm" followed by a space
Exit your editor, updating the file.
chmod +x cleanup.sh
You should now have a shell script that will delete everything except zip files.
Carefully inspect the script before running it. Look out for typos, and for files whose names contain shell metacharacters. You might need to add quotation marks to the file names.
(Note that if you do this as a one-line shell command, you don't have the opportunity to inspect the list of files you're going to delete before you actually delete them.)
Once you're satisfied that your script is correct, run
./cleanup.sh
from your shell prompt.

for i in index.html.*
do
$type = file $i;
if [[ ! $file =~ "Zip" ]]
then
rm $file
fi
done
Change the rm to a ls for testing purposes.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string