git sparse-checkout ignore specific file type - linux

I have a git repository with a bunch of large csv in them, which I don't want to clone, so I came across git sparse-checkout and this post: https://github.blog/2020-01-17-bring-your-monorepo-down-to-size-with-sparse-checkout/
From this post I took following:
git clone --no-checkout https://github.com/john_doe/repo-with-big-csv.git
cd repo-with-big-csv
git sparse-checkout init --cone
Then I edit the .git/info/sparse-checkout and add the following (adapted from example in page above):
/*
!**/*.csv
But it doesn't seem to work properly. After git pull some folders are cloned, some are not. I also noticed a warning, when I do git sparse-checkout list I get:
warning: unrecognized pattern: '**/*.csv'
warning: disabling cone pattern matching
/*
!**/*.csv
What's the proper way to ignore a certain file type only?

See "Git sparse checkout with exclusion" and make sure to use Git 2.26.x, which has some fixes for the git sparse-checkout command.
When in cone mode, the git sparse-checkout set subcommand takes a list of directories instead of a list of sparse-checkout patterns
If core.sparseCheckoutCone=true, then Git will parse the sparse-checkout file expecting patterns of these types. Git will warn if the patterns do not match.
You need to only use restrict patterns based on folder prefix matches.
The OP Frode Akselsen adds in the comments:
my example is actually working: the folders that don't show up just contain only .csv files, hence, after applying the rules in .git/info/sparse-checkout, nothing is in the folder anymore and therefore Git doesn't show the folder.
I confirm Git will only show content: if folder has no file (no "content"), said folder is not visible.

Related

How to check if a file in git commit is a symbolic link without checkout

Is there a way to check if a file in git commit is a symbolic link without checking out the commit content?
Background:
There is a hook which is used to check C++ code formatting in each git commit that is pushed.
So far the algorithm is:
Get a list of files in a commit with git diff-tree --no-commit-id --name-only --diff-filter=d -r ${commit}.
Processes each file (content) in git commit, selected based on a file extension, using git show ${commit}:${file}.
Problem:
A file with .cpp extension may be a symbolic link, in which case it shall not be processed.
NOTE: I know that having source files as a symbolic link is not a good idea.

I cannot clone git tree

I have a question about git, I tried to clone a tree but without success.
git clone https://github.com/cer/event-sourcing-examples/tree/d2077e21aa677a00095f90250470ff011c132ab8/java-spring
I cloned the project
git clone https://github.com/cer/event-sourcing-examples
and I tried to switch to that tree but no effect
Would you have any suggestions ?
Best regards
Git cannot clone a tree directly. You need to clone the entire repository, and then check out a commit that uses the tree you want. For the sake of reducing confusions, though, do note that there is a difference between the terms "tree" and "commit", though:
A tree is a Git object representing a directory, and contains links to blobs (files) and other trees. A tree is not necessarily the root directory of the repository.
A commit object contains a link to the root tree of the repository, and some extra information such as commit message, dates and other headers.
You can only check out commits. Few Git commands deal directly with tree objects (git cat-file and git ls-tree being among the exceptions). However, the object ID in your GitHub URL is indeed the ID of a commit, so that's not a problem.
What you can do, then, is check out the commit you want into a new branch after you've cloned the repository:
git checkout -b test-branch d2077e21
If the problem you're trying to solve is just fetching a single commit (or tree) from a remote repository, then you're out of luck, because Git's remote protocol does not support that operation. If anything, if you can insert a branch into the remote repository at the commit you want, you can clone that branch directly, without any history:
git clone -b test-branch --depth 1 https://github.com/cer/event-sourcing-examples
If you can't do that, however, then you're still out of luck. The remote protocol only allows referencing named refs, not arbitrary commits.
Check if below things helps.Am using a GIT bash here.
Clone the repository.
git clone https://github.com/cer/event-sourcing-examples.git
Enter that directory
cd event-sourcing-examples/
Switch the branch(i am assuming by tree you mean branch)
git checkout wip-vagrant wip-vagrant is a branch name
To get the update you have to issue a pull command.
git pull
If you directly want to clone the branch then follow the instructions in above comment(Micheal).
git clone -b <branch> <remote_repo>
Example:
git clone -b my-branch git#github.com:user/myproject.git
Alternative (no public key setup needed):
git clone -b my-branch https://git#github.com/username/myproject.git
If your goal is just to get a copy of the repo at a particular commit...
While you can't use clone, you can download a zip file of the repo at a particular commit.
This method works on GitHub.
This and other approaches can be found at:
https://coderwall.com/p/xyuoza/git-cloning-specific-commits
TL;DR
Navigate to the tree view of the sha you want.
https://github.com/<repo_name>/tree/<commit_sha>
Download the zip file.
Don't clone.
Github Tree View
Open the repo and click the "commits" link
(in the bar that says "commits branches packages, etc.)
Select the commit you want. This will take you to the view showing the changes.
In the url you will see something like this:
https://github.com/Colt/webpack-demo-app/commit/eb66c0dc93141080f5b1abb335ec998a1e91d72e
- Note the sha in the url is preceeded by the word "commit".
Replace the word "commit" with the word "tree" to put yourself in the
tree view.
- Finally, click on the green "Clone or download" button
and Download the ZIP. Don't try to clone.
This will download the entire repo as it was at that commit.
First, you need to get the complete repo and get checkout the repo to commit_sha.
git clone -n <repo_name>
git checkout <commit_sha>

How can I verify git clone is working correctly?

I'm following the documentation provided here by git to setup a bare git repository in a folder called root.
I started in the root directory where I ran
git init
git -A *
git commit -m "test"
I then ran git status and all appears good.
Next I ran the line from the documentation at a directory one level above the repo I created above.
git clone --bare root root.git
This created root.git but I cannot see any evidence that anything was cloned I just see a set of files and directories when I cd root.git.
I don't know how to verify it was actually cloned, and if it was why can't I see the original files?
When you do --bare --- you are telling git to clone just the git portion -
This is the option you use when you want to have a remote repository that does not include a workspace.
If you want to verify that it actually cloned your changes, you'll want to clone it again in a different directory - without the --bare flag
I would recommend using the full path to do this:
cd /path/to/some/workspace
git clone /path/to/your/root.git successful-git-clone #that last bit is optional
This will put the workspace contents of root.git into a folder named successful-git-clone/ - without that last bit, it will default to root/ -
Even if you are in a bare repository, some git commands works and you could do a git branch to see if you have all your branches or git log to look at your commits...

Git ignore and changing the history (on Windows)

I've already read several posts about this here (like Git ignore & changing the past, How to remove files that are listed in the .gitignore but still on the repository?, and Applying .gitignore to committed files), but they have several problems:
Commands that only work on Linux.
Incomplete commands (like the first post I've linked to).
Only for one file.
I have pretty much no experience with Git so I was hoping for some help here.
What I'm basically trying to do is rescue one of my projects history. It's currently Hg and I converted it to Git with Hg-Git (all very easy) and it includes the history (great!). However, I also added a .gitignore file and added several new files & folders that I want completely gone from the history (like the bin and obj folders, but also files from ReSharper). So I'm looking for a way to apply the .gitignore file to all of my history. The commands should work on Windows as I have no intention of installing Linux for this.
No need to add the .gitignore in the history (there is no added value to do it), just add it for your future commits.
For the remove of files and directories in your history, use bfg-repo-cleaner which is fast, easy and works very well on Windows (done in scala).
It will do the job for you!
This is working for me:
Install hg-git.
cd HgFolder
hg bookmark -r default master
mkdir ../GitFolder
cd ../GitFolder
git init --bare
cd ../HgFolder
hg push ../GitFolder
Move all files from GitFolder to a '.git' folder (in this GitFolder) and set this folder to hidden (not the subfolders and files).
cd ../GitFolder
git init
git remote add origin https://url.git
Copy all current content (including .gitignore) to GitFolder.
git add .
git commit -m "Added existing content and .gitignore".
git filter-branch --index-filter "git rm --cache d -r --ignore-unmatch 'LINES' 'FROM' 'GITIGNORE'" --prune-empty --tag-name-filter cat -- --all
git rm -r --cached .
git add .
git gc --prune=now --aggressive
git push origin master --force
There is probably an easier way to do this and it might not be perfect but this had the result I wanted.

Git clone without including top/parent folder

We have a repo in git where the project is contained in a folder called Project. We'd like to be able to release the code to a production server, by cloning the repo, without including the "Project" folder, but with everything below it. Is this possible? The destination directory name is /var/www, which is unrelated to anything in the project. Unfortunately I can't just do a symbolic link because of the nature of our hosting provider (which we'll change soon).
My answer take the assumption that you have a git repository whose content is the following:
/.gitignore
/Project
/Project/index.php
/ProjectB
/ProjectB/pom.xml
If you don't need history at all in that copy of your repository, there is the git archive command which can do what you want except its output its data in tar or zip format:
git archive [--format=<fmt>] [--list] [--prefix=<prefix>/] [<extra>]
[-o <file> | --output=<file>] [--worktree-attributes]
[--remote=<repo> [--exec=<git-upload-archive>]] <tree-ish>
[<path>…]
Like:
git archive --format=zip --remote=git#foobar.git master -- Project | unzip
However, the git clone command does not accept a repository path, and I think it's not really git like to export only a tree view of some branch. You would probably need a submodule making Project an independent git repository, or like the git archive example, get only what you want but without versioning (which can be questionable on a production server).
Instead, you can do that:
Clone your repository to whatever path, say /opt/foobar.
Create a symbolic link of /opt/foobar/Project in /var/www.
Or reference the /opt/foobar/Project in your apache configuration (to avoid the symlink) instead of plain /var/www.

Resources