Import a GitHub repo into Databricks community edition - apache-spark

I am trying to import some data from a public repo in GitHub so that to use it from my Databricks notebooks.
So far I tried to connect my Databricks account with my GitHub as described here, without results though since it seems that GitHub support comes with some non-community licensing. I get the following message when I try to set the GitHub token which is required for the GitHub integration:
The same question has been asked before on the official Databricks forum.
What is the best way to import and store a GitHub repo on databricks community edition?

I managed to solve this using shell commands from the notebook itself. To retrieve the repository for the 1st time I did git clone via HTTPS:
%sh git clone https://github.com/SomeDataRepo/TheData.git --depth 1 --branch=master /dbfs/FileStore/TheData/
Why not SSH? Well SSH requires to setup the SSH keys which was not necessary in my case.
Finally, every time that I need a fresh version of the data I execute a git pull before executing my program:
%sh git -C /dbfs/FileStore/TheData/ pull

assuming you have python installed on your desktop, install the databricks cli, clone the git repo to your local, then use the workspace cli to import the entire repo as a directory.
https://docs.databricks.com/dev-tools/cli/workspace-cli.html

The simplest way is, just import the .dbc file direct into your user workspace on Community Edition, as explained by Databricks here:
Import GitHub repo into Community Edtion Workspace
In GitHub, in the pane to the right, under Releases, click on the
Latest link:
Latest release
Under Assets look for the link to the DBC file
Right click the DBC file's link and copy the link location (there is
no need to download this file)
.dbc file
Back in Databricks, click on the Workspace icon in the
navigational pane to the left
In the Workspace swimlane, click the Home button to open your
home folder. It should open the folder /Users/your-email-address
as in /Users/student#example.com
In the swimlane for your email address, click on the down chevron
and select Import
Import
In the Import Notebooks dialog
Select URL
Paste in the URL copied in step #3 above
Click Import
Once the import is done, select the new folder for this course to
view this course's notebooks.
Which notebook you should start with depends on your courseware and/or instructor.

Related

Import a CSV file using Databricks CLI in Repos

We are using Databricks to generate ETL scripts. One step requires us to upload small csvs into a Repos folder. I can do this manually using the import window in the Repos GUI. However, i would like to do this programmatically using the databricks cli. Is this possible? I have tried using the Workspace API, but this only works for sourcecode files.
Unfortunately it's not possible as of right now, because there is no API for that that could be used by databricks-cli. But you can add and commit files to the Git repository, and then use databricks repos update to pull them inside the workspace.

jupyter notebooks not seen by GitHub

I have some local jupyter notebooks in my Linux virtual machine. I would like to make a repository with them on Github.
I downloaded Github Desktop as it seems easier than using command line.
The issue, is that when I select the file containing my notebooks (which I put on a share folder with my host OS (windows)) to add a repository, then on Github Desktop its still written '0 changed files', hence I can not commit to master. When I publish the repository, then its obviously empty :(.
Any suggestion would help me, I am new to Git.
Thank you very much!
You must add the files to a local clone of your GitHub repository.
Create a repository on GitHub with the name of the project.
Clone project to your desktop using GitHub Desktop.
Move all folder contents from your share folder on Windows to the cloned repository folder in the GitHub folder, probably inside of your Documents folder.
Follow the prompts on the GitHub Desktop client to add files to repository. Commit changes. And push the changes to GitHub's server.
You may find it easier to read the GitHub Desktop tutorial for more information on how to use GitHub. https://help.github.com/desktop/guides/contributing-to-projects/

How do I access the Git included in microclimate?

I want to use microclimate installed on ICP with my local IDE and not the web IDE provided. How do I and my team access the GitLab to work on the code generated by microclimate? How do I commit my changes using my local IDE?
You can find information on how to integrate with your existing IDEs using the following url:
https://microclimate-dev2ops.github.io/howToIDE
Additionally, you can also import your project from GitLab and/or GitHub using the Import Project option and referencing the git repo location. To enable bi-directional code change between Microclimate and GIT, you need to run MicroClimate on ICP and enable the Pipeline.
Hope this helps!
Microclimate does not provide GitLab, but it will work with GitLab. https://docs.gitlab.com/ee/install/kubernetes/gitlab_chart.html provides instructions for installing GitLab onto Kubernetes. Once set up you should be able to interact with GitLab from your local IDE in the same way as you would with any other git server.

creating a readthedocs.io repo in sync with a public gitlab repo

I have a public gitlab project here
https://gitlab.com/parmentelat/minisim2
I tried to add a corresponding project in readthedocs.io, so that a new commit being pushed onto gitlab triggers a doc rebuild on readthedocs
I do this routinely with projects hosted at github and it's really easy - at least under my setup - since readthedocs shows me an updated list of github repos right away, and everything goes smoothly after that.
When trying to import this gitlab project under readthedocs though, I have to chose 'Import manually' as my gitlab projects would not show up.
(In the 'connected services' of my readthedocs settings page, I could find a way to connect to github and to bitbucket; gitlab does not seem supported)
Fair enough, I try this manual import, but at that point no matter how I try to spell the project's URL and what method (git or https) I try to use for importing the project, I get this error message
This repository doesn't have a valid webhook set up. That means it won't be rebuilt on commits to the repository.
You can resync your webhook to fix this.
is what I am trying to do doable at all ?
do I need to do something specific on the gitlab side
thanks for any hint
You can manually set the webhook on gitlab.com:
Click the settings icon for your project
Select "Integrations"
Enter the above URL, select "Push events" and "Enable SSL verification"
Click "Add Webhook"
That should do it.

import svn repository to remote server

Hi just a quick question here
I got an account on www.assembla.com which is svn repository hosting website.
I managed to checkout/commit to remote repository.
Now I am trying to import my existing local svn rep, to remote server.
I cant use "svnadmin load" since it expecting to find local target not URL.
I tried svn+ssh but it failed to connect.
Among other things I am behind proxy.
my repository is here: https://subversion.assembla.com/svn/xxx/
Do you know how I can import my old repository?
Thanks!
I believe you can only import into a new SVN repository.
Click the Admin tab.
Click Tools.
For Repositories > Source/SVN, click the Add button.
Click the new Source/SVN tab that appears at the top. If you already have an existing SVN repository, the new tab's name would be appended with "2" or the next available number (e.g., "Source/SVN2").
Click Import/Export.
The Import screen (as shown below) is self-explanatory. Hope it works for you
How can I import or export a subversion repository?
How can I import or export a
subversion repository? Trac tickets?
You can find forms for importing and
exporting svn repositories in Trac.
Go to your Trac and log in as a space
owner. You will see an Admin tab on
the top right. Select Admin, and
select “Data Import/Export” from the
left menu. There is a link to export
the svn repository, and a form to
upload a zipped Subversion repository
dump. There are also forms for
uploading and exporting trac
directories. We currently use trac
0.10.4.

Resources