How can I export all my issues from an Enterprise GitHub repository to an Excel file? I have tried searching many Stack Overflow answers but did not succeed. I tried this solution too (exporting Git issues to CSV and getting "ImportError: No module named requests" errors. Is there any tool or any easy way to export all the issues to Excel?
To export from a private repo using curl, you can run the following:
curl -i https://api.github.com/repos/<repo-owner>/<repo-name>/issues --header "Authorization: token <token>"
The token can be generated under Personal access tokens
Inspect the API description for all details.
With the official GitHub CLI you can easily export all issues into a CSV format.
brew install gh
Log in:
gh auth login
Change directory to a repository and run this command:
gh issue list --limit 1000 --state all | tr '\t' ',' > issues.csv
In the European .csv files the separator is a semicolon ';', not a comma. Modify the separator as you want.
The hub command-line wrapper for github makes this pretty simple.
You can do something like this:
$ hub issue -f "%t,%l%n" > list.csv
which gives you something like this
$ more issue.csv
Issue 1 title, tag1 tag2
Issue 2 title, tag3 tag2
Issue 3 title, tag1
If that is a one-time task, you may play around with GitHub WebAPI. It allows to export the issues in JSON format. Then you can convert it to Excel (e.g. using some online converter).
Just open the following URL in a browser substituting the {owner} and {repo} with real values:
https://api.github.com/repos/{owner}/{repo}/issues?page=1&per_page=100
It is unfortunate that github.com does not make this easier.
In the mean time, if you have jq and curl, you can do this in two lines using something like the following example that outputs issue number, title and labels (tags) and works for private repos as well (if you don't want to filter by label, just remove the labels={label}& part of the url). You'll need to substitute $owner, $repo, $label, and $username:
# with personal access token = $PAT
echo "number, title, labels" > issues.csv
curl "https://api.github.com/repos/$owner/$repo/issues?labels=$label&page=1&per_page=100" -u "$username:$PAT" \
| jq -r '.[] | [.number, .title, (.labels|map(.name)|join("/"))]|#csv' >> issues.csv
# without PAT (will be prompted for password)
echo "number, title, labels" > issues.csv
curl "https://api.github.com/repos/$owner/$repo/issues?labels=$label&page=1&per_page=100" -u "$username" \
| jq -r '.[] | [.number, .title, (.labels|map(.name)|join("/"))]|#csv' >> issues.csv
Note that if your data exceeds 1 page, it may require additional calls.
I tried the methods described in other comments regarding exporting issues in JSON format. It worked ok but the formatting was somehow screwed up. Then I found in Excel help that it is able to access APIs directly and load the data from the JSON response neatly into my Excel sheets.
The Google terms I used to find the help I needed were "excel power query web.content GET json". I found a How To Excel video which helped a lot.
URL that worked in the Excel query (same as from other posts):
https://api.github.com/repos/{owner}/{repo}/issues?page=1&per_page=100
Personally, I also add the parameter &state=open, otherwise I need to request hundreds of pages. At one point I reached GitHub's limit on unauthenticated API calls/hour for my IP address.
You can also check out the one-liner that I created (it involves GitHub CLI and jq)
gh issue list --limit 10000 --state all --json number,title,assignees,state,url | jq -r '["number","title","assignees","state","url"], (.[] | [.number, .title, (.assignees | if .|length==0 then "Unassigned" elif .|length>1 then map(.login)|join(",") else .[].login end) , .state, .url]) | #tsv' > issues-$(date '+%Y-%m-%d').tsv
Gist with documentation
I have tinkered with this for quite some time and found that Power BI is a good way of keeping the data up to date in the spreadsheet. I had to look into Power BI a little to make this work, because getting the right info out of the structured JSON fields, and collapsing lists into concatenated strings, especially for labels, wasn't super intuitive. But this Power BI query works well for me by removing all the noise and getting relevant info into an easily digestible format that can be reviewed with stakeholders:
let
MyJsonRecord = Json.Document(Web.Contents("https://api.github.com/repos/<your org>/<your repo>/issues?&per_page=100&page=1&state=open&filter=all", [Headers=[Authorization="Basic <your auth token>", Accept="application/vnd.github.symmetra-preview+json"]])),
MyJsonTable = Table.FromRecords(MyJsonRecord),
#"Column selection" = Table.SelectColumns(MyJsonTable,{"number", "title", "user", "labels", "state", "assignee", "assignees", "comments", "created_at", "updated_at", "closed_at", "body"}),
#"Expanded labels" = Table.ExpandListColumn(#"Column selection", "labels"),
#"Expanded labels1" = Table.ExpandRecordColumn(#"Expanded labels", "labels", {"name"}, {"labels.name"}),
#"Grouped Rows" = Table.Group(#"Expanded labels1", {"number","title", "user", "state", "assignee", "assignees", "comments", "created_at", "updated_at", "closed_at", "body"}, {{"Label", each Text.Combine([labels.name],","), type text}}),
#"Removed Other Columns" = Table.SelectColumns(#"Grouped Rows",{"number", "title", "state", "assignee", "comments", "created_at", "updated_at", "closed_at", "body", "Label"}),
#"Expanded assignee" = Table.ExpandRecordColumn(#"Removed Other Columns", "assignee", {"login"}, {"assignee.login"})
in
#"Expanded assignee"
I added and then removed columns in this and did not clean this up - feel free to do that before you use it.
Obviously, you also have to fill in your own organization name and repo name into the URL, and obtain the auth token. I have tested the URL with a Chrome REST plugin and got the token from entering the user and api key there. You can authenticate explicitly from Excel with the user and key if you don't want to deal with the token. I just find it simpler to go the anonymous route in the query setup and instead provide the readily formatted request header.
Also, this works for repos with up to 100 open issues. If you have more, you need to duplicate the query (for page 2 etc) and combine the results.
Steps for using this query:
in a new sheet, on the "Data" tab, open the "Get Data" drop-down
select "Launch Power Query Editor"
in the editor, choose "New Query", "Other Sources", "Blank query"
now you click on "Advanced Editor" and paste the above query
click the "Done" button on the Advanced Editor, then "Close and Load" from the tool bar
the issues are loading in your spreadsheet and you are in business
no crappy third-party tool needed
You can also try https://github.com/remoteorigin/git-issues-downloader but be sure to used the develop branch. The npm version and master branch is buggy.
Or you can use this patched version with
npm install -g https://github.com/mkobar/git-issues-downloader
and then run with (for public repo)
git-issues-downloader -n -p none -u none https://github.com/<user>/<repository>
or for a private repo:
git-issues-downloader -n -p <password or token> -u <user> https://github.com/<user>/<repository>
Works great.
Here is a tool that does it for you (uses the GitHub API):
https://github.com/gavinr/github-csv-tools
Export Pull Requests can export issues to a CSV file, which can be opened with Excel. It also supports GitLab and Bitbucket.
From its documentation:
Export open PRs and issues in sshaw/git-link and
sshaw/itunes_store_transporter:
epr sshaw/git-link sshaw/itunes_store_transporter > pr.csv
Export open pull request not created by sshaw in padrino/padrino-framework:
epr -x pr -c '!sshaw' padrino/padrino-framework > pr.csv
It has several options for filtering what gets exported.
GitHub's JSON API can be queried from directly in Excel using Power Query. It does require some knowledge about how to convert JSON into Excel table format but that's fairly Googlable.
Here's how to first get to the data:
In Excel, on Ribbon, click Data > Get Data > From JSON. In dialog
box, enter API URL ... in format similar to (add parms as you wish):
https://api.github.com/repos/{owner}/{repo}/issues
A dialog box labeled "Access Web content" will appear.
On the left-hand side, click the Basic tab.
In the User name textbox, enter your GitHub username.
In the Password textbox, enter a GitHub password/Personal Access
token.
Click Connect.
Power Query Editor will be displayed with a list of items that say Record.
... now Google around for how to transform accordingly so that the appropriate issue data can be displayed as a single table.
As a one-time task, building on 'hub'-based recommendation from #Chip... on a windows system with GitBash prompt already installed:
Download the latest hub executable (such as Windows 64 bit) https://github.com/github/hub/releases/ and extract it (hub.exe is in the .../bin directory).
Create a github personal access token https://github.com/settings/tokens and copy the token text string to the clipboard.
Create a text file (such as in notepad) to use as the input file to hub.exe... the first line is your github user name and on the 2nd line paste the personal access token, followed by a newline (so that both lines will processed when input to hub). Here I presume the file is infile.txt in the repository's base directory.
Run Git Bash... and remember to cd (change directory) to the repository of interest! Then enter a line like:
<path_to_hub_folder>/bin/hub.exe issue -s all -f "%U|%t|%S|%cI|%uI|%L%n" < infile.txt > outfile.csv
Then open the file with '|' as the column delimiter. (and consider deleting the personal access token on github).
You can do it using the python package PyGithub
from github import Github
token = Github('personal token key here')
repo = token.get_repo('repo-owner/repo-name')
issues = repo.get_issues(state='all')
for issue in issues:
print(issue.url)
Here I got back the URL, you can get back the content instead if you want by changing the '.URL' part.
Then just export the issues links or content to CSV
gh GitHub CLI integrates now jq with --jq <expression> to filter JSON output using a jq expression as documented on GitHub CLI Manual https://cli.github.com/manual/gh_issue_list.
TSV dump.
gh issue list --limit 10 --state all --json title,body --jq '["title","body"], (.[] | [.title,.body]) | #tsv' > issues-$(date '+%Y-%m-%d').tsv
CSV dump
Surprisingly 000D unicode character need to be filtered out with tr $'\x{0D}' ' '.
gh issue list --limit 10 --state all --json title,body --jq '["title","body"], (.[] | [.title,.body]) | #csv' | tr $'\x{0D}' ' ' > issues-$(date '+%Y-%m-%d').csv
Related
I've downloaded some JSON data from Shodan, and only want to retain some fields from it. To explore what I want, I'm running the following, which works-
shodan parse --fields ip,port --separator , "data.json.gz"
However, I now want to output/ export the data; I'm trying to run the following -
shodan parse --fields ip,port -O "data_processed.json.gz" "data.json.gz"
It's requiring me to specify a filter parameter, which I don't need. If I do add an empty filter as so, it tells me data_processes.json.gz doesn't exist.
shodan parse --fields ip,port -f -O "data_processed.json.gz" "data.json.gz"
I'm a bit stumped on how to export only certain fields of my data; how do I go about doing so?
If you only want to output those 2 properties then you can simply pipe them to a file:
shodan parse --fields ip,port --separator , data.json.gz > data_processed.csv
A few things to keep in mind:
You probably want to export the ip_str property as it's a more user-friendly version of the IP address. The ip property is a numeric version of the IP address and aimed at users storing the information in a database.
You can convert your data file into Excel or CSV format using the shodan convert command. For example: shodan convert data.json.gz csv See here for a quick guide: https://help.shodan.io/guides/how-to-convert-to-excel
Can anyone help me which is the best way to implement below utility where I have around 20 Set of Username and Password and I should login to url Say : http://login.com , If the username/password is proper it will redirect to http://login.com/true else it will redirect to http://login.com/false.
How to Automate this process and track update the status to Excel sheet back which are valid User name and Password combination.
Is there anyway we can automate without opening url in the browser (Kind of headless automation)
You can do all this with shell scripting and no browsers involved. For the
should login to url
you can utilize cURL, since it specifically has a feature to follow redirects (-L / --location), and it's also free.
curl --user user:pass https://example.com/a
For the part with
to Automate this process and track update the status to Excel sheet back
you can use the output from the previous step (the page url) and formatted in a CSV file, like so:
echo "$page_url" > results.csv
After that you could rename into xls and then with Excel, gnumeric, or other programs, it is recognized like xls.
ls -R -ltr / | head -50 | awk '{if ($page_url == *"myURL"*) print "true"}' OFS="," > sample.xls
You can find more options for working with Excel files here.
I'm trying to figure out a good way to increase the productivity of my data entry job.
What I am looking to do is come up with a way to scrape data from a PDF and input it into Excel.
More specifically the data I am working with is from grocery store flyers. As it stands now we have to manually enter every deal in the flyer into a database. A sample of a flyer is http://weeklyspecials.safeway.com/customer_Frame.jsp?drpStoreID=1551
What I am hoping to do is have columns for products, price, and predefined options (Loyalty Cards, Coupons, Select Variety... that sort of thing).
Any help would be appreciated, and if I need to be more specific let me know.
After looking at the specific PDF linked to by the OP, I have to say that this is not quite displaying a typical table format.
It contains many images inside the "cells", but the cells are not all strictly vertically or horizontally aligned:
So this isn't even a 'nice' table, but an extremely ugly and awkward one to work with...
Having said that, I'll have to add:
Extracting even 'nice' tables from PDFs in general is extremely difficult...
Standard PDFs do not provide any hints about the semantics of what they draw on a page:
the only distinction that the syntax provides is the distinctions between vector elements (lines, fills,...), images and text.
Whether any character is part of a table or part of a line or just a lonely, single character within an otherwise empty area is not easy to recognize programmatically by parsing the PDF source code.
For a background about why the PDF file format should never, ever be thought of as suitable for hosting extractable, structured data, see this article:
Why Updating Dollars for Docs Was So Difficult (ProPublica-Website)
...but doing so with TabulaPDF works very well!
Having said the above now let me add this:
For an amazing open source family of tools that gets better and better from week to week for extracting tabular data from PDFs (unless they are scanned pages) -- contradicting what I said in my introductionary paragraphs! -- check out TabulaPDF. See these links:
Introducing Tabula: Upload a PDF, get back tabular CSV data. Poof!
Tabula-Extractor: A Command Line Interface to Tabula
Tabula source code repository
Tabula API (upcoming, not ready yet)
Tabula-Extractor is written in Ruby.
In the background it makes use of PDFBox (which is written in Java) and a few other third-party libs.
To run, Tabula-Extractor requires JRuby-1.7 installed.
Installing Tabula-Extractor
I'm using the 'bleeding-edge' version of Tabula-Extractor directly from its GitHub source code repository.
Getting it to work was extremely easy, since on my system JRuby-1.7.4_0 is already present:
mkdir ~/svn-stuff
cd ~/svn-stuff
git clone https://github.com/tabulapdf/tabula-extractor.git git.tabula-extractor
Included in this Git clone will already be the required libraries, so no need to install PDFBox.
The command line tool is in the /bin/ subdirectory.
Exploring the command line options:
~/svn-stuff/git.tabula-extractor/bin/tabula -h
Tabula helps you extract tables from PDFs
Usage:
tabula [options] <pdf_file>
where [options] are:
--pages, -p <s>: Comma separated list of ranges, or all. Examples:
--pages 1-3,5-7, --pages 3 or --pages all. Default
is --pages 1 (default: 1)
--area, -a <s>: Portion of the page to analyze
(top,left,bottom,right). Example: --area
269.875,12.75,790.5,561. Default is entire page
--columns, -c <s>: X coordinates of column boundaries. Example
--columns 10.1,20.2,30.3
--password, -s <s>: Password to decrypt document. Default is empty
(default: )
--guess, -g: Guess the portion of the page to analyze per page.
--debug, -d: Print detected table areas instead of processing.
--format, -f <s>: Output format (CSV,TSV,HTML,JSON) (default: CSV)
--outfile, -o <s>: Write output to <file> instead of STDOUT (default:
-)
--spreadsheet, -r: Force PDF to be extracted using spreadsheet-style
extraction (if there are ruling lines separating
each cell, as in a PDF of an Excel spreadsheet)
--no-spreadsheet, -n: Force PDF not to be extracted using
spreadsheet-style extraction (if there are ruling
lines separating each cell, as in a PDF of an Excel
spreadsheet)
--silent, -i: Suppress all stderr output.
--use-line-returns, -u: Use embedded line returns in cells. (Only in
spreadsheet mode.)
--version, -v: Print version and exit
--help, -h: Show this message
Extracting the table which the OP wants
I'm not even trying to extract this ugly table from the OP's monster PDF. I'll leave it as an excercise to these readers who are feeling adventurous enough...
Instead, I'll demo how to extract a 'nice' table. I'll take pages 651-653 from the official PDF-1.7 specification, here represented with screenshots:
I used this command:
~/svn-stuff/git.tabula-extractor/bin/tabula \
-p 651,652,653 -g -n -u -f CSV \
~/Downloads/pdfs/PDF32000_2008.pdf
After importing the generated CSV into LibreOffice Calc, the spreadsheet looks like this:
To me this looks like the perfect extraction of a table which did spread over 3 different PDF pages. (Even the newlines used within table cells made it into the spreadsheet.)
Update
Here is an ASCiinema screencast (which you also can download and re-play locally in your Linux/MacOSX/Unix terminal with the help of the asciinema command line tool), starring tabula-extractor:
I would like to capture all the commands fired by a user in a session. This is needed for the purpose of auditing.
I used some thing like below,
LoggedIn=`date +"%B-%d-%Y-%M:%H"`
HostName=`hostname`
UNIX_USER=`who am i | cut -d " " -f 1`
echo " Please enter a Change Request Number for which you are looging in : "
read CR_NUMBER
FileName=$HostName-$LoggedIn-$CR_NUMBER-$UNIX_USER
script $FileName
I have put this snippet in .profile file, so that as soon as the user logs in to a SU account this creates the file. The plan is to push this file to a central repository where an auditor can look into those files.
But there are couple of problems in this.
The "script" command spools all the data from the session, for example say, a user cats a property file, It appends all the data of the property file to the auditing file.
Unless user fires the 'exit' command, the data will not be spooled to auditing file, by any chance if user logs out with out firing exit command, the auditing file will be empty.
Is there any better solution for auditing ? History file is not an option since it does not tell me for which Change Request number ( internal to my organisation) the commands are fired. Is there any other way just capture only the commands fired but not the output ?
Some of the previous discussion are here and here
I think this software exactly matches your need:
https://github.com/a2o/snoopy
I'm trying to create a script that will format the output of w32tm.exe /monitor and display in a table the server name, NTP offset, and RefID.
I'm a little unaware of how to go about getting the output from an executable file to format it and was wondering if someone here could help me. Right now I'm trying this:
$executable = w32tm.exe /monitor
$executable | Format-Table -View "Server Name", "NTP offset", "RefID"
How can I manage to get the executable to be formatted in a table to display those specific parts of the exe?
Hi I also needed to do this before I found a function someone made to do this have a look I am sure you can change it to suit your needs or just use it as is .
ps-function
Article on how it works