Is it possible to search for code or text in GitLab inside of files? I can search for files, issues, milestones, etc., but could not find a way to search for code in source files or text in the documentation i.e .doc files.
To search files on GitLab, hit URL https://gitlab.com/search?utf8=%E2%9C%93&search_code=true&repository_ref={BranchName}
Select respective group and project from the dropdown.
Enter the text to be searched and click on 'Search' button.
It was announced in 5.2:
https://about.gitlab.com/2013/05/22/gitlab-5-dot-2-released/
And I can confirm code search is working with 8.4
We have a paid gitlab account such that our domain is gitlab.ourcompany.com - so this answer might not apply to all. There have been a few occassions where I needed to search for a "string" across all files in a Gitlab project, but there is no clear 'search' button. The search bar only searches for matching project names - so not what I was looking for. This method below is the easiest to use and easiest to remember.
https://gitlab.xxxxx.com/search
Alternatively you can use the python-gitlab library to search text in the projects that you need:
import gitlab
def search(gitlab_server, token, file_filter, text, group=None, project_filter=None):
return_value = []
gl = gitlab.Gitlab(gitlab_server, private_token=token)
if (project_filter == '') and (group == ''):
projects = gl.projects.list(all=True)
else:
group_object = gl.groups.get(group)
group_projects = group_object.projects.list(search=project_filter)
projects = []
for group_project in group_projects:
projects.append(gl.projects.get(group_project.id))
for project in projects:
files = []
try:
files = project.repository_tree(recursive=True, all=True)
except Exception as e:
print(str(e), "Error getting tree in project:", project.name)
for file in files:
if file_filter == file['name']:
file_content = project.files.raw(file_path=file['path'], ref='master')
if text in str(file_content):
return_value.append({
"project": project.name,
"file": file['path']
})
return return_value
Complete example can be found here: gitlab-search
Related
im trying to access text content from a ttk.Notebook.
I read an unkonwn number of text files (<20), and create a new tab for every .txt-file.
And add the content of the .txt to a Text-widget for every tab.
os.chdir('C://Users//Public//Documents')
myNotes = glob.glob('*.txt')
myNotes.append('+')
self.notebook = ttk.Notebook(self.master)
for files in myNotes:
if files != '+':
with open('C://Users//Public//Documents//'+files,'r') as f:
value = f.read()
else:
value=''
self.notebookTab = ttk.Frame(self.notebook)
self.notebook.add(self.notebookTab, text=files)
self.text = Text(self.notebookTab, bd=0, wrap='word')
self.text.pack(fill='both', expand=True)
self.text.insert('1.0', value)
self.notebook.pack(fill='both', expand=True)
I can get the name of the active tab (Eg. name of the text-file) with this:
activeTabName = self.notebook.tab(self.notebook.select(), "text")
But I cant figure out how to get the text of the Text-widget associated with the active tab.
What I like to accomplish is to be able to modify the content of one or several text-files, and save the new content to the correct .txt-file.
Anyone have any idéas?
I found a way to achive what i wished to do.
I saved my Text-widgets in a list. And then fetched the index from the tabs and got that index from my list. Eg:
self.textLista = []
...
self.textLista.append(self.text)
...
activeTabNo = self.notebook.index("current")
self.textLista[activeTabNo].get('1.0', END+'-1c'))
This might not be a good way of doing it, but atleast it works for my purpouse.
I am trying to create a small program that can locate http and https links in a piece of text. Just beginning to learn about regular expressions but i cant understand what i am doing wrong with my code. Rather than displaying the link it displays "no website found". Any help is greatly appreciated.
import re
correctURL = re.compile(r'(HTTPS://|HTTP://) \S+', re.I)
myURL = "HTTPS://w"
match = correctURL.search(myURL)
if match:
print("The website found was:" + match.group(0))
else:
print("No website was found")
Only two minor modifications are necessary:
(i) Omitting the space from the pattern, and grouping the "rest" of the URL so that it can be referenced later
(ii) When a match is found, we print the 2nd group. (group(0) is the whole match, group(1) is the 1st etc.)
correctURL = re.compile(r'(HTTPS://|HTTP://)(\S+)', re.I)
myURL = "HTTPS://w"
match = correctURL.search(myURL)
if match:
print("The website found was:" + match.group(2))
else:
print("No website was found")
I wonder if this is already what you need, or you may want to extract the domain name as well, as suggested by the wording "website".
I have a small python arcade game that I've converted to a standalone .exe using pyInstaller. It works fine with pygame, but the issue is that I use pickle to save highscores. I was originally using cmd for user input, so in cmd it would say "Please type your name" and whatever you typed would be stored in a separate file using pickle. Two problems: I can't use cmd in the standalone .exe (and it looks ugly anyways), and when I store it in a separate file with pickle, I don't think it's being included in the standalone. I say "think" because the code never makes it's way past the user input section.
How can I get the user input to appear on the screen (in my own font and location) rather than in cmd?
and
How can I include the user input file (which is stored with pickle) to be included in the .exe?
This is what I have currently (all within the main loop):
if lives == 0:
username = input("Please type your name: ")
highscore = {username: points}
try:
with open("C:/Python35/highscore.txt", "rb") as highscoreBest:
highscoreBest = pickle.load(highscoreBest)
except EOFError:
with open("C:/Python35/highscore.txt", "wb") as highscoreBest:
pickle.dump(highscore, highscoreBest)
for (k, v), (k2, v2) in zip(highscore.items(), highscoreBest.items()):
if v >= v2:
with open("C:/Python35/highscore.txt", "wb") as highscoreBest:
pickle.dump(highscore, highscoreBest)
with open("C:/Python35/highscore.txt", "rb") as highscoreBest:
highscoreBest = pickle.load(highscoreBest)
for key, value in highscoreBest.items():
print("%s has the highscore of %s!" % (key, value))
highscoreText = highscorefont.render("Highscore %s" % (value), 1, textcolor)
gameOverText = font.render("GAME OVER", 1, textcolor)
scoreText = font.render("Score %s" % (points), 1, textcolor)
while 1:
screen.blit(gameOverText, (200, 400))
screen.blit(scoreText, (225, 450))
screen.blit(highscoreText, (235,500))
for event in pygame.event.get():
if event.type == pygame.QUIT: sys.exit()
pygame.display.flip()
time.sleep(1)
Thank you to all who reply.
looks like you want to have GUI in your game. Read this http://www.pygame.org/wiki/gui, it should help you.
For example,
OcempGUI
Links
Home Page: http://ocemp.sourceforge.net/gui.html
Source: http://sourceforge.net/project/showfiles.php?group_id=100329&package_id=149654
Installation http://ocemp.sourceforge.net/manual/installation.html
Regarding text file for highscores you can include into the bundle along with executable file, see https://pythonhosted.org/PyInstaller/spec-files.html#adding-files-to-the-bundle
Adding Data Files
To have data files included in the bundle, provide a list that describes the files as the value of the datas= argument to Analysis. The list of data files is a list of tuples. Each tuple has two values, both of which must be strings:
The first string specifies the file or files as they are in this system now.
The second specifies the name of the folder to contain the files at run-time.
For example, to add a single README file to the top level of a one-folder app, you could modify the spec file as follows:
a = Analysis(...
datas=[ ('src/README.txt', '.') ],
...
)
You have made the datas= argument a one-item list. The item is a tuple in which the first string says the existing file is src/README.txt. That file will be looked up (relative to the location of the spec file) and copied into the top level of the bundled app.
I want to extract text from word documents that were edited in "Track Changes" mode. I want to extract the inserted text and ignore the deleted text.
Running the below code I saw that paragraphs inserted in "track changes" mode return an empty Paragraph.text
import docx
doc = docx.Document('C:\\test track changes.docx')
for para in doc.paragraphs:
print(para)
print(para.text)
Is there a way to retrieve the text in revisioned inserts (w:ins elements) ?
I'm using python-docx 0.8.6, lxml 3.4.0, python 3.4, Win7
Thanks
I was having the same problem for years (maybe as long as this question existed).
By looking at the code of "etienned" posted by #yiftah and the attributes of Paragraph, I have found a solution to retrieve the text after accepting the changes.
The trick was to get p._p.xml to get the XML of the paragraph and then using "etienned" code on that (i.e retrieving all the <w:t> elements from the XML code, which contains both regular runs and <w:ins> blocks).
Hope it can help the souls lost like I was:
from docx import Document
try:
from xml.etree.cElementTree import XML
except ImportError:
from xml.etree.ElementTree import XML
WORD_NAMESPACE = "{http://schemas.openxmlformats.org/wordprocessingml/2006/main}"
TEXT = WORD_NAMESPACE + "t"
def get_accepted_text(p):
"""Return text of a paragraph after accepting all changes"""
xml = p._p.xml
if "w:del" in xml or "w:ins" in xml:
tree = XML(xml)
runs = (node.text for node in tree.getiterator(TEXT) if node.text)
return "".join(runs)
else:
return p.text
doc = Document("Hello.docx")
for p in doc.paragraphs:
print(p.text)
print("---")
print(get_accepted_text(p))
print("=========")
Not directly using python-docx; there's no API support yet for tracked changes/revisions.
It's a pretty tricky job, which you'll discover if you search on the element names, perhaps 'open xml w:ins' for a start, that brings up this document as the first result:
https://msdn.microsoft.com/en-us/library/ee836138(v=office.12).aspx
If I needed to do something like that in a pinch I'd get the body element using:
body = document._body._body
and then use XPath on that to return the elements I wanted, something vaguely like this aircode:
from docx.text.paragraph import Paragraph
inserted_ps = body.xpath('./w:ins//w:p')
for p in inserted_ps:
paragraph = Paragraph(p, None)
print(paragraph.text)
You'll be on your own for figuring out what XPath expression will get you the paragraphs you want.
opc-diag may be a friend in this, allowing you to quickly scan the XML of the .docx package. http://opc-diag.readthedocs.io/en/latest/index.html
the below code from Etienne worked for me, it's working directly with the document's xml (and not using python-docx)
http://etienned.github.io/posts/extract-text-from-word-docx-simply/
I needed a quick solution to make text surrounded by "smart tags" visible to docx's text property, and found that the solution could also be adapted to make some tracked changes visible.
It uses lxml.etree.strip_tags to remove surrounding "smartTag" and "ins" tags, and promote the contents; and lxml.etree.strip_elements to remove the whole "del" elements.
def para2text(p, quiet=False):
if not quiet:
unsafeText = p.text
lxml.etree.strip_tags(p._p, "{*}smartTag")
lxml.etree.strip_elements(p._p, "{*}del")
lxml.etree.strip_tags(p._p, "{*}ins")
safeText = p.text
if not quiet:
if safeText != unsafeText:
print()
print('para2text: unsafe:')
print(unsafeText)
print('para2text: safe:')
print(safeText)
print()
return safeText
docin = docx.Document(filePath)
for para in docin.paragraphs:
text = para2text(para)
Beware that this only works for a subset of "tracked changes", but it might be the basis of a more general solution.
If you want to see the xml for a docx file directly: rename it as .zip, extract the "document.xml", and view it by dropping into chrome or your favourite viewer.
I am not sure if I've been missing anything obvious, but I have not found anything documented about how one would go to insert Word elements (tables, for example) at some specific place in a document?
I am loading an existing MS Word .docx document by using:
my_document = Document('some/path/to/my/document.docx')
My use case would be to get the 'position' of a bookmark or section in the document and then proceed to insert tables below that point.
I'm thinking about an API that would allow me to do something along those lines:
insertion_point = my_document.bookmarks['bookmark_name'].position
my_document.add_table(rows=10, cols=3, position=insertion_point+1)
I saw that there are plans to implement something akin to the 'range' object of the MS Word API, this would effectively solve that problem. In the meantime, is there a way to instruct the document object methods where to insert the new elements?
Maybe I can glue some lxml code to find a node and pass that to these python-docx methods? Any help on this subject would be much appreciated! Thanks.
I remembered an old adage, "use the source, Luke!", and could figure it out. A post from python-docx owner on its git project page also gave me a hint: https://github.com/python-openxml/python-docx/issues/7.
The full XML document model can be accessed by using the its _document_part._element property. It behaves exactly like an lxml etree element. From there, everything is possible.
To solve my specific insertion point problem, I created a temp docx.Document object which I used to store my generated content.
import docx
from docx.oxml.shared import qn
tmp_doc = docx.Document()
# Generate content in tmp_doc document
tmp_doc.add_heading('New heading', 1)
# more content generation using docx API.
# ...
# Reference the tmp_doc XML content
tmp_doc_body = tmp_doc._document_part._element.body
# You could pretty print it by using:
#print(docx.oxml.xmlchemy.serialize_for_reading(tmp_doc_body))
I then loaded my docx template (containing a bookmark named 'insertion_point') into a second docx.Document object.
doc = docx.Document('/some/path/example.docx')
doc_body = doc._document_part._element.body
#print(docx.oxml.xmlchemy.serialize_for_reading(doc_body))
The next step is parsing the doc XML to find the index of the insertion point. I defined a small function for the task at hand, which returns a named bookmark parent paragraph element:
def get_bookmark_par_element(document, bookmark_name):
"""
Return the named bookmark parent paragraph element. If no matching
bookmark is found, the result is '1'. If an error is encountered, '2'
is returned.
"""
doc_element = document._document_part._element
bookmarks_list = doc_element.findall('.//' + qn('w:bookmarkStart'))
for bookmark in bookmarks_list:
name = bookmark.get(qn('w:name'))
if name == bookmark_name:
par = bookmark.getparent()
if not isinstance(par, docx.oxml.CT_P):
return 2
else:
return par
return 1
The newly defined function was used toget the bookmark 'insertion_point' parent paragraph. Error control is left to the reader.
bookmark_par = get_bookmark_par_element(doc, 'insertion_point')
We can now use bookmark_par's etree index to insert our tmp_doc generated content at the right place:
bookmark_par_parent = bookmark_par.getparent()
index = bookmark_par_parent.index(bookmark_par) + 1
for child in tmp_doc_body:
bookmark_par_parent.insert(index, child)
index = index + 1
bookmark_par_parent.remove(bookmark_par)
The document is now finalized, the generated content having been inserted at the bookmark location of an existing Word document.
# Save result
# print(docx.oxml.xmlchemy.serialize_for_reading(doc_body))
doc.save('/some/path/generated_doc.docx')
I hope this can help someone, as the documentation regarding this is still yet to be written.
You put [image] as a token in your template document:
for paragraph in document.paragraphs:
if "[image]" in paragraph.text:
paragraph.text = paragraph.text.strip().replace("[image]", "")
run = paragraph.add_run()
run.add_picture(image_path, width=Inches(3))
you have have a paragraph in a table cell as well. just find the cell and do as above.
Python-docx owner suggests how to insert a table into the middle of an existing document:
https://github.com/python-openxml/python-docx/issues/156
Here it is with some improvements:
import re
from docx import Document
def move_table_after(document, table, search_phrase):
regexp = re.compile(search_phrase)
for paragraph in document.paragraphs:
if paragraph.text and regexp.search(paragraph.text):
tbl, p = table._tbl, paragraph._p
p.addnext(tbl)
return paragraph
if __name__ == '__main__':
document = Document('Existing_Document.docx')
table = document.add_table(rows=..., cols=...)
...
move_table_after(document, table, "your search phrase")
document.save('Modified_Document.docx')
Have a look at python-docx-template which allows jinja2 style templates insertion points in a docx file rather than Word bookmarks:
https://pypi.org/project/docxtpl/
https://docxtpl.readthedocs.io/en/latest/
Thanks a lot for taking time to explain all of this.
I was going through more or less the same issue. My specific point was how to merge two or more docx documents, at the end.
It's not exactly a solution to your problem, but here is the function I came with:
def combinate_word(main_file, files, output):
main_doc = Document(main_file)
for file in files:
sub_doc = Document(file)
for element in sub_doc._document_part.body._element:
main_doc._document_part.body._element.append(element)
main_doc.save(output)
Unfortunately, it's not yet possible nor easy to copy images with python-docx. I fall back to win32com ...