Uploading documents to Alfresco CMIS using Python - python-3.x

I am trying to upload documents to a Alfresco CMIS site using Python code
from cmislib.model import CmisClient
file_to_upload = open(file_full_path, 'r')
doc_in_Alfresco = archive_folder_2.createDocument(file_name, \
contentFile=file_to_upload)
Have taken the example from this help document - https://chemistry.apache.org/python/docs/examples.html
i have .txt, .rtf, . pdf, .docx documents etc.
Except for txt documents, rest of the upload is failing and the common error is
LookupError: 'base64' is not a text encoding; use codecs.encode() to handle arbitrary codecs
The error will differ in codec name depending on the file extension.
With limited documentation help, can someone hint what can be the possible solution. I am using cmislib in Python 3 in MacOS and the code in run in Jupyter Notebook.
Any hints will be highly appreciated.
Thanks

It worked for me with the following changes in place
file_to_upload.read()
file_data = base64.b64encode(content_file.read())
doc_in_Alfresco = archive_folder_2.createDocument(file_name, \
contentFile=file_to_upload)

Related

What are Python3 libraries which replace "from scikits.audiolab import Format, Sndfile"

Hope you'll are doing good. I am new to python. I am trying to use audio.scikits library in python3 verion. I have a working code version in 2.7(with audio.scikits) . While I am running with python3 version I am getting the Import Error: No Module Named 'Version' error. I get to know that python3 is not anymore supporting audio.scikits(If I am not wrong). Can anyone suggest me replacing library for audio.scikits where I can use all the functionalities like audio.scikits do OR any other solution which might helps me. Thanks in advance.
2.7 Version Code :
from scikits.audiolab import Format, Sndfile
from scipy.signal import firwin, lfilter
array = np.array(all)
fmt = Format('flac', 'pcm16')
nchannels = 1
cd, FileNameTmp = mkstemp('TmpSpeechFile.wav')
# making the file .flac
afile = Sndfile(FileNameTmp, 'w', fmt, nchannels, RawRate)
#writing in the file
afile.write_frames(array)
SendSpeech(FileNameTmp)
To check entire code please visit :Google Asterisk Reference Code(modifying based on this code)
I want to modify this code with python3 supported libraries. Here I am doing this for Asterisk-Microsoft-Speech To Text SDK.
Firstly the link code you paste is Asterisk-Google-Speech-Recognition, it's not the Microsoft-Speech-To-Text, if you want get a sample about Microsoft-Speech-To-Text you could refer to the official doc:Recognize speech from an audio file.
And about your problem you said, yes it's not completely compatible, in the github issue there is a solution for it, you could refer to this comment.

Vader Sentiment with multiple PDF

I have recently merged 20 pdf in 1 pdf via adobe. I have import the pdf in python with this code.
from PyPDF2 import PdfFileReader, PdfFileWriter
pdf_file = open ('/Users/cj/Desktop/PEI.pdf','rb')
newfile=open('rjtjj.txt','w')
pdf_reader= PdfFileReader (pdf_file)
pdf_writer= PdfFileWriter()
print(pdf_reader.numPages)
n=pdf_reader.getNumPages()
for i in range(0, n-1):
# pdf_writer.addPage(pdf_reader.getPage(i))
gft=pdf_reader.getPage(i)
newfile.write(gft.extractText())
pdf_file.close()
newfile.close()
I'm trying to use Vadersentiment to analyse the pdf. What i want to do is analyse individually the 20 pdf that are merged into 1.
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
with open('rjtjj.txt', 'r') as f:
for line in f.read().split("\n"):
vs=analyzer.polarity_scores(line)
I know my code is wrong, because it only gives me the first line of the entire pdf. I am new to this, i would really appreciate your help.
Thank you
Your problem really isn't about Vader sentiment analysis -- it is about correct extraction of text from a PDF.
Postscript's forth interpreter is Turing-complete, so some PDF documents are "hard" to parse. You didn't post your PDF so we can only guess at the issue. You might try using poppler's pdftotext command line utility instead. Ubuntu calls the package "poppler-utils"; on mac you would use brew install poppler. Running through pdf2ps & ps2ascii will sometimes offer different, and helpful, results.
If you continue to find it difficult to retrieve proper text from the PDF, you may want to contact whoever produced the PDF and settle on supplying the same information in a revised format.

Python3 - How to Write metadata to a windows media file

I would like to be able to write some metadata to a video, media file in Windows 10 using python3. Doing some searching online I figured out how to read the metadata with the code below. Nowhere online could I find any information about writing the data. I wonder if this is even possible with Python.
Any help would be appreciated.
from win32com.propsys import propsys, pscon
filename = "Video.mp4"
properties = propsys.SHGetPropertyStoreFromParsingName(filename)
title = properties.GetValue(pscon.PKEY_Title).GetValue()
print (title)

I'm trying to get an excel sheet downloaded using python requests module and getting junk output

I'm trying to download an excel file which is uploaded on a Sharepoint 2013 site.
My code is as follows:
import requests
url='https://<sharepoint_site>/<document_name>.xlsx?Web=0'
author = HttpNtlmAuth('<username>','<passsword>')
response=requests.get(url,auth=author,verify=False)
print(response.status_code)
print(response.content)
This gives me a long output which is something like:
x00docProps/core.xmlPK\x01\x02-\x00\x14\x00\x06\x00\x08\x00\x00\x00!\x00\x7f\x8bC\xc3\xc1\x00\x00\x00"\x01\x00\x00\x13\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xb8\xb9\x01\x00customXml/item1.xmlPK\x05\x06\x00\x00\x00\x00\x1a\x00\x1a\x00\x12\x07\x00\x00\xd2\xba\x01\x00\x00\x00'
I did something like this before for another site and I got xml as output which was acceptable for me but I'm not sure how to handle this data.
Any ideas to process this to be like xlsx or xml?
Or maybe to download the xlsx another way?(I tried doing it through the wget library and the excel seems to get corrupted)
Any ideas would be really helpful.
Regards,
Karan
Its too late but i got similar issue... thought it might help someone else.
try writing the output to a file or apply some encoding while printing.
writing to a file:
file=open("./temp.xls", 'wb')
file.write(response.content)
file.close()
or
file=open("./temp.xls", 'wb')
file.write(response.text)
file.close()
printing with encoding
print ( resp.text.encode("utf-8") )
or
print ( resp.content.encode("utf-8") )
!Make appropriate imports.
!try 'w' or 'wb' for file write.
Hope this helps.
It seems that the file is encrypted and request can't handle this.
Maybe the web service provides an API for downloading and secure decoding.

openxml can't open docx file throught sharepoint rest

I'm using the sharepoint rest api to get the contents of a docx file like so
_api/web/getfolderbyserverrelativeurl('openxmlJsPoc')/files('TemplateDocument.docx')/$value
I get the contents of the file, But I'm having trouble reading it with the openxml javascript api.
this is a sample the return data that I get:
PK ! î¦o´• )  Í[Content_Types].xml ¢É(  ¼•MKÃ#†ï‚ÿ!ìUš­
"ÒÔƒG¬àuÝLÚÅýbgÚÚï$Ú(Z[iª—#²;ïûì»3dpþâl6ƒ„&øBæ}‘ס4~\ˆ‡ÑuïTdHÊ—Ê…X ŠóáþÞ´ˆ€W{,Ä„(žI‰zNa"x^©BrŠø5eTúYAõû'ROà©Gµ†.¡RSKÙÕ~#IQdok¯B¨­ÑŠ˜TÎ|ùÅ¥÷îse³'&âc¹Ò¡^ùÙà½î–£I¦„ìN%ºQŽ1ä<¤R–AOŸ!_/³‚3T•ÑÐÖ×j1
ˆœ¹³y»â”ñKþ9ˆÙ<;³42-ˆ;Û};úRy#BÅ}1ROvÏÐJo„˜ÃÓýŸEñI|7Ë]
%Gç, ¿Ê÷c„DÚùYÕ­·i‹‹XÎk]ýKÇfòþ¢ùÝuaë)RpÎJCàšÜ:‡ÞŠÖz›Co·0tŸûVtk†ãÿÎá£ùšKÙ‘ýŠ>”Ínø
ÿÿ PK ! ™U~ á  ó_rels/.rels ¢ï(  ¬’ÏJÃ#Æï‚ï°Ì½™´Šˆ4éE„ÞDâ»Ó$˜ýÃîTÛ·w-ˆjÒƒÇùæ›ß|ìzs°ƒzç˜zï*X%(vÚ›Þµ¼6O‹{PIȼã
ŽœS__­_x ÉC©ëCRÙÅ¥
:‘ð€˜tÇ–Rá»ÜÙùhIò3¶H¿Q˸*Ë;Œ¿= yª­© nÍ
¨æòæyo¿Ûõš½Þ[vrfòAØ6‹3[”>_£Š-KÆëç\NH!ð<Ñêr¢¿¯EËB†„PûÈÓ<_Š) åå#ó?é|øh0GtÊvŠæö?iô>‰·3ñœ4ßH8ú˜õ' ÿÿ PK ! v¥S¬" Û Ú word/_rels/document.xml.rels ¢Ö (  ¬”ËjÃ0E÷…þƒÑ¾–í´i)‘³)…l[ºUäñƒêa¤I[ÿ}E ±Cƒ’…6‚¡{W#­Ö¿J&ß]o4#yš‘´0u¯[F>ª×
which I'm positive its correct because when i save this as a docx file it opens correctly.
tried using
openXml.OpenXmlPackage(result);
// and
doc = new openXml.OpenXmlPackage();
doc.openFromArrayBuffer
but I keep getting errors
please help!
the problem was with the JZIP.js that comes packaged with the sdk.
A better approac is to save the template as a Word xml file and then download it through ajax and open it.
worked for me

Resources