LZMADecompressor not working correctly - python-3.x

I am trying to use lzma to compress and decompress some data in memory. I know that the following approach works:
import lzma
s = 'Lorem ipsum dolor'
bytes_in = s.encode('utf-8')
print(s)
print(bytes_in)
# Compress
bytes_out = lzma.compress(data=bytes_in, format=lzma.FORMAT_XZ)
print(bytes_out)
# Decompress
bytes_decomp = lzma.decompress(data=bytes_out, format=lzma.FORMAT_XZ)
print(bytes_decomp)
The output is:
Lorem ipsum dolor
b'Lorem ipsum dolor'
b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x02\x00!\x01\x16\x00\x00\x00t/\xe5\xa3\x01\x00\x10Lorem ipsum dolor\x00\x00\x00\x00\xddq\x8e\x1d\x82\xc8\xef\xad\x00\x01)\x112\np\x0e\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'
b'Lorem ipsum dolor'
However, I notice that using lzma.LZMACompressor gives different results. With the following code:
import lzma
s = 'Lorem ipsum dolor'
bytes_in = s.encode('utf-8')
print(s)
print(bytes_in)
# Compress
lzc = lzma.LZMACompressor(format=lzma.FORMAT_XZ)
lzc.compress(bytes_in)
bytes_out = lzc.flush()
print(bytes_out)
# Decompress
bytes_decomp = lzma.decompress(data=bytes_out, format=lzma.FORMAT_XZ)
print(bytes_decomp)
I get this output:
Lorem ipsum dolor
b'Lorem ipsum dolor'
b'\x01\x00\x10Lorem ipsum dolor\x00\x00\x00\x00\xddq\x8e\x1d\x82\xc8\xef\xad\x00\x01)\x112\np\x0e\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'
And then the program fails on line 18 with _lzma.LZMAError: Input format not supported by decoder.
I have 3 questions here:
How come the output for lzma.compress is so much longer than lzma.LZMACompressor.compress even though it seemingly does the same thing?
In the second example, why does the decompressor complain about invalid format?
How can I get the second example to decompress correctly?

On your second example you're dropping a part of the compressed stream, and bytes_out only gets the flush part. On the other hand, that works:
lzc = lzma.LZMACompressor(format=lzma.FORMAT_XZ)
bytes_out = lzc.compress(bytes_in) + lzc.flush()
print(bytes_out)
note that the first example is really equivalent since source for lzma.compress is:
def compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None):
"""Compress a block of data.
Refer to LZMACompressor's docstring for a description of the
optional arguments *format*, *check*, *preset* and *filters*.
For incremental compression, use an LZMACompressor instead.
"""
comp = LZMACompressor(format, check, preset, filters)
return comp.compress(data) + comp.flush()

Related

Strapi >>> Facing "ENAMETOOLONG: name too long" error

System information
Strapi Version: 4.1.12
Operating System: MacOS
Database: Postgres
Node Version: 16.15.1
NPM Version: 8.11.0
Yarn Version: 1.22.11
When I try to upload a file with long name then I’m getting below error:
error: ENAMETOOLONG: name too long
File name: Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book here a.pdf
Can someone help me with the root cause of this error?
It's your file system. Your example filename looks exactly 255 bytes, but you have a "fancy quote" in there. It's not an apostrophe (0x27) or backtick (0x60) but three bytes, 0x2e 0x80 0x99. The error is entirely correct: name too long.
You can check this list, search for character U+2019 and you'll see this sequence of bytes matches your quote character.
JavaScript's string functions, such as ''.substr() work on characters and not bytes, so simply using filename.substr(0, 255) will not work.
The best way is to use an external package that knows how to trim UTF-8 strings without breaking special character sequences, like your multi-byte quote or emoji.
const truncate = require('truncate-utf8-bytes');
const { extname, basename } = require('path');
function trimFilenameToBytes(filename, maxBytes = 255) {
// By extracting the file extension from the filename,
// it'll trim "verylong.pdf" to "verylo.pdf" and not "verylong.p"
const ext = extname(filename);
const base = basename(filename, ext);
const length = Buffer.byteLength(ext);
const shorter = truncate(base, Math.max(0, maxBytes - length)) + ext;
// Just in case the file extension's length is more than maxBytes.
return truncate(shorter, maxBytes);
}
const filename = 'Lorem Ipsum is simply dummy \
text of the printing and typesetting industry. Lorem Ipsum has \
been the industry’s standard dummy text ever since the 1500s, \
when an unknown printer took a galley of type and scrambled it \
to make a type specimen book here a.pdf';
console.log(
'This string is',
filename.length,
'characters and',
Buffer.byteLength(filename, 'utf-8'),
'bytes'
);
console.log(trimFilenameToBytes(filename));
// Will log the following, note how it's 2 bytes shorter:
// This string is 255 characters and 257 bytes
// Lorem Ipsum is simply dummy \
// text of the printing and typesetting industry. Lorem Ipsum has \
// been the industry’s standard dummy text ever since the 1500s, \
// when an unknown printer took a galley of type and scrambled it \
// to make a type specimen book here.pdf

Trying to Replacing a string in a file using groovy DSL [duplicate]

This question already has answers here:
how to replace a string/word in a text file in groovy
(6 answers)
Closed 2 years ago.
I want to replace VERSION placeholders in a file to a variable version value, but I'm running into the below error:
def versions = ["8.8.0", "9.9.0"]
versions.each { version ->
def file = new File("$Path/test.url")
def fileText = file.replaceAll("VERSION", "${version}")
file.write(fileText);
Error:
groovy.lang.MissingMethodException: No signature of method: java.io.File.replaceAll() is applicable for argument types: (java.lang.String, org.codehaus.groovy.runtime.GStringImpl) values: [VERSION, 8.8.0]
I'm a newbie to groovy dsl, not sure what I'm missing, any suggestions, appreciated !
Another way is to use the groovy file .text property:
def f = new File('sample-file.txt')
f.text = f.text.replaceAll('VERSION', '8.8.0')
and like #cfrick mentioned, there is not much point in performing the replace operation on multiple versions as only the first one will actually find the VERSION string.
Running the above on a sample file:
─➤ groovy solution.groovy
─➤
will result in the string being replaced:
─➤ diff sample-file.txt_original sample-file.txt
1c1
< Dolore magna aliqua. VERSION Ut enim ad minim veniam.
---
> Dolore magna aliqua. 8.8.0 Ut enim ad minim veniam.
where diff is a linux tool for comparing two files.

program to scrumble all the words in a file using the permutations()

here is the function to scramble the words in a file
import itertools as it
import random as rdm
def get_permuted_lines(word_list):
'''this function takes a list of all the words in the file in order they appear in the file and returns another list having all the scrumbled words in the same order they appear in the file'''
#final list is the list that will hold all the scrumbled words
final_list=[]
for word in word_list:
#words having length<=3 should not be scrumbled
if len(word)<=3:
final_list.append(word)
else:
if len(word)==4 and (word.endswith('.') or word.endswith(',')):
final_list.append(word)
elif len(word)==5 and word.endswith('\n'):
final_list.append(word)
else:
#if a line endswith ,
if word.endswith(',\n'):
first_letter, *middle_letters, last_letter =
word[0],word[1:-3],word[-3:len(word)]
perm_list = list(it.permutations(middle_letters, len(middle_letters)))
join_tup_words=[''.join(tup) for tup in perm_list]
final_list.append(first_letter+ join_tup_words[rdm.randint(0,len(join_tup_words)-1)]+last_letter)
#if a line endswith .
elif word.endswith('.\n'):
first_letter, *middle_letters, last_letter = word[0],word[1:-3],word[-3:len(word)]
perm_list= list(it.permutations(middle_letters, len(middle_letters)))
join_tup_words= [''.join(tup) for tup in perm_list]
final_list.append(first_letter+ join_tup_words[rdm.randint(0,len(join_tup_words)-1)]+last_letter)
#for remaining words
else:
first_letter, *middle_letters, last_letter=word
perm_list= list(it.permutations(middle_letters,len(middle_letters)))
join_tup_words=[''.join(tup) for tup in perm_list]
final_list.append(first_letter+ join_tup_words[rdm.randint(0,len(join_tup_words)-1)]+last_letter)
return final_list
def read_write(fname):
'''here we read from the file fname and write to a new file called fname + Scrumble.txt after creating it'''
with open(fname,'r') as f:
lines=f.read()
#getting a list of scrumbled words in order it appears in the file
permuted_words=get_permuted_lines(lines.split(' '))
#joining all the words to form lines
join_words_list=' '.join(permuted_words)
#creating a new file with the name (fname + scrumble.txt)
new_file=fname[:-4]+'Scrumble.txt'
with open(new_file,'w') as f:
f.write(join_words_list)
with open(new_file,'r') as f:
print(f.read())
if __name__=='__main__':
'''getting the file name and passing it for readiing its content'''
#file_name is the name of the file we want to scramble
file_name =input('enter the file_name: ')
read_write(file_name)
i have tried tried the same program with the re and the random module which works fine. also using only the random module does the task. but using the itertools.permutations() works only for the files having less no of lines(say 3) but not more.
how can i fix this?
You have a combinatory explosion at hand using permutations. Your texts probably has some long words in it:
from itertools import permutations
from datetime import datetime, timedelta
for n in range (1,15):
g = ''.join("k"*n)
start = datetime.now()
print()
print( f' "{g}" feed to permutations leads to {len(list(permutations(g)))} results taking {(datetime.now()-start).total_seconds() * 1000} ms')
Output:
"k" feed to permutations leads to 1 results taking 0.0 ms
"kk" feed to permutations leads to 2 results taking 0.0 ms
"kkk" feed to permutations leads to 6 results taking 0.0 ms
"kkkk" feed to permutations leads to 24 results taking 0.0 ms
<snipp>
"kkkkkkkkk" feed to permutations leads to 362880 results taking 78.126 ms
"kkkkkkkkkk" feed to permutations leads to 3628800 results taking 703.131 ms
"kkkkkkkkkkk" feed to permutations leads to 39916800 results taking 8920.826 ms
... laptop freezes ...
For me its around 12 characters long.
How to avoid it: do not use permutations - use a simple shuffle:
import random
def removeSpaces(textList):
return ' '.join(textList)
def addSpaces(text):
return text.split(" ")
def needsScrambling(word):
stripped = word.strip(",.?!")
return len(stripped) > 3 and stripped.isalpha()
def scramble(words):
def scrambleWord(oneWord):
prev = ""
suff = ""
if oneWord[0] in ",.?!":
prev = oneWord[0]
oneWord = oneWord[1:]
if oneWord[-1] in ",.?!\n":
suff = oneWord[-1]
oneWord = oneWord[:-1]
return ''.join([prev, oneWord[0], *random.sample(oneWord[1:-1], k=len(oneWord)-2),oneWord[-1],suff])
return [ scrambleWord(w) if needsScrambling(w) else w for w in words]
def doIt(t):
return removeSpaces(scramble(addSpaces(t)))
demoText = "Non eram nescius, Brute, cum, quae summis ingeniis exquisitaque doctrina philosophi" + ' \n' + \
"Graeco sermone tractavissent, ea Latinis litteris mandaremus, fore ut hic noster labor in varias" + ' \n' + \
"reprehensiones incurreret. nam quibusdam, et iis quidem non admodum indoctis, totum hoc displicet" + ' \n' + \
"philosophari. quidam autem non tam id reprehendunt, si remissius agatur, sed tantum studium tamque" + ' \n' + \
"multam operam ponendam in eo non arbitrantur. erunt etiam, et ii quidem eruditi Graecis litteris," + ' \n' + \
"contemnentes Latinas, qui se dicant in Graecis legendis operam malle consumere. postremo aliquos" + ' \n' + \
"futuros suspicor, qui me ad alias litteras vocent, genus hoc scribendi, etsi sit elegans, personae" + ' \n' + \
"tamen et dignitatis esse negent." + ' \n\n' + \
"[2] Contra quos omnis dicendum breviter existimo. Quamquam philosophiae quidem vituperatoribus" + ' \n' + \
"satis responsum est eo libro, quo a nobis philosophia defensa et collaudata est, cum esset" + ' \n' + \
"accusata et vituperata ab Hortensio. qui liber cum et tibi probatus videretur et iis, quos" + ' \n' + \
"ego posse iudicare arbitrarer, plura suscepi veritus ne movere hominum studia viderer, retinere" + ' \n' + \
"non posse. Qui autem, si maxime hoc placeat, moderatius tamen id volunt fieri, difficilem" + ' \n' + \
"quandam temperantiam postulant in eo, quod semel admissum coerceri reprimique non potest, ut" + ' \n' + \
"propemodum iustioribus utamur illis, qui omnino avocent a philosophia, quam his, qui rebus" + '\n' + \
"infinitis modum constituant in reque eo meliore, quo maior sit, mediocritatem desiderent." + '\n' + \
"Source: https://la.wikisource.org/wiki/De_finibus_bonorum_et_malorum/Liber_Primus"
print(doIt(demoText))
Output:
Non eram niseucs, Bture, cum, quae siumms igeninis euaqusixtqie driocnta phlpoishoi
Graeco srmenoe tevsricstanat, ea Lniatis liteirts mnurdeaams, fore ut hic noestr lbaor in varias
reprehensiones icrenruert. nam qubsuidam, et iis qeuidm non audmdom itdnoics, toutm hoc dsieiplct
philosophari. qaduim autem non tam id rneedrunepht, si rmesisuis auatgr, sed tntaum sutuidm tqmaue
multam oaerpm pednnoam in eo non attuirranbr. eurnt etaim, et ii qideum edriuti Garceis liettris,
contemnentes Laanits, qui se dinact in Gecrias lgednies orpeam mllae coermusne. psormeto aliuqos
futuros sospciur, qui me ad ailas ltreatis vcnoet, geuns hoc sdrbcneii, etsi sit eaegnls, psneroae
tamen et dgiainitts esse nenegt.
[2] Conrta quos oinms dnuicedm betievrr esimtxio. Qumuqaam pooihhslipae qeduim vupaiteuoirbtrs
satis rnupessom est eo libro, quo a noibs psiohoiplha densfea et cduoallata est, cum esest
accusata et vtiaterupa ab Hirntseoo. qui liebr cum et tbii purotbas videertur et iis, qous
ego posse irucdiae aaeribtrrr, pulra seuspci vterius ne mrovee hmiuonm sduita vdeerir, rntreeie
non pssoe. Qui ateum, si mixmae hoc pclaaet, mairtdueos teamn id vnlout ferii, dciffeiilm
quandam tnmreeaiptam pasounltt in eo, quod smeel aidsmsum cercroei rimriqepue non pteost, ut
propemodum itriuosiubs uuamtr iills, qui omnino aocevnt a pshoihloipa, qaum his, qui rebus
infinitis mdoum caustinontt in rquee eo mierole, quo miaor sit, meretiicodatm desiderent.

BeautifulSoup get element inside li tag

I'm having trouble parsing html element inside li tag.
This is my code:
from bs4 import BeautifulSoup
import requests
sess = requests.Session()
url = 'http://example.com'
page = sess.get(url)
page = BeautifulSoup(page.text)
soap = page.select('li.item')
print(soap.find('h3').text)
This is html code:
...
<li class="item">
<strong class="item-type">design</strong>
<h3 class="item-title">Item title</h3>
<p class="item-description">
Lorem ipsum dolor sit amet, dicam partem praesent vix ei, ne nec quem omnium cotidieque, omnes deseruisse efficiendi sit te. Mei putant postulant id. Cibo doctus eligendi at vix. Eos nisl exerci mediocrem cu, nullam pertinax petentium sea et. Vim affert feugait an.
</p>
</li>
...
There are more than 10 li tag I just paste one of them.
Output error:
Traceback (most recent call last):
File "test.py", line 10, in <module>
print(soap.find('h3').text)
AttributeError: 'list' object has no attribute 'find'
Thanks to #DaveJ , this method worked:
[s.find('h3').text for s in soap]

Dictionary text file Python

text
Donald Trump:
791697302519947264,1477604720,Ohio USA,Twitter for iPhone,5251,1895
Join me live in Springfield, Ohio!
Lit
<<<EOT
781619038699094016,1475201875,United States,Twitter for iPhone,31968,17246
While Hillary profits off the rigged system, I am fighting for you! Remember the simple phrase: #FollowTheMoney...
<<<EOT
def read(text):
with open(text,'r') as f:
for line in f:
Is there a way that i can separate each information for the candidates So for example for Donald Trump it should be
[
[Donald Trump],
[791697302519947264[[791697302519947264,1477604720,'Ohio USA','Twitter for iPhone',5251,18951895], 'Join['Join me live in Springfield, Ohio! Lit']Lit']],
[781619038699094016[[781619038699094016,1475201875,'United States','Twitter for iPhone',31968,1724617246], 'While['While Hillary profits off the rigged system, I am fighting for you! Remember the simple phrase: #FollowTheMoney...']']]
]
The format of the file is the following:
ID,DATE,LOCATION,SOURCE,FAVORITE_COUNT,RETWEET_COUNT text(the tweet)
So basically after the 6 headings, everything after that is a tweet till '<<
Also is there a way i can do this for every candidate in the file
I'm not sure why you need a multi-dimensional list (I would pick tuples and dictionaries if possible) but this seems to produce the output you asked for:
>>> txt = """Donald Trump:
... 791697302519947264,1477604720,Ohio USA,Twitter for iPhone,5251,1895
... Join me live in Springfield, Ohio!
... Lit
... <<<EOT
... 781619038699094016,1475201875,United States,Twitter for iPhone,31968,17246
... While Hillary profits off the rigged system, I am fighting for you! Remember the simple phrase: #FollowTheMoney...
... <<<EOT
... Another Candidate Name:
... 12312321,123123213,New York USA, Twitter for iPhone,123,123
... This is the tweet text!
... <<<EOT"""
>>>
>>>
>>> buffer = []
>>> tweets = []
>>>
>>> for line in txt.split("\n"):
... if not line.startswith("<<<EOT"):
... buffer.append(line)
... else:
... if buffer[0].strip().endswith(":"):
... tweets.append([buffer.pop(0).rstrip().replace(":", "")])
... metadata = buffer.pop(0).split(",")
... tweet = [" ".join(line for line in buffer).replace("\n", " ")]
... tweets.append([metadata, tweet])
... buffer = []
...
>>>
>>> from pprint import pprint
>>>
>>> pprint(tweets)
[['Donald Trump'],
[['791697302519947264',
'1477604720',
'Ohio USA',
'Twitter for iPhone',
'5251',
'1895'],
['Join me live in Springfield, Ohio! Lit']],
[['781619038699094016',
'1475201875',
'United States',
'Twitter for iPhone',
'31968',
'17246'],
['While Hillary profits off the rigged system, I am fighting for you! Remember the simple phrase: #FollowTheMoney... ']],
['Another Candidate Name'],
[['12312321',
'123123213',
'New York USA',
' Twitter for iPhone',
'123',
'123'],
['This is the tweet text!']]]
>>>
I am not quite understanding... but here is my example to read a file line by line then add that line to a string of text to post to twitter.
candidates = open("FILEPATH WITH DOUBLE \") #example "C:\\users\\fox\\desktop\\candidates.txt"
for candidate in candidates():
candidate = candidate.rstrip('\n') #removes new line(this is mandatory)
#next line post means post to twitter
post("propaganda here " + candidate + "more propaganda)
note for every line in that file this code will post to twitter
ex.. 20 lines means twenty twitter posts

Resources