Need suggestions on creating lists from image - python-3.x

So, what I have to do is to open an image in argv[1] and apply a filter argv[2]
The image file looks like this in txt:
P2
10 4
255
120 0 0 0 0 0 0 0 0 0
120 0 0 0 0 0 255 255 0 0
120 0 0 0 0 0 255 255 0 0
120 0 0 0 0 0 0 0 0 0
what I have to do is to organize the lines after the 255 in lists of lists, but all I can do is a list of strings, from which I can't do much (I will have to apply a filter and so on, but that is another problem.)
i should only use the sys library (it's an assignment)
import sys
class image:
def __init__(self,a):
self.cab=[]
self.img=a
self.img2=[]
self.c=[]
for i in self.img:
self.img2.append(i)
self.img3=''.join(self.img2)
self.img4=self.img3.split('\n')
def cabec(self,b): # this has no importance in my question (only for the assignment)
for i in range(3):
self.c.append(b[i])
class filtro:
def __init__(self,f):
self.filt=[]
for x in f:
self.filt.append(x)
self.filt2=''.join(self.filt)
self.filt3=self.filt2.split('\n')
a = open(sys.argv[1])
b = image(a)
... (this is where I should be able to apply the filters and such, but with a list of strings I don't know what to do)
I am really an amateur, any suggestions would be nice

If I understood correctly, you need a list of ints instead of a list of strings. See this question for how to read ints from a text file.

Related

Python3.x, Pandas: creating a list of y values depending on the x values

I have a two data sets that are composed of different x values. It looks like the following.
import pandas as pd
data1=pd.csv_read('Data1.csv')
data2=pd.csv_read('Data2.csv')
print(data1)
data1_x data1_y1 data1_y2 data1_y3
-347.2498 0 2 8
-237.528509 0 3 7
-127.807218 0 0 6
-18.085927 11 5 0
print(data2)
data2_x data2_y1 data2_y2 data2_y3
-394.798507 2 0 0
-285.265994 1 0 0
-175.733482 0 0 1
-66.200969 4 0 0
I am creating new x that includes all the values by using the following code. new_x=reduce(np.union1d, (data1.iloc[:,0], data1.iloc[:,0]))
print(new_x)
array([-394.799,-347.25,-285.266,-237.529,-175.733,-127.807,-66.201,-18.0859])
Currently, I am trying to create a new y lists for each data set that keeps the same y values if the corresponding x values are present but fills with blank if there is no corresponding x value initially.
For instance, print(New_data2) would look something like this.
New_x_data2 New_y1_data2 New_y2_data2 New_y3_data2
-394.799 2 0 0
-347.25
-285.266 1 0 0
-237.529
-175.733 0 0 1
-127.807 0 0 6
-66.201 4 0 0
-18.0859 11 5 0
Especially, I am lost in figuring out how to get the new y value. Any ideas?
import pandas as pd
from re import sub
repl = lambda x : sub("data\d_(\w+)", "New_\\1_data2", x)
data1.rename(repl, axis = 'columns').append(data2.rename(repl, axis='columns')).sort_values('New_x_data2')
Out[1024]:
New_x_data2 New_y1_data2 New_y2_data2 New_y3_data2
0 -394.798507 2 0 0
0 -347.249800 0 2 8
1 -285.265994 1 0 0
1 -237.528509 0 3 7
2 -175.733482 0 0 1
2 -127.807218 0 0 6
3 -66.200969 4 0 0
3 -18.085927 11 5 0

in APL how do I turn a vector (of length n) into a diagonal matrix (nxn)?

I had a J program I wrote in 1985 (on vax vms). One section was creating a diagonal matrix from a vector.
a=(n,n)R1,nR0
b=In
a=bXa
Maybe it wasn't J but APL in ascii, but these lines work in current J (with appropriate changes in the primitive functions). But not in APL (gnu , NARS2000 or ELI). I get domain error in the last line.
Is there an easy way to do this without looping?
Your code is an ASCII transliteration of APL. The corresponding J code is:
a=.(n,n)$1,n$0
b=.i.n
a=.b*a
Try it online! However, no APL (as of yet — it is being considered for Dyalog APL) has major cell extension which is required on the last line. You therefore need to specify that the scalars of the vector b should be multiplied with the rows of the matrix a using bracket axis notation:
a←(n,n)⍴1,n⍴0
b←⍳n
a←b×[1]a
Try it online! Alternatively, you can use the rank operator (where available):
a←(n,n)⍴1,n⍴0
b←⍳n
a←b(×⍤0 1)a
Try it online!
A more elegant way to address diagonals is ⍉ with repeated axes:
n←5 ◊ z←(n,n)⍴0 ◊ (1 1⍉z)←⍳n ◊ z
1 0 0 0 0
0 2 0 0 0
0 0 3 0 0
0 0 0 4 0
0 0 0 0 5
Given an input vector X, the following works in all APLs, (courtesy of #Adám in chat):
(2⍴S)⍴((2×S)⍴1,-S←⍴X)\X
And here's a place where you can run it online.
Here are my old, inefficient versions that use multiplication and the outer product (the latter causes the inefficiency):
((⍴Q)⍴X)×Q←P∘.=P←⍳⍴X
((⍴Q)⍴X)×Q←P Pρ1,(P←≢X)ρ0
Or another way:
(n∘.=n)×(2ρρn)ρn←⍳5
should give you the following in most APLs
1 0 0 0 0
0 2 0 0 0
0 0 3 0 0
0 0 0 4 0
0 0 0 0 5
This solution works in the old ISO Apl:
a←(n,n)⍴v,(n,n)⍴0

libjpeg not exact pixel values even with quality of 100

I am writing a program to read some text files and write it to a JPEG file using libjpeg. When I set the quality to 100 (withjpeg_set_quality), there is actually no quality degradation in grayscale. However, when I move to RGB, even with a quality of 100, there seems to be compression.
When I give this input to convert to a grayscale JPEG image it works nicely and gives me a clean JPEG image:
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 255 0 0 0
255 0 0 0 0
The (horizontally flipped) output is:
Now when I assume that array was the Red color, and use the following two arrays for the Green and Blue colors respectively:
0 0 0 0 0
0 0 0 0 0
0 0 255 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 255
0 0 0 255 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
This is the color output I get:
While only 5 input pixels have any color value, the surrouding pixels have also gotten a value when converted to color. For both the grayscale image and RGB image the quality was set to 100.
I wanted to see what is causing this and how I can fix it so the colors are also only used for the pixels that actually have an input value?
You are getting errors from the RGB->YCbCr conversion. That is impossible to avoid in the large because there is not a 1:1 mapping between the two color spaces.
The fix is easy - just don't use jpeg. Png is a better choice for your use case.
What you are seeing is result of how jpeg compression works, there is such a thing as "lossless jpeg" but its really a completely different file format that isn't well supported.

matlab : working with strings

I have a matrix with string elements
A = [ Jack Sara Bob]
B = [0 0 0 0 0 0
0 0 0 0 0 0]
And wanted to put A elements in B like: B(2,3:6)=A
But it doesn't work. can anybody help?
Because they don't have the same length. B(2,3:6) is length-4 while A is longer than 10. What A really contains is characters rather than strings/words.

How to count the frequency of a element in APL or J without loops

Assume I have two lists, one is the text t, one is a list of characters c. I want to count how many times each character appears in the text.
This can be done easily with the following APL code.
+⌿t∘.=c
However it is slow. It take the outer product, then sum each column.
It is a O(nm) algorithm where n and m are the size of t and c.
Of course I can write a procedural program in APL that read t character by character and solve this problem in O(n+m) (assume perfect hashing).
Are there ways to do this faster in APL without loops(or conditional)? I also accept solutions in J.
Edit:
Practically speaking, I'm doing this where the text is much shorter than the list of characters(the characters are non-ascii). I'm considering where text have length of 20 and character list have length in the thousands.
There is a simple optimization given n is smaller than m.
w ← (∪t)∩c
f ← +⌿t∘.=w
r ← (⍴c)⍴0
r[c⍳w] ← f
r
w contains only the characters in t, therefore the table size only depend on t and not c. This algorithm runs in O(n^2+m log m). Where m log m is the time for doing the intersection operation.
However, a sub-quadratic algorithm is still preferred just in case someone gave a huge text file.
NB. Using "key" (/.) adverb w/tally (#) verb counts
#/.~ 'abdaaa'
4 1 1
NB. the items counted are the nub of the string.
~. 'abdaaa'
abd
NB. So, if we count the target along with the string
#/.~ 'abc','abdaaa'
5 2 1 1
NB. We get an extra one for each of the target items.
countKey2=: 4 : '<:(#x){.#/.~ x,y'
NB. This subtracts 1 (<:) from each count of the xs.
6!:2 '''1'' countKey2 10000000$''1234567890'''
0.0451088
6!:2 '''1'' countKey2 1e7$''1234567890'''
0.0441849
6!:2 '''1'' countKey2 1e8$''1234567890'''
0.466857
NB. A tacit version
countKey=. [: <: ([: # [) {. [: #/.~ ,
NB. appears to be a little faster at first
6!:2 '''1'' countKey 1e8$''1234567890'''
0.432938
NB. But repeating the timing 10 times shows they are the same.
(10) 6!:2 '''1'' countKey 1e8$''1234567890'''
0.43914
(10) 6!:2 '''1'' countKey2 1e8$''1234567890'''
0.43964
Dyalog v14 introduced the key operator (⌸):
{⍺,⍴⍵}⌸'abcracadabra'
a 5
b 2
c 2
r 2
d 1
The operand function takes a letter as ⍺ and the occurrences of that letter (vector of indices) as ⍵.
I think this example, written in J, fits your request. The character list is longer than the text (but both are kept short for convenience during development.) I have not examined timing but my intuition is that it will be fast. The tallying is done only with reference to characters that actually occur in the text, and the long character set is looked across only to correlate characters that occur in the text.
c=: 80{.43}.a.
t=: 'some {text} to examine'
RawIndicies=: c i. ~.t
Mask=: RawIndicies ~: #c
Indicies=: Mask # RawIndicies
Tallies=: Mask # #/.~ t
Result=: Tallies Indicies} (#c)$0
4 20 $ Result
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 4 0
0 0 1 0 0 0 2 1 2 0 0 0 1 3 0 0 0 2 0 0
4 20 $ c
+,-./0123456789:;<=>
?#ABCDEFGHIJKLMNOPQR
STUVWXYZ[\]^_`abcdef
ghijklmnopqrstuvwxyz
As noted in other answers, the key operator does this directly. However the classic APL way of solving this problem is still worth knowing.
The classic solution is "sort, shift, and compare":
c←'missippi'
t←'abcdefghijklmnopqrstuvwxyz'
g←⍋c
g
1 4 7 0 5 6 2 3
s←c[g]
s
iiimppss
b←s≠¯1⌽s
b
1 0 0 1 1 0 1 0
n←b/⍳⍴b
n
0 3 4 6
k←(1↓n,⍴b)-n
k
3 1 2 2
u←b/s
u
imps
And for the final answer:
z←(⍴t)⍴0
z
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z[t⍳u]←k
z
0 0 0 0 0 0 0 0 3 0 0 0 1 0 0 2 0 0 2 0 0 0 0 0 0 0
This code is off the top of my head, not ready for production. Have to look for empty cases - the boolean shift is probably not right for all cases....
"Brute force" in J:
count =: (i.~~.) ({,&0) (]+/"1#:=)
Usage:
'abc' count 'abdaaa'
4 1 0
Not sure how it's implemented internally, but here are the timings for different input sizes:
6!:2 '''abcdefg'' count 100000$''abdaaaerbfqeiurbouebjkvwek''' NB: run time for #t = 100000
0.00803909
6!:2 '''abcdefg'' count 1000000$''abdaaaerbfqeiurbouebjkvwek'''
0.0845451
6!:2 '''abcdefg'' count 10000000$''abdaaaerbfqeiurbouebjkvwek''' NB: and for #t = 10^7
0.862423
We don't filter input date prior to 'self-classify' so:
6!:2 '''1'' count 10000000$''1'''
0.244975
6!:2 '''1'' count 10000000$''1234567890'''
0.673034
6!:2 '''1234567890'' count 10000000$''1234567890'''
0.673864
My implementation in APL (NARS2000):
(∪w),[0.5]∪⍦w←t∩c
Example:
c←'abcdefg'
t←'abdaaaerbfqeiurbouebjkvwek'
(∪w),[0.5]∪⍦w←t∩c
a b d e f
4 4 1 4 1
Note: showing only those characters in c that exist in t
My initial thought was that this was a case for the Find operator:
T←'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
C←'MISSISSIPPI'
X←+/¨T⍷¨⊂C
The used characters are:
(×X)/T
IMPS
Their respective frequencies are:
X~0
4 1 2 4
I've only run toy cases so I have no idea what the performance is, but my intuition tells me it should be cheaper that the outer product.
Any thoughts?

Resources