I am writing a program to read some text files and write it to a JPEG file using libjpeg. When I set the quality to 100 (withjpeg_set_quality), there is actually no quality degradation in grayscale. However, when I move to RGB, even with a quality of 100, there seems to be compression.
When I give this input to convert to a grayscale JPEG image it works nicely and gives me a clean JPEG image:
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 255 0 0 0
255 0 0 0 0
The (horizontally flipped) output is:
Now when I assume that array was the Red color, and use the following two arrays for the Green and Blue colors respectively:
0 0 0 0 0
0 0 0 0 0
0 0 255 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 255
0 0 0 255 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
This is the color output I get:
While only 5 input pixels have any color value, the surrouding pixels have also gotten a value when converted to color. For both the grayscale image and RGB image the quality was set to 100.
I wanted to see what is causing this and how I can fix it so the colors are also only used for the pixels that actually have an input value?
You are getting errors from the RGB->YCbCr conversion. That is impossible to avoid in the large because there is not a 1:1 mapping between the two color spaces.
The fix is easy - just don't use jpeg. Png is a better choice for your use case.
What you are seeing is result of how jpeg compression works, there is such a thing as "lossless jpeg" but its really a completely different file format that isn't well supported.
Related
I have a dataframe with 2 columns like:
pd.DataFrame({"action":['asdasd','fgdgddg','dfgdgdfg','dfgdgdfg','nmwaws'],"classification":['positive','negative','neutral','positive','mixed']})
df:
action classification
asdasd positive
fgdgddg negative
dfgdgdfg neutral
sdfsdff positive
nmwaws mixed
What I want to do is to create 4 new columns for each of the unique labels in the columns classification and assign 1 or 0 if the row has or not that label. Like the out put below:
And I need this as outuput:
action classification positive negative neutral mixed
asdasd positive 1 0 0 0
fgdgddg negative 0 1 0 0
dfgdgdfg neutral 0 0 1 0
sdfsdff positive 1 0 0 0
nmwaws mixed 0 0 0 1
I tried the multilabel Binarizer from sklearn but it parsed all letters of each word, not the word.
Cananyone help me?
You can use pandas.get_dummies.
pd.get_dummies(df["classification"])
Output:
mixed negative neutral positive
0 0 0 0 1
1 0 1 0 0
2 0 0 1 0
3 0 0 0 1
4 1 0 0 0
If you want to concat it to the DataFrame:
pd.concat([df, pd.get_dummies(df["classification"])], axis=1)
Output:
action classification mixed negative neutral positive
0 asdasd positive 0 0 0 1
1 fgdgddg negative 0 1 0 0
2 dfgdgdfg neutral 0 0 1 0
3 dfgdgdfg positive 0 0 0 1
4 nmwaws mixed 1 0 0 0
I have data for around 2 million active customers and around 2-5 years worth of transaction data by customer. This data includes features such as what item that customer bought, what store they bought it from, the date they purchased that item, how much they bought, how much they paid, etc.
I need to predict which of our customers will shop in the next 2 weeks.
Right now my data is set up like this
item_a item_b item_c item_d customer_id visit
dates
6/01 1 0 0 0 cust_123 1
6/02 0 0 0 0 cust_123 0
6/03 0 1 0 0 cust_123 1
6/04 0 0 0 0 cust_123 0
6/05 1 0 0 0 cust_123 1
6/06 0 0 0 0 cust_123 0
6/07 0 0 0 0 cust_123 0
6/08 1 0 0 0 cust_123 1
6/01 0 0 0 0 cust_456 0
6/02 0 0 0 0 cust_456 0
6/03 0 0 0 0 cust_456 0
6/04 0 0 0 0 cust_456 0
6/05 1 0 0 0 cust_456 1
6/06 0 0 0 0 cust_456 0
6/07 0 0 0 0 cust_456 0
6/08 0 0 0 0 cust_456 0
6/01 0 0 0 0 cust_789 0
6/02 0 0 0 0 cust_789 0
6/03 0 0 0 0 cust_789 0
6/04 0 0 0 0 cust_789 0
6/05 0 0 0 0 cust_789 0
6/06 0 0 0 0 cust_789 0
6/07 0 0 0 0 cust_789 0
6/08 0 1 1 0 cust_789 1
should I make the target variable be something like
df['target_variable']='no_purchase'
for cust in list(set(df['customer'])):
df['target_variable']=np.where(df['visit']>0,cust,df['target_variable'])
or have my visit feature be my target variable? If it's the latter, should I OHE all 2 million customers? If not, how should I set this up on Keras so that it classifies visits for all 2 million customers?
I think you should better understand your problem -- your problem requires strong domain knowledge to correct model it, and it can be modeled in many different ways, and below are just some examples:
Regression problem: given a customer's purchase record only containing relative date, e.g.
construct a sequence like [date2-date1, date3-date2, date4-date3, ...] from your data.
[6, 7, 5, 13, ...] means a customer is likely to buy things on the weekly or biweekly basis
[24, 30, 33, ...] means a customer is likely to buy things on the monthly basis.
If you organize problem in this way, all you need is to predict what is the next number in a given sequence. You may easily get such data by
randomly select a full sequence, say [a, b, c, d, e, f, ..., z]
randomly select a position to predict, say x
pick K (say K=6) proceeding sequence [r, s, t, u, v, w]as your network input, and x as your network target.
Once you have this model been trained, your ultimate task can be easily resolved by checking whether the predicted number is greater than 60.
Classification problem: given a customer's purchase record of K months, predict how many purchase will a customer have in the next two month.
Again, you need to create training data from your raw data, but this time the target for a customer is how many items does he purchased in month K+1 and K+2, and you may organize your input data of K-month record in your own way.
Please note, the number of items a customer purchased is a discrete number, but way below 1M. In fact, like in problem of face image based age estimation, people often quantilize the target into bins, e.g. 0-8, 9-16, 17-24, etc. You may do the same thing for your problem. Of course, you may also formulate this target as a regression problem to directly predict how many items.
Why you need to know your problem better?
as you can see, you may come up a number of problem formulations that might all look reasonable at the first glance or very difficult for you to say which one is the best.
it is worthy noting the dependence between a problem set-up and its hidden premise, (you may not notice such things until you think the problem carefully). For example, the regression problem set-up to predict the gap of the next purchase implies that the number of items a customer purchased does not matter. This claim may or may not be fair in your problem.
you may come up a much simpler but more effective solution if you know your problem well.
In most of problems like yours, you don't have to use deep learning or at least not at the first place. Classic approaches may work better.
I would like to plot a 3D picture from data (below). All axis should have equal scale. However resulting picture is too small relative to canvas size.
How can I fill most canvas with the plot?
I tried changing canvas size, or setting size 1,1.
Data (5 squares):
0 0 0
1 0 0
1 0 1
0 0 1
0 0 0
0 0 0
1 0 0
1 1 0
0 1 0
0 0 0
0 0 0
0 1 0
0 1 1
0 0 1
0 0 0
0 0 0
-1 0 0
-1 0 1
0 0 1
0 0 0
0 0 0
-1 0 0
-1 1 0
0 1 0
0 0 0
I am using gnuplot 4.4 patchlevel 3,
commands:
set term pdfcairo size 2,1;
set xrange [-1.05:1.05];
set yrange [-0.05:1.05];
set zrange [-0.05:1.05];
unset key; unset border; unset tics;
set lmargin 0; set rmargin 0; set tmargin 0; set bmargin 0;
set view equal xyz;
set output 'address';
splot 'data.txt' w l;
unset output;
Thank you.
The "set view" command contains two scale factors. The first scale factor scaless the entire plot, the second one scale the z axis only.
Syntax:
set view <rot_x>{,{<rot_z>}{,{<scale>}{,<scale_z>}}}
The easiest way to adjust this is using an interactive terminal. Click and drag with the middle mouse button to adjust the scale factors to taste. Then use show view to note the current values. Add those to your script. For example to scale the whole thing by a factor of two:
set view ,,2.0
I've set up Names with the intention of using them to return data ranges for a line chart. The X values are "GI", "IE" and "EE". The Y value is "DATE".
However, my "DATE" and "GI" names are returning "#VALUE!" errors - whereas IE and EE are not.
So far, I have found that this error occurs when the height value (CountIf below) is more than 1.
The cell range, and beyond to 2000-and-something, are dynamically generated from user selections to form a Date Range. Ergo the use of CountIf rather than CountA.
Any help would be much appreciated. This is the last leg of a difficult workbook!
DATE:
=OFFSET(Graph!$B$8,0,0,COUNTIF(Graph!$B$8:$B$2927,">"&0)-1)
GI:
=OFFSET(Graph!$C$8,0,0,COUNTIF(Graph!$C$8:$C$2927,">"&0)-1)
IE:
=OFFSET(Graph!$D$8,0,0,COUNTIF(Graph!$D$8:$D$2927,">"&0)-1)
EE:
=OFFSET(Graph!$E$8,0,0,COUNTIF(Graph!$E$8:$E$2927,">"&0)-1)
Information:
B C D E
7 DATE GI IE EE
8 25/04/2011 0 0 0
9 26/04/2011 0 0 0
10 27/04/2011 0 0 0
11 28/04/2011 0 0 0
12 29/04/2011 0 0 0
13 30/04/2011 0 0 0
14 01/05/2011 0 0 0
15 02/05/2011 0 0 0
16 03/05/2011 0 0 0
17 04/05/2011 0 0 0
18 05/05/2011 0 0 0
19 06/05/2011 0 0 0
20 07/05/2011 0 0 0
21 08/05/2011 0 0 0
22 09/05/2011 0 0 0
23 10/05/2011 18000 0 0
24 11/05/2011 18000 0 0
25 12/05/2011 18000 0 0
26 13/05/2011 18000 0 0
27 14/05/2011 18000 0 0
28 15/05/2011 18000 0 0
29 16/05/2011 18000 0 0
30 17/05/2011 18000 0 0
31 18/05/2011 18000 0 0
32 19/05/2011 18000 0 0
33 20/05/2011 18000 0 0
34 21/05/2011 18000 0 0
35 22/05/2011 18000 0 0
This formula should create the correct named Range for date:
=OFFSET(Sheet1!$B$8,0,0,MATCH(Sheet1!$D$4,Sheet1!$B$8:$B$2927,0),1)
For GI:
=OFFSET(Sheet1!$B$8,0,1,MATCH(Sheet1!$D$4,Sheet1!$B$8:$B$2927,0),1)
For IE:
=OFFSET(Sheet1!$B$8,0,2,MATCH(Sheet1!$D$4,Sheet1!$B$8:$B$2927,0),1)
For EE:
=OFFSET(Sheet1!$B$8,0,3,MATCH(Sheet1!$D$4,Sheet1!$B$8:$B$2927,0),1)
(D4 contains the end date dropdown.)
In the data selection for the graph, it is important to write the named Range including the sheet it's on, e.g.: =Sheet1!nrDate instead of just =nrDate.
Please let me know if this works for you.
So based on your data, an going a slightly different route than offset (offset route should work) I used the index route.
for the x axis I used
=INDEX($B$9:$B$36,MATCH($C$5,$B$9:$B$36,0)):INDEX($B$9:$B$36,MATCH($D$5,$B$9:$B$36,0))
I used a defined name of X_axis
for the y axis I used
=INDEX($C$9:$C$36,MATCH($C$5,$B$9:$B$36,0)):INDEX($C$9:$C$36,MATCH($D$5,$B$9:$B$36,0))
I used a define name of Y_axis. For your second series on the Y axis, you would need to change the reference range from C9:C36, to the appropriate column that is lined up with your dates.
When defining the series, I had to use the workbook name in conjunction with the named range. so series data looked like this:
Proof of Concept
So, what I have to do is to open an image in argv[1] and apply a filter argv[2]
The image file looks like this in txt:
P2
10 4
255
120 0 0 0 0 0 0 0 0 0
120 0 0 0 0 0 255 255 0 0
120 0 0 0 0 0 255 255 0 0
120 0 0 0 0 0 0 0 0 0
what I have to do is to organize the lines after the 255 in lists of lists, but all I can do is a list of strings, from which I can't do much (I will have to apply a filter and so on, but that is another problem.)
i should only use the sys library (it's an assignment)
import sys
class image:
def __init__(self,a):
self.cab=[]
self.img=a
self.img2=[]
self.c=[]
for i in self.img:
self.img2.append(i)
self.img3=''.join(self.img2)
self.img4=self.img3.split('\n')
def cabec(self,b): # this has no importance in my question (only for the assignment)
for i in range(3):
self.c.append(b[i])
class filtro:
def __init__(self,f):
self.filt=[]
for x in f:
self.filt.append(x)
self.filt2=''.join(self.filt)
self.filt3=self.filt2.split('\n')
a = open(sys.argv[1])
b = image(a)
... (this is where I should be able to apply the filters and such, but with a list of strings I don't know what to do)
I am really an amateur, any suggestions would be nice
If I understood correctly, you need a list of ints instead of a list of strings. See this question for how to read ints from a text file.