Filter in PySpark/Python RDD - python-3.x

I have a list like this:
["Dhoni 35 WC 785623", "Sachin 40 Batsman 4500", "Dravid 45 Batsman 50000", "Kumble 41 Bowler 456431", "Srinath 41 Bowler 65465"]
After applying filter I want like this:
["Dhoni WC", "Sachin Batsman", "Dravid Batsman", "Kumble Bowler", "Srinath Bowler"]
I tried out this way
m = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
n = m.map(lambda k:k.split(' '))
o = n.map(lambda s:(s[0]))
o.collect()
['Dhoni', 'Sachin', 'Dravid', 'Kumble', 'Srinath']
q = n.map(lambda s:s[2])
q.collect()
['WC', 'Batsman', 'Batsman', 'Bowler', 'Bowler']

Provided, all your list items are of same format, one way to achieve this is with map.
rdd = sc.parallelize(["Dhoni 35 WC 785623","Sachin 40 Batsman 4500","Dravid 45 Batsman 50000","Kumble 41 Bowler 456431","Srinath 41 Bowler 65465"])
rdd.map(lambda x:(x.split(' ')[0]+' '+x.split(' ')[2])).collect()
Output:
['Dhoni WC', 'Sachin Batsman', 'Dravid Batsman', 'Kumble Bowler', 'Srinath Bowler']

Related

mocha use chai test object equality doesnt work as expected

I'm using "chai" to test a response data of https,the data is like is:
let req = https.request(options, (res) => {
res.on('data', (data) => {
return callback(null, data);
});
});
The test code like this:
let chai = require("chai");
let expect = chai.expect;
console.log("data=" + data);
console.log("typeof data = " + typeof(data));//object
console.log("util.isObject(data) = " + util.isObject(data));//true
console.log("util.isString(data) = " + util.isString(data));//false
// assert.isObject(data, "object");
expect(JSON.stringify(data)).to.be.an("string");//ok
expect(JSON.parse(data)).to.be.an("object");//ok
expect(data).to.be.an("object");//error
Mocha test failed at "expect(data).to.be.an("object");",log like this:
data={"data":{"isDulp":false,"bindTb":false},"req_id":"REQ_APP-1487212987084_2851"}
typeof data = object
util.isObject(data) = true
util.isString(data) = false
Uncaught AssertionError: expected <Buffer 7b 22 64 61 74 61 22 3a 7b 22 69 73 44 75 6c 70 22 3a 66 61 6c 73 65 2c 22 62 69 6e 64 54 62 22 3a 66 61 6c 73 65 7d 2c 22 72 65 71 5f 69 64 22 3a 22 ... > to be an object
I thought 'data' is a object, and when I use typeof to test it, it print "object", but when I use chai "expect(data).to.be.an("object")" the test case failed.
If I use "expect(JSON.parse(data)).to.be.an("object")", the test case passed.
Some one who can tell me why? What the type of the 'data'?
The expected buffer result shows that your endpoint returns a buffer instead of an object, assert String works because it is an string of course. I am guessing that the reason it fails is that data is not an object but a buffer.

Avoid some unexpected results from Association Rules

I'm trying to extract some association rules from this dataset:
49
70
27,66
6
27
66,8,64
32
82
66
71
44
1
33
17
31,83
50,29
22
72
8
8,16
56
83,61
85,63,37
50,57
2
50
96,6
73
57
12
62
96
3
47,50,73
35
85,45
25,96,22,17
85
24
17,57
34,4
60,96,45
25
85,66,73
30
14
73,85
64
48
5
37
13,55
37,17
I've this code:
val transactions = sc.textFile("/user/cloudera/dataset1")
import org.apache.spark.mllib.fpm.AssociationRules
import org.apache.spark.mllib.fpm.FPGrowth.FreqItemset
val freqItemsets = transactions.flatMap(xs =>
(xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).map(x => (x.toList, 1L))
).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}
val ar = new AssociationRules().setMinConfidence(0.4)
val results = ar.run(freqItemsets)
results.collect().foreach { rule =>
println("[" + rule.antecedent.mkString(",")
+ "=>"
+ rule.consequent.mkString(",") + "]," + rule.confidence)
}
But I'm getting some unexpected lines in my output:
[2,9=>5],0.5
[8,5,,,3=>6],1.0
[8,5,,,3=>7],0.5
[8,5,,,3=>7],0.5
[,,,=>6],0.5
[,,,=>7],0.5
[,,,=>5],0.5
[,,,=>3],0.5
[4,3=>7],1.0
[4,3=>,,,],1.0
[4,3=>,,,],1.0
[4,3=>5],1.0
[4,3=>7,7],1.0
[4,3=>7,7],1.0
[4,3=>0],1.0
Why I'm getting outputs like this:
[,,,=>3],0.5
I'm not understanding the issue... Anyone knows how to solve this problem?
Many Thanks!
All of these results should be unexpected, because you have a bug in your code!
You need to create combinations of the items. As it stands, your code is creating combinations of characters in the string (like "25,96,22,17"), which of course won't give the right result (and that's why you see the "," as an element).
To fix, add: val freqItemsets = transactions.map(_.split(",")).
So instead of
val freqItemsets = transactions.flatMap(xs =>
(xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).map(x => (x.toList, 1L))
).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}
You have:
val freqItemsets = transactions.map(_.split(",")).flatMap(xs =>
(xs.combinations(1) ++ xs.combinations(2) ++ xs.combinations(3) ++ xs.combinations(4) ++ xs.combinations(5)).filter(_.nonEmpty).map(x => (x.toList, 1L)) ).reduceByKey(_ + _).map{case (xs, cnt) => new FreqItemset(xs.toArray, cnt)}
Which will give the expected:
[96,17=>22],1.0
[96,17=>25],1.0
[85,37=>63],1.0
[47,73=>50],1.0
[31=>83],1.0
[60,45=>96],1.0
[60=>45],1.0
[60=>96],1.0
[96,45=>60],1.0
[22,17=>25],1.0
[22,17=>96],1.0
[66,8=>64],1.0
[63,37=>85],1.0
[66,64=>8],1.0
[25,22,17=>96],1.0
[27=>66],0.5
[96,22,17=>25],1.0
[61=>83],1.0
[64=>66],0.5
[64=>8],0.5
[45=>60],0.5
[45=>96],0.5
[45=>85],0.5
[6=>96],0.5
[47=>73],1.0
[47=>50],1.0
[50,73=>47],1.0
[96,22=>17],1.0
[96,22=>25],1.0
[66,73=>85],1.0
[8,64=>66],1.0
[29=>50],1.0
[83=>31],0.5
[83=>61],0.5
[25,96,17=>22],1.0
[85,66=>73],1.0
[25,96,22=>17],1.0
[25,96=>17],1.0
[25,96=>22],1.0
[22=>17],0.5
[22=>96],0.5
[22=>25],0.5
[85,73=>66],1.0
[55=>13],1.0
[60,96=>45],1.0
[63=>37],1.0
[63=>85],1.0
[25,22=>17],1.0
[25,22=>96],1.0
[16=>8],1.0
[25=>96],0.5
[25=>22],0.5
[25=>17],0.5
[34=>4],1.0
[85,63=>37],1.0
[47,50=>73],1.0
[13=>55],1.0
[4=>34],1.0
[25,17=>22],1.0
[25,17=>96],1.0

Octave advanced textread usage, bash

I have following text file:
079082084072079032084069067072000000000,0
082078032049050032067072065082071069000,1
076065066032065083083084000000000000000,0
082078032049050072082000000000000000000,1
082078032049050072082000000000000000000,1
082078032049050072082000000000000000000,1
070083087032073073032080068000000000000,0
080067065032049050032072082000000000000,0
082078032056072082000000000000000000000,1
070083087032073073073000000000000000000,0
082078032087069069075069078068000000000,1
082078032049050072082000000000000000000,1
077065073078084032077069067072032073073,0
082078032049050072082000000000000000000,1
080067065032049050032072082000000000000,0
082078032049050072082000000000000000000,1
I need too matrices:
X size 16x13
Y size 16x1
I want to separate each row of the file into 13 values, example:
079 082 084 072 079 032 084 069 067 072 000 000 000
Is it possible to import it into octave using textread function?
If no, can it be done using Linux bash command?
Yes, you can do this with textscan (see bottom if you really want to use textread:
octave> txt = "079082084072079032084069067072000000000,0\n082078032049050032067072065082071069000,1";
octave> textscan (txt, repmat ("%3d", 1, 13))
ans =
{
[1,1] =
79
82
[1,2] =
82
78
[1,3] =
84
32
[1,4] =
72
49
[...]
Note that you are reading them as numeric values, so you do not get the preceding zeros. If you want them, you can either read them as string by using "%3s" in the format (extra trouble to handle and reduced performance since you will then be handling cell arrays).
Since you are reading from a file:
[fid, msg] = fopen ("data.txt", "r");
if (fid)
error ("failed to fopen 'data.txt': %s", msg);
endif
data = textscan (fid, repmat ("%3d", 1, 13));
fclose (fid);
If you really want to use textread:
octave> [d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11, d12, d13] = textread ("data.txt", repmat ("%3d", 1, 13))
d1 =
79
82
76
[...]
d2 =
82
78
65
[...]
or:
octave> data = cell (1, 13);
octave> [data{:}] = textread ("data.txt", repmat ("%3d", 1, 13))
data =
{
[1,1] =
79
82
76
[...]
[1,2] =
82
78
65
[...]
If you need to capture the value after the comma (not really part of your original question), you can use:
octave> textscan (txt, [repmat("%3d", 1, 13) ",%1d"])
ans =
{
[1,1] =
79
82
[1,2] =
82
78
[1,3] =
84
32
[...]
[1,14] =
0
1
}
You can do this pretty easily by reading three characters at a time using read in the shell:
while IFS="${IFS}," read -rn3 val tail; do
[[ $tail ]] && echo || printf '%s ' "$val"
done < file
This implementation assumes that if we encounter a value after the comma, we should go to the next line.

Fortran: How to read to an array from a file

I'm trying to read integers from a file to an array. But I get an error when I run the program.
PROGRAM MINTEM
INTEGER TEMP(4,7), I, J, MINIMUM, CURRENT
OPEN(UNIT=1, FILE='temps.dat')
READ (1,*) ((TEMP(I,J),J=1,7),I=1,4)
MINIMUM = TEMP(1,1)
DO I = 1,4
DO J = 1,7
IF (TEMP(I,J) < MINIMUM) THEN
MINIMUM = TEMP(I,J)
END IF
END DO
END DO
PRINT *, "MINIMUM TEMPERATURE = ", MINIMUM
END PROGRAM MINTEM
Input file looks like this:
22
100 90 80 70 60 100 90 80 70 60 100 90 80 70 60 100 90 80 70
100 90
The file you provided can be read in using this:
integer, allocatable :: t(:)
open(1,file='temp.dat')
read(1,*) N ! your first line with 22
allocate( t(N-1) ) ! further on you only have 21 elements
read(1,*)t ! so, read them in
print*, t
deallocate(t)
close(1)

Need to convert Image file back to X Y co-ordinate format

I am drawing an signature like this as given below and taking X Y cordinate and saving it to the arry list.
Bitmap bmp;
//Graphics object
Graphics graphics;
//Pen object
Pen pen = new Pen(Color.Black);
// Array List of line segments
ArrayList pVector = new ArrayList();
//Point object
Point lastPoint = new Point(0, 0);
protected override void OnMouseDown(MouseEventArgs e)
{
base.OnMouseDown(e);
// process if currently drawing signature
if (!drawSign)
{
// start collecting points
drawSign = true;
// use current mouse click as the first point
lastPoint.X = e.X;
lastPoint.Y = e.Y;
}
}
protected override void OnMouseMove(MouseEventArgs e)
{
base.OnMouseMove(e);
// process if drawing signature
if (drawSign)
{
if (graphics != null)
{
// draw the new segment on the memory bitmap
graphics.DrawLine(pen, lastPoint.X, lastPoint.Y, e.X, e.Y);
pVector.Add(lastPoint.X + " " + lastPoint.Y + " " + e.X + " " + e.Y);
// update the current position
lastPoint.X = e.X;
lastPoint.Y = e.Y;
// display the updated bitmap
Invalidate();
}
}
}
Using the arrylist (pVector) I am saving the values to the database as string(singature ) and aslo as image as given below
//Saving value to Database
ArrayList arrSign = new ArrayList();
arrSign = this.signatureControl.getPVector();
string singature = "";
for (int i = 0; i < arrSign.Count; i++)
{
singature = singature + arrSign[i].ToString() + "*";
}
the string singature wiil be like this
60 46 59 48*59 48 59 51*59 51 59 53*59 53 60 49*60 49 61 44*61 44 62 38*62 38 64 31*64 31 67 23*67 23 70 14*70 14 72 10*72 10 75 3*75 3 77 -2*77 -2 76 2*76 2 75 6*75 6 72 17*72 17 71 24*71 24 69 31*69 31 68 46*68 46 67 59*67 59 68 71*68 71 69 79*69 79 70 86*70 86 71 89*71 89 71 93*71 93 71 95*71 95 71 97*71 97 70 95*70 95 69 88*69 88 68 81*68 81 69 77*69 77 69 68*69 68 71 60
//Saving as Image file
Pen pen = new Pen(Color.Black);
string[] arrStr = (signature.Split('*'));
Graphics graphics;
Bitmap bmp = new Bitmap(300, 200);
graphics = Graphics.FromImage(bmp);
graphics.Clear(Color.White);
for (int i = 0; i < arrStr.Length - 2; i++)
{
string[] strArr = new string[4];
strArr = ((arrStr[i].ToString()).Split(' '));
graphics.DrawLine(pen, Convert.ToInt32(strArr[0].ToString()), Convert.ToInt32(strArr[1].ToString()),
Convert.ToInt32(strArr[2].ToString()), Convert.ToInt32(strArr[3].ToString()));
}
string pathToCopyImage = systemBus.TempFile;
bmp.Save(pathToCopyImage + "\\" + dsReportDetails.Tables["tblDelivery"].Rows[0]["PKDelivery"].ToString() + "_Signature.bmp", System.Drawing.Imaging.ImageFormat.Bmp);
bmp.Dispose();
My problem is that after Saving the signature as Image file I am not able to convert it back to arrylist like the one that i am used to save the value in the database.
ie I need to convert the image file back to as given below format
60 46 59 48*59 48 59 51*59 51 59 53*59 53 60 49*60 49 61 44*61 44 62 38*62 38 64 31*64 31 67 23*67 23 70 14*70 14 72 10*72 10 75 3*75 3 77 -2*77 -2 76 2*76 2 75 6*75 6 72 17*72 17 71 24*71 24 69 31*69 31 68 46*68 46 67 59*67 59 68 71*68 71 69 79*69 79 70 86*70 86 71 89*71 89 71 93*71 93 71 95*71 95 71 97*71 97 70 95*70 95 69 88*69 88 68 81*68 81 69 77*69 77 69 68*69 68 71 60
Will any one help me please
It is not very easy to get your "signature string" back from image so you can just add you "string signature" to saved image as a metadata tag of image (as a Description for example). So then you read your image back you don't need to recognize "signature string" from image, you can just read it from metadata as a string. Msdn has a nice article about image metadata and api to work with them. http://msdn.microsoft.com/en-us/library/ms748873.aspx
By the way, your code for concatenating "signature string" is slow and memory consuming. It is better to use StringBuilder in such situations in .Net. And overall strings are not the best data structure to store list of points. But it depends on requirements for your app.

Resources