Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
I have a list 'L'
L = [('1', '2'), ('1', '3'), ('1', '4'), ('1', '5'), ('2', '3'), ('2', '4'), ('2', '5'), ('3', '4'), ('3', '5'), ('4', '5')]
I want this list to
[ 3, 4, 5, 6, 5, 6, 7, 7, 8, 9]
which is sum of element.
What is appropriate code? I want shortest code
Use map method?
Seems like homework, but OK.
We need to transform the strings inside the tuples into ints, and then sum them, there are a trillion ways of doing this. This is a general solution so it works even if the tuples in the original list have variable lengths.
First using map as you want
list(map(lambda tup: sum(map(int,tup)), L))
The list call is just used to create a list from the map object.
You can also use list comprehensions
[sum(int(x) for x in tup) for tup in L]
You can also mix and match the map in the comprehension call like this to get the shortest code.
[sum(map(int,tup)) for tup in L]
Here you go. :)
casting = [ (int(i), int(j)) for i,j in L ]
sumVariable = [ sum(i) for i in casting ]
you can try this :
alpha = [int(x[0])+int(x[1]) for x in L]
Related
Suppose I want to create a list like the following with a list comprehension:
["2", "2", "2", "3", "3", "3", "4", "4", "4"]
I have tried:
>>> [*[str(n)] * 3 for n in range(2, 5)]
File "<stdin>", line 1
SyntaxError: iterable unpacking cannot be used in comprehension
and
>>> [str(n) * 3 for n in range(2, 5)]
['222', '333', '444']
where I got the numbers, but together in one string, and
>>> [[str(n)] * 3 for n in range(2, 5)]
[['2', '2', '2'], ['3', '3', '3'], ['4', '4', '4']]
where I got a nested list, but want a flat one. Can this be done in a simple way, or do I have to take the nested list method and flatten the list ?
A nested for loop is your best bet. I think this is the simplest solution:
[str(n) for n in range(2, 5) for i in range(3)]
A simple way you could do this with list comprehension is by using integer division.
[str(n // 3) for n in range(6, 15)]
You can use the function chain function
from the itertools library in order to flatten the list.
>>> from itertools import chain
>>> list(chain(*[[str(n)] * 3 for n in range(2, 5)]))
['2', '2', '2', '3', '3', '3', '4', '4', '4']
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
marksheet = [['harry',87], ['bob', 76], ['bucky', 98]]
print(set([marks for name, marks in marksheet]))
output: {98, 76, 87}
Can someone please explain how this works?
You're iterating name, marks over marksheet. So you're extracting two values and storing them as name, which you ignore, and marks, which you create a list from. That list you just create is passed to set, which creates a set. You can break the code down step by step:
marksheet = [['harry',87], ['bob', 76], ['bucky', 98]]
In [40]: marksheet
Out[40]: [['harry', 87], ['bob', 76], ['bucky', 98]]
In [41]: l = [marks for name, marks in marksheet]
In [42]: l
Out[42]: [87, 76, 98]
You can also surround the values you're extracting in parentheses to help make it more clear:
In [43]: l = [marks for (name, marks) in marksheet]
In [44]: l
Out[44]: [87, 76, 98]
Some people use _ to denote the returned value is ignored:
In [45]: l = [marks for (_, marks) in marksheet]
In [46]: l
Out[46]: [87, 76, 98]
The above is an example of list comprehension. This is equivalent to:
In [47]: l=[]
In [48]: for (name, marks) in marksheet:
...: l.append(marks)
...:
In [49]: l
Out[49]: [87, 76, 98]
From there you are simply passing the list to set, which can take an iterable. In this case, the list you just created is the iterable:
In [50]: set(l)
Out[50]: {76, 87, 98}
I have a list of tuples of the format, [(ID, Date), (ID, Date)...], with dates in datetime format. As an example of the RDD I'm working with:
[('1', datetime.date(2012, 1, 01)),
('2', datetime.date(2012, 1, 01)),
('3', datetime.date(2012, 1, 01)),
('4', datetime.date(2012, 1, 01)),
('5', datetime.date(2012, 1, 01)),
('1', datetime.date(2011, 1, 01)),
('2', datetime.date(2013, 1, 01)),
('3', datetime.date(2015, 1, 01)),
('4', datetime.date(2010, 1, 01)),
('5', datetime.date(2018, 1, 01))]
I need to gather the IDs and the minimum date associated with each ID. Presumably, this is a reduceByKey action, but I've not been able to sort out the associated function. I'm guessing I'm just over-complicating things, but help would be appreciated in identifying the appropriate lambda (or method if reduceByKey is not most efficient in this scenario).
I've scoured StackOverflow and found similar answers here, here, and here, but again, I've not been able to successfully modify these answers to fit my particular scenario. Often times, the datetime format seems to trip things up (the datetime format itself is due to the way I parsed the xml, so I can go back and have it parsed as a string if that's helpful).
I've attempted the following, and receive errors for each:
.reduceByKey(min) - IndexError: tuple index out of range
reduceByKey(lambda x, y: (x, min(y))) - IndexError: tuple index out of range (if datetime is converted to string, or error below if in datetime format)
.reduceByKey(lambda x, y: (x[0], min(y))) - TypeError: 'datetime.date' object is not subscriptable
I expect the final result to be as follows:
[('1', datetime.date(2011, 1, 01)),
('2', datetime.date(2012, 1, 01)),
('3', datetime.date(2012, 1, 01)),
('4', datetime.date(2010, 1, 01)),
('5', datetime.date(2012, 1, 01))]
I figured it out. There were several problems. For starters, here's the applicable syntax. First off (and after creating a SparkSession of course), I converted the RDD to a dataframe with:
df = spark.createDataFrame(df, ['col1', 'col2'])
And then carried out a groupBy and aggregation function. These you'll see on other SO answers, but I thought I'd post this here since it's in the context of my particular scenario.
from pyspark.sql import functions as F
df= df.groupBy('col1').agg(F.min('col2'))
To then return the data to the RDD format, I used
result = df.rdd.map(lambda x: (x[0], x[4]))
In this particular case, I was also mapped the elements of the 0th and 4th columns of the dataframe back to a tuple assigned to result.
In this process, I also discovered some interesting bits that may be helpful to others:
I had some Null values in my dataframe that I wasn't aware of
that continually caused, mostly "NoneType is not subscriptible"
errors. While this made sense, it took me a while to figure out
where the NoneType was located.
Some of my XML had been incorrectly parsed so that it was returning a tuple of (None) instead of a tuple of (None, None) as required by the format of data above.
These corrections allowed me to .show() the dataframes (and not just .printSchema(). The .groupBy and its associated objects were never a problem.
I'm new at Python, and i need your help for this.
I have a user input like :
5 72 245 62
And i need to split this integers into a dictionary like this :
{1=5;2=72;3=245;4=62}
I tried something like :
sequence = dict(x ,input().split())
Where x is incrementing counter.
If your desired end result is a Python dictionary, then I think you're pretty close.
You can actually use a python builtin to achieve this called enumerate:
>>> values = input().split()
1 2 3 4
>>> values
['1', '2', '3', '4']
>>>
>>> sequence = dict(enumerate(values))
>>> sequence
{0: '1', 1: '2', 2: '3', 3: '4'}
enumerate just goes through any iterable (such as a list of strings) and outputs the index of each item and the item itself in a tuple:
>>> for x in enumerate(values):
... print(x)
...
(0, '1')
(1, '2')
(2, '3')
(3, '4')
You can then call dict on an iterable of tuples (which is what enumerate produces) in order to turn them into a dictionary.
Of course, enumerate, like most things is zero-indexed, so you can also pass in a starting number if you would like to start a 1:
>>> sequence = dict(enumerate(values, 1))
>>> sequence
{1: '1', 2: '2', 3: '3', 4: '4'}
The problem with what you have
Let's say, as above, we have a list of strings. In order to match up numbers with each string in the list, we need something like the following:
>>> dict([(1, '1'), (2, '2')...])
Notice that I am passing one argument to dict: a list of tuples where each item in the list looks like (1, '1') and I have one container holding all of them.
Your attempt was the following:
>>> sequence = dict(x ,input().split())
This is interpreted probably something like (guessing on the x):
>>> dict(1, ['1', '2', '3'])
Which produces the following Traceback:
>>> dict(1, [1, 2, 3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: dict expected at most 1 arguments, got 2
You are passing two arguments to dict, not one, which is what it expects.
It expects some kind of container with a bunch of things in it where the first element of each thing is mapped to the second element, such as the following:
>>> [(1, '1'), (2, '2')]
I am trying to read a CSV file and have the data come back as a list of list of ints. The CSV file is 6 columns wide and 2 rows. Using Python 3.4.
It always comes out as a list of list of strs. Searching StackOverflow and Google shows 7 different ways to do this, none of which work. These are shown below the code as what I have tried.
import csv
b = []
with open('C:\Python34\DataforProgramstorun\\csv data6x2.csv') as f:
reader = csv.reader(f)
for row in reader :
b.append(row) # this gives me a list of list each element a csv str from print (b)
print (b)
The result is:
[['2', '5', '15', '17', '19', '20'], ['6', '8', '14', '18', '21', '30']]
I would like it to be:
[[2, 5, 15, 17, 19, 20], [6, 8, 14, 18, 21, 30]]
None of the following work:
[ int(x) for y in b for x in y.split() ] #builtins.AttributeError: 'list' object has no attribute 'split'
[int(x) for x in ' '.join(b).split ()] #builtins.TypeError: sequence item 0: expected str instance, list found
import itertools as it; new =list(it.imap(int,b)) #builtins.AttributeError: 'module' object has no attribute 'imap'
for i in range (0,len(b)): b[i] = int (b[i]) #builtins.TypeError: int() argument must be a string or a number, not 'list'
results = b; results = [int(i) for i in results] ##builtins.TypeError: int() argument must be a string or a number, not'list'
b = list(map(int,b)) #builtins.TypeError: int() argument must be a string or a number, not 'list'
[int(i) for i in b] #builtins.TypeError: int() argument must be a string or a number, not 'list'
>>> lst = [['2', '5', '15', '17', '19', '20'], ['6', '8', '14', '18', '21', '30']]
>>> [[int(x) for x in inner] for inner in lst]
[[2, 5, 15, 17, 19, 20], [6, 8, 14, 18, 21, 30]]
The problem with all your tried solutions is that you only go one level deep. So you do consider the outer list, but then try to work with the inner list directly, which usually fails. To solve this, you need to actually work on the inner lists directly. You could also solve it like this:
for i, sublist in enumerate(b):
b[i] = [int(x) for x in sublist]
And instead of [int(x) for x in sublist] you could also use one of the many other solution to convert all strings in a (sub)list to ints, for example list(map(int, sublist)).