Remove sequential duplicate word separated by delimiter - python-3.x

I am trying to remove sequential duplicate separated by delimiter '>' from journey column and also aggregate values under column uu and conv. I've tried
INPUT
a=[['journey', 'uu', 'convs'],
['Ct', '10', '2'],
['Ct>Ct', '100', '3'],
['Ct>Pt>Ct', '200', '10'],
['Ct>Pt>Ct>Ct', '40', '5'],
['Ct>Pt>Bu', '1000', '8']]
OUTPUT
a=[['journey', 'uu', 'convs'],
['Ct', '110', '5'],
['Ct>Pt>Ct', '240', '15'],
['Ct>Pt>Bu', '1000', '8']]
I tried below to split but it didn't work
a='>'.join(set(a.split()))

You need to split your string by > and then you could use groupby to eliminate duplicate items in your string. For example:
x = ['Ct>Pt>Ct>Ct', '40', '5']
print(">".join([i for i, _ in groupby(x[0].split(">"))]))
# 'Ct>Pt>Ct'
You could use this as a lambda function in another groupby to aggregate the lists. Then sum each element of the same index by using zip. Check it out:
a=[['journey', 'uu', 'convs'],
['Ct', '10', '2'],
['Ct>Ct', '100', '3'],
['Ct>Pt>Ct', '200', '10'],
['Ct>Pt>Ct>Ct', '40', '5'],
['Ct>Pt>Bu', '1000', '8']]
from itertools import groupby
result = [a[0]] # Add header
groups = groupby(
a[1:],
key=lambda x: ">".join([i for i, _ in groupby(x[0].split(">"))])
)
# groups:
# ['Ct, '[['Ct', '10', '2'], ['Ct>Ct', '100', '3']]]
# ['Ct>Pt>Ct', [['Ct>Pt>Ct', '200', '10'], ['Ct>Pt>Ct>Ct', '40', '5']]]
# ['Ct>Pt>Bu', [['Ct>Pt>Bu', '1000', '8']]]
for key, items in groups:
row = [key]
for i in zip(*items):
if i[0].isdigit():
row.append(str(sum(map(int, i))))
result.append(row)
print(result)
Prints:
[['journey', 'uu', 'convs'],
['Ct', '110', '5'],
['Ct>Pt>Ct', '240', '15'],
['Ct>Pt>Bu', '1000', '8']]

Related

How can I determine the user who provided the correct arguments when using the command?

I make a game of roulette, everyone probably knows.
Problem:
I have arguments that need to be cited correctly, but I need to get the user who cited those arguments correctly.
Question:
How can I do it?
I haven’t tried it, I don’t know how to do it :) I hope you can help, thanks! Code below
#commands.command(brief = '''
Использование команды:
Поставить на число: JM!wheel number (число) (ставка)
Поставить на цвет: JM!wheel color (red или black) (ставка)
Поставить на чет-нечет JM!wheel vs (even = чет, odd = нечет) (ставка)''')
async def wheel(self, ctx, mode = None, value = None, bet = None):
result = random.randint(0, 36)
numbers_red = ['1', '3', '5', '7', '9', '12', '14', '16', '18', '19', '21', '23', '25', '27', '30', '32', '34',
'36']
numbers_black = ['2', '4', '6', '10', '11', '13', '15', '17', '20', '22', '24', '26', '28', '29', '31', '33',
'35']
numbers_even = ['2', '4', '6', '8', '10', '12', '14', '16', '18', '20', '22', '24', '26', '28', '30', '32', '34', '36']
numbers_odd = ['1', '3', '5', '7', '9', '11', '13', '15', '17', '19', '21', '23', '25', '27', '29', '31', '33', '35']
if mode == None and value == None and bet = None:
await ctx.send('''
Использование команды:
Поставить на число: JM!wheel number (число) (ставка)
Поставить на цвет: JM!wheel color (red или black) (ставка)
Поставить на чет-нечет JM!wheel vs (even = чет, odd = нечет) (ставка)''')
if mode and value and bet:
if mode == 'color':
if value in numbers_red and result in numbers_red:
pass
elif value in numbers_black and result in numbers_black:
pass
elif value in numbers_green and result == '0':
pass
elif mode == 'number':
if value == result:
pass
if mode == 'vs':
if value in numbers_odd and result in numbers_odd:
pass
if value in numbers_even and result in numbers_even:
pass
Maybe you need to use this code:
user = ctx.message.author
So that you will know who used the command.
Idk if you asked for this...
Otherwise you may need to know who is the author of a message, you can fetch the message and then get the author.
user = fetch_message(ID).author
Hope it was usefull

How to filter string from all column from csv file using python

csv file exampleI have a csv file and I need to check all columns to find ? in the csv file and remove those rows.
below is an example
Column1 Column 2 Column 3
1 ? 3
2 ?.. 1
? 2 ?.
? 4 4
I tried the below however it does not work
data = readData(“text.csv”)
print(data)
def Filter(string, substr):
return [str for str in string if
any(sub not in str for sub in substr)]
string = data
substr = [’?’,’?.’,’? ‘,’? ']
filter_data=Filter(string, substr)
my code is below to get ouptut in tupples.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def readData(filename) :
data = pd.read_csv(filename, skipinitialspace=True)
return [d for d in data.itertuples(index=False, name=None)]
data = readData("problem2.csv")
print(data)
[('18.0', 8, '307.0 ', '130.0 ', '3504.', '12.0', 70, 1, 'chevrolet chevelle malibu'), ('15.0', 8, '350.0 ', '165.0 ', '3693.', '11.5', 70, 1, 'buick skylark 320'), ('18.0', 8, '318.0 ', '150.0 ', '?.', '11.0', 70, 1, 'plymouth satellite'), ('16.0', 8, '304.0 ', '150.0 ', '3433.', '12.0', 70, 1, 'amc rebel sst'), ('17.0', 8, '302.0 ', '140.0 ', '3449.', '10.5', 70, 1, 'ford torino'), ('15.0', 8, '429.0 ', '198.0 ', '4341.', '10.0', 70, 1, 'ford galaxie 500'), ('14.0', 8, '454.0 ', '220.0 ', '4354.', '9.0', 70, 1, 'chevrolet impala'), ('14.0', 8, '440.0 ', '215.0 ', '4312.', '8.5', 70, 1, 'plymouth fury iii'),
Next want to remove rows with '?; from all columns to provide the same output in tupples.
My input file is as follows:
mpg,cylinder,displace,horsepower,weight,accelerate,year,origin,name
18,8,307,130,3504,12,70,1,chevy malibu
18,8,308,140,?.,14,70,1,plymoth satellite
18,8,309,150,?,15,70,1,ford torino
18,8,310,150,? ,16,70,1,ford galaxy
18,8,310,150, ?,17,70,1,pontiac catalina
18,8,310,150,3505,18,70,1,ford maverick
The code to replace any of the following occurrences ['?','?.',' ?','? '] is as follows:
import csv
qs = ['?','?.',' ?','? ']
with open('abc.txt') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
row = ['' if r in qs else r for r in row]
print (row)
The output of this will be as follows:
['mpg', 'cylinder', 'displace', 'horsepower', 'weight', 'accelerate', 'year', 'origin', 'name']
['18', '8', '307', '130', '3504', '12', '70', '1', 'chevy malibu']
['18', '8', '308', '140', '', '14', '70', '1', 'plymoth satellite']
['18', '8', '309', '150', '', '15', '70', '1', 'ford torino']
['18', '8', '310', '150', '', '16', '70', '1', 'ford galaxy']
['18', '8', '310', '150', '', '17', '70', '1', 'pontiac catalina']
['18', '8', '310', '150', '3505', '18', '70', '1', 'ford maverick']
As you can see values from rows 3 thru 6 got replaced with ''.
Ran with one more sample dataset:
mpg,cylinder,displace,horsepower,weight,accelerate,year,origin,name
18,8,307,130,3504,12,70,1,chevy malibu
18,8,308,140,?.,14,70,1,plymoth satellite
18,8,309,?,3506,15,70,1,ford torino
18,8,310,160,? ,16,70,1,ford galaxy
18,8,311,170,3508, ?,70,1,pontiac catalina
18,8,312,180,3509,18,70,1,ford maverick
Output is:
['mpg', 'cylinder', 'displace', 'horsepower', 'weight', 'accelerate', 'year', 'origin', 'name']
['18', '8', '307', '130', '3504', '12', '70', '1', 'chevy malibu']
['18', '8', '308', '140', '', '14', '70', '1', 'plymoth satellite']
['18', '8', '309', '', '3506', '15', '70', '1', 'ford torino']
['18', '8', '310', '160', '', '16', '70', '1', 'ford galaxy']
['18', '8', '311', '170', '3508', '', '70', '1', 'pontiac catalina']
['18', '8', '312', '180', '3509', '18', '70', '1', 'ford maverick']
In this scenario, the ? is on various columns. It still addresses the problem.
In case you are looking for all the rows in one go, you can read all the lines into one variable and process it.
qs = {'?.':'',' ?':'','? ':'','?':''}
with open('abc.txt') as csv_file:
lines = csv_file.readlines()
for i,text in enumerate(lines):
[text := text.replace(a,b) for a,b in qs.items()]
lines[i] = text
print (lines)
Your output data will be as follows:
['mpg,cylinder,displace,horsepower,weight,accelerate,year,origin,name\n', '18,8,307,130,3504,12,70,1,chevy malibu\n', '18,8,308,140,,14,70,1,plymoth satellite\n', '18,8,309,,3506,15,70,1,ford torino\n', '18,8,310,160,,16,70,1,ford galaxy\n', '18,8,311,170,3508,,70,1,pontiac catalina\n', '18,8,312,180,3509,18,70,1,ford maverick\n']
tuple output
Looks like you are expecting tuples as output.
Here's the code to do it:
import csv
qs = {'?.':'',' ?':'','? ':'','?':''}
final_list = []
with open('abc.txt') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for row in csv_reader:
row = ['' if r in qs else r for r in row]
final_list.append(tuple(row))
print (final_list)
The output will be as follows:
[('mpg', 'cylinder', 'displace', 'horsepower', 'weight', 'accelerate', 'year', 'origin', 'name'), ('18', '8', '307', '130', '3504', '12', '70', '1', 'chevy malibu'), ('18', '8', '308', '140', '', '14', '70', '1', 'plymoth satellite'), ('18', '8', '309', '', '3506', '15', '70', '1', 'ford torino'), ('18', '8', '310', '160', '', '16', '70', '1', 'ford galaxy'), ('18', '8', '311', '170', '3508', '', '70', '1', 'pontiac catalina'), ('18', '8', '312', '180', '3509', '18', '70', '1', 'ford maverick')]

How to sort dictionary of list based on certain column?

I have a table like below, stored in a dictionary:
The dictionary looks like this
d = {
'A': ['45', '70', '5', '88', '93', '79', '87', '69'],
'B': ['99', '18', '91', '3', '92', '2', '67', '15'],
'C': ['199200128', '889172415', '221388292', '199200128', '889172415', '889172415', '199200128', '221388292'],
'D': ['10:27:05', '07:10:29', '17:04:48', '10:25:42', '07:11:18', '07:11:37', '10:38:11', '17:08:55'],
'E': ['73', '6', '95', '21', '29', '15', '99', '9']
}
I'd like to sort the dictionary based on the hours from lowest to highest and sum the columns A, B and E corresponding the same value in column C as in image below (where sums of A, B and E are in red):
Then, the resulting dictionary would look like this:
{
'A': ['70', '93', '79', '242', '88', '45', '133', '87', '5', '69', '161'],
'B': ['18', '92', '2', '112', '3', '99', '102', '67', '91', '15', '173'],
'C': ['889172415', '889172415', '889172415', '', '199200128', '199200128', '', '199200128', '221388292', '221388292', ''],
'D': ['07:10:29', '07:11:18', '07:11:37', '', '10:25:42', '10:27:05', '', '10:38:11', '17:04:48', '17:08:55', ''],
'E': ['6', '29', '15', '50', '21', '73', '94', '99', '95', '9', '203']
}
I currently try to sort the input dictionary with this code, but doesn´t seem to work for me.
>>> sorted(d.items(), key=lambda e: e[1][4])
[
('D', ['10:27:05', '07:10:29', '17:04:48', '10:25:42', '07:11:18', '07:11:37', '10:38:11', '17:08:55']),
('E', ['73', '6', '95', '21', '29', '15', '99', '9']),
('C', ['199200128', '889172415', '221388292', '199200128', '889172415', '889172415', '199200128', '221388292']),
('B', ['99', '18', '91', '3', '92', '2', '67', '15']),
('A', ['45', '70', '5', '88', '93', '79', '87', '69'])
]
>>>
May someone give some help with this. Thanks
Do you allow to use pandas to solve this task ?
If yes, then you can transform your data to
pd.DataFrame
object
data = pd.DataFrame.from_dict(dictionary, orient = 'columns')
data = data.sort_values(by =„D”)
And then return to dictionary again using
_dict = data.to_dict()

Sorting a list of tuples without using lambda

I am yet to learn the 'lambda' concept in python, I tried to look for answers and every answer includes lambda in it. This is my code, can you please suggest me a way to sort it by values.
sorted_dict = {'sir': '113', 'to': '146', 'my': '9', 'jesus': '4', 'saving': '275', 'changing': '72', 'apologize': '285', 'pain': '308', 'sisters': '27', 'forgiving': '36', 'can': '62', 'family': '77', 'sorry': '8', 'is': '360', 'too': '15', 'her': '37', 'wanted': '18', 'being': '44', 'into': '208', 'are': '17', 'just': '97', 'so': '148', 'now': '112', 'be': '19', 'right': '189', 'been': '105', 'no': '56', 'because': '74', 'forgive': '52', 'keep': '88', 'wish': '12', "i'm": '67', 'always': '53', 'ask': '29'}
new_list = list()
for key,value in sorted_dict.items():
new_tup = (key, value)
new_list.append(new_tup)
new_list = sorted(new_list)
How do i proceed further?
lambda is often used as the key to sort every value in an iterator.
The same step from turning dictionaries to list of tuples, can be done using the dict method dict.items().
and i used lambda in sorting, as a key to tell the sorted function that, i want to sort based on the value in each tuple located in the 1st index.
sorted_dict = {'sir': '113', 'to': '146', 'my': '9', 'jesus': '4', 'saving': '275', 'changing': '72', 'apologize': '285', 'pain': '308', 'sisters': '27', 'forgiving': '36', 'can': '62', 'family': '77', 'sorry': '8', 'is': '360', 'too': '15', 'her': '37', 'wanted': '18', 'being': '44', 'into': '208', 'are': '17', 'just': '97', 'so': '148', 'now': '112', 'be': '19', 'right': '189', 'been': '105', 'no': '56', 'because': '74', 'forgive': '52', 'keep': '88', 'wish': '12', "i'm": '67', 'always': '53', 'ask': '29'}
new_list = sorted_dict.items()
new_list = sorted(new_list, key=lambda x: int(x[1]))
print(new_list)
if you are familiar with other programming concepts, you may have heard of what is called an "inline function"...
Lambda is an "inline function" equivalent in Python..
its a function which doesnt have a function name, and is restricted to have only a single line of code.
now coming to the problem of sort, the sort function in python accepts two arguments,
the list to be sorted
a function where you can define how to sort a list.
Suppose if its a list of numbers , you dont need the 2nd argument at all..
But if its like your case, where its a list of tuples or say a list of dictionaries,you need to tell python how to sort that list..
That is accomplished with the help of the 'key' argument in the sort function...
Below code is an illustration of that..
In [1]: l1 = [('a',1), ('b', 3), ('c', 2)]
In [2]: def sortHelper(x):
...: return x[1]
...:
In [3]: l1.sort(key=sortHelper)
In [4]: l1
Out[4]: [('a', 1), ('c', 2), ('b', 3)]
In [5]:
Now as you see, the sortHelper method is just a single line function, which can very well be written with a lambda function.
lambda x: x[1]
So its common to use lambda functions, but its not a compulsion.. you can accomplish the same functionality with normal python functions also..

remove multiple rows from a array in python

array([
['192', '895'],
['14', '269'],
['1', '23'],
['1', '23'],
['50', '322'],
['19', '121'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['2,546', '17,610'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['151', '1,150'],
['45', '319'],
['44', '335'],
['30', '184']
])
How can I remove some of the rows and left the array like:
Table3=array([
['192', '895'],
['14', '269'],
['1', '23'],
['50', '322'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['45', '319'],
['44', '335'],
['30', '184']
])
I removed the index 2,4,6. I am not sure how should I do it. I have tried few ways, but still can't work.
It seems like you actually deleted indices 2, 5, and 10 (not 2, 4 and 6). To do this you can use np.delete, pass it a list of the indices you want to delete, and apply it along axis=0:
Table3 = np.delete(arr, [[2,5,10]], axis=0)
>>> Table3
array([['192', '895'],
['14', '269'],
['1', '23'],
['50', '322'],
['17', '112'],
['12', '72'],
['2', '17'],
['5,250', '36,410'],
['882', '6,085'],
['571', '3,659'],
['500', '3,818'],
['458', '3,103'],
['151', '1,150'],
['45', '319'],
['44', '335'],
['30', '184']],
dtype='<U6')

Resources