Erlang timer:sleep(1000) cause dead in thread - multithreading

I write a code in Erlang and use timer.sleep(), but it is okey and code work when I write timer.sleep(100), but I need more time to stop the thread and when I increase the time and run timer.sleep(100) in function which is called with spawn then the code not working.
In the code the second io does not print in console, so the thread dead in the line time:sleep(1000).
request_handler ( Name, Amount, Banks ) ->
io:format("~p ~p ~p ~n", [Name, Amount, Banks]),
timer:sleep(1000),
io:format("~p ~p ~p ~n", [Name, Amount, Banks]),
if Amount < 50 ->
Ask_amount = rand:uniform(Amount);
true ->
Ask_amount = rand:uniform(50)
end,
Bank = lists:nth(rand:uniform(length(Banks)), Banks),
Pid = whereis(Bank),
Pid ! {self(), Ask_amount},
receive
{accept, Ask_amount} ->
request_handler(Name, (Amount - Ask_amount), Banks);
{reject} ->
request_handler(Name, Amount, lists:filter(fun (Elem) -> not lists:member(Elem, [Bank]) end, Banks ))
end.
Any body know how I can sleep the thread in erlang for 1000 ms?

In the code the second io does not print in console, so the thread
dead in the line time:sleep(1000).
First off, in erlang we call them processes--not threads. If the process that is running handle_request() dies while it is sleeping, i.e. before the second format statement shows any output, then you should see an error message somewhere.
You haven't provided enough information to help you. Your question is essentially this:
I have the following function running in a process:
go() ->
io:format("hello"),
timer:sleep(1000),
io:format("goodbye").
I don't see the second output. Why is my process dying during the sleep?
The answer is: because something killed your process.
Here's an example that shows how you can make your request_handler() process sleep for 1 second:
handlers.erl:
-module(handlers).
-compile([export_all]).
request_handler ( Name, Amount, Banks ) ->
io:format("Format1: ~p ~p ~p ~n", [Name, Amount, Banks]),
timer:sleep(1000),
io:format("Format2: ~p ~p ~p ~n", [Name, Amount, Banks]),
% This is how you write an if statement when you want to assign
% the result to a variable:
AskAmount = if
Amount < 50 -> rand:uniform(Amount);
true -> rand:uniform(50)
end,
Bank = lists:nth(rand:uniform(length(Banks)), Banks),
Pid = whereis(Bank),
Pid ! {self(), AskAmount},
receive
{accept, AskAmount} ->
request_handler(Name, (Amount - AskAmount), Banks);
{reject} ->
request_handler(
Name,
Amount,
lists:filter(fun (Elem) -> not lists:member(Elem, [Bank]) end,
Banks
)
)
end.
The way the if statement is written above will get rid of the warning:
Warning: variable 'Ask_amount' exported from 'if
bank.erl:
-module(bank).
-compile(export_all).
init(Name) ->
register(Name, spawn(?MODULE, loop, [ok])).
loop(State) ->
receive
{From, Amount} ->
From ! {accept, Amount},
loop(State)
end.
create_banks(Names) ->
lists:foreach(
fun(Name) -> init(Name) end,
Names %=> [bank1, bank2, bank3]
).
my.erl:
-module(my).
-compile([export_all]).
go() ->
BankNames = [bank1, bank2, bank3],
bank:create_banks(BankNames),
handlers:request_handler("Hello", 500, BankNames).
In the shell:
~/erlang_programs$ erl
Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V9.3 (abort with ^G)
1> c(bank).
bank.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,bank}
2> c(handlers).
handlers.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,handlers}
3> c(my).
my.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,my}
4> my:go().
Format1: "Hello" 500 [bank1,bank2,bank3]
Format2: "Hello" 500 [bank1,bank2,bank3]
Format1: "Hello" 499 [bank1,bank2,bank3]
Format2: "Hello" 499 [bank1,bank2,bank3]
Format1: "Hello" 488 [bank1,bank2,bank3]
Format2: "Hello" 488 [bank1,bank2,bank3]
Format1: "Hello" 468 [bank1,bank2,bank3]
Format2: "Hello" 468 [bank1,bank2,bank3]
Format1: "Hello" 460 [bank1,bank2,bank3]
Format2: "Hello" 460 [bank1,bank2,bank3]
Format1: "Hello" 456 [bank1,bank2,bank3]
Format2: "Hello" 456 [bank1,bank2,bank3]
Format1: "Hello" 419 [bank1,bank2,bank3]
...
...
^C^C
Response to comment:
I want to run function 'request_handler' more than one time concurrently.
...but when I run the request handler:
spawn(handlers, request_handler, [hello, 450, Banks])
the second io not working.
It works for me:
-module(my).
-compile([export_all]).
go() ->
BankNames = [bank1, bank2, bank3],
bank:create_banks(BankNames),
lists:foreach(
fun(N) ->
spawn(handlers, request_handler, [N, 500, BankNames])
end,
lists:seq(1, 5) %=> [1, 2, 3, 4, 5]
).
In the shell:
~/erlang_programs$ erl
Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:10] [hipe] [kernel-poll:false]
Eshell V9.3 (abort with ^G)
1> c(bank).
bank.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,bank}
2> c(handlers).
handlers.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,handlers}
3> c(my).
my.erl:2: Warning: export_all flag enabled - all functions will be exported
{ok,my}
4> my:go().
process(<0.84.0>): Format1: 1 500 [bank1,bank2,bank3]
process(<0.85.0>): Format1: 2 500 [bank1,bank2,bank3]
process(<0.86.0>): Format1: 3 500 [bank1,bank2,bank3]
process(<0.87.0>): Format1: 4 500 [bank1,bank2,bank3]
process(<0.88.0>): Format1: 5 500 [bank1,bank2,bank3]
ok
process(<0.84.0>): Format2: 1 500 [bank1,bank2,bank3]
process(<0.85.0>): Format2: 2 500 [bank1,bank2,bank3]
process(<0.86.0>): Format2: 3 500 [bank1,bank2,bank3]
process(<0.87.0>): Format2: 4 500 [bank1,bank2,bank3]
process(<0.88.0>): Format2: 5 500 [bank1,bank2,bank3]
process(<0.84.0>): Format1: 1 467 [bank1,bank2,bank3]
process(<0.87.0>): Format1: 4 471 [bank1,bank2,bank3]
process(<0.85.0>): Format1: 2 465 [bank1,bank2,bank3]
process(<0.86.0>): Format1: 3 460 [bank1,bank2,bank3]
process(<0.88.0>): Format1: 5 453 [bank1,bank2,bank3]
process(<0.84.0>): Format2: 1 467 [bank1,bank2,bank3]
process(<0.87.0>): Format2: 4 471 [bank1,bank2,bank3]
process(<0.85.0>): Format2: 2 465 [bank1,bank2,bank3]
process(<0.86.0>): Format2: 3 460 [bank1,bank2,bank3]
process(<0.88.0>): Format2: 5 453 [bank1,bank2,bank3]
process(<0.84.0>): Format1: 1 422 [bank1,bank2,bank3]
process(<0.86.0>): Format1: 3 441 [bank1,bank2,bank3]
process(<0.87.0>): Format1: 4 461 [bank1,bank2,bank3]
process(<0.88.0>): Format1: 5 413 [bank1,bank2,bank3]
process(<0.85.0>): Format1: 2 442 [bank1,bank2,bank3]
process(<0.84.0>): Format2: 1 422 [bank1,bank2,bank3]
process(<0.86.0>): Format2: 3 441 [bank1,bank2,bank3]
process(<0.87.0>): Format2: 4 461 [bank1,bank2,bank3]
process(<0.88.0>): Format2: 5 413 [bank1,bank2,bank3]
process(<0.85.0>): Format2: 2 442 [bank1,bank2,bank3]
process(<0.84.0>): Format1: 1 405 [bank1,bank2,bank3]
process(<0.86.0>): Format1: 3 416 [bank1,bank2,bank3]
process(<0.87.0>): Format1: 4 439 [bank1,bank2,bank3]
process(<0.88.0>): Format1: 5 376 [bank1,bank2,bank3]
process(<0.85.0>): Format1: 2 419 [bank1,bank2,bank3]
process(<0.84.0>): Format2: 1 405 [bank1,bank2,bank3]
process(<0.86.0>): Format2: 3 416 [bank1,bank2,bank3]
process(<0.87.0>): Format2: 4 439 [bank1,bank2,bank3]
process(<0.88.0>): Format2: 5 376 [bank1,bank2,bank3]
process(<0.85.0>): Format2: 2 419 [bank1,bank2,bank3]
process(<0.84.0>): Format1: 1 397 [bank1,bank2,bank3]
process(<0.86.0>): Format1: 3 394 [bank1,bank2,bank3]
process(<0.85.0>): Format1: 2 389 [bank1,bank2,bank3]
process(<0.87.0>): Format1: 4 401 [bank1,bank2,bank3]
process(<0.88.0>): Format1: 5 340 [bank1,bank2,bank3]
process(<0.84.0>): Format2: 1 397 [bank1,bank2,bank3]
process(<0.86.0>): Format2: 3 394 [bank1,bank2,bank3]
process(<0.85.0>): Format2: 2 389 [bank1,bank2,bank3]
process(<0.87.0>): Format2: 4 401 [bank1,bank2,bank3]
process(<0.88.0>): Format2: 5 340 [bank1,bank2,bank3]
process(<0.84.0>): Format1: 1 367 [bank1,bank2,bank3]
process(<0.87.0>): Format1: 4 400 [bank1,bank2,bank3]
process(<0.86.0>): Format1: 3 355 [bank1,bank2,bank3]
process(<0.85.0>): Format1: 2 370 [bank1,bank2,bank3]
process(<0.88.0>): Format1: 5 313 [bank1,bank2,bank3]
process(<0.84.0>): Format2: 1 367 [bank1,bank2,bank3]
process(<0.87.0>): Format2: 4 400 [bank1,bank2,bank3]
process(<0.86.0>): Format2: 3 355 [bank1,bank2,bank3]
process(<0.85.0>): Format2: 2 370 [bank1,bank2,bank3]
process(<0.88.0>): Format2: 5 313 [bank1,bank2,bank3]
process(<0.84.0>): Format1: 1 337 [bank1,bank2,bank3]
process(<0.87.0>): Format1: 4 381 [bank1,bank2,bank3]
process(<0.88.0>): Format1: 5 299 [bank1,bank2,bank3]
process(<0.86.0>): Format1: 3 329 [bank1,bank2,bank3]
process(<0.85.0>): Format1: 2 367 [bank1,bank2,bank3]
^C^C
As you can see, both io:format() statements display their output. The ok is the return value of my:go() which returns whatever lists:foreach() returns, which is ok.

Related

Retaining bad_lines identified by pandas in the output file instead of skipping those lines

I have to convert text files into csv's after processing the contents of the text file as pandas dataframe.
Below is the code i am using. out_txt is my input text file and out_csv is my output csv file.
df = pd.read_csv(out_txt, sep='\s', header=None, on_bad_lines='warn', encoding = "ANSI")
df = df.replace(r'[^\w\s]|_]/()|~"{}="', '', regex=True)
df.to_csv(out_csv, header=None)
If "on_bad_lines = 'warn'" is not decalred the csv files are not created. But if i use this condition those bad lines are getting skipped (obviously) with the warning
Skipping line 6: Expected 8 fields in line 7, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
I would like to retain these bad lines in the csv. I have highlighted the bad lines detected in the below image (my input text file).
Below is the contents of the text file which is getting saved. In this content i would like to remove characters like #, &, (, ).
75062 220 8 6 110 220 250 <1
75063 260 5 2 584 878 950 <1
75064 810 <2 <2 456 598 3700 <1
75065 115 5 2 96 74 5000 <1
75066 976 <5 2 5 68 4200 <1
75067 22 210 4 348 140 4050 <1
75068 674 5 4 - 54 1130 3850 <1
75069 414 5 y) 446 6.6% 2350 <1
75070 458 <5 <2 548 82 3100 <1
75071 4050 <5 2 780 6430 3150 <1
75072 115 <7 <1 64 5.8% 4050 °#&4«x<i1
75073 456 <7 4 46 44 3900 <1
75074 376 <7 <2 348 3.8% 2150 <1
75075 378 <6 y) 30 40 2000 <1
I would split on \s later with str.split rather than read_csv :
df = (
pd.read_csv(out_txt, header=None, encoding='ANSI')
.replace(r'[^\w\s]|_]/()|~"{}="', '', regex=True)
.squeeze().str.split(expand=True)
)
Another variant (skipping everything that comes in-between the numbers):
df = (
pd.read_csv(out_txt, header=None, encoding='ANSI')
[0].str.findall(r"\b(\d+)\b"))
.str.split(expand=True)
)
​
Output :
print(df)
0 1 2 3 4 5 6 7
0 375020 1060 115 38 440 350 7800 1
1 375021 920 80 26 310 290 5000 1
2 375022 1240 110 28 460 430 5900 1
3 375023 830 150 80 650 860 6200 1
4 375024 185 175 96 800 1020 2400 1
5 375025 680 370 88 1700 1220 172 1
6 375026 550 290 72 2250 1460 835 2
7 375027 390 120 60 1620 1240 158 1
8 375028 630 180 76 820 1360 180 1
9 375029 460 280 66 380 790 3600 1
10 375030 660 260 62 11180 1040 300 1
11 375031 530 200 84 1360 1060 555 1

How can I select only the rows In file 1 that match column values in file 2?

I have multiple measurements per 'Subject' in file 1. I only want to use the highest quality, singular measurement per Subject. In my second file I have the exact list of which measurement is the best for each Subject. This information is contained in the column 'seriesnumber'. The number in the 'seriesnumber' column in file 2 corresponds to the best measurement for a Subject. I Need to extract only these rows from my file 1.
I have tried to use awk, join, and merge to try and accomplish this but came up with errors and strange incomplete files.
join code:
join -j2 file1 file2
awk code:
awk ' FILENAME=="file1" {arr[$2]=$0; next}
FILENAME=="file2" {print arr[$2]} ' file1 file2 > newfile
File 1 Example
Subject Seriesnumber
19-1-1001 2 8655 661 15250 60747 8005 3919 7393 2264 1479 1663 22968 4180 1712 689 781 4255 90 1260 7233 154 15643 63421 7361 4384 6932 2062 4526 1742 686 4575 100 1684 0 1194 0 0 5 0 0 147 699 315 305 317 565 1361200 1338210 1338690 304258 308180 612438 250614 255920 506534 66645 802424 1206450 1187010 1185180 1816840 1 1 21 17 38 1765590
19-1-1001 10 8992 507 15722 64032 8728 3929 7208 2075 1529 1529 22503 3993 1819 710 764 3870 87 1247 7361 65 16128 66226 8165 4384 6669 1805 4405 1752 779 4039 103 1705 0 1280 0 0 10 0 0 186 685 300 318 320 598 1370490 1347160 1347520 306588 307188 613775 251704 256521 508225 65808 808802 1208880 1189150 1187450 1827880 1 1 22 26 48 1778960
19-1-1103 2 3303 317 12146 57569 7008 3617 6910 2018 811 1593 18708 4708 1429 408 668 3279 14 1289 2351 85 13730 60206 6731 4137 7034 2038 4407 1483 749 3576 85 1668 0 948 0 0 7 0 0 129 602 288 291 285 748 1250030 1238540 1238820 301810 301062 602872 215029 218080 433108 61555 781150 1107360 1098510 1097220 1635560 1 1 32 47 79 1555850
19-1-1103 9 3236 286 12490 59477 7000 3558 6782 2113 894 1752 19338 4818 1724 387 649 3345 56 1314 2077 133 13885 60414 6628 4078 7063 2031 4269 1709 610 3707 112 1947 0 990 0 0 8 0 0 245 604 279 280 284 693 1269820 1258050 1258320 306856 309614 616469 215658 220876 436534 61859 796760 1124870 1115990 1114510 1630740 1 1 32 42 74 1556790
19-10-1010 2 3344 608 14744 59165 8389 4427 6962 2008 716 1496 21980 4008 1474 769 652 3715 61 1400 3049 1072 15767 61919 8325 4824 7117 1936 4001 1546 684 3935 103 1434 0 1624 0 0 3 0 0 316 834 413 520 517 833 1350760 1337040 1336840 311985 312592 624577 246800 251133 497933 65699 809736 1200320 1189410 1188280 1731270 1 1 17 13 30 1606700
19-10-1010 6 3242 616 15205 61330 8019 4520 6791 2093 735 1558 22824 3981 1546 653 614 3672 96 1227 2992 1070 16450 64189 8489 4407 6953 2099 4096 1668 680 4116 99 1449 0 2161 0 0 19 0 0 263 848 387 525 528 824 1339090 1325830 1325780 309464 311916 621380 239958 244616 484574 65493 810887 1183120 1172600 1171430 1720000 1 1 16 26 42 1587100
File 2 Example
Subject seriesnumber
19-10-1010 2
19-10-1166 2
19-102-10005 2
19-102-10006 2
19-103-10009 2
19-103-10010 2
19-104-10013 11
19-104-10014 2
19-105-10017 6
19-105-10018 6
The desired output would like something like this:
Where I no longer have duplicate entries per subject. The second column will look different because the preferred series number will differ per subject.
19-10-1010 2 3344 608 14744 59165 8389 4427 6962 2008 716 1496 21980 4008 1474 769 652 3715 61 1400 3049 1072 15767 61919 8325 4824 7117 1936 4001 1546 684 3935 103 1434 0 1624 0 0 3 0 0 316 834 413 520 517 833 1350760 1337040 1336840 311985 312592 624577 246800 251133 497933 65699 809736 1200320 1189410 1188280 1731270 1 1 17 13 30 1606700
19-10-1166 2 3699 312 15373 61787 8026 4248 6385 1955 608 2194 21394 4260 1563 886 609 3420 25 1101 3415 417 16909 63040 7236 4264 5933 1852 4156 1213 654 4007 53 1336 5 1597 0 0 18 0 0 110 821 300 514 466 854 1193020 1179470 1179420 282241 273236 555477 204883 203228 408111 61343 740736 1036210 1026080 1024910 1563950 1 1 39 40 79 1415890
19-102-10005 2 8733 514 13024 50735 7729 3775 4955 1575 1045 1141 20415 3924 1537 990 651 3515 134 1259 8571 232 13487 51374 7150 4169 5192 1664 3760 1620 596 3919 189 1958 0 1479 0 0 36 0 0 203 837 459 409 439 1072 1224350 1200010 1200120 287659 290445 578104 216976 220545 437521 57457 737161 1095770 1074440 1073050 1637570 1 1 31 22 53 1618600
19-102-10006 2 8347 604 13735 42231 7266 3836 6473 2057 1099 1007 18478 3769 1351 978 639 3332 125 1197 8207 454 13774 43750 6758 4274 6148 1921 3732 1584 614 3521 180 1611 0 1241 0 0 25 0 0 254 813 410 352 372 833 1092800 1069450 1069190 244104 245787 489891 202201 205897 408098 59170 634640 978807 958350 957462 1485600 1 1 19 19 38 1472020
19-103-10009 2 4222 596 14702 52038 7428 4065 6598 2166 835 1854 22613 3397 1387 879 568 3729 93 1315 3414 222 14580 52639 7316 3997 6447 1986 4067 1529 596 3778 113 1689 0 2097 0 0 23 0 0 260 761 326 400 359 772 1204670 1190100 1189780 256560 260381 516941 237316 243326 480642 60653 681040 1070620 1059370 1058440 1605990 1 1 25 23 48 1593730
19-103-10010 2 5254 435 14688 47120 7772 3130 5414 1711 741 1912 20643 3594 1449 882 717 3663 41 999 6465 605 14820 49390 6361 3826 5527 1523 3513 1537 639 3596 80 1261 0 1475 0 0 18 0 0 283 827 383 414 297 627 1135490 1117320 1116990 243367 245896 489263 221809 227084 448893 55338 639719 1009370 994519 993639 1568140 1 1 14 11 25 1542210
19-104-10013 2 7276 341 11836 53018 7912 3942 6105 2334 795 2532 21239 4551 1258 1176 430 3636 83 1184 8811 396 12760 53092 7224 4361 6306 1853 4184 1278 543 3921 175 1814 0 2187 0 0 8 0 0 266 783 381 382 357 793 1011640 987712 987042 206633 228397 435031 170375 191222 361597 61814 601948 879229 859619 859103 1586150 1 1 224 162 386 1557120
19-104-10014 2 5964 355 13297 55439 8599 4081 5628 1730 970 1308 20196 4519 1363 992 697 3474 62 1232 6830 472 14729 59478 7006 4443 6156 1825 4492 1726 827 4017 122 1804 0 1412 0 0 17 0 0 259 672 299 305 319 779 1308470 1288970 1288910 284018 285985 570003 258525 257355 515880 62485 746108 1166160 1149700 1148340 1826660 1 1 33 24 57 1630580
19-105-10017 2 7018 307 13848 53855 8345 3734 6001 2095 899 1932 20712 4196 1349 645 823 4212 72 1475 3346 1119 13970 55202 7411 3975 5672 1737 3778 1490 657 4089 132 1689 0 1318 0 0 23 0 0 234 745 474 367 378 760 1122360 1104380 1104520 235806 233881 469687 217939 220736 438675 61471 639143 985718 970903 969619 1583800 1 1 51 51 102 1558470
19-105-10018 2 16454 1098 12569 52521 8215 3788 5858 1805 788 1147 21028 3496 1492 665 634 3796 39 1614 10700 617 12813 52098 8091 3901 5367 1646 3544 1388 723 3938 47 1819 0 1464 0 0 42 0 0 330 832 301 319 400 788 1148940 1114080 1113560 225179 227218 452397 237056 237295 474351 59172 614884 1019300 986820 986144 1607900 1 1 19 28 47 1591480
19-105-10020 2 4096 451 13042 48597 7601 3228 5665 1582 778 1670 19769 3612 1187 717 617 3672 103 962 2627 467 13208 48466 6619 3461 5217 1360 3575 1388 718 3783 90 1370 0 862 0 0 6 0 0 216 673 386 439 401 682 1081580 1068850 1068890 233290 235396 468686 209666 214472 424139 54781 619447 958522 948737 947554 1493740 1 1 16 11 27 1452900
For file1 containing (I removed long useless lines):
Subject Seriesnumber
19-1-1001 2 8655 661 15250 60747 800
19-1-1001 10 8992 507 15722 64032 872
19-1-1103 2 3303 317 12146 57569 700
19-1-1103 9 3236 286 12490 59477 700
19-10-1010 2 3344 608 14744 59165 838
19-10-1010 6 3242 616 15205 61330 801
and file2 containig:
Subject seriesnumber
19-10-1010 2
19-10-1166 2
19-102-10005 2
19-102-10006 2
19-103-10009 2
19-103-10010 2
19-104-10013 11
19-104-10014 2
19-105-10017 6
19-105-10018 6
The following awk will output:
$ awk 'NR==FNR{a[$1, $2];next} ($1, $2) in a' file2 file1
19-10-1010 2 3344 608 14744 59165 838
Note that the first file argument to awk is file2 not file1 (small optimization)! How it works:
NR == FNR - if line number is file line number. Ie. choose only first file passed to awk.
a[$1, $2] - remember index $1,$2 in associative array a
next - do not parse rest of script and restart with next line
($1, $2) in a - check if $1, $2 is in associative array a
because of next this is run only for the second file as passed to awk
if this expression returns with true, then the line will be printed (this is how awk works).
Alternatively you could do the follow, but it will store the whole file1 in memory, which is... memory consuming..., the code above only stores $1, $2 indexes in memory.
awk 'NR==FNR{arr[$1, $2]=$0} NR!=FNR{print arr[$1, $2]}' file1 file2

Please suggest approaches and code to solve the defined problem statement

x y z amount absolute_amount
121 abc def 500 500
131 fgh xyz -800 800
121 abc xyz 900 900
131 fgh ijk 800 800
141 obc pqr 500 500
151 mbr pqr -500 500
141 obc pqr -500 500
151 mbr pqr 900 900
I need to find the duplicate rows in the dataset where the x and y are same, with conditions being-
sum(amount) !=0
abs(sum(amount)) != absolute_amount
I tried grouping them and the code i used in R is working but i need it to work in python
logic1 <- tablename %>%
group_by('x','y')%>%
filter(n()>1 && sum(`amount`) != 0 && abs(sum(`amount`)) != absolute_amount)
Expected output
x y z amount absolute_amount
121 abc def 500 500
121 abc xyz 900 900
151 mbr pqr -500 500
151 mbr pqr 900 900
Use transform with groupby.sum() to return sum transformed for each group and then compare the 2 conditions you have:
c=df.groupby(['x','y'])['amount'].transform('sum')
df[c.ne(0) & c.abs().ne(df.absolute_amount)]
x y z amount absolute_amount
0 121 abc def 500 500
2 121 abc xyz 900 900
5 151 mbr pqr -500 500
7 151 mbr pqr 900 900

Delete rows according to condition

Using as key columns 1 and 2, i want to delete all rows which the value increments by one.
input
1000 1001 140
1000 1002 140
1000 1003 140
1000 1004 140
1000 1005 140
1000 1006 140
1000 1201 140
1000 1202 140
1000 1203 140
1000 1204 140
1000 1205 140
2000 1002 140
2000 1003 140
2000 1004 140
2000 1005 140
2000 1006 140
output desired
1000 1001 140
1000 1006 140
1000 1201 140
1000 1205 140
2000 1002 140
2000 1006 140
I have tried
awk '{if (a[$1] < $2)a[$1]=$2;}END{for(i in a){print i,a[i];}}' <file>
But for some reason, it keeps only the maximum value.
Your problem statement doesn't describe your output. You want to print the first and last row of each contiguous range. Like this:
$ awk '$1 > A || $2 > B + 1 {
if(row){print row}; print}
{A=$1; B=$2; row=$0}
END {print}' dat
1000 1001 140
1000 1006 140
1000 1201 140
1000 1205 140
2000 1002 140
2000 1006 140
The basic problem is just to determine if a line is only 1 more than the prior one. The only way to do that is to have both lines to compare. By storing the value of each line as it's read, you can compare the current line to the prior one.

Adding rows that match a criteria in another column in Excel

This is a sample data
Polling_Booth INC SAD BSP PS_NO
1 89 47 2 1
2 97 339 6 1
3 251 485 8 1
4 356 355 25 2
5 290 333 9 2
6 144 143 4 3
7 327 196 1 4
8 370 235 1 5
And this is what I'm trying to achieve
Polling_Booth INC SAD BSP PS_NO OP_INC OP_SAD OP_BSP
1 89 47 2 1
2 97 339 6 1
3 251 485 8 1 437 871 16
4 356 355 25 2
5 290 333 9 2 646 688 34
6 144 143 4 3 144 143 4
7 327 196 1 4 327 196 1
8 370 235 1 5 370 235 1
This is achieved adding up rows which has the same PS_NO, This is what I have tried
=if(E2=E3,sum(B2,B3),0) #same for all the rows
Any help would be much appreciated..Thanks
You could get it to look like your table by adding another condition to check if it's the last occurrence of the PS_No in column E and setting the result to an empty string if not
=IF(COUNTIF($E$2:$E2,$E2)=COUNTIF($E$2:$E$10,$E2),SUMIF($E$2:$E$10,$E2,B$2:B$10),"")
If the data is sorted by PS_No, you can do it more easily by
=IF($E3<>$E2,SUMIF($E$2:$E$10,$E2,B$2:B$10),"")
which I think is what you were trying in your question

Resources