Convert column pattern - linux

I have this kind of file:
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
8 0 1
10 0 1
11 0 1
The RS separator is an empty line by default.
If there was a double blank line, we have to substitute on of them by a pattern $1 0 0, where $1 means the increased "number" before the $1 0 * record.
If the separator is empty line + 1 empty line we have to increase the $1 by 1.
If the separator is empty line + 2 empty line we have to increase the $1 by 2.
...
and I need to get this output:
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
6 0 0
7 0 0
8 0 1
9 0 0
10 0 1
11 0 1
Thanks in advance!

awk 'NF{f=0;n=$1;print;next}f{print ++n " 0 0"}{print;f=1}' ./infile
Output
$ awk 'NF{f=0;n=$1;print;next}f{print ++n " 0 0"}{print;f=1}' ./infile
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
6 0 0
7 0 0
8 0 1
9 0 0
10 0 1
11 0 1
Explanation
NF{f=0;n=$1;print;next}: if the current line has data, unset flag f, save the number in the first field to n, print the line and skip the rest of the script
{print;f=1}: We only reach this action if the current line is blank. If so, print the line and set the flag f
f{print ++n " 0 0"}: We only execute this action if the flag f is set which only happens if the previous line was blank. If we enter this action, print the missing fields with an incremented n

You can try something like this. The benefit of this way is that your input file need not have an empty line for the missing numbers.
awk -v RS="" -v ORS="\n\n" -v OFS="\n" '
BEGIN{getline; col=$1;line=$0;print line}
$1==col{print $0;next }
($1==col+1){print $0;col=$1;next}
{x=$1;y=$0; col++; while (col < x) {print col" 0 0";col++};print y;next}' file
Input File:
[jaypal:~/Temp] cat file
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
8 0 1
10 0 1
11 0 1
Script Output:
[jaypal:~/Temp] awk -v RS="" -v ORS="\n\n" -v OFS="\n" '
BEGIN{getline; col=$1;line=$0;print line}
$1==col{print $0;next }
($1==col+1){print $0;col=$1;next}
{x=$1;y=$0; col++; while (col < x) {print col" 0 0";col++};print y;next}' file
1 0 1
2 0 3
2 1 2
3 0 3
4 0 1
4 1 1
4 2 1
4 3 1
5 0 1
6 0 0
7 0 0
8 0 1
9 0 0
10 0 1
11 0 1

Related

Count number of non zero columns in a given set of columns of a data frame - pandas

I have a df as shown below
df:
Id Jan20 Feb20 Mar20 Apr20 May20 Jun20 Jul20 Aug20 Sep20 Oct20 Nov20 Dec20 Amount
1 20 0 0 12 1 3 1 0 0 2 2 0 100
2 0 0 2 1 0 2 0 0 1 0 0 0 500
3 1 2 1 2 3 1 1 2 2 3 1 1 300
From the above I would like to calculate Activeness value which is the number of non zero columns in the month columns as given below.
'Jan20', 'Feb20', 'Mar20', 'Apr20', 'May20', 'Jun20', 'Jul20',
'Aug20', 'Sep20', 'Oct20', 'Nov20', 'Dec20'
Expected Output:
Id Jan20 Feb20 Mar20 Apr20 May20 Jun20 Jul20 Aug20 Sep20 Oct20 Nov20 Dec20 Amount Activeness
1 20 0 0 12 1 3 1 0 0 2 2 0 100 7
2 0 0 2 1 0 2 0 0 1 0 0 0 500 4
3 1 2 1 2 3 1 1 2 2 3 1 1 300 12
I tried below code:
df['Activeness'] = pd.Series(index=df.index, data=np.count_nonzero(df[['Jan20', 'Feb20',
'Mar20', 'Apr20', 'May20', 'Jun20', 'Jul20',
'Aug20', 'Sep20', 'Oct20', 'Nov20', 'Dec20']], axis=1))
which is working well, but I would like to know is there any method that is faster than this.
You can try:
df['Activeness'] = df.filter(like = '20').ne(0, axis =1).sum(1)

All boolean possibilities of given length in J

I want the simplest verb that gives a list of all boolean lists of given length.
e.g.
f=. NB. Insert magic here
f 2
0 0
0 1
1 0
1 1
f 3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
This functionality has been recently added to the stats/base addon.
load 'stats/base/combinatorial' NB. or just load 'stats'
permrep 2 NB. permutations of size 2 from 2 items with replacement
0 0
0 1
1 0
1 1
3 permrep 2 NB. permutations of size 3 from 2 items with replacement
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
permrep NB. display definition of permrep
$:~ :(# #: i.#^~)
Using the Qt IDE you can view the script defining permrep and friends by entering open 'stats/base/combinatorial' in the Term window. Alternatively you can view it on Github.
To define f as specified in your question, the following should suffice:
f=: permrep&2
f=: (# #: i.#^~)&2 NB. alternatively
f 3
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
The #: ("Antibase 2") vocab page has an example close to what I want. I don't really understand that primitive but the following code gives a list of base 2 digits of the numbers 0 to 2^n-1:
f=. #:#i.#(2^])
(Thanks to Dan for getting me to look up #:.)

countif unique values using multiple criteria

I have read a bunch of other posts, but havent fount what I need. I need to do a countif on 2 criteria, but only count the unique "EventIds" in this case that meet it. currently it counts the total number that meet the criteria
EventID DR Ref(F) Number Locations (G)
110000018 1 13
110000018 2 2
110000018 3 8
110000018 4 5
110000252 1 3
110000252 2 3
110000354 1 1
110000366 1 2
110000366 3 1
I have the data above and am trying to display it in a matrix below, currently I am using a countif with F being the "DR Ref" column, "AB" being DR1 a
=COUNTIFS($F$4:$F$12,">="&AB$2,$G$4:$G$12,">="&$Y4)
this is the current output...
DR
Locat 1(AB) 2 3
> 1 9 5 3
> 2 7 4 2
> 3 5 3 2
> 4 3 2 2
> 5 3 2 2
> 6 2 1 1
> 7 2 1 1
> 8 2 1 1
> 9 1 0 0
> 10 1 0 0
> 11 1 0 0
> 12 1 0 0
> 13 1 0 0
Desired output
this is the wanted output...
DR
Locat 1(AB) 2 3
> 1 4 3 2
> 2 3 2 1
> 3 2 2 1
> 4 1 1 1
> 5 1 1 1
> 6 1 1 1
> 7 1 1 1
> 8 1 1 1
> 9 1 0 0
> 10 1 0 0
> 11 1 0 0
> 12 1 0 0
> 13 1 0 0
Use the following formula:
=SUM(IF(($F$4:$F$12>=AB$2)*($G$4:$G$12>=$Y4),1/COUNTIFS($E$4:$E$12,$E$4:$E$12,$F$4:$F$12,">="&AB$2,$G$4:$G$12,">="&$Y4),0))
and press ctrl+shift+enter
Check the code here.
but I have to say that you phrased the question in a very typical manner.

Is there a way to make one integer increase when a second integer increases by a set amount in python 3?

I'm trying to create a simple game in python 3 and I'm trying to build in an EXP system, for example, every 50 experience points, your health (Which is already an integer) increases by one. Is there a command for this?
(I'm coding this on repl.it if that matters)
I've never shunned guessing. :)
Let me suppose that you are incrementing a variable called experience_points and that, once for every 50 times you increment that you want to increment a variable called health by one.
experience_points += 1
if experience_points % 50 == 0:
health +=1
This bit of code shows how this might work. Notice how health goes up one for every 50 times that 'experience_points` goes up one.
Welcome to the modulus operator!
>>> experience_points = 0
>>> health = 0
>>> while True:
... # do something in the game
... experience_points += 1
... if experience_points % 50 == 0:
... health += 1
... print (experience_points, health, '<--', end='')
... if experience_points > 160:
... break
...
1 0 <--2 0 <--3 0 <--4 0 <--5 0 <--6 0 <--7 0 <--8 0 <--9 0 <--10 0 <--11 0 <--12 0 <--13 0 <--14 0 <--15 0 <--16 0 <--17 0 <--18 0 <--19 0 <--20 0 <--21 0 <--22 0 <--23 0 <--24 0 <--25 0 <--26 0 <--27 0 <--28 0 <--29 0 <--30 0 <--31 0 <--32 0 <--33 0 <--34 0 <--35 0 <--36 0 <--37 0 <--38 0 <--39 0 <--40 0 <--41 0 <--42 0 <--43 0 <--44 0 <--45 0 <--46 0 <--47 0 <--48 0 <--49 0 <--50 1 <--51 1 <--52 1 <--53 1 <--54 1 <--55 1 <--56 1 <--57 1 <--58 1 <--59 1 <--60 1 <--61 1 <--62 1 <--63 1 <--64 1 <--65 1 <--66 1 <--67 1 <--68 1 <--69 1 <--70 1 <--71 1 <--72 1 <--73 1 <--74 1 <--75 1 <--76 1 <--77 1 <--78 1 <--79 1 <--80 1 <--81 1 <--82 1 <--83 1 <--84 1 <--85 1 <--86 1 <--87 1 <--88 1 <--89 1 <--90 1 <--91 1 <--92 1 <--93 1 <--94 1 <--95 1 <--96 1 <--97 1 <--98 1 <--99 1 <--100 2 <--101 2 <--102 2 <--103 2 <--104 2 <--105 2 <--106 2 <--107 2 <--108 2 <--109 2 <--110 2 <--111 2 <--112 2 <--113 2 <--114 2 <--115 2 <--116 2 <--117 2 <--118 2 <--119 2 <--120 2 <--121 2 <--122 2 <--123 2 <--124 2 <--125 2 <--126 2 <--127 2 <--128 2 <--129 2 <--130 2 <--131 2 <--132 2 <--133 2 <--134 2 <--135 2 <--136 2 <--137 2 <--138 2 <--139 2 <--140 2 <--141 2 <--142 2 <--143 2 <--144 2 <--145 2 <--146 2 <--147 2 <--148 2 <--149 2 <--150 3 <--151 3 <--152 3 <--153 3 <--154 3 <--155 3 <--156 3 <--157 3 <--158 3 <--159 3 <--160 3 <--161 3 <--

replace the first N dots of a string

I hope to replace the first 14 dots of my.string with 14 zeroes when region = 2. All other dots should be kept the way they are.
df.1 = read.table(text = "
city county state region my.string reg1 reg2
1 1 1 1 123456789012345678901234567890 1 0
1 2 1 1 ...................34567890098 1 0
1 1 2 1 112233..............0099887766 1 0
1 2 2 1 ..............2020202020202020 1 0
1 1 1 2 ..............00.............. 0 1
1 2 1 2 ..............0987654321123456 0 1
1 1 2 2 ..............9999988888777776 0 1
1 2 2 2 ..................555555555555 0 1
", sep = "", header = TRUE, stringsAsFactors = FALSE)
df.1
I do not think this question has been asked here. Sorry if it has. Sorry also not to have spent more time looking for the solution. A quick Google search did not turn up an answer. I did ask a similar question here earlier: R: removing the last three dots from a string Thank you for any help.
I should clarify that I only want to remove 14 consecutive dots at the far left of the string. If a string begins with a number that is followed by 14 dots, then those 14 dots should remain the way they are.
Here is how my.string would look:
123456789012345678901234567890
...................34567890098
112233..............0099887766
..............2020202020202020
0000000000000000..............
000000000000000987654321123456
000000000000009999988888777776
00000000000000....555555555555
Have you tried:
sub("^\\.{14}", "00000000000000", df.1$my.string )
For conditional replacement try:
> df.1[ df.1$region ==2, "mystring"] <-
sub("^\\.{14}", "00000000000000", df.1$my.string[ df.1$region==2] )
> df.1
city county state region my.string reg1 reg2
1 1 1 1 1 123456789012345678901234567890 1 0
2 1 2 1 1 ...................34567890098 1 0
3 1 1 2 1 112233..............0099887766 1 0
4 1 2 2 1 ..............2020202020202020 1 0
5 1 1 1 2 ..............00.............. 0 1
6 1 2 1 2 ..............0987654321123456 0 1
7 1 1 2 2 ..............9999988888777776 0 1
8 1 2 2 2 ..................555555555555 0 1
mystring
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 0000000000000000..............
6 000000000000000987654321123456
7 000000000000009999988888777776
8 00000000000000....555555555555
gsub('^[.]{14,14}',paste(rep(0,14),collapse=''),df.1$my.string)
"123456789012345678901234567890" "00000000000000.....34567890098" "112233..............0099887766"
[4] "000000000000002020202020202020" "0000000000000000.............." "000000000000000987654321123456"
[7] "000000000000009999988888777776" "00000000000000....555555555555"
dwin's answer is awesome. here's one that's easy to understand but not nearly as spiffy
# restrict the substitution to only region == 2..
# then replace the 'my.string' column with..
df.1[ df.1$region == 2 , 'my.string' ] <-
# substitute.. (only the first instance!)
# (use gsub for multiple instances)
sub(
# fourteen dots
'..............' ,
# with fourteen zeroes
'00000000000000' ,
# in the same object (also restricted to region == 2
df.1[ df.1$region == 2 , 'my.string' ] ,
# and don't use regex or anything special.
# just exactly 14 dots.
fixed = TRUE
)
A data.table solution:
require(data.table)
dt <- data.table(df.1)
# solution:
dt[, mystring := ifelse(region == 2, sub("^[.]{14}",
paste(rep(0,14), collapse=""), my.string),
my.string), by=1:nrow(dt)]
# city county state region my.string reg1 reg2 mystring
# 1: 1 1 1 1 123456789012345678901234567890 1 0 123456789012345678901234567890
# 2: 1 2 1 1 ...................34567890098 1 0 ...................34567890098
# 3: 1 1 2 1 112233..............0099887766 1 0 112233..............0099887766
# 4: 1 2 2 1 ..............2020202020202020 1 0 ..............2020202020202020
# 5: 1 1 1 2 ..............00.............. 0 1 0000000000000000..............
# 6: 1 2 1 2 ..............0987654321123456 0 1 000000000000000987654321123456
# 7: 1 1 2 2 ..............9999988888777776 0 1 000000000000009999988888777776
# 8: 1 2 2 2 ..................555555555555 0 1 00000000000000....555555555555

Resources