replace the first N dots of a string - string

I hope to replace the first 14 dots of my.string with 14 zeroes when region = 2. All other dots should be kept the way they are.
df.1 = read.table(text = "
city county state region my.string reg1 reg2
1 1 1 1 123456789012345678901234567890 1 0
1 2 1 1 ...................34567890098 1 0
1 1 2 1 112233..............0099887766 1 0
1 2 2 1 ..............2020202020202020 1 0
1 1 1 2 ..............00.............. 0 1
1 2 1 2 ..............0987654321123456 0 1
1 1 2 2 ..............9999988888777776 0 1
1 2 2 2 ..................555555555555 0 1
", sep = "", header = TRUE, stringsAsFactors = FALSE)
df.1
I do not think this question has been asked here. Sorry if it has. Sorry also not to have spent more time looking for the solution. A quick Google search did not turn up an answer. I did ask a similar question here earlier: R: removing the last three dots from a string Thank you for any help.
I should clarify that I only want to remove 14 consecutive dots at the far left of the string. If a string begins with a number that is followed by 14 dots, then those 14 dots should remain the way they are.
Here is how my.string would look:
123456789012345678901234567890
...................34567890098
112233..............0099887766
..............2020202020202020
0000000000000000..............
000000000000000987654321123456
000000000000009999988888777776
00000000000000....555555555555

Have you tried:
sub("^\\.{14}", "00000000000000", df.1$my.string )
For conditional replacement try:
> df.1[ df.1$region ==2, "mystring"] <-
sub("^\\.{14}", "00000000000000", df.1$my.string[ df.1$region==2] )
> df.1
city county state region my.string reg1 reg2
1 1 1 1 1 123456789012345678901234567890 1 0
2 1 2 1 1 ...................34567890098 1 0
3 1 1 2 1 112233..............0099887766 1 0
4 1 2 2 1 ..............2020202020202020 1 0
5 1 1 1 2 ..............00.............. 0 1
6 1 2 1 2 ..............0987654321123456 0 1
7 1 1 2 2 ..............9999988888777776 0 1
8 1 2 2 2 ..................555555555555 0 1
mystring
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 0000000000000000..............
6 000000000000000987654321123456
7 000000000000009999988888777776
8 00000000000000....555555555555

gsub('^[.]{14,14}',paste(rep(0,14),collapse=''),df.1$my.string)
"123456789012345678901234567890" "00000000000000.....34567890098" "112233..............0099887766"
[4] "000000000000002020202020202020" "0000000000000000.............." "000000000000000987654321123456"
[7] "000000000000009999988888777776" "00000000000000....555555555555"

dwin's answer is awesome. here's one that's easy to understand but not nearly as spiffy
# restrict the substitution to only region == 2..
# then replace the 'my.string' column with..
df.1[ df.1$region == 2 , 'my.string' ] <-
# substitute.. (only the first instance!)
# (use gsub for multiple instances)
sub(
# fourteen dots
'..............' ,
# with fourteen zeroes
'00000000000000' ,
# in the same object (also restricted to region == 2
df.1[ df.1$region == 2 , 'my.string' ] ,
# and don't use regex or anything special.
# just exactly 14 dots.
fixed = TRUE
)

A data.table solution:
require(data.table)
dt <- data.table(df.1)
# solution:
dt[, mystring := ifelse(region == 2, sub("^[.]{14}",
paste(rep(0,14), collapse=""), my.string),
my.string), by=1:nrow(dt)]
# city county state region my.string reg1 reg2 mystring
# 1: 1 1 1 1 123456789012345678901234567890 1 0 123456789012345678901234567890
# 2: 1 2 1 1 ...................34567890098 1 0 ...................34567890098
# 3: 1 1 2 1 112233..............0099887766 1 0 112233..............0099887766
# 4: 1 2 2 1 ..............2020202020202020 1 0 ..............2020202020202020
# 5: 1 1 1 2 ..............00.............. 0 1 0000000000000000..............
# 6: 1 2 1 2 ..............0987654321123456 0 1 000000000000000987654321123456
# 7: 1 1 2 2 ..............9999988888777776 0 1 000000000000009999988888777776
# 8: 1 2 2 2 ..................555555555555 0 1 00000000000000....555555555555

Related

Cumulative count using grouping, sorting, and condition

i want Cumulative count of zero only in column c grouped by column a and sorted by b if other number the count reset to 1
this a sample
df = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':[1,2,3,4,1,2,3,4],
'c':[10,0,0,5,1,0,1,0]}
)
i try next code that work but if zero appear more than one time shift function didn't depend on new value and need to run more than one time depend on count of zero series
df.loc[df.c == 0 ,'n'] = df.n.shift(1)+1
i try next code it done with small data frame but when try with large data take a long time and didn't finsh
for ind in df.index:
if df.loc[ind,'c'] == 0 :
df.loc[ind,'new'] = df.loc[ind-1,'new']+1
else :
df.loc[ind,'new'] = 1
pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':[1,2,3,4,1,2,3,4],
'c':[10,0,0,5,1,0,1,0]}
The desired result
a b c n
0 1 1 10 1
1 1 2 0 2
2 1 3 0 3
3 1 4 5 1
4 2 1 1 1
5 2 2 0 2
6 2 3 1 1
7 2 4 0 2
Try use cumsum to create a group variable and then use groupby.cumcount to create the new column:
df.sort_values(['a', 'b'], inplace=True)
df['n'] = df['c'].groupby([df.a, df['c'].ne(0).cumsum()]).cumcount() + 1
df
a b c n
0 1 1 10 1
1 1 2 0 2
2 1 3 0 3
3 1 4 5 1
4 2 1 1 1
5 2 2 0 2
6 2 3 1 1
7 2 4 0 2

what is the good way to add 1 in column values if value greater than 2 python

I want to add 1 in column values if column value is greater than 2
here is my dataframe
df=pd.DataFrame({'A':[1,1,1,1,1,1,3,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,5,1,1,0,1,1,1]})
df_out
df=pd.DataFrame({'A':[1,1,1,1,1,1,3,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,6,1,1,0,1,1,1]})
Use DataFrame.loc with add 1:
df.loc[df.A.gt(2), 'flag'] += 1
print (df)
A flag
0 1 1
1 1 1
2 1 0
3 1 1
4 1 1
5 1 1
6 3 6
7 2 1
8 2 1
9 2 0
10 2 1
11 2 1
12 2 1
Or:
df['flag'] = np.where(df.A.gt(2), df['flag'] + 1, df['flag'])
EDIT:
mean = df.groupby(pd.cut(df['x'], bins))['y'].transform('mean')
df['flag'] = np.where(mean.gt(2), df['y'] + 1, df['y'])
And then:
x= df.groupby(pd.cut(df['x'], bins))['y'].apply(lambda x:abs(x-np.mean(x)))

Change every CSV file value

I'm sure there's a simple solution to this but I'm struggling. I want to set the values of a csv file I've created to 1s and 0s so that I can work out the probability based on each row.
Here's the csv data:
0 1 2 3 4 5 6 7 \
0 Reference China Greece Japan S Africa S Korea Sri lanka Taiwan
1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ...
14898 1 1 1 1 1 1 1 1
14899 1 1 1 1 1 1 1 1
14900 1 1 1 1 1 1 1 1
14901 1 1 1 1 1 1 1 1
14902 1 1 1 1 1 1 1 1
8 9 10 11 12 13 14 15 16
0 USA Ecuador Egypt Ghana India Isreal Pakistan Taiwan USA Ohio
1 1.031 1 1 1 1 1 1 1 1
2 1.031 1 1 1 1 1 1 1 1
3 1.031 1 1 1 1 1 1 1 1
4 1.031 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ... ...
14898 1 1 1 1 1 1 1 1 1
14899 1 1 1 1 1 1 1 1 1
14900 1 1 1 1 1 1 1 1 1
14901 1 1 1 1 1 1 1 1 1
14902 1 1 1 1 1 1 1 1 1
[14903 rows x 17 columns]
And I've tried this:
data = pd.DataFrame(pd.read_csv('IEratios.csv', header=None, sep=','))
for x in data:
if x == 1:
x = 0
else:
x = 1
Which I thought would be simple and work but I was wrong and everywhere I look nothing I find seems to apply to all columns and rows, so I am lost.
You can use the .map() function in pandas, this allows you to run a function trough an entire DF column like so:
def changeNumber(x):
if x == 1:
return 0
else:
return 1
df = pd.read_csv('IEratios.csv', sep=',')
df['China'] = df['china'].map(changeNumber)
I don't know if I understand what you want to do.
Do you want to replace the values that are one by zero, and the zeros by one?
If I understand correctly, how are you using panda you can use the following statement
df.replace({"0": "1", "1": "0"}, inplace=True)
You have to be careful with the data type of your dataframe
Have you tried using the numpy.where function?
data = pd.DataFrame(pd.read_csv('IEratios.csv', header=None, sep=','))
data = np.where((data == 1), 0, 1)

How to take mean of 3 values before flag change 0 to 1python

I have dataframe with columns A,B and flag. I want to calculate mean of 2 values before flag change from 0 to 1 , and record value when flag change from 0 to 1 and record value when flag changes from 1 to 0.
# Input dataframe
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'flag':[0,0,0,0,1,1,1,0,0,0,0,0]})
# Expected output
df_out=df=pd.DataFrame({'A_mean_before_flag_change':[5.5],
'B_mean_before_flag_change':[5],
'A_value_before_change_flag':[7],
'B_value_before_change_flag':[6]})
I try to create more general solution:
df=pd.DataFrame({'A':[1,3,4,7,8,11,1,15,20,15,16,87],
'B':[1,3,4,6,8,11,1,19,20,15,16,87],
'flag':[0,0,0,0,1,1,1,0,0,1,0,1]})
print (df)
A B flag
0 1 1 0
1 3 3 0
2 4 4 0
3 7 6 0
4 8 8 1
5 11 11 1
6 1 1 1
7 15 19 0
8 20 20 0
9 15 15 1
10 16 16 0
11 87 87 1
First create groups by mask for 0 with next 1 values of flag:
m1 = df['flag'].eq(0) & df['flag'].shift(-1).eq(1)
df['g'] = m1.iloc[::-1].cumsum()
print (df)
A B flag g
0 1 1 0 3
1 3 3 0 3
2 4 4 0 3
3 7 6 0 3
4 8 8 1 2
5 11 11 1 2
6 1 1 1 2
7 15 19 0 2
8 20 20 0 2
9 15 15 1 1
10 16 16 0 1
11 87 87 1 0
then filter out groups with size less like N:
N = 4
df1 = df[df['g'].map(df['g'].value_counts()).ge(N)].copy()
print (df1)
A B flag g
0 1 1 0 3
1 3 3 0 3
2 4 4 0 3
3 7 6 0 3
4 8 8 1 2
5 11 11 1 2
6 1 1 1 2
7 15 19 0 2
8 20 20 0 2
Filter last N rows:
df2 = df1.groupby('g').tail(N)
And aggregate last with mean:
d = {'mean':'_mean_before_flag_change', 'last': '_value_before_change_flag'}
df3 = df2.groupby('g')['A','B'].agg(['mean','last']).sort_index(axis=1, level=1).rename(columns=d)
df3.columns = df3.columns.map(''.join)
print (df3)
A_value_before_change_flag B_value_before_change_flag \
g
2 20 20
3 7 6
A_mean_before_flag_change B_mean_before_flag_change
g
2 11.75 12.75
3 3.75 3.50
I'm assuming that this needs to work for cases with more than one rising edge and that the consecutive values and averages get appended to the output lists:
# the first step is to extract the rising and falling edges using diff(), identify sections and length
df['flag_diff'] = df.flag.diff().fillna(0)
df['flag_sections'] = (df.flag_diff != 0).cumsum()
df['flag_sum'] = df.flag.groupby(df.flag_sections).transform('sum')
# then you can get the relevant indices by checking for the rising edges
rising_edges = df.index[df.flag_diff==1.0]
val_indices = [i-1 for i in rising_edges]
avg_indices = [(i-2,i-1) for i in rising_edges]
# and finally iterate over the relevant sections
df_out = pd.DataFrame()
df_out['A_mean_before_flag_change'] = [df.A.loc[tpl[0]:tpl[1]].mean() for tpl in avg_indices]
df_out['B_mean_before_flag_change'] = [df.B.loc[tpl[0]:tpl[1]].mean() for tpl in avg_indices]
df_out['A_value_before_change_flag'] = [df.A.loc[idx] for idx in val_indices]
df_out['B_value_before_change_flag'] = [df.B.loc[idx] for idx in val_indices]
df_out['length'] = [df.flag_sum.loc[idx] for idx in rising_edges]
df_out.index = rising_edges

Is there a way to make one integer increase when a second integer increases by a set amount in python 3?

I'm trying to create a simple game in python 3 and I'm trying to build in an EXP system, for example, every 50 experience points, your health (Which is already an integer) increases by one. Is there a command for this?
(I'm coding this on repl.it if that matters)
I've never shunned guessing. :)
Let me suppose that you are incrementing a variable called experience_points and that, once for every 50 times you increment that you want to increment a variable called health by one.
experience_points += 1
if experience_points % 50 == 0:
health +=1
This bit of code shows how this might work. Notice how health goes up one for every 50 times that 'experience_points` goes up one.
Welcome to the modulus operator!
>>> experience_points = 0
>>> health = 0
>>> while True:
... # do something in the game
... experience_points += 1
... if experience_points % 50 == 0:
... health += 1
... print (experience_points, health, '<--', end='')
... if experience_points > 160:
... break
...
1 0 <--2 0 <--3 0 <--4 0 <--5 0 <--6 0 <--7 0 <--8 0 <--9 0 <--10 0 <--11 0 <--12 0 <--13 0 <--14 0 <--15 0 <--16 0 <--17 0 <--18 0 <--19 0 <--20 0 <--21 0 <--22 0 <--23 0 <--24 0 <--25 0 <--26 0 <--27 0 <--28 0 <--29 0 <--30 0 <--31 0 <--32 0 <--33 0 <--34 0 <--35 0 <--36 0 <--37 0 <--38 0 <--39 0 <--40 0 <--41 0 <--42 0 <--43 0 <--44 0 <--45 0 <--46 0 <--47 0 <--48 0 <--49 0 <--50 1 <--51 1 <--52 1 <--53 1 <--54 1 <--55 1 <--56 1 <--57 1 <--58 1 <--59 1 <--60 1 <--61 1 <--62 1 <--63 1 <--64 1 <--65 1 <--66 1 <--67 1 <--68 1 <--69 1 <--70 1 <--71 1 <--72 1 <--73 1 <--74 1 <--75 1 <--76 1 <--77 1 <--78 1 <--79 1 <--80 1 <--81 1 <--82 1 <--83 1 <--84 1 <--85 1 <--86 1 <--87 1 <--88 1 <--89 1 <--90 1 <--91 1 <--92 1 <--93 1 <--94 1 <--95 1 <--96 1 <--97 1 <--98 1 <--99 1 <--100 2 <--101 2 <--102 2 <--103 2 <--104 2 <--105 2 <--106 2 <--107 2 <--108 2 <--109 2 <--110 2 <--111 2 <--112 2 <--113 2 <--114 2 <--115 2 <--116 2 <--117 2 <--118 2 <--119 2 <--120 2 <--121 2 <--122 2 <--123 2 <--124 2 <--125 2 <--126 2 <--127 2 <--128 2 <--129 2 <--130 2 <--131 2 <--132 2 <--133 2 <--134 2 <--135 2 <--136 2 <--137 2 <--138 2 <--139 2 <--140 2 <--141 2 <--142 2 <--143 2 <--144 2 <--145 2 <--146 2 <--147 2 <--148 2 <--149 2 <--150 3 <--151 3 <--152 3 <--153 3 <--154 3 <--155 3 <--156 3 <--157 3 <--158 3 <--159 3 <--160 3 <--161 3 <--

Resources