Change every CSV file value - python-3.x

I'm sure there's a simple solution to this but I'm struggling. I want to set the values of a csv file I've created to 1s and 0s so that I can work out the probability based on each row.
Here's the csv data:
0 1 2 3 4 5 6 7 \
0 Reference China Greece Japan S Africa S Korea Sri lanka Taiwan
1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
3 1 1 1 1 1 1 1 1
4 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ...
14898 1 1 1 1 1 1 1 1
14899 1 1 1 1 1 1 1 1
14900 1 1 1 1 1 1 1 1
14901 1 1 1 1 1 1 1 1
14902 1 1 1 1 1 1 1 1
8 9 10 11 12 13 14 15 16
0 USA Ecuador Egypt Ghana India Isreal Pakistan Taiwan USA Ohio
1 1.031 1 1 1 1 1 1 1 1
2 1.031 1 1 1 1 1 1 1 1
3 1.031 1 1 1 1 1 1 1 1
4 1.031 1 1 1 1 1 1 1 1
... ... ... ... ... ... ... ... ... ...
14898 1 1 1 1 1 1 1 1 1
14899 1 1 1 1 1 1 1 1 1
14900 1 1 1 1 1 1 1 1 1
14901 1 1 1 1 1 1 1 1 1
14902 1 1 1 1 1 1 1 1 1
[14903 rows x 17 columns]
And I've tried this:
data = pd.DataFrame(pd.read_csv('IEratios.csv', header=None, sep=','))
for x in data:
if x == 1:
x = 0
else:
x = 1
Which I thought would be simple and work but I was wrong and everywhere I look nothing I find seems to apply to all columns and rows, so I am lost.

You can use the .map() function in pandas, this allows you to run a function trough an entire DF column like so:
def changeNumber(x):
if x == 1:
return 0
else:
return 1
df = pd.read_csv('IEratios.csv', sep=',')
df['China'] = df['china'].map(changeNumber)

I don't know if I understand what you want to do.
Do you want to replace the values that are one by zero, and the zeros by one?
If I understand correctly, how are you using panda you can use the following statement
df.replace({"0": "1", "1": "0"}, inplace=True)
You have to be careful with the data type of your dataframe

Have you tried using the numpy.where function?
data = pd.DataFrame(pd.read_csv('IEratios.csv', header=None, sep=','))
data = np.where((data == 1), 0, 1)

Related

how to do count of particular value of given column corresponding to other column

To count the particular value of given column
Use pd.crosstab with df.sum:
In [236]: output = pd.crosstab(df['Rel_ID'], df['Values'])
In [238]: output['total'] = output.sum(axis=1)
In [239]: output
Out[239]:
Values 400.0 500.0 1700.0 6300.0 total
Rel_ID
TESTA 1 1 1 1 4
TESTB 1 0 1 1 3
TESTC 0 1 1 0 2
TESTD 1 0 1 1 3
TESTE 1 1 0 0 2

Cumulative count using grouping, sorting, and condition

i want Cumulative count of zero only in column c grouped by column a and sorted by b if other number the count reset to 1
this a sample
df = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':[1,2,3,4,1,2,3,4],
'c':[10,0,0,5,1,0,1,0]}
)
i try next code that work but if zero appear more than one time shift function didn't depend on new value and need to run more than one time depend on count of zero series
df.loc[df.c == 0 ,'n'] = df.n.shift(1)+1
i try next code it done with small data frame but when try with large data take a long time and didn't finsh
for ind in df.index:
if df.loc[ind,'c'] == 0 :
df.loc[ind,'new'] = df.loc[ind-1,'new']+1
else :
df.loc[ind,'new'] = 1
pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':[1,2,3,4,1,2,3,4],
'c':[10,0,0,5,1,0,1,0]}
The desired result
a b c n
0 1 1 10 1
1 1 2 0 2
2 1 3 0 3
3 1 4 5 1
4 2 1 1 1
5 2 2 0 2
6 2 3 1 1
7 2 4 0 2
Try use cumsum to create a group variable and then use groupby.cumcount to create the new column:
df.sort_values(['a', 'b'], inplace=True)
df['n'] = df['c'].groupby([df.a, df['c'].ne(0).cumsum()]).cumcount() + 1
df
a b c n
0 1 1 10 1
1 1 2 0 2
2 1 3 0 3
3 1 4 5 1
4 2 1 1 1
5 2 2 0 2
6 2 3 1 1
7 2 4 0 2

what is the good way to add 1 in column values if value greater than 2 python

I want to add 1 in column values if column value is greater than 2
here is my dataframe
df=pd.DataFrame({'A':[1,1,1,1,1,1,3,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,5,1,1,0,1,1,1]})
df_out
df=pd.DataFrame({'A':[1,1,1,1,1,1,3,2,2,2,2,2,2],'flag':[1,1,0,1,1,1,6,1,1,0,1,1,1]})
Use DataFrame.loc with add 1:
df.loc[df.A.gt(2), 'flag'] += 1
print (df)
A flag
0 1 1
1 1 1
2 1 0
3 1 1
4 1 1
5 1 1
6 3 6
7 2 1
8 2 1
9 2 0
10 2 1
11 2 1
12 2 1
Or:
df['flag'] = np.where(df.A.gt(2), df['flag'] + 1, df['flag'])
EDIT:
mean = df.groupby(pd.cut(df['x'], bins))['y'].transform('mean')
df['flag'] = np.where(mean.gt(2), df['y'] + 1, df['y'])
And then:
x= df.groupby(pd.cut(df['x'], bins))['y'].apply(lambda x:abs(x-np.mean(x)))

all possible steps by 1 or 2 to reach nth stair

I am working on a python program where I want to find all possible ways to reach nth floor.
Here is my program taken from here:
# A program to count the number of ways to reach n'th stair
# Recurssive program to find n'th fibonacci number
def fib(n):
if n <= 1:
return n
return fib(n-1) + fib(n-2)
# returns no. of ways to reach s'th stair
def countWays(s):
return fib(s + 1)
# Driver program
s = 10
print("Number of ways = ", countWays(s) )
Here I am getting the total number of ways to reach nth floor, but I want a function that can return an array of all possible ways to reach nth floor.
Example:
1) s = 3 output should be the possible steps which are {1,1,1}, {2,1}, {1,2}.
2) s = 10, has 89 combinations:
1 1 1 1 1 1 1 1 1 1
2 1 1 1 1 1 1 1 1
1 2 1 1 1 1 1 1 1
1 1 2 1 1 1 1 1 1
2 2 1 1 1 1 1 1
1 1 1 2 1 1 1 1 1
2 1 2 1 1 1 1 1
1 2 2 1 1 1 1 1
1 1 1 1 2 1 1 1 1
2 1 1 2 1 1 1 1
1 2 1 2 1 1 1 1
1 1 2 2 1 1 1 1
2 2 2 1 1 1 1
1 1 1 1 1 2 1 1 1
2 1 1 1 2 1 1 1
1 2 1 1 2 1 1 1
1 1 2 1 2 1 1 1
2 2 1 2 1 1 1
1 1 1 2 2 1 1 1
2 1 2 2 1 1 1
1 2 2 2 1 1 1
1 1 1 1 1 1 2 1 1
2 1 1 1 1 2 1 1
1 2 1 1 1 2 1 1
1 1 2 1 1 2 1 1
2 2 1 1 2 1 1
1 1 1 2 1 2 1 1
2 1 2 1 2 1 1
1 2 2 1 2 1 1
1 1 1 1 2 2 1 1
2 1 1 2 2 1 1
1 2 1 2 2 1 1
1 1 2 2 2 1 1
2 2 2 2 1 1
1 1 1 1 1 1 1 2 1
2 1 1 1 1 1 2 1
1 2 1 1 1 1 2 1
1 1 2 1 1 1 2 1
2 2 1 1 1 2 1
1 1 1 2 1 1 2 1
2 1 2 1 1 2 1
1 2 2 1 1 2 1
1 1 1 1 2 1 2 1
2 1 1 2 1 2 1
1 2 1 2 1 2 1
1 1 2 2 1 2 1
2 2 2 1 2 1
1 1 1 1 1 2 2 1
2 1 1 1 2 2 1
1 2 1 1 2 2 1
1 1 2 1 2 2 1
2 2 1 2 2 1
1 1 1 2 2 2 1
2 1 2 2 2 1
1 2 2 2 2 1
1 1 1 1 1 1 1 1 2
2 1 1 1 1 1 1 2
1 2 1 1 1 1 1 2
1 1 2 1 1 1 1 2
2 2 1 1 1 1 2
1 1 1 2 1 1 1 2
2 1 2 1 1 1 2
1 2 2 1 1 1 2
1 1 1 1 2 1 1 2
2 1 1 2 1 1 2
1 2 1 2 1 1 2
1 1 2 2 1 1 2
2 2 2 1 1 2
1 1 1 1 1 2 1 2
2 1 1 1 2 1 2
1 2 1 1 2 1 2
1 1 2 1 2 1 2
2 2 1 2 1 2
1 1 1 2 2 1 2
2 1 2 2 1 2
1 2 2 2 1 2
1 1 1 1 1 1 2 2
2 1 1 1 1 2 2
1 2 1 1 1 2 2
1 1 2 1 1 2 2
2 2 1 1 2 2
1 1 1 2 1 2 2
2 1 2 1 2 2
1 2 2 1 2 2
1 1 1 1 2 2 2
2 1 1 2 2 2
1 2 1 2 2 2
1 1 2 2 2 2
2 2 2 2 2
Update:
I found this working code in Java, I am not able to understand how do I change this to python
public static void main(String args[]) {
int s = 10;
List<Integer> vals = new ArrayList<>();
ClimbWays(s, 0, new int[s], vals);
vals.sort(null);
System.out.println(vals);
}
public static void ClimbWays(int n, int currentIndex, int[] currectClimb, List<Integer> vals) {
if (n < 0)
return;
if (n == 0) {
vals.add(currentIndex);
int last = 0;
for (int i = currentIndex - 1; i >= 0; i--) {
int current = currectClimb[i];
int res = current - last;
last = current;
System.out.print(res + " ");
}
System.out.println();
return;
}
currectClimb[currentIndex] = n;
ClimbWays(n - 1, currentIndex + 1, currectClimb, vals);
ClimbWays(n - 2, currentIndex + 1, currectClimb, vals);
}
It seems like you are looking for a modification of the partitions of a number:
import itertools as it
def partitions(n, I=1):
yield (n,)
for i in range(I, n//2 + 1):
for p in partitions(n-i, i):
yield (i,) + p
def countWays(s):
for i in partitions(s):
if s in i: continue # just the original number
yield from set(it.permutations(i)) # set to remove duplicates
print(list(countWays(3)))
Displays:
[(1, 2), (2, 1), (1, 1, 1)]
Note that this will return them in no particularly sorted order.
(Partitions algorithm from here.)
Here is a conversion of your java code into python:
def climbWays(n, currentIndex, currentClimb, vals):
if n < 0:
return
if n == 0:
vals.append(currentIndex)
last = 0
for i in range(currentIndex - 1, -1, -1):
current = currentClimb[i]
res = current - last
last = current
print(res, end=" ")
print()
return
currentClimb[currentIndex] = n
climbWays(n - 1, currentIndex + 1, currentClimb, vals)
climbWays(n - 2, currentIndex + 1, currentClimb, vals)
s = 10
vals = []
climbWays(s, 0, [0] * s, vals)
vals.sort()
print(vals)

replace the first N dots of a string

I hope to replace the first 14 dots of my.string with 14 zeroes when region = 2. All other dots should be kept the way they are.
df.1 = read.table(text = "
city county state region my.string reg1 reg2
1 1 1 1 123456789012345678901234567890 1 0
1 2 1 1 ...................34567890098 1 0
1 1 2 1 112233..............0099887766 1 0
1 2 2 1 ..............2020202020202020 1 0
1 1 1 2 ..............00.............. 0 1
1 2 1 2 ..............0987654321123456 0 1
1 1 2 2 ..............9999988888777776 0 1
1 2 2 2 ..................555555555555 0 1
", sep = "", header = TRUE, stringsAsFactors = FALSE)
df.1
I do not think this question has been asked here. Sorry if it has. Sorry also not to have spent more time looking for the solution. A quick Google search did not turn up an answer. I did ask a similar question here earlier: R: removing the last three dots from a string Thank you for any help.
I should clarify that I only want to remove 14 consecutive dots at the far left of the string. If a string begins with a number that is followed by 14 dots, then those 14 dots should remain the way they are.
Here is how my.string would look:
123456789012345678901234567890
...................34567890098
112233..............0099887766
..............2020202020202020
0000000000000000..............
000000000000000987654321123456
000000000000009999988888777776
00000000000000....555555555555
Have you tried:
sub("^\\.{14}", "00000000000000", df.1$my.string )
For conditional replacement try:
> df.1[ df.1$region ==2, "mystring"] <-
sub("^\\.{14}", "00000000000000", df.1$my.string[ df.1$region==2] )
> df.1
city county state region my.string reg1 reg2
1 1 1 1 1 123456789012345678901234567890 1 0
2 1 2 1 1 ...................34567890098 1 0
3 1 1 2 1 112233..............0099887766 1 0
4 1 2 2 1 ..............2020202020202020 1 0
5 1 1 1 2 ..............00.............. 0 1
6 1 2 1 2 ..............0987654321123456 0 1
7 1 1 2 2 ..............9999988888777776 0 1
8 1 2 2 2 ..................555555555555 0 1
mystring
1 <NA>
2 <NA>
3 <NA>
4 <NA>
5 0000000000000000..............
6 000000000000000987654321123456
7 000000000000009999988888777776
8 00000000000000....555555555555
gsub('^[.]{14,14}',paste(rep(0,14),collapse=''),df.1$my.string)
"123456789012345678901234567890" "00000000000000.....34567890098" "112233..............0099887766"
[4] "000000000000002020202020202020" "0000000000000000.............." "000000000000000987654321123456"
[7] "000000000000009999988888777776" "00000000000000....555555555555"
dwin's answer is awesome. here's one that's easy to understand but not nearly as spiffy
# restrict the substitution to only region == 2..
# then replace the 'my.string' column with..
df.1[ df.1$region == 2 , 'my.string' ] <-
# substitute.. (only the first instance!)
# (use gsub for multiple instances)
sub(
# fourteen dots
'..............' ,
# with fourteen zeroes
'00000000000000' ,
# in the same object (also restricted to region == 2
df.1[ df.1$region == 2 , 'my.string' ] ,
# and don't use regex or anything special.
# just exactly 14 dots.
fixed = TRUE
)
A data.table solution:
require(data.table)
dt <- data.table(df.1)
# solution:
dt[, mystring := ifelse(region == 2, sub("^[.]{14}",
paste(rep(0,14), collapse=""), my.string),
my.string), by=1:nrow(dt)]
# city county state region my.string reg1 reg2 mystring
# 1: 1 1 1 1 123456789012345678901234567890 1 0 123456789012345678901234567890
# 2: 1 2 1 1 ...................34567890098 1 0 ...................34567890098
# 3: 1 1 2 1 112233..............0099887766 1 0 112233..............0099887766
# 4: 1 2 2 1 ..............2020202020202020 1 0 ..............2020202020202020
# 5: 1 1 1 2 ..............00.............. 0 1 0000000000000000..............
# 6: 1 2 1 2 ..............0987654321123456 0 1 000000000000000987654321123456
# 7: 1 1 2 2 ..............9999988888777776 0 1 000000000000009999988888777776
# 8: 1 2 2 2 ..................555555555555 0 1 00000000000000....555555555555

Resources