When I tried to remove " from a text I mentioned, that translate and tranwrd creating a blank instead. Only compress delivers the wanted result. Why the two others replace " with blank?
data test;
a = 'this is my:"funny text"';
b = translate(a,"",'"');
c = tranwrd(a,'"',"");
d = compress(a,'"');
run;
The results (d is the wanted one):
b: this is my: funny text
c: this is my: funny text
d: this is my:funny text
Please consider the blank between : ffor b and c.
Related
Writing a Python program (ver. 3) to count strings in a specified field within each line of one or more csv files.
Where the csv file contains:
Field1, Field2, Field3, Field4
A, B, C, D
A, E, F, G
Z, E, C, D
Z, W, C, Q
the script is executed, for example:
$ ./script.py 1,2,3,4 file.csv
And the result is:
A 10
C 7
D 2
E 2
Z 2
B 1
Q 1
F 1
G 1
W 1
ERROR
the script is executed, for example:
$ ./script.py 1,2,3,4 file.csv file.csv file.csv
Where the error occurs:
for rowitem in reader:
for pos in field:
pos = rowitem[pos] ##<---LINE generating error--->##
if pos not in fieldcnt:
fieldcnt[pos] = 1
else:
fieldcnt[pos] += 1
TypeError: list indices must be integers or slices, not str
Thank you!
Judging from the output, I'd say that the fields in the csv file does not influence the count of the string. If the string uniqueness is case-insensitive please remember to use yourstring.lower() to return the string so that different case matches are actually counted as one. Also do keep in mind that if your text is large the number of unique strings you might find could be very large as well, so some sort of sorting must be in place to make sense of it! (Or else it might be a long list of random counts with a large portion of it being just 1s)
Now, to get a count of unique strings using the collections module is an easy way to go.
file = open('yourfile.txt', encoding="utf8")
a= file.read()
#if you have some words you'd like to exclude
stopwords = set(line.strip() for line in open('stopwords.txt'))
stopwords = stopwords.union(set(['<media','omitted>','it\'s','two','said']))
# make an empty key-value dict to contain matched words and their counts
wordcount = {}
for word in a.lower().split(): #use the delimiter you want (a comma I think?)
# replace punctuation so they arent counted as part of a word
word = word.replace(".","")
word = word.replace(",","")
word = word.replace("\"","")
word = word.replace("!","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
That should do it. The wordcount dict should contain the word and it's frequency. After that just sort it using collections and print it out.
word_counter = collections.Counter(wordcount)
for word, count in word_counter.most_common(20):
print(word, ": ", count)
I hope this solves your problem. Lemme know if you face problems.
i feel like I've searched high and low for answers to this, what feels like an easy issue, with no luck.
I am trying to format a number in VBA to include a specific text, just like i can do in excel.
e.g. i have a number 3, which i want to format to show "Workday 3"
Excel: "Workday" Standard = Workday 3
Example 1: Range(A1)=Format(MyNumber, "a #") = a 3
Example 2: Range(A1)=Format(MyNumber, "# Workday") = 4 Workday
Issue: Range(A1)=Format(MyNumber, "Workday #") = 3ork2a2
Thanks!
w, d and y are special characters within number formatting. You can escape them with the \ to display them as a literal character.
Range("A1").Value = Format(myNumber, "\Work\da\y #")
More detail from the Format documentation:
To display a character that has special meaning as a literal character, precede it with a backslash (\)... Examples of characters that can't be displayed as literal characters are the date-formatting and time-formatting characters (a, c, d, h, m, n, p, q, s, t, w, y, /, and :)...
Note, Format returns a String - so you could just do the following:
Range("A1").Value = "Workday " & myNumber
I'm trying to clean up a column of data containing postal codes before processing the values. The data contains all kinds of crazy formatting or input like the following and is a CHAR datatype:
12345
12.345
1234-5678
12345 6789
123456789
12345-6789
.
[blank]
I would like to remove all of the special characters and have tried the following code, but my script fails after many iterations of the logic. When I say it fails, let's say sOriginalZip = '.', but it gets past my empty string check and nil check as if it is not empty even after I have replaced all special characters, control characters and space characters. So my output looks like this:
" 2 sZip5 = "
code:
nNull = nil
sZip5 = string.gsub(sOriginalZip,"%p","")
sZip5 = string.gsub(sZip5,"%c","")
sZip5 = string.gsub(sZip5,"%s","")
print("sZip5 = " .. sZip5)
if sZip5 ~= sBlank or tonumber(sZip5) ~= nNull then
print(" 2 sZip5 = " .. sZip5)
else
print("3 sZip5 = " .. sZip5)
end
I think there are different ways to go, following should work:
sZip5 = string.gsub(sOriginalZip, '.', function(d) return tonumber(d) and d or '' end)
It returns a number string, blank value or nil
Thanks! I ended up using a combination of csarr and Egor's suggestions to get this:
sZip5 = string.gsub(sOriginalZip,"%W",function(d)return tonumber(d) and d or "" end)
Looks like it is evaluating correctly. Thanks again!
I'm trying to write a conditional statement where I can skip a specific space then start reading all the characters after it.
I was thinking to use substring but that wouldn't help because substring will only work if I know the exact number of characters I want to skip but in this case, I want to skip a specific space to read characters afterward.
For example:
String text = "ABC DEF W YZ" //number of characters before the spaces are unknown
String test = "A"
if ( test == "A") {
return text (/*escape the first two space and return anything after that*/)
}
You can split your string on " " with tokenize, remove the first N elements from the returned array (where N is the number of spaces you want to ignore) and join what's left with " ".
Supposing your N is 2:
String text = "ABC DEF W YZ" //number of characters before the spaces are unknown
String test = "A"
if ( test == "A") {
return text.tokenize(" ").drop(2).join(" ")
}
If I have a string, for example "Tiger," what could I write that would return T + i + g + e + r? It would be nice if I could put each letter inside of an array.
I need this because I'm writing a program that analyzes an inputted string and determines how many times repeated letters occur.
Try String.split() method with empty delimeter:
var str:String = "Tiger";
var letters:Array = str.split('');
//result-> ["T","i","g","e","r"]