Hello my question is how do i keep the format for a string that has had the .split run on it. What i want
$test="a.b.c.d.e"
$test2="abc"
#split test
#append split to test2
#desired output
abc
a
b
c
d
e
I know if i perform split on a string such as
$test="a.b.c.d.e"
$splittest=$test.split(".")
$splittest
#output
a
b
c
d
e
However when i try to make it so that i want to append the above split to a string
$test2="abc"
$test2+$splittest
#output
abca b c d e
while
$splittest+$abc
#output
a
b
c
d
e
abc
Is there a way to append the split string to another string while keeping this split format or will i have to foreach loop through the split string and append it to the $test2 string one by one.
foreach ($line in $splittest)
{
$test2="$($test2)`n$(splittest)"
}
I would prefer not to use the foreach method as it seems to slow down a script i am working on which requires text to be split and appended over 500k times on the small end.
What you're seeing is the effect of how PowerShell's operator overload resolution.
When PowerShell sees +, it needs to decide whether + means sum (1 + 1 = 2), concatenate (1 + 1 = "11"), or add (1 + 1 = [1,1]) in the given context.
It does so by looking at the type of the left hand side argument, and attempts to convert the right hand side argument to a type that the chosen operator overload expects.
When you use + in the order you need, the string value is to the left, and so it results in a string concatenation operation.
There are multiple ways of prepending the string to the existing array:
# Convert scalar to array before +
$newarray = #($abc) + $splittest
# Flatten items inside an array subexpression
$newarray = #($abc;$splittest)
Now all you have to do is join the strings by a newline:
$newarray -join [System.Environment]::NewLine
Or you can change the output field separator ($OFS) to a newline and have it joined implicitly:
$OFS = [System.Environment]::NewLine
"$newarray"
Finally, you could pipe the array to Out-String, but that will add a trailing newline to the entire string:
#($abc;$splittest) |Out-String
Related
'''def tokenize(s):
string = s.lower().split()
getVals = list([val for val in s if val.isalnum()])
result = "".join(getVals)
print (result)'''
tokenize('AKKK#eastern B!##est!')
Im trying for the output of ('akkkeastern', 'best')
but my output for the above code is - AKKKeasternBest
what are the changes I should be making
Using a list comprehension is a good way to filter elements out of a sequence like a string. In the example below, the list comprehension is used to build a list of characters (characters are also strings in Python) that are either alphanumeric or a space - we are keeping the space around to use later to split the list. After the filtered list is created, what's left to do is make a string out of it using join and last but not least use split to break it in two at the space.
Example:
string = 'AKKK#eastern B!##est!'
# Removes non-alpha chars, but preserves space
filtered = [
char.lower()
for char in string
if char.isalnum() or char == " "
]
# String-ifies filtered list, and splits on space
result = "".join(filtered).split()
print(result)
Output:
['akkkeastern', 'best']
Writing a Python program (ver. 3) to count strings in a specified field within each line of one or more csv files.
Where the csv file contains:
Field1, Field2, Field3, Field4
A, B, C, D
A, E, F, G
Z, E, C, D
Z, W, C, Q
the script is executed, for example:
$ ./script.py 1,2,3,4 file.csv
And the result is:
A 10
C 7
D 2
E 2
Z 2
B 1
Q 1
F 1
G 1
W 1
ERROR
the script is executed, for example:
$ ./script.py 1,2,3,4 file.csv file.csv file.csv
Where the error occurs:
for rowitem in reader:
for pos in field:
pos = rowitem[pos] ##<---LINE generating error--->##
if pos not in fieldcnt:
fieldcnt[pos] = 1
else:
fieldcnt[pos] += 1
TypeError: list indices must be integers or slices, not str
Thank you!
Judging from the output, I'd say that the fields in the csv file does not influence the count of the string. If the string uniqueness is case-insensitive please remember to use yourstring.lower() to return the string so that different case matches are actually counted as one. Also do keep in mind that if your text is large the number of unique strings you might find could be very large as well, so some sort of sorting must be in place to make sense of it! (Or else it might be a long list of random counts with a large portion of it being just 1s)
Now, to get a count of unique strings using the collections module is an easy way to go.
file = open('yourfile.txt', encoding="utf8")
a= file.read()
#if you have some words you'd like to exclude
stopwords = set(line.strip() for line in open('stopwords.txt'))
stopwords = stopwords.union(set(['<media','omitted>','it\'s','two','said']))
# make an empty key-value dict to contain matched words and their counts
wordcount = {}
for word in a.lower().split(): #use the delimiter you want (a comma I think?)
# replace punctuation so they arent counted as part of a word
word = word.replace(".","")
word = word.replace(",","")
word = word.replace("\"","")
word = word.replace("!","")
if word not in stopwords:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
That should do it. The wordcount dict should contain the word and it's frequency. After that just sort it using collections and print it out.
word_counter = collections.Counter(wordcount)
for word, count in word_counter.most_common(20):
print(word, ": ", count)
I hope this solves your problem. Lemme know if you face problems.
I want to tokenize string up to 3rd occurence of some delimiter and then return the rest of the string as last element of the tokenize array.
Example:
I have a String which looks like this:
String someString= 1.22.33.4
Now im tokenizing it by delimiter '.' like this:
def (a, b, c, d) = someString.tokenize('.')
And it works, but only if number of dots are exactly 3.
Now if someone puts more number of dots like:
String someString = 1.22.33.4.55
Then it wouldn't work, because the number of variables won't match. So i want to make sure it only tokenizes up to 3rd dot, and then gives back whatever is left. So what i want to achieve in this case would be:
a = 1, b=22, c=33, d=4.55
How to do that?
You can use the version of split with the second argument to restrict
the returned items. E.g.
def (a,b,c,d) = '1.22.33.4.55'.split("\\.", 4)
assert ["1","22","33","4.55"] == [a,b,c,d]
Not a one liner but it works:
String someString= '1.22.33.4.55'
def stringArray = someString.tokenize('.')
def (a,b,c) = stringArray
def d = stringArray.drop(3).join('.')
println "a=$a, b=$b, c=$c, d=$d"
result:
a=1, b=22, c=33, d=4.55
I have a list containing string patterns for digits 0-3. I am trying to print them onto the same line, so that print(digits1+col+digits[2]+col+digits[3]) prints '1 2 3' from the # pattern strings from the respective list index, but can only get the number patterns printed on their own.
# Create strings for each number 0-3 and store in digits list.
zero = '#'*3+'\n'+'#'+' '+'#'+'\n'+'#'+' '+'#'+'\n'+'#'+' '+'#'+'\n'+'#'*3
one = '#\n'.rjust(4)*6
two = '#'*3+'\n'+'#'.rjust(3)+'\n'+'#'*3+'\n'+'#'.ljust(3)+'\n'+'#'*3
three = '#'*3+'\n'+'#'.rjust(3)+'\n'+'#'*3+'\n'+'#'.rjust(3)+'\n'+'#'*3
digits = [zero, one, two, three]
col = '\n'.ljust(1)*6 # A divider column between each printed digit.
print(digits[1]+col+digits[2]+col+digits[3],end='')
The result of the above code.
One way to solve this is by reversing the digits matrix, right now each index in digits list has the complete digit values but if we keep horizontal values at each index it will print properly.
think it would be better represented in code...https://repl.it/#pavanskipo/DirectTriangularSlash
# Digits replaced horizntally
digits_rev = [digits[0].split("\n"),
digits[1].split("\n"),
digits[2].split("\n"),
digits[3].split("\n")]
for i in range(0, len(digits)+1):
print(digits_rev[0][i] + '\t' +
digits_rev[1][i] + '\t' +
digits_rev[2][i] + '\t' +
digits_rev[3][i])
click on the link and hit run, let me know if it works
I am trying to read a text file containing digits and strings using Octave. The file format is something like this:
A B C
a 10 100
b 20 200
c 30 300
d 40 400
e 50 500
but the delimiter can be space, tab, comma or semicolon. The textread function works fine if the delimiter is space/tab:
[A,B,C] = textread ('test.dat','%s %d %d','headerlines',1)
However it does not work if delimiter is comma/semicolon. I tried to use dklmread:
dlmread ('test.dat',';',1,0)
but it does not work because the first column is a string.
Basically, with textread I can't specify the delimiter and with dlmread I can't specify the format of the first column. Not with the versions of these functions in Octave, at least. Has anybody ever had this problem before?
textread allows you to specify the delimiter-- it honors the property arguments of strread. The following code worked for me:
[A,B,C] = textread( 'test.dat', '%s %d %d' ,'delimiter' , ',' ,1 )
I couldn't find an easy way to do this in Octave currently. You could use fopen() to loop through the file and manually extract the data. I wrote a function that would do this on arbitrary data:
function varargout = coltextread(fname, delim)
% Initialize the variable output argument
varargout = cell(nargout, 1);
% Initialize elements of the cell array to nested cell arrays
% This syntax is due to {:} producing a comma-separated
[varargout{:}] = deal(cell());
fid = fopen(fname, 'r');
while true
% Get the current line
ln = fgetl(fid);
% Stop if EOF
if ln == -1
break;
endif
% Split the line string into components and parse numbers
elems = strsplit(ln, delim);
nums = str2double(elems);
nans = isnan(nums);
% Special case of all strings (header line)
if all(nans)
continue;
endif
% Find the indices of the NaNs
% (i.e. the indices of the strings in the original data)
idxnans = find(nans);
% Assign each corresponding element in the current line
% into the corresponding cell array of varargout
for i = 1:nargout
% Detect if the current index is a string or a num
if any(ismember(idxnans, i))
varargout{i}{end+1} = elems{i};
else
varargout{i}{end+1} = nums(i);
endif
endfor
endwhile
endfunction
It accepts two arguments: the file name, and the delimiter. The function is governed by the number of return variables that are specified, so, for example, [A B C] = coltextread('data.txt', ';'); will try to parse three different data elements from each row in the file, while A = coltextread('data.txt', ';'); will only parse the first elements. If no return variable is given, then the function won't return anything.
The function ignores rows that have all-strings (e.g. the 'A B C' header). Just remove the if all(nans)... section if you want everything.
By default, the 'columns' are returned as cell arrays, although the numbers within those arrays are actually converted numbers, not strings. If you know that a cell array contains only numbers, then you can easily convert it to a column vector with: cell2mat(A)'.