This code works, but the duplicate find seems less than optimal. Is it possible to implement the same functionality without the duplication?
def pattern = ~'some_regex'
def inFile = new File('in')
inFile.eachLine { String line ->
if (line.find(pattern)) {
line.find(pattern) { match ->
... // do something
}
}
else {
... // do something (else)
}
}
I'd suggest to use eachMatch()
inFile.eachLine { String line ->
String matched
line.eachMatch( pattern ){
matched = it[ 0 ]
doSomethingWithMatch matched
}
if( !matched ) doNoMatch()
}
Related
I know how to check a string is in another string
like this code.
when (myString) {
in "FirstString" -> {
// some stuff
}
in "SecondString" -> {
// some stuff
}
else -> {
// some stuff
}
}
in keyword under the hood calls this method CharSequence.contains(other: CharSequence, ignoreCase: Boolean = false)
the question is this :
is there any way that in this case i can set ignoreCase = true ?
You can declare an ad-hoc local operator function contains for strings before when:
fun main() {
operator fun String.contains(other: String): Boolean = this.contains(other, ignoreCase = true)
when(myString) {
in "firstString" -> ...
}
}
This way that function will be invoked for in operator instead of the one declared in the standard library because it's located in the closer scope.
Note, however, that this trick works only if the original contains function is an extension. If it's a member function, it cannot be overridden with an extension.
You can use toLowerCase() function here :
when (myString.toLowerCase()) {
in "firststring" -> {
// some stuff
}
in "secondstring" -> {
// some stuff
}
else -> {
// some stuff
}
}
For the cases of when, if they're variables toLowerCase() needs to be called on each of them. But, if they're constants, simple using lower case strings will work - "firststring", "secondstring"
I have a method that detects urls in a string and returns me both the urls and the ranges where they can be found. Everything works perfectly until there are emojis on the string. For example:
"I'm gonna do this callenge as soon as I can swing again πππ\n http://youtu.be/SW_d3fGz1hk"
Because of the emojis, the url extracted from the text is http://youtu.be/SW_d3fGz1 instead of http://youtu.be/SW_d3fGz1hk. I figured that the easiest solution was to just replace the emojis on the string with whitespace characters (cause I need the range to be correct for some text styling stuff). Problem is, this is extremely hard to accomplish with Swift (most likely my abilities with the Swift String API is lacking).
I've been trying to do it like this but it seems that I cannot create a string from an array of unicode points:
var emojilessStringWithSubstitution: String {
let emojiRanges = [0x1F601...0x1F64F, 0x2702...0x27B0]
let emojiSet = Set(emojiRanges.flatten())
let codePoints: [UnicodeScalar] = self.unicodeScalars.map {
if emojiSet.contains(Int($0.value)) {
return UnicodeScalar(32)
}
return $0
}
return String(codePoints)
}
Am I approaching this problem the wrong way? Is replacing emojis the best solution here? If so, how can I do it?
Swift 5
Don't use this hardcoded way to detect emojis. In Swift 5 you can do it easily
let inputText = "Some πstring πππ with πΉπΉ πΉ emoji π"
let textWithoutEmoij = inputText.unicodeScalars
.filter { !$0.properties.isEmojiPresentation }
.reduce("") { $0 + String($1) }
print(textWithoutEmoij) // Some string with emoji
You can use pattern matching (for emoji patterns) to filter out emoji characters from your String.
extension String {
var emojilessStringWithSubstitution: String {
let emojiPatterns = [UnicodeScalar(0x1F601)...UnicodeScalar(0x1F64F),
UnicodeScalar(0x2702)...UnicodeScalar(0x27B0)]
return self.unicodeScalars
.filter { ucScalar in !(emojiPatterns.contains{ $0 ~= ucScalar }) }
.reduce("") { $0 + String($1) }
}
}
/* example usage */
let str = "I'm gonna do this callenge as soon as I can swing again πππ\n http://youtu.be/SW_d3fGz1hk"
print(str.emojilessStringWithSubstitution)
/* I'm gonna do this callenge as soon as I can swing again
http://youtu.be/SW_d3fGz1hk */
Note that the above only makes use of the emoji intervals as presented in your question, and is in no way representative for all emojis, but the method is general and can swiftly be extended by including additional emoji intervals to the emojiPatterns array.
I realize reading your question again that you'd prefer substituting emojis with whitespace characters, rather than removing them (which the above filtering solution does). We can achieve this by replacing the .filter operation above with a conditional return .map operation instead, much like in your question
extension String {
var emojilessStringWithSubstitution: String {
let emojiPatterns = [UnicodeScalar(0x1F600)...UnicodeScalar(0x1F64F),
UnicodeScalar(0x1F300)...UnicodeScalar(0x1F5FF),
UnicodeScalar(0x1F680)...UnicodeScalar(0x1F6FF),
UnicodeScalar(0x2600)...UnicodeScalar(0x26FF),
UnicodeScalar(0x2700)...UnicodeScalar(0x27BF),
UnicodeScalar(0xFE00)...UnicodeScalar(0xFE0F)]
return self.unicodeScalars
.map { ucScalar in
emojiPatterns.contains{ $0 ~= ucScalar } ? UnicodeScalar(32) : ucScalar }
.reduce("") { $0 + String($1) }
}
}
I the above, the existing emoji intervals has been extended, as per your comment to this post (listing these intervals), such that the emoji check is now possibly exhaustive.
Swift 4:
extension String {
func stringByRemovingEmoji() -> String {
return String(self.filter { !$0.isEmoji() })
}
}
extension Character {
fileprivate func isEmoji() -> Bool {
return Character(UnicodeScalar(UInt32(0x1d000))!) <= self && self <= Character(UnicodeScalar(UInt32(0x1f77f))!)
|| Character(UnicodeScalar(UInt32(0x2100))!) <= self && self <= Character(UnicodeScalar(UInt32(0x26ff))!)
}
}
Emojis are classified as symbols by Unicode. Character sets are typically used in searching operations. So we will use Character sets a property that is symbols.
var emojiString = "Hey there π, welcome"
emojiString = emojiString.components(separatedBy: CharacterSet.symbols).joined()
print(emojiString)
Output is
Hey there , welcome
Now observe the emoji is replaced by a white space so there is two white space and we replace it by the following way
emojiString.replacingOccurrences(of: " ", with: " ")
The above method replace parameter of: "two white space" to with: "single white space"
Getting all emoji is more complicated than you would think. For more info on how to figure out which characters are emoji, check out this stackoverflow post or this article.
Building on that information, I would propose to use the extension on Character to more easily let us understand which characters are emoji. Then add a String extension to easily replace found emoji with another character.
extension Character {
var isSimpleEmoji: Bool {
guard let firstProperties = unicodeScalars.first?.properties else {
return false
}
return unicodeScalars.count == 1 &&
(firstProperties.isEmojiPresentation ||
firstProperties.generalCategory == .otherSymbol)
}
var isCombinedIntoEmoji: Bool {
return unicodeScalars.count > 1 &&
unicodeScalars.contains {
$0.properties.isJoinControl ||
$0.properties.isVariationSelector
}
}
var isEmoji: Bool {
return isSimpleEmoji || isCombinedIntoEmoji
}
}
extension String {
func replaceEmoji(with character: Character) -> String {
return String(map { $0.isEmoji ? character : $0 })
}
}
Using it would simply become:
"Some string πππ with emoji".replaceEmoji(with: " ")
I found that the solutions given above did not work for certain characters such as ποΈπ»ββοΈ and π§°.
To find the emoji ranges, using regex I converted the full list of emoji characters to a file with just hex values. Then I converted them to decimal format and sorted them. Finally, I wrote a script to find the ranges.
Here is the final Swift extension for isEmoji().
extension Character {
func isEmoji() -> Bool {
let emojiRanges = [
(8205, 11093),
(12336, 12953),
(65039, 65039),
(126980, 129685)
]
let codePoint = self.unicodeScalars[self.unicodeScalars.startIndex].value
for emojiRange in emojiRanges {
if codePoint >= emojiRange.0 && codePoint <= emojiRange.1 {
return true
}
}
return false
}
}
For reference, here are the python scripts I wrote to parse the hex strings to integers and then find the ranges.
convert-hex-to-decimal.py
decimals = []
with open('hex.txt') as hexfile:
for line in hexfile:
num = int(line, 16)
if num < 256:
continue
decimals.append(num)
decimals = list(set(decimals))
decimals.sort()
with open('decimal.txt', 'w') as decimalfile:
for decimal in decimals:
decimalfile.write(str(decimal) + "\n")
make-ranges.py
first_line = True
range_start = 0
prev = 0
with open('decimal.txt') as hexfile:
for line in hexfile:
if first_line:
prev = int(line)
range_start = prev
first_line = False
continue
curr = int(line)
if prev + 1000 < curr: # 100 is abitrary to reduce number of ranges
print("(" + str(range_start) + ", " + str(prev) + ")")
range_start = curr
prev = curr
Don't hard-code the range of emojis, use this instead.
func ε»ι€θ‘¨ζ
符ε·(ε符串:String) -> String {
let 转ζ’δΈΊUnicode = ε符串.unicodeScalars//https://developer.apple.com/documentation/swift/string
let ε»ι€θ‘¨ζ
εηη»ζ = 转ζ’δΈΊUnicode.filter { (item) -> Bool in
let ε€ζζ―ε¦θ‘¨ζ
= item.properties.isEmoji
return !ε€ζζ―ε¦θ‘¨ζ
//ζ―葨ζ
ε°±δΈδΏη
}
return String(ε»ι€θ‘¨ζ
εηη»ζ)
}
I have a class which is paring csv based file, but I would like to put a parameter for the token symbol.
Please let me know how can I change the function and use the function on program.
class CSVParser{
static def parseCSV(file,closure) {
def lineCount = 0
file.eachLine() { line ->
def field = line.tokenize(';')
lineCount++
closure(lineCount,field)
}
}
}
use(CSVParser.class) {
File file = new File("test.csv")
file.parseCSV { index,field ->
println "row: ${index} | ${field[0]} ${field[1]} ${field[2]}"
}
}
You'll have to add the parameter in between the file and closure parameters.
When you create a category class with static methods, the first parameter is the object the method is being called on so file must be first.
Having a closure as the last parameter allows the syntax where the open brace of the closure follows the function invocation without parentheses.
Here's how it would look:
class CSVParser{
static def parseCSV(file,separator,closure) {
def lineCount = 0
file.eachLine() { line ->
def field = line.tokenize(separator)
lineCount++
closure(lineCount,field)
}
}
}
use(CSVParser) {
File file = new File("test.csv")
file.parseCSV(',') { index,field ->
println "row: ${index} | ${field[0]} ${field[1]} ${field[2]}"
}
}
Just add the separator as the second parameter to the parseCSV method:
class CSVParser{
static def parseCSV(file, sep, closure) {
def lineCount = 0
file.eachLine() { line ->
def field = line.tokenize(sep)
closure(++lineCount, field)
}
}
}
use(CSVParser.class) {
File file = new File("test.csv")
file.parseCSV(";") { index,field ->
println "row: ${index} | ${field[0]} ${field[1]} ${field[2]}"
}
}
Imagine I have this structure:
class Foo {
String bar
}
Now imagine I have several instance of Foo whose bar value is baz_1, baz_2, and zab_3.
I want to write a collect statement that only collects the bar values which contain the text baz. I cannot get it to work, but it would look something like this:
def barsOfAllFoos = Foo.getAll().bar
assert barsOfAllFoos == [ 'baz_1', 'baz_2', 'zab_3' ]
def barsWithBaz = barsOfAllFoos.collect{ if( it.contains( "baz" ) { it } ) } // What is the correct syntax for this?
assert barsWithBaz == [ 'baz_1', 'baz_2' ]
You need findAll:
barsOfAllFoos.findAll { it.contains 'baz' }
If you want to both filter and transform there's lots of ways to do this. After 1.8.1 I'd go with #findResults and a closure that returns null for the elements I want to skip.
def frob(final it) { "frobbed $it" }
final barsWithBaz = barsOfAllFoos.findResults {
it.contains('baz')? frob(it) : null
}
In earlier versions you can use #findAll and #collect
final barsWithBaz = barsOfAllFoos
. findAll { it.contains('baz') }
. collect { frob(it) }
Or #sum
final barsWithBaz = barsOfAllFoos.sum([]) {
it.contains('baz')? [frob(it)] : []
}
Or #inject
final barsWithBaz = barsOfAllFoos.inject([]) {
l, it -> it.contains('baz')? l << frob(it) : l
}
Using findResults did not work for me... If you want to collect a transformed version of the values matching the condition (for instance a regex search of many lines) you can use collect followed by find or findAll as follows.
def html = """
<p>this is some example data</p>
<script type='text/javascript'>
form.action = 'http://www.example.com/'
// ...
</script>
"""
println("Getting url from html...")
// Extract the url needed to upload the form
def url = html.split("\n").collect{line->
def m = line =~/.*form\.action = '(.+)'.*/
if (m.matches()) {
println "Match found!"
return m[0][1]
}
}.find()
println "url = '${url}'"
This returns the part of the line matching the given pattern.
Getting url from html...
Match found!
url = 'http://www.example.com/'
I'm trying to work with these methods with no success and i`ll be happy if someone can help me.
I'm using groovy and i have 2 maps of strings.
I want to match between the strings of the 2 maps with threads (using by gpars)
For example :
def firstMap = ["a":"A", "b":"B"]
def secondMap = ["c":"C", "a":A"]
The normal way to equale between the maps is to
fistMap.findAll().each { first ->
secondMap.findAll.each { second ->
if (first.key.equals(second.key) && (first.value.equlas(second.value))
//saveItIntoArray
}
}
I want to do it with gpars thread so i tried :
withPool(2) {
runForkJoin(firstMap) { task ->
task.each {
secondMap.each {
//equals
}
forChild(?)
}
}
}
I kind of new with this and i really don't know how to make it work.
I will appreciate any help.
Thanks,
Or.
What I'd suggest is using parallel collections:
def firstMap = ["a":"A", "b":"B"]
def secondMap = ["c":"C", "a":"A"].asImmutable()
withPool{
println firstMap.findAllParallel { fk, fv -> secondMap.findResult { sk, sv -> fk == sk && fv == sv ? [(fk):fv] : null } }
}