Map repeated values in presto - presto

I'm extracting data from JSON and mapping two arrays in presto.It works fine when there are no repeated values in the array but fails with error - Duplicate map keys are not allowed if any of the values are repeated.I need those values and cannot remove any of the values from the array.Is there a work around for this scenario?
Sample values:
array1 -- [Rewards,NEW,Rewards,NEW]
array2 -- [losg1,losg2,losg3,losg4]
Map key/value has to be generated like this [Rewards=>losg1,NEW=>losg2,Rewards=>losg3,NEW=>losg4]

Pairs of associations can be returned like this:
SELECT ARRAY[ROW('Rewards', 'losg1'), ROW('NEW', 'losg2'), ROW('Rewards', 'losg3')]

Related

Sorting based on certain value in a string.

I have a file with contents like this :
666500872101_002.log
738500861101_003.log
738500861101_002.log
666500872101_001.log
741500881101_001.log
738500861101_001.log
741500881101_002.log
666500872101_003.log
741500881101_003.log
666500872101_004.log
I need to Sort the rows based on the values in fields 5 to 8, i.e. 741500881101_003.log at first and then based on the part number of log i.e.
741500881101_003.log to get something like this :
738500861101_001.log
738500861101_002.log
738500861101_003.log
666500872101_001.log
666500872101_002.log
666500872101_003.log
666500872101_004.log
741500881101_001.log
741500881101_002.log
741500881101_003.log
Can't get any good results using sort please help.
You can use the sort command wit the following options:
sort -n -k1.5,1.8 -n -k1.14,1.16 fileToSort.log
Options:
-n for numerical sorting
-k1.5,1.8 and -k1.14,1.16 to define your sorting keys
Example:
$ sort -n -k1.5,1.8 -n -k1.14,1.16 fileToSort
738500861101_001.log
738500861101_002.log
738500861101_003.log
666500872101_001.log
666500872101_002.log
666500872101_003.log
666500872101_004.log
741500881101_001.log
741500881101_002.log
741500881101_003.log
I solved this problem as part of learning SPARK. I am not UNIX shell programmer. Hence thought of solving the problem using spark
val logList = Array("666500872101_002.log","738500861101_003.log","738500861101_002.log","666500872101_001.log","741500881101_001.log","738500861101_001.log","741500881101_002.log","666500872101_003.log","741500881101_003.log","666500872101_004.log")
val logListRDD = sc.parallelize(logList)
logListRDD.map(x=>((x.substring(4,8), x.slice(x.indexOfSlice("_") +1, x.indexOfSlice("."))),x)).sortByKey().values.collect.take(20)
Output:
Array[String] = Array(738500861101_001.log, 738500861101_002.log, 738500861101_003.log, 666500872101_001.log, 666500872101_002.log, 666500872101_003.log, 666500872101_004.log, 741500881101_001.log, 741500881101_002.log, 741500881101_003.log)
Explaining what I did
sc.parallelize(logList) - is the step to create an RDD which is the core component of spark.
map(x=>((x.substring(4,8), x.slice(x.indexOfSlice("_") +1, x.indexOfSlice("."))),x)) - This extracts the contents from Array and generates a key value pair. In our case, value is the ***.log value and key is an Array of Substrings based on which we wanted to sort (0086, 001). KeyValue pair will look like [(0086, 001),738500861101_001.log]
sortByKey() - Sorts the data based on the Key generated above
values - gets the value corresponding to the key
collect.take(20) -> Displays the o/p on screen

Is a list in dictionary values?

a = {0:[[1,2,3], [1,3,4,5]]}
print([1,2,3] in a.values())
I get False. Because this list is in values I need True. Is it possible to check all lists in nested list as a value in dictionary? Maybe without loops?
Since you're using python3 you can do this:
[1,2,3] in list(a.values())[0]
a.values() returns a dictionary view. dictionary views
Then you can wrap the dictionary view into list but this list will contain only one element which can be accessed by index 0. However, if your dictionary contains several keys and corresponding values then list(a.values()) will contain the same number of elements (values mapped to keys) as keys in the dictionary.
Note that when you use some_value in some_collection construct and don't use loops explicitly it will still iterate through the collection.

Is there a simple way to remove sublits

I have a list(rs_data) with sublists obtained from a Dataframe, and some rows from Dataframe contain multiple elements, like those:
print(rs_data)
rs1791690, rs1815739, rs2275998
rs6552828
rs1789891
rs1800849, rs2016520, rs2010963, rs4253778
rs1042713, rs1042714, rs4994, rs1801253
I want to obtain a list in which each element (rs….) is separated, something like this:
{'rs1791690', 'rs1815739', 'rs227599', 'rs401681', 'rs2180062', 'rs9018'….}
How can I eliminate sublits or generate a new list without sublists, in which each element is unique.
To generate a new list you could iterate over the old one and throw out the elements you don't like.
Something like this
for i in rs_data:
if i in bad_values:
# do something
else:
# do something else
If you just want to eliminate duplicates it would be the best to use a set
Like this
mynewset = set(rs_data)

How to convert a string array into floats in PHP

I'm selecting some Strings from my SQLite DB and store them in an Array. Now I want to plot the values with the framework "Razorflow". I think it is only possible if the values of the array are floats, am I wrong?
In my DB I'm storing temperature and humidity of 12 different sensor nodes, in this form:
id|temperature|humidity|...|...
1 | 22.50C| 47.50%|...|...
..
I heared something about the floatval()-function, I also heared that this function is not made for objects like Arrays. Is there a simple solution? I'm very new to PHP :-D
Depends on how you're getting the values back from the database.
id|temperature|humidity|...|...
Sounds like a string to me, so i would first explode it into an array and then iterate the array casting floatval into every element.
What you do after that process depends on you.
EDIT:
If your query is:
$humi = $database->query((SELECT humidity FROM Measurement WHERE topic_hum='WSN9/humi'
You will get back only 1 column (humidity) and 0 or more rows depending on your database. Let's say it's only 1 for now:
$resul = mysql_query($humi,$link);
$rows = mysql_fetch_array($resul);
$myarray = explode("|", $rows["humidity"]);
This should give us an array called myarray containing X elements each with a "single" string part value. So we can iterate over it and parse it. There is also the shorthand "array_map" to iterate over an array using a callback function over each element and returning the value as an array:
$myparsedarray = array_map('floatval', $myarray)
Now you have an array with only float values (and errors maybe, check your data!)

How to combine a list of objects to a map of list with a custom key based on a field value in Groovy?

Hi I'm new to Groovy and have been trying this out but could not come up with a correct solution.
Basically I have a list of objects that i would need to correlate part of a specific field and put it in a map with a transformed key. Given example below, I need correlate the values of the third field by the first four characters (e.g. key3,key4) and put them in a map. So all key3 objects and key4 objects in a separate map and combine them in 1 map with key3 and key4 as the keys and their original values in a list.
Foo[] foo = [
["field1a", "field2a", "key3a"],
["field1b", "field2b", "key3b"],
["field1c", "field2c", "key4c"]
]
into
result = [
"key3":[
["field1a", "field2a", "key3a"],
["field1b", "field2b", "key3b"]
],
"key4":[
["field1c", "field2c", "key4c"]
]
]
So far i've been able to get the unique keys by using a combination of collect(), substring() and unique(), but i am unable to build the map properly. I've used collectEntries() but it only creates a map of the object and not a map of lists.
If anyone can point me in the right direction it would really be a big help. Thanks!
Using groupBy.
assert foo.groupBy { it[-1][0..-2] } == [
key3:[
['field1a', 'field2a', 'key3a'],
['field1b', 'field2b', 'key3b']
],
key4:[
['field1c', 'field2c', 'key4c']
]
]
Explanation:
Group by the third/last element in the list it[-1] but only consider the substring key3, hence it[-1][0..-2]

Resources