add var with alphanumeric code in order of value - add
I have data from counties and for a peudonymized plot I want to add an alphanumeric code in the order of a sort variable. It is not so important what the code will look like, but I want to have a letter at the beginning so that it will not be confused with the numeric information in the chart.
In the original data, I have more than 26 observations. Therefore the code needs to have two digits.
# example data
county <- c("all", "Berkshire", "Blackpool", "Bournemouth", "Bristol",
"Cambridgeshire", "Cheshire", "Devon", "Dorset", "Essex",
"Gloucestershire", "Hampshire", "Kent", "Lincolnshire",
"Norfolk", "Oxfordshire", "Suffolk", "Wiltshire", "Worcestershire",
"Yorkshire")
sort <- c(-2, 16.5, 400, 331, 375.2, 13.1, 400, 376.4,
128.3, 400, 48.6, 6.7, 113.5, 43.7, 295.9,400,
261.5, 100, 183.3, 400)
df <- data.frame(county, sort)
This is how I would like the result to look like:
Related
Why does my PySpark regular expression not give more than the first row?
Taking inspiration from this answer: https://stackoverflow.com/a/61444594/4367851 I have been able to split my .txt file into columns in a Spark DataFrame. However, it only gives me the first game - even though the sample .txt file contains many more. My code: basefile = spark.sparkContext.wholeTextFiles("example copy 2.txt").toDF().\ selectExpr("""split(replace(regexp_replace(_2, '\\\\n', ','), ""),",") as new""").\ withColumn("Event", col("new")[0]).\ withColumn("White", col("new")[2]).\ withColumn("Black", col("new")[3]).\ withColumn("Result", col("new")[4]).\ withColumn("UTCDate", col("new")[5]).\ withColumn("UTCTime", col("new")[6]).\ withColumn("WhiteElo", col("new")[7]).\ withColumn("BlackElo", col("new")[8]).\ withColumn("WhiteRatingDiff", col("new")[9]).\ withColumn("BlackRatingDiff", col("new")[10]).\ withColumn("ECO", col("new")[11]).\ withColumn("Opening", col("new")[12]).\ withColumn("TimeControl", col("new")[13]).\ withColumn("Termination", col("new")[14]).\ drop("new") basefile.show() Output: +--------------------+---------------+-----------------+--------------+--------------------+--------------------+-----------------+-----------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+ | Event| White| Black| Result| UTCDate| UTCTime| WhiteElo| BlackElo| WhiteRatingDiff| BlackRatingDiff| ECO| Opening| TimeControl| Termination| +--------------------+---------------+-----------------+--------------+--------------------+--------------------+-----------------+-----------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+ |[Event "Rated Cla...|[White "BFG9k"]|[Black "mamalak"]|[Result "1-0"]|[UTCDate "2012.12...|[UTCTime "23:01:03"]|[WhiteElo "1639"]|[BlackElo "1403"]|[WhiteRatingDiff ...|[BlackRatingDiff ...|[ECO "C00"]|[Opening "French ...|[TimeControl "600...|[Termination "Nor...| +--------------------+---------------+-----------------+--------------+--------------------+--------------------+-----------------+-----------------+--------------------+--------------------+-----------+--------------------+--------------------+--------------------+ Input file: [Event "Rated Classical game"] [Site "https://lichess.org/j1dkb5dw"] [White "BFG9k"] [Black "mamalak"] [Result "1-0"] [UTCDate "2012.12.31"] [UTCTime "23:01:03"] [WhiteElo "1639"] [BlackElo "1403"] [WhiteRatingDiff "+5"] [BlackRatingDiff "-8"] [ECO "C00"] [Opening "French Defense: Normal Variation"] [TimeControl "600+8"] [Termination "Normal"] 1. e4 e6 2. d4 b6 3. a3 Bb7 4. Nc3 Nh6 5. Bxh6 gxh6 6. Be2 Qg5 7. Bg4 h5 8. Nf3 Qg6 9. Nh4 Qg5 10. Bxh5 Qxh4 11. Qf3 Kd8 12. Qxf7 Nc6 13. Qe8# 1-0 [Event "Rated Classical game"] . . . Each game starts with [Event so I feel like it should be doable as the file has repeating structure, alas I can't get it to work. Extra points: I don't actually need the move list so if it's easier they can be deleted. I only want the content of what is inside the " " for each new line once it has been converted to a Spark DataFrame. Many thanks.
wholeTextFiles reads each file into a single record. If you read only one file, the result will a RDD with only one row, containing the whole text file. The regexp logic in the question returns only one result per row and this will be the first entry in the file. Probably the best solution would be to split the file at the os level into one file per game (for example here) so that Spark can read the multiple games in parallel. But if a single file is not too big, splitting the games can also be done within PySpark: Read the file(s): basefile = spark.sparkContext.wholeTextFiles(<....>).toDF() Create a list of columns and convert this list into a list of column expressions using regexp_extract: from pyspark.sql import functions as F cols = ['Event', 'White', 'Black', 'Result', 'UTCDate', 'UTCTime', 'WhiteElo', 'BlackElo', 'WhiteRatingDiff', 'BlackRatingDiff', 'ECO', 'Opening', 'TimeControl', 'Termination'] cols = [F.regexp_extract('game', rf'{col} \"(.*)\"',1).alias(col) for col in cols] Extract the data: split the whole file into an array of games explode this array into single records delete the line breaks within each record so that the regular expression works use the column expressions defined above to extract the data basefile.selectExpr("split(_2,'\\\\[Event ') as game") \ .selectExpr("explode(game) as game") \ .withColumn("game", F.expr("concat('Event ', replace(game, '\\\\n', ''))")) \ .select(cols) \ .show(truncate=False) Output (for an input file containing three copies of the game): +---------------------+-----+-------+------+----------+--------+--------+--------+---------------+---------------+---+--------------------------------+-----------+-----------+ |Event |White|Black |Result|UTCDate |UTCTime |WhiteElo|BlackElo|WhiteRatingDiff|BlackRatingDiff|ECO|Opening |TimeControl|Termination| +---------------------+-----+-------+------+----------+--------+--------+--------+---------------+---------------+---+--------------------------------+-----------+-----------+ |Rated Classical game |BFG9k|mamalak|1-0 |2012.12.31|23:01:03|1639 |1403 |+5 |-8 |C00|French Defense: Normal Variation|600+8 |Normal | |Rated Classical game2|BFG9k|mamalak|1-0 |2012.12.31|23:01:03|1639 |1403 |+5 |-8 |C00|French Defense: Normal Variation|600+8 |Normal | |Rated Classical game3|BFG9k|mamalak|1-0 |2012.12.31|23:01:03|1639 |1403 |+5 |-8 |C00|French Defense: Normal Variation|600+8 |Normal | +---------------------+-----+-------+------+----------+--------+--------+--------+---------------+---------------+---+--------------------------------+-----------+-----------+
python3: Split time series by diurnal periods
I have the following dataset: 01/05/2020,00,26.3,27.5,26.3,80,81,73,22.5,22.7,22.0,993.7,993.7,993.0,0.0,178,1.2,-3.53,0.0 01/05/2020,01,26.1,26.8,26.1,79,80,75,22.2,22.4,21.9,994.4,994.4,993.7,1.1,22,2.0,-3.54,0.0 01/05/2020,02,25.4,26.1,25.4,80,81,79,21.6,22.3,21.6,994.7,994.7,994.4,0.1,335,2.3,-3.54,0.0 01/05/2020,03,23.3,25.4,23.3,90,90,80,21.6,21.8,21.5,994.7,994.8,994.6,0.9,263,1.5,-3.54,0.0 01/05/2020,04,22.9,24.2,22.9,89,90,86,21.0,22.1,21.0,994.2,994.7,994.2,0.3,268,2.0,-3.54,0.0 01/05/2020,05,22.8,23.1,22.8,90,91,89,21.0,21.4,20.9,993.6,994.2,993.6,0.7,264,1.5,-3.54,0.0 01/05/2020,06,22.2,22.8,22.2,92,92,90,20.9,21.2,20.8,993.6,993.6,993.4,0.8,272,1.6,-3.54,0.0 01/05/2020,07,22.6,22.6,22.0,91,93,91,21.0,21.2,20.7,993.4,993.6,993.4,0.4,284,2.3,-3.49,0.0 01/05/2020,08,21.6,22.6,21.5,92,92,90,20.2,20.9,20.1,993.8,993.8,993.4,0.4,197,2.1,-3.54,0.0 01/05/2020,09,22.0,22.1,21.5,92,93,92,20.7,20.8,20.2,994.3,994.3,993.7,0.0,125,2.1,-3.53,0.0 01/05/2020,10,22.7,22.7,21.9,91,92,91,21.2,21.2,20.5,995.0,995.0,994.3,0.0,354,0.0,70.99,0.0 01/05/2020,11,25.0,25.0,22.7,83,91,82,21.8,22.1,21.1,995.5,995.5,995.0,0.8,262,1.5,744.8,0.0 01/05/2020,12,27.9,28.1,24.9,72,83,70,22.3,22.8,21.6,996.1,996.1,995.5,0.7,228,1.9,1392.,0.0 01/05/2020,13,30.4,30.4,27.7,58,72,55,21.1,22.6,20.4,995.9,996.2,995.9,1.6,134,3.7,1910.,0.0 01/05/2020,14,31.7,32.3,30.1,50,58,48,20.2,21.3,19.7,995.8,996.1,995.8,3.0,114,5.4,2577.,0.0 01/05/2020,15,32.9,33.2,31.8,44,50,43,19.1,20.5,18.6,994.9,995.8,994.9,0.0,128,5.6,2853.,0.0 01/05/2020,16,33.2,34.4,32.0,46,48,41,20.0,20.0,18.2,994.0,994.9,994.0,0.0,125,4.3,2700.,0.0 01/05/2020,17,33.1,34.5,32.7,44,46,39,19.2,19.9,18.5,993.4,994.1,993.4,0.0,170,1.6,2806.,0.0 01/05/2020,18,33.6,34.2,32.6,41,47,40,18.5,20.0,18.3,992.6,993.4,992.6,0.0,149,0.0,2319.,0.0 01/05/2020,19,33.5,34.7,32.1,43,49,39,19.2,20.4,18.3,992.3,992.6,992.3,0.3,168,4.1,1907.,0.0 01/05/2020,20,32.1,33.9,32.1,49,51,41,20.2,20.7,18.5,992.4,992.4,992.3,0.1,192,3.7,1203.,0.0 01/05/2020,21,29.9,32.2,29.9,62,62,49,21.8,21.9,20.2,992.3,992.4,992.2,0.0,188,2.9,408.0,0.0 01/05/2020,22,28.5,29.9,28.4,67,67,62,21.8,22.0,21.7,992.5,992.5,992.3,0.4,181,2.3,6.817,0.0 01/05/2020,23,27.8,28.5,27.8,71,71,66,22.1,22.1,21.5,993.1,993.1,992.5,0.0,225,1.6,-3.39,0.0 02/05/2020,00,27.4,28.2,27.3,75,75,68,22.5,22.5,21.7,993.7,993.7,993.1,0.5,139,1.5,-3.54,0.0 02/05/2020,01,27.3,27.7,27.3,72,75,72,21.9,22.6,21.9,994.3,994.3,993.7,0.0,126,1.1,-3.54,0.0 02/05/2020,02,25.4,27.3,25.2,85,85,72,22.6,22.8,21.9,994.4,994.5,994.3,0.1,256,2.6,-3.54,0.0 02/05/2020,03,25.5,25.6,25.3,84,85,82,22.5,22.7,22.1,994.3,994.4,994.2,0.0,329,0.7,-3.54,0.0 02/05/2020,04,24.5,25.5,24.5,86,86,82,22.0,22.5,21.9,993.9,994.3,993.9,0.0,290,1.2,-3.54,0.0 02/05/2020,05,24.0,24.5,23.5,87,88,86,21.6,22.1,21.3,993.6,993.9,993.6,0.7,285,1.3,-3.54,0.0 02/05/2020,06,23.7,24.1,23.7,87,87,85,21.3,21.6,21.3,993.1,993.6,993.1,0.1,305,1.1,-3.51,0.0 02/05/2020,07,22.7,24.1,22.5,91,91,86,21.0,21.7,20.7,993.1,993.3,993.1,0.6,220,1.1,-3.54,0.0 02/05/2020,08,22.9,22.9,22.6,92,92,91,21.5,21.5,21.0,993.2,993.2,987.6,0.0,239,1.5,-3.53,0.0 02/05/2020,09,22.9,23.0,22.8,93,93,92,21.7,21.7,21.4,993.6,993.6,993.2,0.0,289,0.4,-3.53,0.0 02/05/2020,10,23.5,23.5,22.8,92,93,92,22.1,22.1,21.6,994.3,994.3,993.6,0.0,256,0.0,91.75,0.0 02/05/2020,11,26.1,26.2,23.5,80,92,80,22.4,23.1,22.2,995.0,995.0,994.3,1.1,141,1.9,789.0,0.0 02/05/2020,12,28.7,28.7,26.1,69,80,68,22.4,22.7,22.1,995.5,995.5,995.0,0.0,116,2.2,1468.,0.0 02/05/2020,13,31.4,31.4,28.6,56,69,56,21.6,22.9,21.0,995.5,995.7,995.4,0.0,65,0.0,1762.,0.0 02/05/2020,14,32.1,32.4,30.6,48,58,47,19.8,22.0,19.3,995.0,995.6,990.6,0.0,105,0.0,2657.,0.0 02/05/2020,15,34.0,34.2,31.7,43,48,42,19.6,20.1,18.6,993.9,995.0,993.9,3.0,71,6.0,2846.,0.0 02/05/2020,16,34.7,34.7,32.3,38,48,38,18.4,20.3,18.3,992.7,993.9,992.7,1.4,63,6.3,2959.,0.0 02/05/2020,17,34.0,34.7,32.7,42,46,38,19.2,20.0,18.4,991.7,992.7,991.7,2.2,103,4.8,2493.,0.0 02/05/2020,18,34.3,34.7,33.6,41,42,38,19.1,19.4,18.0,991.2,991.7,991.2,2.0,141,4.8,2593.,0.0 02/05/2020,19,33.5,34.5,32.5,42,47,39,18.7,20.0,18.4,990.7,991.4,989.9,1.8,132,4.2,1317.,0.0 02/05/2020,20,32.5,34.2,32.5,47,48,40,19.7,20.3,18.7,990.5,990.7,989.8,1.3,191,4.2,1250.,0.0 02/05/2020,21,30.5,32.5,30.5,59,59,47,21.5,21.6,20.0,979.8,990.5,979.5,0.1,157,2.9,345.5,0.0 02/05/2020,22,28.6,30.5,28.6,67,67,59,21.9,21.9,21.5,978.9,980.1,978.7,0.6,166,2.2,1.122,0.0 02/05/2020,23,27.2,28.7,27.2,74,74,66,22.1,22.2,21.6,978.9,979.3,978.6,0.0,246,1.7,-3.54,0.0 03/05/2020,00,26.5,27.2,26.0,77,80,74,22.2,22.5,22.0,979.0,979.1,978.7,0.0,179,1.4,-3.54,0.0 03/05/2020,01,26.0,26.6,26.0,80,80,77,22.4,22.5,22.1,979.1,992.4,978.7,0.0,276,0.6,-3.54,0.0 03/05/2020,02,26.0,26.5,26.0,79,81,75,22.1,22.5,21.7,978.8,979.1,978.5,0.0,290,0.6,-3.53,0.0 03/05/2020,03,25.3,26.0,25.3,83,83,79,22.2,22.4,21.8,978.6,989.4,978.5,0.5,303,1.0,-3.54,0.0 03/05/2020,04,25.3,25.6,24.6,81,85,81,21.9,22.5,21.7,978.1,992.7,977.9,0.7,288,1.5,-3.00,0.0 03/05/2020,05,23.7,25.3,23.7,88,88,81,21.5,21.9,21.5,977.6,991.8,977.3,1.2,256,1.8,-3.54,0.0 03/05/2020,06,23.3,23.7,23.3,91,91,88,21.7,21.7,21.5,976.9,977.6,976.7,0.4,245,1.8,-3.54,0.0 03/05/2020,07,23.0,23.6,23.0,91,91,89,21.4,21.9,21.3,976.7,977.0,976.4,0.9,257,1.9,-3.54,0.0 03/05/2020,08,23.4,23.4,22.9,90,92,90,21.7,21.7,21.3,976.8,976.9,976.5,0.4,294,1.6,-3.52,0.0 03/05/2020,09,23.0,23.5,23.0,88,90,87,21.0,21.6,20.9,992.1,992.1,976.7,0.8,263,1.6,-3.54,0.0 03/05/2020,10,23.2,23.2,22.5,91,92,88,21.6,21.6,20.8,993.0,993.0,992.2,0.1,226,1.5,29.03,0.0 03/05/2020,11,26.0,26.1,23.2,77,91,76,21.6,22.1,21.5,993.8,993.8,982.1,0.0,120,0.9,458.1,0.0 03/05/2020,12,26.6,27.0,25.5,76,80,76,22.1,22.5,21.4,982.7,994.3,982.6,0.3,121,2.3,765.3,0.0 03/05/2020,13,28.5,28.7,26.6,66,77,65,21.5,23.1,21.2,982.5,994.2,982.4,1.4,130,3.2,1219.,0.0 03/05/2020,14,31.1,31.1,28.5,55,66,53,21.0,21.8,19.9,982.3,982.7,982.1,1.2,129,3.7,1743.,0.0 03/05/2020,15,31.6,31.8,30.7,50,55,49,19.8,20.8,19.2,992.9,993.5,982.2,1.1,119,5.1,1958.,0.0 03/05/2020,16,32.7,32.8,31.1,46,52,46,19.6,20.7,19.2,991.9,992.9,991.9,0.8,122,4.4,1953.,0.0 03/05/2020,17,32.3,33.3,32.0,44,49,42,18.6,20.2,18.2,990.7,991.9,979.0,2.6,133,5.9,2463.,0.0 03/05/2020,18,33.1,33.3,31.9,44,50,44,19.3,20.8,18.9,989.9,990.7,989.9,1.1,170,5.4,2033.,0.0 03/05/2020,19,32.4,33.2,32.2,47,47,44,19.7,20.0,18.7,989.5,989.9,989.5,2.4,152,5.2,1581.,0.0 03/05/2020,20,31.2,32.5,31.2,53,53,46,20.6,20.7,19.4,989.5,989.7,989.5,1.7,159,4.6,968.6,0.0 03/05/2020,21,29.7,32.0,29.7,62,62,51,21.8,21.8,20.5,989.7,989.7,989.4,0.8,154,4.0,414.2,0.0 03/05/2020,22,28.3,29.7,28.3,69,69,62,22.1,22.1,21.7,989.9,989.9,989.7,0.3,174,2.0,6.459,0.0 03/05/2020,23,26.9,28.5,26.9,75,75,67,22.1,22.5,21.7,990.5,990.5,989.8,0.2,183,1.0,-3.54,0.0 The second column is time (hour). I want to separate the dataset by morning (06-11), afternoon (12-17), evening (18-23) and night (00-05). How I can do it?
You can use pd.cut: bins = [-1,5,11,17,24] labels = ['morning', 'afternoon', 'evening', 'night'] df['day_part'] = pd.cut(df['hour'], bins=bins, labels=labels)
I added column names, including Hour for the second column. Then I used read_csv which reads the source text, "dropping" leading zeroes, so that Hour column is just int. To split rows (add a column marking the diurnal period), use: df['period'] = pd.cut(df.Hour, bins=[0, 6, 12, 18, 24], right=False, labels=['night', 'morning', 'afternoon', 'evening']) Then you can e.g. use groupby to process your groups. Because I used right=False parameter, the bins are closed on the left side, thus bin limits are more natural (no need for -1 as an hour). And bin limits (except for the last) are just starting hours of each period - quite natural notation.
Write Past Value in data.txt
I want to write some list in data.txt. The output from program is: Triangle ('(a1, b1)', '(a2, b2)', '(a3, b3)') Triangle ('(a4, b4)', '(a5, b5)', '(a6, b6)') With this lines of code to write in data.txt; data = {} data['shapes'] = [] data['shapes'].append({ 'name': str(triangle.name), 'Vertices': list(triangle.get_points()) I need output in my data.txt with json format like this: {"shapes": [{"name": "Triangle", "Vertices": ["(a1, b1)", "(a2, b2)", "(a3, b3)"]}, {"name": "Triangle", "Vertices": ["(a4, b4)", "(a5, b5)", "(a6, b6)"]}]} But this is what I get: {"shapes": [{"name": "Triangle", "Vertices": ["(a4, b4)", "(a5, b5)", "(a6, b6)"]}]} So, how can I write the past value of triangle that have vertices (a1, b1)...(a3, b3)?
This part of your code should be executed only once: data = {} data['shapes'] = [] The following part of your code you should execute repeatedly data['shapes'].append({ 'name': str(triangle.name), 'Vertices': list(triangle.get_points()) probably in a loop similar to this one for triangle in triangles: data['shapes'].append({ 'name': str(triangle.name), 'Vertices': list(triangle.get_points())
It seems like you're overwriting the variable referencing the first triangle object with the next triangle object before appending the first triangle object's information to data['shapes']. That block of code where you append to your data['shapes'] should be executed twice, once for each triangle object.
running Discrete wavelet transform in R Language
Please can someone help with a solution for running Discrete wavelet transform in R. I have tried with the following data format; Year, Rain. Year is in form of 1970,1972,1973.... and Rain in form of 200, 85, 34, 56 23, 0.5... etc. I don't know if my data frame is correct. or if i need to do something to the data before I run it. Haven't saved the data.frame as wave, I ran as: rain.dwt.01 <- wavDWT(wave) Here is my code: getwd() setwd("C:\\Users\\dell\\Desktop\\ANN") wave<-read.csv(file.choose(),header = T) library(wmtsa) library(wavelets) library(waveslim) library(MASS) library(wavethresh) ### loaded auxillary functions from Internet con <- url("faculty.washington.edu/dbp/R-CODE/workshop.Rdata") print(load(con)) close(con) lplot(wave) abline(h=mean(wave),lty="dotted",col="red") rain.dwt.01 <- wavDWT(wave) I got this error: Error in itCall("RS_wavelets_transform_discrete_wavelet_convolution", : (list) object cannot be coerced to type 'double Please help with example so I can understand why this error appears
Assigning values to imported variables from excel
I need to import an excel document into mathematica which has 2000 compounds in it, with each compound have 6 numerical constants assigned to it. The end goal is to type a compound name into mathematica and have the 6 numerical constants be outputted. So far my code is: t = Import["Titles.txt.", {"Text", "Lines"}] (imports compound names) n = Import["NA.txt.", "List"] (imports the 6 values for each compound) n[[2]] (outputs the second compounds 6 values) Instead of n[[#]] i would like to know how to type in a compound from the imported compound names and have the 6 values be outputted .
I'm not sure if I understand your question - you have two text files, rather than an Excel file, for example, and it's not clear what the data looks like. But there are probably plenty of ways to do this. Here's a suggestion (it might not be the best way): Let's assume that you've got all your data into a table (a list of lists): pt = { {"Hydrogen", "H", 1, 1.0079, -259, -253, 0.09, 0.14, 1776, 1, 13.5984}, {"Helium", "He", 2, 4.0026, -272, -269, 0, 0, 1895, 18, 24.5874}, {"Lithium" , "Li", 3, 6.941, 180, 1347, 0.53, 0, 1817, 1, 5.3917} } To find the information associated with a particular string: Cases[pt, {"Helium", rest__} -> rest] {"He", 2, 4.0026, -272, -269, 0, 0, 1895, 18, 24.5874} where the pattern rest__ holds everything that was found after "Helium". To look for the second item: Cases[pt, {_, "Li", rest__} -> rest] {2, 4.0026, -272, -269, 0, 0, 1895, 18, 24.5874} If you add more information to the patterns, you have more flexibility in how you choose elements from the table: Cases[pt, {name_, symbol_, aNumber_, aWeight_, mp_, bp_, density_, crust_, discovered_, rest__} /; discovered > 1850 -> {name, symbol, discovered}] {{"Helium", "He", 1895}} For something interactive, you could knock up a Manipulate: elements = pt[[All, 1]]; headings = {"symbol", "aNumber", "aWeight", "mp", "bp", "density", "crust", "discovered", "group", "ion"}; Manipulate[ Column[{ elements[[x]], TableForm[{ headings, Cases[pt, {elements[[x]], rest__} -> rest]}]}], {x, 1, Length[elements], 1}]