Is there an fread analog for reading from stdin? - linux

I'd like to use fread in a (R)script that would get input data via the linux pipe mechanism. Is there an fread analog for the following?
read.csv(file = 'stdin', ...)
I'll also settle for reading stdin some other way and then using fread to parse it, as I mainly want this for fread's superior separator and header logic.

Turns out it's as simple as:
fread('file:///dev/stdin')
This works, because fread actually creates a temporary file when the first 7 characters are "file://" or "http://" and uses download.file to copy the data there and then fread that.
Update: As of version 1.8.11 one can use shell commands in fread, making another solution possible:
fread('cat /dev/stdin')

All of the read.* functions use 'scan' under their hoods. scan is fairly low level but does have the capacity for parsing lines of data into different classes.
> mat <- matrix(scan(), 4,4) # will paste in block of data
1: 0.5 0.1428571 0.25
4: 0.5 0.1428571 0.25
7: 0.5 0.1428571 0.25
10: 0.5 0.1428571 0.25
13: 0.5 0.1428571 0.25
16: 0.5
17: # Terminate with two <cr>'s
Read 16 items
> mat
[,1] [,2] [,3] [,4]
[1,] 0.5000000 0.1428571 0.2500000 0.5000000
[2,] 0.1428571 0.2500000 0.5000000 0.1428571
[3,] 0.2500000 0.5000000 0.1428571 0.2500000
[4,] 0.5000000 0.1428571 0.2500000 0.5000000
> lst <- scan(what=list(double(0), "a"))
1: 4 t
2: 6 h
3: 8 l
4: 8 8
5:
Read 4 records
> lst
[[1]]
[1] 4 6 8 8
[[2]]
[1] "t" "h" "l" "8"
You should also look at the ?connections page.

Related

mean! may return incorrect results

I recently had the following question
regarding the function mean! of the Statistics.jl package.
The following bug is reported regarding the behavior of mean!. As indicated, the mean! function does not properly consider that its arguments may alias each other. In some such cases the result from mean! is not correct:
julia> let a = [1 2 3]
mean!(a, a)
end
1×3 Array{Int64,2}:
0 0 0
julia> let a = [1 2 3]
mean!(copy(a), a)
end
1×3 Array{Int64,2}:
1 2 3
julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i9-9980HK CPU # 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
However, I think that this behavior is normal since viewing the definition of mean!, the result of the operation mean!(r, v) is written in r. Therefore it seems logical to me that if you use the same object as variable r and as variable v, the result is unpredictable.
I have seen that this also happens with the sum! function.
Someone can tell me if I am right or indeed, there is something that I am not understanding.
mean! behaves the way you observe because it internally calls sum!.
Now sum! behaves this way for the following reason. It was designed to perform summation without making any allocations. Therefore the first thing sum! does is initializing the target vector to 0 (a neutral element of summation). After this is done your a vector contains only 0s, and thus later you get all 0s also.
However, indeed it would make sense that the sum! (and similar) functions docstring should mention that the target should not alias with the source. Here is another example of the same you have observed:
julia> x = [1 2 3
4 5 6
7 8 9]
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
julia> y = view(x, :, 1)
3-element view(::Matrix{Int64}, :, 1) with eltype Int64:
1
4
7
julia> sum!(y, x)
3-element view(::Matrix{Int64}, :, 1) with eltype Int64:
5
11
17
julia> x
3×3 Matrix{Int64}:
5 2 3
11 5 6
17 8 9

What conversion is being used in these commands-- multiples of 16?

I'm controlling a board that has 16 outputs in two groups #1: 1-8, #2: 9-16.
The first part of the command is the group [1,...] or [2,...]
The second part is the output, so in the limited examples I am shown [1,1,...] would turn on [group 1, channel 1, ....]-- so far so good.
The next example is Output 8: [1, 128, ...]
The next example is Outputs 1 & 8: [1, 129, ...]
What is this conversion called? I'm assuming the outputs map as follows, is this correct:
Output 1 (or 9): 1
Output 2 (or 10): 16
Output 3 (or 11): 32
...
Output 8 (or 16): 128
So if I wanted outputs 2 & 3 the command would be [1, 48]?
You work with bits. 10000000 means turn on output 8. 10000000 in binary means 128 in decimal. If you wanted output 7 you’d do 1000000 which is 64 in decimal. Most programming languages would let you do something like 0x01001100 for example to turn on 7,4 and 3. Easier to work in binary.
The generic formula for you to find the value to set 1 output would be:
second_part = 2^(output_to_set - 1)
For multiple outputs, you just need to add them.
So if you want to set, outputs 2 and 3:
second_part = 2^1 + 2^2 = 6

Spatially adaptive smoothing using mgcv package in R

I am using the gam function in the mgcv package to fit spatially adaptive smoothing for heterogeneous data. This is my R code for fitting.
library(MASS)
data(mcycle)
fit <- gam(accel ~ s(times, k = 20, bs = 'ad'), data = mcycle, method = 'REML')
The output contains 5 smoothing parameters. I am trying to extract the values for each smoothing parameter ( S[[i]] for $i =1,..5$) and I used fit$S[[1]] to get the first smoothing parameter values, but it does not work. Could someone help me with this?
You want the $sp component
> fit$sp
s(times)1 s(times)2 s(times)3 s(times)4 s(times)5
1.364206e+01 5.204389e-04 2.036490e-03 8.565542e+00 2.428618e+03
The $S component of the $smooth list contains the penalty matrices associated with the five smoothing parameters.
See ?gamObject and ?smooth.construct for further details on what is returned in the fit.
If you really want the penalty matrices, then look at the structure of the smooth component:
> str(fit$smooth, max = 1)
List of 1
$ :List of 26
..- attr(*, "class")= chr [1:2] "pspline.smooth" "mgcv.smooth"
..- attr(*, "qrc")=List of 4
.. ..- attr(*, "class")= chr "qr"
..- attr(*, "nCons")= int 1
Even if there is a single smooth, the $smooth is a list. So we need fit$smooth[[1]] to access this smooth. Now if we look at the $S component of the smooth we see
> str(fit$smooth[[1]]$S, max = 1)
List of 5
$ : num [1:19, 1:19] 0.4446 -0.2845 0.0913 0.0426 0.0943 ...
$ : num [1:19, 1:19] 0.3417 -0.2441 0.0845 0.0341 0.0654 ...
$ : num [1:19, 1:19] 0.0913 -0.0734 0.0271 0.0109 0.0141 ...
$ : num [1:19, 1:19] 4.13e-05 -3.46e-05 4.10e-05 1.32e-04 -3.96e-05 ...
$ : num [1:19, 1:19] 1.68e-06 2.43e-06 3.49e-06 4.62e-06 1.08e-05 ...
Which indicates that there are five penalty matrices associated with this smooth and that each matrix is a component of the S list. Hence, for the ith penalty matrix we need
fit$smooth[[1]]$S[[ i ]]
Hence for the second penalty matrix we need
fit$smooth[[1]]$S[[2]]
the first six rows and columns of which look like this
> fit$smooth[[1]]$S[[2]][1:6, 1:6]
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.34168394 -0.24407752 0.084500619 0.03412496 0.06538967 0.054028500
[2,] -0.24407752 0.36254851 -0.255915616 0.05368650 -0.03418746 -0.019895116
[3,] 0.08450062 -0.25591562 0.352961000 -0.21961696 0.04421239 0.001082056
[4,] 0.03412496 0.05368650 -0.219616955 0.35168761 -0.18138207 0.077301400
[5,] 0.06538967 -0.03418746 0.044212389 -0.18138207 0.25012833 -0.178018503
[6,] 0.05402850 -0.01989512 0.001082056 0.07730140 -0.17801850 0.264159096

Using APCluster on Pre-Classified Data - Coloring Dendograms and Nicely Formatted Output

Question 1:
I am trying to work with the plot() function on an AggExResult object and the clusters in the documentation (https://cran.r-project.org/web/packages/apcluster/apcluster.pdf) work as expected.
In my own data, I have an additional column in the input which provides a pre-defined “target” for classification purposes, and I am wondering if there is a way to have the dendogram labels highlighted by color (e.g. red=class 0, blue=class 1) with the class of the targets being factors (or characters). I am ultimately trying to visually display how many clusters contain "pure" vs. "mixed" classes. Here is some slightly modified code from the online documentation to show roughly what my input data looks like:
cl1Targ <- matrix(nrow=50,ncol=1)
for(c1t in 1:nrow(cl1Targ)){ cl1Targ[c1t] <- as.factor(0) }
cl2Targ <- matrix(nrow=50,ncol=1)
for(c2t in 1:nrow(cl2Targ)){ cl2Targ[c2t] <- as.factor(1) }
## create two Gaussian clouds
#cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06))
#cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05))
cl1 <- cbind(rnorm(50,0.2,0.05),rnorm(50,0.8,0.06),cl1Targ)
cl2 <- cbind(rnorm(50,0.7,0.08),rnorm(50,0.3,0.05),cl2Targ)
x <- rbind(cl1,cl2)
colnames(x) <- c('Column 1','Column 2','Class_ID')
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x, r=2)
## run affinity propagation
apres <- apcluster(sim, q=0.7)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(sim)
## plot dendrogram
plot(aggres1, main='aggres1 w/ target') #
How would I color the dendogram by the target defined in the input?
Question 2:
When I show() the example data’s APResult, I see the following:
show(apres)
APResult object
Number of samples = 100
Number of iterations = 165
Input preference = -0.01281384
Sum of similarities = -0.1222309
Sum of preferences = -0.1409522
Net similarity = -0.2631832
Number of clusters = 11
Exemplars:
8 17 24 37 43 52 58 68 92 95 99
Clusters:
Cluster 1, exemplar 8:
7 8 9 25 31 36 39 42 47 48
Cluster 2, exemplar 17:
6 11 13 15 17 18 19 23 32 35
Cluster 3, exemplar 24:
2 5 10 24 45
When I use my own data, I see the following (the row.names, which are the drugs being clustered by gene expression mean fold change values)
show(apclr2q05_mean)
APResult object
Number of samples = 1045
Number of iterations = 429
Input preference = -390.0822
Sum of similarities = -89326.99
Sum of preferences = -83477.58
Net similarity = -172804.6
Number of clusters = 214
Exemplars:
amantadine_58mg6h_fc amiodarone_147mg3d_fc clarithromycin_56mg1d_fc fluconazole_394mg5d_fc ketoconazole_114mg5d_fc ketoconazole_2274mg1d_fc
pantoprazole_1100mg1d_fc pantoprazole_1100mg3d_fc quetiapine_500mg5d_fc roxithromycin_312mg5d_fc torsemide_3mg3d_fc acetazolamide_250mg3d_fc
Clusters:
Cluster 1, exemplar amantadine_58mg6h_fc:
amantadine_58mg6h_fc promazine_100mg1d_fc cyproteroneAcetate_2500mg6h_fc danazol_2g5d_fc ivermectin_7500ug1d_fc letrozole_250mg6h_fc
mefenamicAcid_93mg3d_fc olanzapine_23mg1d_fc secobarbital_20mg6h_fc zaleplon_100mg3d_fc
Cluster 2, exemplar amiodarone_147mg3d_fc:
amiodarone_147mg3d_fc amiodarone_147mg5d_fc aspirin_375mg5d_fc betaNapthoflavone_80mg5d_fc clofibrate_130mg3d_fc finasteride_800mg5d_fc
Cluster 3, exemplar clarithromycin_56mg1d_fc:
ciprofloxacin_72mg5d_fc ciprofloxacin_450mg6h_fc clarithromycin_56mg1d_fc clarithromycin_56mg3d_fc clarithromycin_56mg5d_fc
Cluster 4, exemplar fluconazole_394mg5d_fc:
fluconazole_394mg5d_fc
Also what I would expect in terms of content but I would like to format this for reporting purposes. I have tried to export this using dput() but I get a lot of extra unnecessary information in the output file. I am wondering how I might be able to export the same type of information from above along with the object name and target classifier mentioned above into a table that would look like the following (and add the name of the object to the output):
Name of object = apclr2q05_mean
Number of samples = 1045
Number of iterations = 429
Input preference = -390.0822
Sum of similarities = -89326.99
Sum of preferences = -83477.58
Net similarity = -172804.6
Number of clusters = 214
Exemplars: Target
amantadine_58mg6h_fc 1
amiodarone_147mg3d_fc 1
clarithromycin_56mg1d_fc 1
fluconazole_394mg5d_fc 0
ketoconazole_114mg5d_fc 0
ketoconazole_2274mg1d_fc 0
Clusters:
Cluster 1, exemplar amantadine_58mg6h_fc:
Drug Target
amantadine_58mg6h_fc 1
promazine_100mg1d_fc 1
cyproteroneAcetate_2500mg6h_fc 1
danazol_2g5d_fc 0
ivermectin_7500ug1d_fc 0
Cluster 2, exemplar amiodarone_147mg3d_fc:
Drug Target
Etc…
A big THANK YOU to Ulrich for his quick response to these questions by email and we wanted to share our discussion with the community so I will let him respond with his solution so that he gets the credit he deserves :-)
As an update, I tried to implement the answer to Question 1 and the sample code works as expected, but I am having trouble getting this to work on my data. The input data has two parts. The first is a matrix with the numeric measurement data including column and row labels:
> fci[1:3,1:3]
M30596_PROBE1 AI231309_PROBE1 NM_012489_PROBE1
amantadine_58mg1d_fc 0.05630744 -0.10441722 0.41873201
amantadine_58mg6h_fc -0.42780274 -0.26222322 0.02703001
amantadine_220mg1d_fc 0.35260779 -0.09902214 0.04067055
The second is the "target" values in Factor format, each of which corresponds to same row in fci above:
> targs[1:3]
amantadine_58mg1d_fc amantadine_58mg6h_fc amantadine_220mg1d_fc
0 0 0
Levels: 0 1
From here, the tree was built as below:
# build the AggExResult:
aglomr1 <- aggExCluster(negDistMat(r=2), fci)
# convert the data
tree <- as.dendrogram(aglomr1)
# assign the color codes
colorCodes <- c("0"="red", "1"="green")
names(targs) <- rownames(fci)
xColor <- colorCodes[as.character(targs)]
names(xColor) <- rownames(fci)
# plot the colored tree
labels_colors(tree) <- xColor[order.dendrogram(tree)]
plot(tree, main="Colored Tree")
The tree was generated but the leaves were not colored. Doing some digging:
> head(xColor)
0 0 0 0 0 0
"red" "red" "red" "red" "red" "red"
That part seems to work as expected in terms of the targets having the correct colors assigned, but the rownames are not in xColor, and the line labels_colors(tree) <- xColor[order.dendrogram(tree)] does not return similar labels, but rather what appear to be row numbers, or NAs:
> head(order.dendrogram(tree))
[1] "295" "929" "488" "493" "233" "235"
> head(labels_colors(tree))
295 929 488 493 233 235
> head(xColor[order.dendrogram(tree)])
<NA> <NA> <NA> <NA> <NA> <NA>
NA NA NA NA NA NA
How would I get the line labels_colors(tree) <- xColor[order.dendrogram(tree)] to behave in the same way as the example provided? Specifically, what I am trying to show is the leaf lables such as amantadine_58mg1d_fc being highlighted in the color that corresponds to the target (0/1).
Here is my answer to your Question 1: the plot() method for 'AggExResult' objects internally uses the plot.dendrogram() method. Since this method does not allow for coloring leaves of dendrograms, this will not work. However, there is the 'dendextend' package which offers such a functionality. (BTW, I found that solution in another thread: Label and color leaf dendrogram in r) Since 'apcluster' offers some casts to 'hclust' and 'dendrogram' objects, this package's functionality can be used more or less directly.
So, here is some sample code:
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## compute agglomerative clustering from scratch
aggres1 <- aggExCluster(sim)
## load 'dendextend' package
## install.packages("dendextend") ## if not yet installed
library(dendextend)
## convert object
tree <- as.dendrogram(aggres1)
## assign color codes
colorCodes <- c("0"="red", "1"="green")
xColor <- colorCodes[x$Class_ID]
names(xColor) <- rownames(x)
## plot color-labeled tree
labels_colors(tree) <- xColor[order.dendrogram(tree)]
plot(tree)
Here is my answer to your Question 2: Sorry, no such functionality is implemented in the 'apcluster' package. And since this is quite a special request, I am reluctant to include it the package (let alone the fact that show() methods cannot have additional arguments). So, alternatively, I want to provide you with a custom function that allows for labeling/grouping exemplars and samples:
library(apcluster)
## create two Gaussian clouds along with class labels 0/1
cl1 <- cbind(rnorm(50, 0.2, 0.05), rnorm(50, 0.8, 0.06))
cl2 <- cbind(rnorm(50, 0.7, 0.08), rnorm(50, 0.3, 0.05))
x <- cbind(Columns=data.frame(rbind(cl1, cl2)),
"Class_ID"=factor(as.character(c(rep(0, 50), rep(1, 50)))))
## compute similarity matrix (negative squared Euclidean)
sim <- negDistMat(x[, 1:2], r=2)
## special show() function with labeled data
show.ExClust.labeled <- function(object, labels=NULL)
{
if (!is(object, "ExClust"))
stop("'object' is not of class 'ExClust'")
if (is.null(labels))
{
show(object)
return(invisible(NULL))
}
cat("\n", class(object), " object\n", sep="")
if (!is.finite(object#l) || !is.finite(object#it))
stop("object is not result of an affinity propagation run; ",
"it is pointless to create 'APResult' objects yourself.")
cat("\nNumber of samples = ", object#l, "\n")
if (length(object#sel) > 0)
{
cat("Number of sel samples = ", length(object#sel),
paste(" (", round(100*length(object#sel)/object#l,1),
"%)\n", sep=""))
cat("Number of sweeps = ", object#sweeps, "\n")
}
cat("Number of iterations = ", object#it, "\n")
cat("Input preference = ", object#p, "\n")
cat("Sum of similarities = ", object#dpsim, "\n")
cat("Sum of preferences = ", object#expref, "\n")
cat("Net similarity = ", object#netsim, "\n")
cat("Number of clusters = ", length(object#exemplars), "\n\n")
if (length(object#exemplars) > 0)
{
if (length(names(object#exemplars)) == 0)
{
cat("Exemplars:\n")
df <- data.frame("Sample"=object#exemplars,
Label=labels[object#exemplars])
print(df, row.names=FALSE)
for (i in 1:length(object#exemplars))
{
cat("\nCluster ", i, ", exemplar ",
object#exemplars[i], ":\n", sep="")
df <- data.frame(Sample=object#clusters[[i]],
Label=labels[object#clusters[[i]]])
print(df, row.names=FALSE)
}
}
else
{
df <- data.frame("Exemplars"=names(object#exemplars),
Label=labels[names(object#exemplars)])
print(df, row.names=FALSE)
for (i in 1:length(object#exemplars))
{
cat("\nCluster ", i, ", exemplar ",
names(object#exemplars)[i], ":\n", sep="")
df <- data.frame(Sample=names(object#clusters[[i]]),
Label=labels[names(object#clusters[[i]])])
print(df, row.names=FALSE)
}
}
}
else
{
cat("No clusters identified.\n")
}
}
## create label vector (with proper names)
label <- x$Class_ID
names(label) <- rownames(x)
## run apcluster()
apres <- apcluster(sim, q=0.3)
## show with labels
show.ExClust.labeled(apres, label)

Column not lining up

I am trying to achieve a column-style for converting celcius to fahrenheit.
My question: When I run it, the numbers are lining up nicely. However the c and f are aligned left and are not above the numbers. Why is this?
Here is my code:
def convert(celc):
fahr = celc * 1.8 + 32
return fahr
def table():
format_str = '{0:10} {1:10}'
c = 'c'
f = 'f'
cfhead = format_str.format(c, f)
print(cfhead)
for graden in range(-40,30,10):
result = format_str.format(convert(graden), graden)
print(result)
table()
Use > in your format to always align to the right:
format_str = '{0:>10} {1:>10}'
The default is for strings to be left-aligned, numbers to be right-aligned.
From the Format Specification Mini-Language documentation:
'<'
Forces the field to be left-aligned within the available space (this is the default for most objects).
'>'
Forces the field to be right-aligned within the available space (this is the default for numbers).
With an explicit alignment formatter you get your desired output:
>>> table()
c f
-40.0 -40
-22.0 -30
-4.0 -20
14.0 -10
32.0 0
50.0 10
68.0 20

Resources