Extremely energy-efficient artificial intelligence is now closer to reality after a study by UCL researchers found a way to improve the accuracy of a brain-inspired computing system.
The system, which uses memristors to create artificial neural networks, is at least 1,000 times more energy efficient than conventional transistor-based AI hardware, but has until now been more prone to error.
Existing AI is extremely energy-intensive — training one AI model can generate 284 tonnes of carbon dioxide, equivalent to the lifetime emissions of five cars. Replacing the transistors that make up all digital devices with memristors, a novel electronic device first built in 2008, could reduce this to a fraction of a tonne of carbon dioxide — equivalent to emissions generated in an afternoon’s drive.
Since memristors are so much more energy-efficient than existing computing systems, they can potentially pack huge amounts of computing power into hand-held devices, removing the need to be connected to the Internet.
This is especially important as over-reliance on the Internet is expected to become problematic in future due to ever-increasing data demands and the difficulties of increasing data transmission capacity past a certain point.
In the new study, published in Nature Communications, engineers at UCL found that accuracy could be greatly improved by getting memristors to work together in several sub-groups of neural networks and averaging their calculations, meaning that flaws in each of the networks could be cancelled out.
Memristors, described as “resistors with memory,” as they remember the amount of electric charge that flowed through them even after being turned off, were considered revolutionary when they were first built over a decade ago, a “missing link” in electronics to supplement the resistor, capacitor and inductor. They have since been manufactured commercially in memory devices, but the research team say they could be used to develop AI systems within the next three years.
Memristors offer vastly improved efficiency because they operate not just in a binary code of ones and zeros, but at multiple levels between zero and one at the same time, meaning more information can be packed into each bit.
Moreover, memristors are often described as a neuromorphic (brain-inspired) form of computing because, like in the brain, processing and memory are implemented in the same adaptive building blocks, in contrast to current computer systems that waste a lot of energy in data movement.
In the study, Dr Adnan Mehonic, PhD student Dovydas Joksas (both UCL Electronic & Electrical Engineering), and colleagues from the UK and the US tested the new approach in several different types of memristors and found that it improved the accuracy of all of them, regardless of material or particular memristor technology. It also worked for a number of different problems that may affect memristors’ accuracy.
Researchers found that their approach increased the accuracy of the neural networks for typical AI tasks to a comparable level to software tools run on conventional digital hardware.
Dr Mehonic, director of the study, said: “We hoped that there might be more generic approaches that improve not the device-level, but the system-level behaviour, and we believe we found one. Our approach shows that, when it comes to memristors, several heads are better than one. Arranging the neural network into several smaller networks rather than one big network led to greater accuracy overall.”
Dovydas Joksas further explained: “We borrowed a popular technique from computer science and applied it in the context of memristors. And it worked! Using preliminary simulations, we found that even simple averaging could significantly increase the accuracy of memristive neural networks.”
Professor Tony Kenyon (UCL Electronic & Electrical Engineering), a co-author on the study, added: “We believe now is the time for memristors, on which we have been working for several years, to take a leading role in a more energy-sustainable era of IoT devices and edge computing.”
from ScienceDaily – Artificial Intelligence https://www.sciencedaily.com/releases/2020/08/200827150957.htm
[This article was first published on R – Myscape, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t.
Exact tests often are a good place to start with differential expression analysis of genomic data sets.
As usual, the types of contrasts you can make will depend on the design of your study and data set. In the following example we will use the raw counts of differentially expressed (DE) genes to compare the following Daphnia genotypes.
After normalization of raw counts we will perform genewise exact tests for differences in the means between two groups of gene counts. Specifically, the two experimental groups of treatment and control for the E05 Daphnia genotype.
Exact Test and Plotting Script – exactTest_edgeR.r
The following R script will be used to normalize raw gene counts produced by a quantification program such as Hisat2 or Tophat2. Then, we will perform exact tests for a specified genotype data set and plot both intermediate and final results.
We will use several different plotting functions to generate informative plots of our input data sets and exact test results with edgeR.
barplot – Bar plots of sequence read library sizes
plotMDS– Multi-dimensional scaling (MDS) scatterplot of distances between the samples
heatmap– Heatmaps of moderated log counts per million (logcpm)
plotBCV– Genewise biological coefficient of variation (BCV) plots against gene abundance
plotMD– Mean difference (MD) plots of all log fold change (logFC) against average count size
plotSmear– MA plots of log-intensity ratios (M-values) versus log-intensity averages (A-values)
Script Inputs
A csv formatted file of raw gene counts is expected as an input to the script. Below is an example raw gene count table where columns are samples, rows are gene IDs, and raw counts are represented by “N#” for simplicity here.
Additionally, the column numbers for the range of samples you wish to perform exact tests on need to be specified. So for example, the column numbers 1 and 6 should be input to perform an exact test on the E05 Daphnia genotype for the example raw gene count table above. This selects all replicates of the UV treatment and VIS control for the E05 genotype.
R Script
First, we should add some comments to the top of the R script describing how to run the script. We should also add a comment with a reminder of the purpose of the script.
#!/usr/bin/env Rscript
#Usage: Rscript exactTest_edgeR.r rawGeneCountsFile startColumn endColumn
#Usage Ex: Rscript exactTest_edgeR.r daphnia_rawGeneCounts_htseq.csv 1 6
#R script to perform exact test DE analysis of raw gene counts using edgeR
Before proceeding with plotting and exact tests we will need to import the edgeR library.
#Load the edgeR library
library("edgeR")
Next, we retrieve the input path for our raw gene counts and the column range for the exact test. The number of input arguments to the script are verified to help ensure that the required data is input to the script.
#Retrieve inputs to the script
args = commandArgs(trailingOnly=TRUE)
#Test if there is one input argument
if (length(args)!=3) {
stop("One file name and a range of columns must be supplied.n", call.=FALSE)
}
Now we can import our data with the input path to our raw gene counts and the column range for the sample we want to test.
#Import gene count data
countsTable <- read.csv(file=args[1], row.names="gene")[ ,args[2]:args[3]]
head(countsTable)
The samples that represent our treatment and control need to be specified as a grouping factor to perform the exact tests with edgeR. In our example input data set the samples are ordered as:
Treatment replicate 1
Treatment replicate 2
Treatment replicate 3
Control replicate 1
Control replicate 2
Control replicate 3
#Add grouping factor
group <- factor(c(rep("treat",3),rep("ctrl",3)))
#Create DGE list object
list <- DGEList(counts=countsTable,group=group)
This is a good point to generate some interesting plots of our input data set before we begin preparing the raw gene counts for the exact test.
First, we will plot the library sizes of our sequencing reads before normalization using the barplot function. The resulting plot is saved to the plotBarsBefore jpg file in your working directory.
#Plot the library sizes before normalization
jpeg("plotBarsBefore.jpg")
barplot(list$samples$lib.size*1e-6, names=1:6, ylab="Library size (millions)")
dev.off()
Next, we will use the plotMDS function to display the relative similarities of the samples and view batch and treatment effects before normalization. The resulting plot is saved to the plotMDSBefore jpg file in your working directory.
#Draw a MDS plot to show the relative similarities of the samples
jpeg("plotMDSBefore.jpg")
plotMDS(list, col=rep(1:3, each=3))
dev.off()
It can also be useful to view the moderated log-counts-per-million before normalization using the cpm function results with heatmap. The resulting plot is saved to the plotHeatMapBefore jpg file in your working directory.
#Draw a heatmap of individual RNA-seq samples
jpeg("plotHeatMapBefore.jpg")
logcpm <- cpm(list, log=TRUE)
heatmap(logcpm)
dev.off()
There is no purpose in analyzing genes that are not expressed in either experimental condition (treatment or control), so raw gene counts are first filtered by expression levels. The normalized gene counts are output to the stats_normalizedCounts csv file.
#Filter raw gene counts by expression levels
keep <- filterByExpr(list)
table(keep)
list <- list[keep, , keep.lib.sizes=FALSE]
#Calculate normalized factors
list <- calcNormFactors(list)
#Write normalized counts to file
normList <- cpm(list, normalized.lib.sizes=TRUE)
write.table(normList, file="stats_normalizedCounts.csv", sep=",", row.names=TRUE)
#View normalization factors
list$samples
dim(list)
Now that we have normalized gene counts for our samples we should generate the same set of previous plots for comparison.
First, we plot the library sizes of our sequencing reads after normalization using the barplot function. The resulting plot is saved to the plotBarsAfter jpg file in your working directory.
#Plot the library sizes after normalization
jpeg("plotBarsAfter.jpg")
barplot(list$samples$lib.size*1e-6, names=1:6, ylab="Library size (millions)")
dev.off()
Next, we will use the plotMDS function to display the relative similarities of the samples and view batch and treatment effects after normalization. The resulting plot is saved to the plotMDSAfter jpg file in your working directory.
#Draw a MDS plot to show the relative similarities of the samples
jpeg("plotMDSAfter.jpg")
plotMDS(list, col=rep(1:3, each=3))
dev.off()
It can also be useful to view the moderated log counts per million (logcpm) after normalization using the cpm function results with heatmap. The resulting plot is saved to the plotHeatMapAfter jpg file in your working directory.
#Draw a heatmap of individual RNA-seq samples
jpeg("plotHeatMapAfter.jpg")
logcpm <- cpm(list, log=TRUE)
heatmap(logcpm)
dev.off()
With the normalized gene counts we can also produce a matrix of pseudo-counts to estimate the common and tagwise dispersions. This allows us to use the plotBCV function to generate a genewise biological coefficient of variation (BCV) plot of dispersion estimates. The resulting plot is saved to the plotBCVResults jpg file in your working directory.
#Produce a matrix of pseudo-counts
list <- estimateDisp(list)
list$common.dispersion
#View dispersion estimates and biological coefficient of variation
jpeg("plotBCV.jpg")
plotBCV(list)
dev.off()
Finally, we are ready to perform exact tests with edgeR using the exactTest function. The resulting table of differentially expressed (DE) genes are written to the stats_exactTest csv file to your working directory.
#Perform an exact test for treat vs ctrl
tested <- exactTest(list, pair=c("ctrl", "treat"))
topTags(tested)
#Create results table of DE genes
resultsTbl <- topTags(tested, n=nrow(tested$table))$table
#Output resulting table
write.table(resultsTbl, file="stats_exactTest.csv", sep=",", row.names=TRUE)
Using the resulting DE genes from the exact test we can view the counts per million for the top genes of each sample.
#Look at the counts per million in individual samples for the top genes
o <- order(tested$table$PValue)
cpm(list)[o[1:10],]
#View the total number of differentially expressed genes at 5% FDR
summary(decideTests(tested))
We can also generate a mean difference (MD) plot of the log fold change (logFC) against the log counts per million (logcpm) using the plotMD function. DE genes are highlighted and the blue lines indicate 2-fold changes. The resulting plot is saved to the plotMDResults jpg file in your working directory.
#Make a MD plot of logFC against logcpm
jpeg("plotMDResults.jpg")
plotMD(tested)
abline(h=c(-1, 1), col="blue")
dev.off()
As a final step, we will produce a MA plot of the libraries of count data using the plotSmear function. There are smearing points with very low counts, particularly those counts that are zero for one of the columns. The resulting plot is saved to the plotMAResults jpg file in your working directory.
#Make a MA plot of the libraries of count data
jpeg("plotMAResults.jpg")
plotSmear(tested)
dev.off()