Making big data a little smaller

When we think about digital information, we often think about size. A daily email newsletter, for example, may be 75 to 100 kilobytes in size. But data also has dimensions, based on the numbers of variables in a piece of data. An email, for example, can be viewed as a high-dimensional vector where there’s one coordinate for each word in the dictionary and the value in that coordinate is the number of times that word is used in the email. So, a 75 Kb email that is 1,000 words long would result in a vector in the millions.

This geometric view on data is useful in some applications, such as learning spam classifiers, but, the more dimensions, the longer it can take for an algorithm to run, and the more memory the algorithm uses.

As data processing got more and more complex in the mid-to-late 1990s, computer scientists turned to pure mathematics to help speed up the algorithmic processing of data. In particular, researchers found a solution in a theorem proved in the 1980s by mathematics William B. Johnson and Joram Lindenstrauss working the area of functional analysis.

Known as the Johnson-Lindenstrauss lemma (JL lemma), computer scientists have used the theorem to reduce the dimensionality of data and help speed up all types of algorithms across many different fields, from streaming and search algorithms, to fast approximation algorithms for statistical and linear algebra and even algorithms for computational biology.

But as data has grown even larger and more complex, many computer scientists have asked: Is the JL lemma really the best approach to pre-process large data into a manageably low dimension for algorithmic processing?

Now, Jelani Nelson, the John L. Loeb Associate Professor of Engineering and Applied Sciences at the Harvard John A. Paulson School of Engineering and Applied Sciences, has put that debate to rest. In a paper presented this week at the annual IEEE Symposium on Foundations of Computer Science in Berkeley, California, Nelson and co-author Kasper Green Larsen, of Aarhus University in Denmark, found that the JL lemma really is the best way to reduce the dimensionality of data.

“We have proven that there are ‘hard’ data sets for which dimensionality reduction beyond what’s provided by the JL lemma is impossible,” said Nelson.

Essentially, the JL lemma showed that for any finite collection of points in high dimension, there is a collection of points in a much lower dimension which preserves all distances between the points, up to a small amount of distortion. Years after its original impact in functional analysis, computer scientists found that

The JL lemma can act as a preprocessing step, allowing the dimensions of data to be significantly reduced before running algorithms.

Rather than going through each and every dimension — like the hundreds of dimensions in an email — the JL lemma uses a system of geometric classification to speed things up. In this geometry, the individual dimensions don’t matter as much as the similarities between them. By mapping these similarities, the geometry of the data and the angles between data points are preserved, just in fewer dimensions.

Of course, the JL lemma has a wide range of applications that go far beyond spam filters. It is used in compressed sensing for reconstructing sparse signals using few linear measurements; clustering high-dimensional data; and DNA motif finding in computational biology.

“We still have a long way to go to understand the best dimension reduction possible for specific data sets as opposed to comparing to the worst case,” said Nelson. “I think that’s a very interesting direction for future work. There are also some interesting open questions related to how quickly we can perform the dimensionality reduction, especially when faced with high-dimensional vectors that are sparse, i.e. have many coordinates equal to zero. This sparse case is very relevant in many practical applications. For example, vectors arising from e-mails are extremely sparse, since a typical email does not contain every word in the dictionary.”

“The Johnson-Lindenstrauss Lemma is a fundamental result in high dimensional geometry but an annoying logarithmic gap remained between the upper and lower bounds for the minimum possible dimension required as a function of the number of points and the distortion allowed,” said Noga Alon, professor of Mathematics at Tel Aviv University, who had proven the previous best lower bound for the problem. “The recent work of Jelani Nelson and Kasper Green Larsen settled the problem. It is a refreshing demonstration of the power of a clever combination of combinatorial reasoning with geometric tools in the solution of a classical problem.”

Attitudes on human genome editing vary, but reach consensus on holding talks

An international team of scientists announced they had successfully edited the DNA of human embryos. As people process the political, moral and regulatory issues of the technology — which nudges us closer to nonfiction than science fiction — researchers at the University of Wisconsin-Madison and Temple University show the time is now to involve the American public in discussions about human genome editing.

In a study published Aug. 11 in the journal Science, the researchers assessed what people in the United States think about the uses of human genome editing and how their attitudes may drive public discussion. They found a public divided on its uses but united in the importance of moving conversations forward.

“There are several pathways we can go down with gene editing,” says UW-Madison’s Dietram Scheufele, lead author of the study and member of a National Academy of Sciences committee that compiled a report focused on human gene editing earlier this year. “Our study takes an exhaustive look at all of those possible pathways forward and asks where the public stands on each one of them.”

Compared to previous studies on public attitudes about the technology, the new study takes a more nuanced approach, examining public opinion about the use of gene editing for disease therapy versus for human enhancement, and about editing that becomes hereditary versus editing that does not.

The research team, which included Scheufele and Dominique Brossard — both professors of life sciences communication — along with Michael Xenos, professor of communication arts, first surveyed study participants about the use of editing to treat disease (therapy) versus for enhancement (creating so-called “designer babies”). While about two-thirds of respondents expressed at least some support for therapeutic editing, only one-third expressed support for using the technology for enhancement.

Diving even deeper, researchers looked into public attitudes about gene editing on specific cell types — somatic or germline — either for therapy or enhancement. Somatic cells are non-reproductive, so edits made in those cells do not affect future generations. Germline cells, however, are heritable, and changes made in these cells would be passed on to children.

Public support of therapeutic editing was high both in cells that would be inherited and those that would not, with 65 percent of respondents supporting therapy in germline cells and 64 percent supporting therapy in somatic cells. When considering enhancement editing, however, support depended more upon whether the changes would affect future generations. Only 26 percent of people surveyed supported enhancement editing in heritable germline cells and 39 percent supported enhancement of somatic cells that would not be passed on to children.

“A majority of people are saying that germline enhancement is where the technology crosses that invisible line and becomes unacceptable,” says Scheufele. “When it comes to therapy, the public is more open, and that may partly be reflective of how severe some of those genetically inherited diseases are. The potential treatments for those diseases are something the public at least is willing to consider.”

Beyond questions of support, researchers also wanted to understand what was driving public opinions. They found that two factors were related to respondents’ attitudes toward gene editing as well as their attitudes toward the public’s role in its emergence: the level of religious guidance in their lives, and factual knowledge about the technology.

Those with a high level of religious guidance in their daily lives had lower support for human genome editing than those with low religious guidance. Additionally, those with high knowledge of the technology were more supportive of it than those with less knowledge.

While respondents with high religious guidance and those with high knowledge differed on their support for the technology, both groups highly supported public engagement in its development and use. These results suggest broad agreement that the public should be involved in questions of political, regulatory and moral aspects of human genome editing.

“The public may be split along lines of religiosity or knowledge with regard to what they think about the technology and scientific community, but they are united in the idea that this is an issue that requires public involvement,” says Scheufele. “Our findings show very nicely that the public is ready for these discussions and that the time to have the discussions is now, before the science is fully ready and while we have time to carefully think through different options regarding how we want to move forward.”

Scientists discover unknown virus in ‘throwaway’ DNA

A chance discovery has opened up a new method of finding unknown viruses.

In research published in the journal Virus Evolution, scientists from Oxford University’s Department of Zoology have revealed that Next-Generation Sequencing and its associated online DNA databases could be used in the field of viral discovery. They have developed algorithms that detect DNA from viruses that happen to be in fish blood or tissue samples, and could be used to identify viruses in a range of different species.

Next-Generation Sequencing has revolutionised genomics research and is currently used to study and understand genetic material. It allows scientists to gather vast amounts of data, from a single piece of DNA, which is then collated into huge, online, genome databases that are publicly accessible.

Dr Aris Katzourakis and Dr Amr Aswad, Research Associates at Oxford’s Department of Zoology, initially discovered the new use for the database, by chance. While looking for an ancient herpes virus in primates, they found evidence of two new undocumented viruses.

Spurred by their accidental discovery, they set out to see if they could intentionally achieve the same result. In a separate project to find new fish-infecting herpes viruses, they used the technique to examine more than 50 fish genomes for recognisable viral DNA. Sure enough, in addition to the herpes viruses they were expecting to find, the researchers identified a distant lineage of unusual viruses – that may even be a new viral family. The traits were found scattered in fragments of 15 different species of fish, including the Atlantic salmon and rainbow trout.

To confirm that the viral evidence was not simply a fluke, or a data processing error, they tested additional samples from a local supermarket and sushi restaurant. The same viral fragments were found in the bought samples.

Study author Dr Aris Katzourakis, from Oxford University’s Department of Zoology, said: ‘In the salmon genome we found what seems to be a complete and independent viral genome, as well as dozens of fragments of viral DNA that had integrated into the fish DNA. We know from recent studies that viruses are able to integrate into the genome of their host, sometimes remaining there for millions of years. In this case, it looks like the virus may have acquired the ability to integrate by stealing a gene from the salmon itself, which explains how it has become so widespread in the salmon genome.’

The key to the success of this research is in its inter-disciplinary approach, combining techniques from two fields: evolutionary biology and genomics. Together, these are at the core of the new field of paleovirology – the study of ancient viruses that have integrated their DNA into that of their hosts, sometimes millions of years ago. Each technique used has been developed to analyse huge quantities of DNA sequence data.

Co-author and Research Associate at Oxford’s Department of Zoology and St. Hilda’s College, Dr Amr Aswad, said: ‘Discovering new viruses has historically been biased towards people and animals that exhibit symptoms of disease. But, our research shows how useful next generation DNA sequencing can be in viral identification. To many, viral DNA in say, chimp or falcon data is a nuisance, and a rogue contaminant that needs to be filtered from results. But we consider these an opportunity waiting to be exploited, as they could include novel viruses that are worth studying – as we have found in our research. We could be throwing away very valuable data.’

Finding new viruses has historically not been an easy process. Cells do not grow on their own, so must be cultured in a laboratory before they can be analysed, which involves months of work. But the Oxford research represents a massive opportunity for the future.

Beyond this study, the approach could be used to identify viruses in a range of different species, particularly those known to harbour transmissible disease. Bats and rodents, for example, are notorious carriers of infectious disease that they are seemingly immune to. Insects such as mosquitoes are also carriers of viral diseases that harm humans, such as Zika. If applied effectively the method could uncover other viruses before an outbreak even happens.

Dr Katzourakis added: ‘One of the real strengths of this technique, as compared to more traditional virology approaches, is the speed of discovery, and the lack of reliance on identifying a diseased individual. The viral data collected, that may otherwise be discarded as a nuisance, is a unique resource for looking for both pathogenic and benign viruses that would otherwise have remained undiscovered.’

The team will next begin to identify the impact of the viruses and whether they have any long term implications for disease, or commercial fish-farming. While an infectious virus may not cause disease in its natural host – in this case, fish. there is a risk of cross-species transmission to either farmed fish or wild populations.

However, the risk to humans is minimal. Dr Aris Katzourakis said: ‘Put it this way, I’m not going to stop eating sashimi.’

The glass transition caught in the act

We learn in school that matter comes in three states: solid, liquid and gas. A bored and clever student (we’ve all met one) then sometimes asks whether glass is a solid or a liquid.

The student has a point. Glasses are weird “solid liquids” that are cooled so fast their atoms or molecules jammed before organizing themselves in the regular patterns of a crystalline solid. So a glass has the mechanical properties of a solid but its atoms or molecules are disorganized, like those in a liquid.

One sign of the weirdness of glass is that the transition from liquid to a glass is much fuzzier than the transition from liquid to crystalline solid. In fact, the glass transition is arbitrarily defined as the point where the glass-forming material has a viscosity of 1013 poise. (The viscosity of water at room temperature is about 0.01 poise. A thick oil might have a viscosity of about 1.0 poise.) At this point, it is too thick to flow and so meets the practical definition of a solid.

Scientists hate definitions this vague, but they’ve been stuck with this one because nobody really understood the glass transition, which frequently makes it onto lists of the top-10 unsolved problems in physics.

For the most part, scientists have been able to measure only bulk properties of glass-forming liquids, such as viscosity and specific heat, and the interpretations they came up with depended in part on the measurements they took. The glass literature is notoriously full of contradictory findings and workshops about glass are the venue for lively debate.

But in the past fifteen years, new experimental setups that scatter X-rays or neutrons off the atoms in a droplet of liquid that is held without a container (which would provoke it to crystallize) have allowed scientists at long last to measure the atomic properties of the liquid. And that is the level at which they suspect the secrets of the glass transition are hidden.

In one such study, Ken Kelton, the Arthur Holly Compton Professor in Arts & Sciences at Washington University in St. Louis, and his research team (Chris Pueblo, Washington University and Minhua Sun, Harbin Normal University, China) compared a measure of the interaction of atoms for different glass-forming liquids. Their results, published online in Nature Materials, reconcile several measures of glass formation, a sign that they are on the right track.

“We have shown that the concept of fragile and strong liquids, which was invented to explain why viscosity changes in markedly different ways as a liquid cools, actually goes much deeper than just the viscosity,” Kelton said. “It is ultimately related to the repulsion between atoms, which limits their ability to move cooperatively. This is why the distinction between fragile and strong liquids also appears in structural properties, elastic properties and dynamics. They’re all just different manifestations of that atomic interaction.”

This is the first time the connection between viscosity and atomic interactions has been demonstrated experimentally, he said. Intriguingly, his studies and work by others suggest that the glass transition begins not at the conventional glass transition temperature but rather at a temperature approximately two times higher in metallic glasses (more than two times higher in the silicate glasses, such as window glass). It is at that point, Kelton said, the atoms first begin to move cooperatively.

Drilling down to the atomic level

Kelton’s latest discoveries follow earlier investigations of a characteristic of glass-forming liquids called fragility. To most people, all glasses are fragile, but to physicists some are “strong” and others are “fragile.”

The distinction was first introduced in 1995 by Austen Angell, a professor of chemistry at Arizona State University, who felt that a new term was needed to capture dramatic differences in the way a liquid’s viscosity increases as it approaches the glass transition.

The viscosities of some liquids change gradually and smoothly as they approach this transition. But as other liquids are cooled, their viscosity changes very little at first, but then take off like a rocket as the transition temperature approaches.

At the time, Angell could only measure viscosity, but he called the first type of liquid “strong” and the second type “fragile” because he suspected a structural difference underlay the differences that he saw,

“It’s easier to explain what he meant if you think of a glass becoming a liquid rather than the other way around,” Kelton said. “Suppose a glass is heated through the glass transition temperature. If it’s a ‘strong’ system, it ‘remembers’ the structure it had as a glass–which is more ordered than in a liquid–and that tells you that the structure does not change much through the transition. In contrast, a ‘fragile’ system quickly ‘forgets’ its glass structure, which tells you that its structure changes a lot through the transition.

“People argued that the change in viscosity had to be related to the structure — through several intermediate concepts, some of which are not well defined,” Kelton added. “What we did was hop over these intermediate steps to show directly that fragility was related to structure.”

In 2014, he with members of his group published in Nature Communications the results of experiments that showed that the fragility of a glass-forming liquid is reflected in something called the structure factor, a quantity measured by scattering X-rays off a droplet of liquid that contains information about the position of the atoms in the droplet.

“It was just as Angell had suspected,” Kelton said. “The rate of atomic ordering in the liquid near the transition temperature determines whether a liquid is ‘fragile’ or ‘strong.'”

Sharp little atomic elbows

But Kelton wasn’t satisfied. Other scientists were finding correlations between the fragility of a liquid and its elastic properties and dynamics, as well as its structure. “There has to be something in common,” he thought. “What’s the one thing that could underlie all of these things?” The answer, he believed, had to be the changing attraction and repulsion between atoms as they moved closer together, which is called the atomic interaction potential.

If two atoms are well separated, Kelton explained, there is little interaction between them and the interatomic potential is nearly zero. When they get closer together, they are attracted to one another for a variety of reasons. The potential energy goes down, becoming negative (or attractive). But then as they move closer still, the cores of the atoms start to interact, repelling one another. The energy shoots way up.

“It’s that repulsive part of the potential we were seeing in our experiments,” Kelton said.

What they found when they measured the repulsive potential of 10 different metallic alloys at the Advanced Photon Source, a beamline at Argonne National Laboratory, is that “strong” liquids have steeper repulsive potentials and the slope of their repulsive potential changes more rapidly that of “fragile” ones. “What this means,” Kelton said, “is that ‘strong’ liquids order more rapidly at high temperatures than ‘fragile’ ones. That is the microscopic underpinning of Angell’s fragility.

“What’s interesting,” Kelton continued, “is that we see atoms beginning to respond cooperatively — showing awareness of one another — at temperatures approximately double the glass transition temperature and close to the melting temperature.

“That’s where the glass transition really starts,” he said. “As the liquid cools more and more, atoms move cooperatively until rafts of cooperation extend from one side of the liquid to the other and the atoms jam. But that point, the conventional glass transition, is only the end point of a continuous process that begins at a much higher temperature.”

Kelton will soon attend a workshop in Poland where he expects lively discussion of his findings, which contradict those of some of his colleagues. But he is convinced that he has hold of the thread that will lead out of the labyrinth because different levels of understanding are beginning to line up. “It’s exciting that things are coming together so well,” he said.

Empowering robots for ethical behavior

Scientists at the University of Hertfordshire in the UK have developed a concept called Empowerment to help robots to protect and serve humans, while keeping themselves safe.

Robots are becoming more common in our homes and workplaces and this looks set to continue. Many robots will have to interact with humans in unpredictable situations. For example, self-driving cars need to keep their occupants safe, while protecting the car from damage. Robots caring for the elderly will need to adapt to complex situations and respond to their owners’ needs.

Recently, thinkers such as Stephen Hawking have warned about the potential dangers of artificial intelligence, and this has sparked public discussion. “Public opinion seems to swing between enthusiasm for progress and downplaying any risks, to outright fear,” says Daniel Polani, a scientist involved in the research, which was recently published in Frontiers in Robotics and AI.

However, the concept of “intelligent” machines running amok and turning on their human creators is not new. In 1942, science fiction writer Isaac Asimov proposed his three laws of robotics, which govern how robots should interact with humans. Put simply, these laws state that a robot should not harm a human, or allow a human to be harmed. The laws also aim to ensure that robots obey orders from humans, and protect their own existence, as long as this doesn’t cause harm to a human.

The laws are well-intentioned, but they are open to misinterpretation, especially as robots don’t understand nuanced and ambiguous human language. In fact, Asimov’s stories are full of examples where robots misinterpreted the spirit of the laws, with tragic consequences.

One problem is that the concept of “harm” is complex, context-specific and is difficult to explain clearly to a robot. If a robot doesn’t understand “harm”, how can they avoid causing it? “We realized that we could use different perspectives to create ‘good’ robot behavior, broadly in keeping with Asimov’s laws,” says Christoph Salge, another scientist involved in the study.

The concept the team developed is called Empowerment. Rather than trying to make a machine understand complex ethical questions, it is based on robots always seeking to keep their options open. “Empowerment means being in a state where you have the greatest potential influence on the world you can perceive,” explains Salge. “So, for a simple robot, this might be getting safely back to its power station, and not getting stuck, which would limit its options for movement. For a more futuristic, human-like robot this would not just include movement, but could incorporate a variety of parameters, resulting in more human-like drives.”

The team mathematically coded the Empowerment concept, so that it can be adopted by a robot. While the researchers originally developed the Empowerment concept in 2005, in a recent key development, they expanded the concept so that the robot also seeks to maintain a human’s Empowerment. “We wanted the robot to see the world through the eyes of the human with which it interacts,” explains Polani. “Keeping the human safe consists of the robot acting to increase the human’s own Empowerment.”

“In a dangerous situation, the robot would try to keep the human alive and free from injury,” says Salge. “We don’t want to be oppressively protected by robots to minimize any chance of harm, we want to live in a world where robots maintain our Empowerment.”

This altruistic Empowerment concept could power robots that adhere to the spirit of Asimov’s three laws, from self-driving cars, to robot butlers. “Ultimately, I think that Empowerment might form an important part of the overall ethical behaviour of robots,” says Salge.