How much can artificial intelligence and machine learning accelerate polymer science?

I’ve been at the annual High Polymer Research Group meeting at Pott Shrigley this week; this year it had the very timely theme “Polymers in the age of data”. Some great talks have really brought home to me both the promise of machine learning and laboratory automation in polymer science, as well as some of the practical barriers. Given the general interest in accelerated materials discovery using artificial intelligence, it’s interesting to focus on this specific class of materials to get a sense of the promise – and the pitfalls – of these techniques.

Debra Audis, from the USA’s National Institute of Standards and Technology, started the meeting off with a great talk on how to use machine learning to make predictions of polymer properties given information about molecular structure. She described three difficulties for machine learning – availability of enough reliable data, the problem of extrapolation outside the parameter space of the training set, and the problem of explainability.

A striking feature of Debra’s talk for me was its exploration of the interaction between old-fashioned theory, and new-fangled machine learning (ML). This goes in two directions – on the one hand, Debra demonstrated that incorporating knowledge from theory can greatly speed up the training of a ML model, as well as improving its ability to extrapolate beyond the training set. But given a trained ML model – essentially a black box of weights for your neural network, Debra emphasised the value of symbolic regression to convert the black box to a closed form expression of simple functional forms of the kind a theorist would hope to be able to derive from some physical principles, providing something a scientist might recognise as an explanation of the regularities that the machine learning model encapsulates.

But any machine learning model needs data – lots of data – so where does that data come from? One answer is to look at the records of experiments done in the past – the huge corpus of experimental data contained within the scientific literature. Jacqui Cole from Cambridge has developed software to extract numerical data, chemical reaction schemes, and to analyse images from the scientific data. For specific classes of (non-polymeric) materials she’s been able to create data sets with thousands of entries, using automated natural language processing to extract some of the contextual information that makes the data useful. Jacqui conceded that polymeric materials are particularly challenging for this approach; they have complex properties that are difficult to pin down to a single number, and what to the outsider may seem to be a single material (polyethylene for example) may actually be a category that encompasses molecules with a wider variety of subtle variations arising from different synthesis methods and reaction conditions. And Debra and Jacqui shared some sighs of exasperation at the horribly inconsistent naming conventions used by polymer science researchers.

My suspicion on this (informed a little by the outcomes of a large scale collaboration with a multinational materials company that I’ve been part of over the last five years) is that the limitations of existing data sets mean that the full potential of machine learning will only be unlocked by the production of new, large scale datasets designed specifically for the problem in hand. For most functional materials the parameter space to be explored is vast and multidimensional, so considerable thought needs to be given to how best to sample this parameter space to provide the training data that a good machine learning model needs. In some circumstances theory can help here – Kim Jelfs from Imperial described an approach where the outputs from very sophisticated, compute intensive theoretical models were used to train a ML model that could then interpolate properties at much lower compute cost. But we will always need to connect to the physical world and make some stuff.

This means we will need automated chemical synthesis – the ability to synthesise many different materials with systematic variation of the reactants and reaction conditions, and then rapidly determine the properties of this library of materials. How do you automate a synthetic chemistry lab? Currently, a synthesis laboratory consists of a human measuring out materials, setting up the right reaction conditions, then analysing and purifying the products, finally determining their properties. There’s a fundamental choice here – you can automate the glassware, or automate the researcher. In the UK, Lee Cronin at Glasgow (not at the meeting) has been a pioneer of the former approach, while Andy Cooper at Liverpool has championed the latter. Andy’s approach involves using commercial industrial robots to carry out the tasks a human researcher would do, while using minimally adapted synthesis and analytical equipment. His argument in favour of this approach is essentially an economic one – the world market for general purpose industrial robots is huge, leading to substantial falls in price, while custom built automated chemistry labs represent a smaller market, so one should expect slower progress and higher prices.

Some aspects of automating the equipment are already commercially available. Automatic liquid handling systems are widely available, allowing one, for example to pipette reactants into multiwell plates, so if one’s synthesis isn’t sensitive to air one can use this approach to do combinatorial chemistry. Adam Gormley from Rutgers described this approach for making a library of copolymers by an oxygen-tolerant adaptation of reversible addition−fragmentation chain-transfer polymerisation (RAFT), to produce libraries of copolymers with varying polymer molecular weight and composition. Another approach uses flow chemistry, in which reactions take place not in a fixed piece of glassware, but as the solvents containing the reactants travel down pipes, as described by Tanja Junkers from Monash, and Nick Warren from Leeds. This approach allows in-line reaction monitoring, so it’s possible to build in a feedback loop, adjusting the ingredients and reaction conditions on the fly in response to what is being produced.

It seems to me, as a non-chemist, that there is still a lot of specific work to be done to adapt the automation approach to any particular synthetic method, so we are still some way from a universal synthesis machine. Andy Cooper’s talk title perhaps alluded to this: “The mobile robotic polymer chemist: nice, but does it do RAFT?” This may be a chemist’s joke.

But whatever approach one has realised to be able to produce a library of molecules with different characteristics, and analyse their properties, there remains the question of how to sample what is likely to be a huge parameter space in order to provide the most effective training set for machine learning. We were reminded by the odd heckle from a very distinguished industrial scientist in the audience that there is a very classical body of theory to underpin this kind of experimental strategy – the Design of Experiments methodology. In these approaches, one selects the optimum set of different parameters in order most effectively to span parameter space.

But an automated laboratory offers the possibility of adapting the sampling strategy in response to the results as one gets them. Kim Jelfs set out the possible approaches very clearly. You can take the brute force approach, and just calculate everything – but this is usually prohibitively expensive in compute. You can use an evolutionary algorithm, using mutation and crossover steps to find a way through parameter space that optimises the output. Bayesian optimisation is popular, and generative models can be useful for taking a few more random leaps. Whatever the details, there needs to be a balance between optimisation and exploration – between taking a good formulation and making it better, and searching widely across parameter space for a possibly unexpected set of conditions that provides a step-change in the properties one is looking for.

It’s this combination of automated chemical synthesis and analysis, with algorithms for directing a search through parameter space, that some people call a “self-driving lab”. I think the progress we’re seeing now suggests that this isn’t an unrealistic aspiration. My somewhat tentative conclusions from all this:

  • We’re still a long way from an automated lab that can flexibly handle many different types of chemistry, so for a while its going to be a question of designing specific set-ups for particular synthetic problems (though of course there will be a lot of transferrable learning).
  • There is still lot of craft in designing algorithms to search parameter space effectively.
  • Theory still has its uses, both in accelerating the training of machine learning models, and in providing satisfactory explanations of their output.
  • It’s going to take significant effort, computing resource and money to develop these methods further, so it’s going to be important to select use cases where the value of an optimised molecule makes the investment worthwhile. Amongst the applications discussed in the meeting were drug excipients, membranes for gas separation, fuel cells and batteries, optoelectronic polymers.
  • Finally, the physical world matters – there’s value in the existing scientific literature, but it’s not going to be enough just to process words and text; for artificial intelligence to fulfil its promise for accelerating materials discovery you need to make stuff and test its properties.

From self-stratifying films to levelling up: A random walk through polymer physics and science policy

After more than two and a half years at the University of Manchester, last week I finally got round to giving an in-person inaugural lecture, which is now available to watch on Youtube. The abstract follows:

How could you make a paint-on solar cell? How could you propel a nanobot? Should the public worry about the world being consumed by “grey goo”, as portrayed by the most futuristic visions of nanotechnology? Is the highly unbalanced regional economy of the UK connected to the very uneven distribution of government R&D funding?

In this lecture I will attempt to draw together some themes both from my career as an experimental polymer physicist, and from my attempts to influence national science and innovation policy. From polymer physics, I’ll discuss the way phase separation in thin polymer films is affected by the presence of surfaces and interfaces, and how in some circumstances this can result in films that “self-stratify” – spontaneously separating into two layers, a favourable morphology for an organic solar cell. I’ll recall the public controversies around nanotechnology in the 2000s. There were some interesting scientific misconceptions underlying these debates, and addressing these suggested some new scientific directions, such as the discovery of new mechanisms for self-propelling nano- and micro- scale particles in fluids. Finally, I will cover some issues around the economics of innovation and the UK’s current problems of stagnant productivity and regional inequality, reflecting on my experience as a scientist attempting to influence national political debates.

Rubber City Rebels

I’m currently teaching a course on the theory of what makes rubber elastic to Material Science students at Manchester, and this has reminded me of two things. The first is that this a great topic to introduce a number of the most central concepts of polymer physics – the importance of configurational entropy, the universality of the large scale statistical properties of macromolecules, the role of entanglements. The second is that the city of Manchester has played a recurring role of the history of the development of this bit of science, which as always, interacts with technological development in interesting and complex ways.

One of the earliest quantitative studies of the mechanical properties of rubber was published by that great Manchester physicist, James Joule, in 1859. As part of his investigations of the relationship between heat and mechanical work, he measured the temperature change that occurs when rubber is stretched. As anyone can find out for themselves with a simple experiment, rubber is an unusual material in this respect. If you take an elastic band (or, better, a rubber balloon folded into a narrow strip), hold it close to your upper lip, suddenly stretch it and then put it to your lip, you can feel that it significantly heats up – and then, if you release the tension again, it cools down again. This is a crucial observation for understanding how it is that the elasticity of rubber arises from the reduction in entropy that occurs when a randomly coiled polymer strand is stretched.

But this wasn’t the first observation of the effect – Joule himself referred to an 1805 article by John Gough, in the Memoirs of the Manchester Literary and Philosophical Society, drawing attention to this property of natural rubber, and the related property that a strand of the material held under tension would contract on being heated. John Gough himself was a fascinating figure – a Quaker from Kendal, a town on the edge of England’s Lake District, blind, as a result of a childhood illness, he made a living as a mathematics tutor, and was a friend of John Dalton, the Manchester based pioneer of the atomic hypothesis. All of this is a reminder of the intellectual vitality of that time in the fast industrialising provinces, truly an “age of improvement”, while the universities of Oxford and Cambridge had slipped into the torpor of qualifying the dim younger offspring of the upper classes to become Anglican clergymen.

Joule’s experiments were remarkably precise, but there was another important difference from Gough’s pioneering observation. Joule was able to use a much improved version of the raw natural rubber (or caoutchouc) that Gough used; the recently invented process of vulcanisation produced a much stronger, stabler material than the rather gooey natural precursor. The original discovery of the process of vulcanisation was made by the self-taught American inventor Charles Goodyear, who found in 1839 that rubber could be transformed by being heated with sulphur. It wasn’t for nearly another century that the chemical basis of this process was understood – the sulphur creates chemical bridges between the long polymer molecules, forming a covalently bound network. Goodyear’s process was rediscovered – or possibly reverse engineered – by the industrialist Thomas Hancock, who obtained the English patents for it in 1843 [2].

Appropriately for Manchester, the market that Hancock was serving was for improved raincoats. The Scottish industrialist Mackintosh had created his eponymous garment from a waterproof fabric consisting of a sandwich of rubber between two textile sheets; Hancock meanwhile had developed a number of machines and technologies for processing natural rubber, so it was natural for the two to enter into partnership with their Manchester factory making waterproof fabric. Their firm prospered; Goodyear, though, failed to make money from his invention and died in poverty (the Goodyear tire company was named after him, but only some years after his death).

At that time, rubber was a product of the Amazonian rain forest, harvested from wild trees by indigenous people. In a well known story of colonial adventurism, 70,000 seeds of the rubber tree were smuggled out of Brazil by the explorer Henry Wickham, successfully cultivated at Kew Gardens, with the plants exported to the British colonies of Malaya and Ceylon to form the basis of a new plantation rubber industry. This expansion and industrialisation of the cultivation of rubber came at an opportune time – the invention of the pneumatic tyre and the development of the automobile industry led to a huge new demand for rubber around the turn of the century, which the new plantations were in a position to meet.

Wild rubber was also being harvested to meet this time in the Belgian Congo, involving an atrocious level of violent exploitation of the indigenous population by the colonisers. But most of the rubber being produced to meet the new demand came from the British Empire plantations; this cultivation may not have been accompanied by the atrocities committed in the Congo, but the competitive prices plantation rubber could be produced at reflected not just the capital invested and high productivity achieved, but also the barely subsistence wages paid to the workforce, imported from India and China.

Back in England, in 1892 the Birmingham based chemist William Tilden had demonstrated that rubber could be synthesised from turpentine [3]. But this invention created little practical interest in England. And why would it, given that the natural product is of a very high quality, and the British Empire had successfully secured ample supplies through its colonial plantations? The process was rediscovered by the Russian chemist Kondakov in 1901, and taken up by the German chemical company Bayer in time for the synthetic product to play a role in the First World War, when German access to plantation rubber was blocked by the allies. At this time the quality of the synthetic product was much worse than that of natural rubber; nonetheless German efforts to improve synthetic rubber continued in the 1920’s and 30’s, with important consequences in the Second World War.

It’s sobering[4] to realise that by 1919, the rubber industry constituted a global industry with an estimated value of £250 million (perhaps £12 billion in today’s money), on the cusp of a further massive expansion driven by the mass adoption of the automobile – and yet scientists were completely ignorant, not just of the molecular origins of rubber’s elasticity, but even of the very nature of its constituent molecules. It was the German chemist Hermann Staudinger who, in 1920, suggested that rubber was composed of very long, linear molecules – polymers. Obvious thought this may be now, this was a controversial suggestion at the time, creating bitter disputes in the community of German chemists at the time, a dispute that gained a political tinge with the rise of the Nazi regime. Staudinger remained in Germany throughout the Second World War, despite being regarded as deeply ideologically suspect.

Staudinger was right about rubber being made up of long-chain molecules, but he was wrong about the form those molecules would take, believing that they would naturally adopt the form of rigid rods. The Austrian scientist Herman Mark, who was working for the German chemical combine IG Farben on synthetic rubber and other early polymers, realised that these long molecules would be very flexible and take up a random coil conformation. Mark’s father was Jewish, so he left IG Farben, first for Austria, and then after the Anschluss he escaped to Canada. At the University of Vienna in the 1930’s, Mark developed, with Eugene Guth, the statistical theory that explains the elastic behaviour of rubber in terms of the entropy changes in the chains as they are stretched and unstretched. This, at last, provided the basic explanation for the effect Gough discovered more than a century before, and that Joule quantified – the rise of temperature that occurs when rubber is stretched.

By the start of the Second World War, both Mark and Guth found themselves in the USA, where the study of rubber was suddenly to become very strategically important indeed. The entry of Japan into the war and the fall of British Malaya cut off allied supplies of natural rubber, leading to a massive scale up of synthetic rubber production. Somewhat ironically, this was based on the pre-war discovery by IG Farben of a version of synthetic rubber that had a great improvement in properties on previous versions – styrene-butadiene rubber (Buna-S). Standard Oil of New Jersey had an agreement with IG Farben to codevelop and market Buna-S in the USA.

The creation, almost from scratch, of a massive synthetic rubber industry in the USA was, of course, just one dimension of the USA’s World War 2 production miracle, but its scale is still astonishing [5]. The industry scaled up, under government direction, from producing 231 tons of general purpose rubber in 1941, to a monthly output of 70,000 tons in 1945. 51 new plants were built to produce the massive amounts of rubber needed for aircraft, tanks, trucks and warships. The programme was backed up by an intensive R&D effort, involving Mark, Guth, Paul Flory (later to win the Nobel prize for chemistry for his work on polymer science) and many others.

There was no significant synthetic rubber programme in the UK in the 1920’s and 1930’s. The British Empire was at its widest extent, providing ample supplies of natural rubber, as well as new potential markets for the material. That didn’t mean that there was no interest in improving scientific understanding of the material – on the contrary, the rubber producers in Malaya first sponsored research in Cambridge and Imperial, then collectively created a research laboratory in England, led by a young physical chemist from near Manchester, Geoffrey Gee. Gee, together with Leslie Treloar, applied the new understanding of polymer physics to understand and control the properties of natural rubber. After the war, realising that synthetic rubber was no longer just an inferior substitute, but a major threat to the markets for natural rubber, Gee introduced a programme of standardisation of rubber grades which helped the natural product maintain its market position.

Gee moved to the University of Manchester in 1953, and some time later Treloar moved to the neighbouring institution, UMIST, where he wrote the classic textbook on rubber elasticity. Manchester in the 1950’s and 60’s was a centre of research into rubber and networks of all kinds. Perhaps the most significant new developments were made in theory, by Sam Edwards, who joined Manchester’s physics department in 1958. Edwards was a brilliant theoretical physicist, who had learnt the techniques of quantum field theory with Julian Schwinger in a postdoc at Harvard. Edwards, having been interested by Gee in the fundamental problems of polymer physics, realised that there are some deep analogies between the mathematics of polymer chains and the quantum mechanical description of the behaviour of electrons. He was able to rederive, in a much more rigorous way that demonstrated the universality of the results, some of the fundamental predictions of polymer physics that had been postulated by Flory, Mark, Guth and others, before going onto results of his own of great originality and importance.

Edwards’s biggest contribution to the theory of rubber elasticity was to introduce methods for dealing with the topological constraints that occur in dense, cross-linked systems of linear chains. Polymer chains are physical objects that can’t cross each other, something that the classical theories of Guth and Mark completely neglect. But it was by then obvious that the entanglements of polymer molecules could themselves behave as cross-links, even in the absence of the chemical cross linking of vulcanisation (in fact, this is already suggested looking back at Gough’s original 1805 observations, which were made on raw, unvulcanised, rubber). Edwards introduced the idea of a “tube” to represent those topological constraints. Combined with the insight of the French physicist Pierre-Gilles de Gennes, this led not just to improved models for rubber elasticity taking account of entanglements, but a complete molecular theory of the complex viscoelastic behaviour of polymer melts [6].

Another leading physicist who emerged from this Manchester school was Julia Higgins, who learnt about polymers while she was a research fellow in the chemistry department in the 1960’s. Higgins subsequently worked in Paris, where in 1974 she carried out, with Cotton, des Cloiseux, Benoit and others, what I think might be one of the most important single experiments in polymer science. Using a neutron source to study the scattering from a melt of polymer molecules, some of which were deuterium labelled, they were able to show that even in the dense, entangled environment of a polymer melt, a single polymer chain still behaves as a classical random walk. This is in contrast with the behaviour of polymers in solution, where the chains are expanded by a so-called “excluded volume” interaction – arising from the fact that two segments of a single polymer chain can’t be in the same place at the same time. This result had been anticipated by Flory, in a rather intuitive and non-rigorous way, but it was Edwards who proved this result rigorously.

[1] My apologies for the rather contrived title. No-one calls Manchester “Rubber City” – it is traditionally a city built on cotton. The true Rubber City is, of course, Akron Ohio. Neither can anyone really describe any of the figures I talk about here as “rebels” (with the possible exception of Staudinger, who in his way is rather a heroic figure). But as everyone knows [7], Akron was a centre of music creativity in the mid-to-late 1970s, producing bands such as Devo, Per Ubu, and the Rubber City Rebels, whose eponymous song has remained a persistent earworm for me since the late 1970’s, and from which I’ve taken my title.
[2] And I do mean “English” here, rather than British or UK – it seems that Scotland had its own patent laws then, which, it turns out, influenced the subsequent development of the rubber boot industry.
[3] It’s usually stated that Tilden succeeded in polymerising isoprene, but a more recent reanalysis of the original sample of synthetic rubber has revealed that it is actually poly(2,3-dimethybutadiene) (https://www.sciencedirect.com/science/article/pii/S0032386197000840)
[4] At least, it’s sobering for scientists like me, who tend to overestimate the importance of having a scientific understanding to make a technology work.
[5] See “U.S. Synthetic Rubber Program: National Historic Chemical Landmark” – https://www.acs.org/content/acs/en/education/whatischemistry/landmarks/syntheticrubber.html
[6] de Gennes won the 1991 Nobel Prize for Physics for his work on polymers and liquid crystals. Many people, including me, strongly believed that this prize should have been shared with Sam Edwards. It has to be said that both men, who were friends and collaborators, dealt with this situation with great grace.
[7] “Everyone” here meaning those people (like me) born between 1958 and 1962 who spent too much of their teenage years listening to the John Peel show.