Open Science is rwx science (reloaded)

mbuliga@pm.me or @xorasimilarity on telegram

These are motivational materials in favor of OS, written during several years when I struggled to practice what I preached. If this inspires you then the goal is achieved.

The gist is that it is much easier to do Open Science than to wait for the perfect Open Access infrastructure.

also available at: https://github.com/mbuliga/writings/blob/main/os-is-rwx.md

Open Science is rwx science

Preamble: this is a short text on Open Science, written a while ago (2017), which I now put it here. It is taken from this version. The link (not the content) appeared in the Chemlambda for the people post. I can't find other traces, except the empty github repository creat, described as "framework for research output as a living creature".

I am a big fan of Open Science. For me, a good piece of research is one which I can Read Write eXecute.

Researchers use articles to communicate. Articles are not eXecutable. I can either Read others' articles or Write mine. I have to trust an editor who tells me that somebody else, whom I don't know, read the article and made a peer-review.

No. Articles are stories told by researchers about how they did the work. And since the micromanagement era, they are even less: fungible units to be used in funding applications, by the number or by the keyword.

This is so strange. I'm a mathematician and you probably know that mathematics is the most economical way to explain something clearly.

Take a 10 pages research article. It contains the intensive work of many months. Now, compress the article further more by the following ridiculous algorithm: throw away everything but the first several bits. Keep only the title, the name of the journal, keywords, maybe the Abstract. That's not science communication, that's massive misuse of brain material.

So I'm an Open Science fan, what should I do instead of writing articles? Maybe I should push my article in public and wait after that for somebody to review it. That's called Open Access and it's very good for the readers. So what? the article is still only Readable or Writable, pick only one option, otherwise it's bad practice. What about my time? It looks that I have to wait and wait for all the bosses, managers, politicians and my fellow researchers to switch to OA first.

It's actually much easier to do Open Science, remember! something that you can Read, Write and eXecute. As an author, you don't have to wait for the whole society to leave the old ways and to embrace the new ones.

You can just push what you did: stories, programs, data, everything. Any reader can pull the content and validate it, independently. EXecute what you pushed, Read your research story and Write derivative works.

I tried this! Want to know how to build a molecular computer which is indiscernible from how we are made? Use this playground called chemlambda. It's a made up, simple chemistry. It works like the real chemistry does, that is locally, randomly, without any externally imposed control. My bet is that chemlambda can be done in real life.

Now, or in a few years.

I used everything available to turn this project into Open Science. You name it:

old form articles,
old html and javascript articles, new official page,
research blog,
old [github repository, new quine graphs repository,
figshare data repository,
funny animations obtained from simulations. Those simulations can be run on your computer, so you can validate my research. I put them in a Google+ collection, later deleted, then republished in a better form.

During this project I realized that it went beyond a Read Write Execute thing. What I did was to design many interesting molecules. They work by themselves, without any external control. Each molecule is like a theorem and the chemical evolution is the proof of the theorem, done by a blind, random, stupid, universal algorithm.

Therefore my Open Science attempt was to create molecules, some of them exhibiting a metabolism, some of them alive. Maybe this is the future of Open Science. To create a living organism which embodies in its metabolism the programs and research data. It's valid if it lives, grow, reproduces, even die. Let it cross breed with other living creatures. In time the natural selection will do marvels. Life is not different than Science. Science is not different than life.

An interview

I save here a copy of the archived version of this interview from 2016, thanks to Guillaume Dumas (Hack your PhD). (I found the original has a new address.)

Q: Hi Marius, can you present yourself and what you are studying?

Hi, I’m a wandering geometer by heart, so I move from a problem to another driven by the feeling that there is a hidden geometric bit which needs to be revealed. My phd is on the Mumford-Shah functional which can be used to find edges in images and also to model the appearance and propagations of fractures. Nonlinear elasticity (think about big deformations, like in rubber) is about minimizing energy functionals over infinite dimensional groups. Convex analysis and hamiltonian mechanics can be mixed into the study of non-smooth, dissipative systems. Sub-riemannian geometry is actually only a model for a non-commutative differential calculus, not enough studied yet (despite fundamental contributions from great mathematicians like Gromov, Mostow, Margulis), where the space, even at a local level, even if trivial from a topological point of view, has a fractal behavior which is smooth in an unusual sense.

Now I’m interested in distributed, decentralized computing, biology and artificial life. It’s related because if you try to put back geometry into computation (and not trying to eliminate it with one dimensional thinking) then you’ll get models which resemble a lot both molecular computing and decentralized networks. The artificial chemistry chemlambda is my latest toy.

Q: What is open science for you?

I got hooked by the web in 1994, at Ecole Polytechnique, Paris. The Mosaic browser, the possibilities… At that point I understood that we pass through a revolution like the one started by the invention of the press, but much, much bigger and faster. Everybody knows this, or it should. (Who doesn’t? Well, all these people from academia who take disastrous decisions for their students, are hand in hand with legacy publishers and eventually will go into a yummy retirement, leaving the academia in a far worse state than decades ago.)

It is natural then to try to do open science. For me this means that I put almost everything I do online, with the hope that if there is something valuable for anybody else, then this is the best strategy.

Science does not happen in isolation, but it is (it was) hard to pass the barriers of time and space. Books are wonderful means to discuss with people who lived long ago or who are not even born. But this is a slow channel. Moreover, it became very noisy, due to legacy publishers and their minions from the academia. The geographic neighborhood of the researcher was enlarged by travels and correspondence by letters, in a small, not very well connected world.

With the means of the web, communication of science exploded. Open science is a matter of evolution of ideas. I can’t stop to think about it as a mating strategy. What is best to propagate one’s genetic traits? An open behavior or an ornate and arthritic mating ritual, performed in secrecy inside a very small community? There is no question which is better.

3. You have put all your research online. Can you tell us more about the open notebooks?

I have most of my articles on arXiv.org. This is the NASA of open science, old technology but still better than anything else, for the moment. I used figshare.com. Of course, there is my homepage, which exists since 1995. Then I have an open notebook at chorasimilarity.wordpress.com. There are more than 500 posts there. From these, about 300 are mostly research notes, some of them transformed into articles, but there is a lot there. Many other posts are about OA.

More important, I have a recent GitHub account. That’s the best variant, because it allows to play with validation as an alternative to peer-review.

Q: How did you decide to do put all your research online, even before publication?

I believed in the web from the first moment, as I wrote before. Besides this, it was a gradual process. I use(d) to write articles which mix several fields and I was told, repeatedly, that such articles are difficult to publish. I have articles which took years to get published (my record is 17 years). Practically, I got again and again into the same problem: by the time the article is accepted for publication I have a secret backlog of many better results and moreover I am already bored by the subject and looking for a new one. The arXiv.org variant solved only partially this problem. I still have many (physical) notebooks which have not turned into arXiv articles because I can’t stand to explain once more why and how and what about various subjects.

But for some time the arXiv was the solution for me to avoid the years lost into waiting for the rare (I wonder why so rare) reviewer who is willing to spend some time with my article. I just put them in the arXiv and moved along.

Then, in 2010 I had very strong feelings that I should go much more open, regardless the outcome. I was visiting Rio de Janeiro, a great city, and perhaps the amazing exuberance of life there made me start to try to move towards computation and biology, from a mathematical point of view. So, when back in Romania I opened the chorasimilarity.wordpress.com and entered into disputes about OA.

More recently, by a slow process I understood that programs are better (more rigorous) than proofs, another idea that everybody heard about but it is hard to internalize. So now I use GitHub as a validation means for my research and I write a lot about it on Google+.

Q: Did your openness have any consequences on your career (good or bad)?

There have been good and bad consequences. Good, because all the people I met, scientifically, are among those contacted via the web. Bad, because a lot of my work does not appear on the bureaucratic radar.

Q: When we first interact, it was about peer review. You mentioned an alternative or complementary concept: validation. Can you explain what is it and why it could be better than traditional peer review?

Yes, validation is one of those ideas which floats in the air.

People from biology, medicine, neuroscience, etc, remarked that despite the avalanche of peer-reviewed research, a significant proportion cannot be validated by other researchers, independently.

Validation, as I see it, means that the researcher has to provide to the reader as much as possible so that the reader can validate (by reproduction, or by reasoning, or by any other means the reader has) the research.

Peer-review is a social validation, from the point of view of the reader it looks like this: hm, I’m reading an article in the journal. That means that the editor interacted with some reviewers who told the editor that the article is good for publication. I don’t know why the reviewer thought so, nor how the reviewer arrived to this conclusion. A good guess is that the reviewer arrived to this conclusion by reading the words and nothing more deep than this. The reviewer had the same means as the reader of the article to arrive to a conclusion, with the implication that the reader has access to the article with the prestige attached by the appearance in the journal, only because another reader recommended the article for publication. Otherwise said, if the prestige stamp of the journal is ignored, the reader has exactly the same means to make sense of the article as the reviewer had. The role of peer review is to suspend the disbelief of the reader because there has been another reader before.

This is a very weird practice, right? We can enhance it by making open, perpetual, pre- and post- publication peer-reviews. This way, if it works, we might see the reviews and get input from other people impressions about the article.

But this is not solving the validation side. What are the means that anybody, first reader (the reviewer) or second reader, has to validate it? What if the reader could replay the research process, instead of just reading some words which describe it?

From here the idea of an article which runs in the browser. Suppose that you could play with the research, as you read the article. Suppose you have access to as much as possible data about this research. Then, from the point of view of the reader, this would be great. It does not exclude peer-review, it provides instead a more rigorous fundament for the opinions expressed through peer review.

This is a new form of research communication, a sort of super-article. Easy to say, but not at all clear what this means. There are proposals, one which I like is the PeerJ-paper now, but there are others as well. Open notebooks, why not? My version is simply a github.io collection of pages, with all means to run the programs instead of reading the proofs.

Q: To finish, what would be the most important advice(s) you will give to young researcher and students interested in becoming scientists?

Think about the future. If you want to do research then do it.

Nobody dreamt, as a child, to spend life writing N articles/year in journals with impact greater than p.

Really, this is reserved to people who use research as a means towards something else. If you want to do research then do it, inside or outside academia. You are important. You have a dream, do it.

Bemis and the bull

from: https://chorasimilarity.wordpress.com/2015/05/10/bemis-and-the-bull/

Bemis said:

"I fell at the foot of the only solitary tree there was in nine counties adjacent (as any creature could see with the naked eye), and the next second I had hold of the bark with four sets of nails and my teeth, and the next second after that I was astraddle of the main limb and blaspheming my luck in a way that made my breath smell of brimstone. I had the bull, now, if he did not think of one thing. But that one thing I dreaded. I dreaded it very seriously. There was a possibility that the bull might not think of it, but there were greater chances that he would. I made up my mind what I would do in case he did. It was a little over forty feet to the ground from where I sat. I cautiously unwound the lariat from the pommel of my saddle------"

"Your saddle? Did you take your saddle up in the tree with you?"

"Take it up in the tree with me? Why, how you talk. Of course I didn't. No man could do that. It fell in the tree when it came down."

"Oh---exactly."

"Certainly. I unwound the lariat, and fastened one end of it to the limb. It was the very best green raw-hide, and capable of sustaining tons. I made a slip-noose in the other end, and then hung it down to see the length. It reached down twenty-two feet---half way to the ground. I then loaded every barrel of the Allen with a double charge. I felt satisfied. I said to myself, if he never thinks of that one thing that I dread, all right---but if he does, all right anyhow---I am fixed for him. But don't you know that the very thing a man dreads is the thing that always happens? Indeed it is so. I watched the bull, now, with anxiety---anxiety which no one can conceive of who has not been in such a situation and felt that at any moment death might come.

Presently a thought came into the bull's eye. I knew it! said I---if my nerve fails now, I am lost. Sure enough, it was just as I had dreaded, he started in to climb the tree------"

"What, the bull?"

"Of course---who else?""

[Mark Twain, Roughing It, chapter VII]

Like Bemis, legacy publishers hope you'll not think the unthinkable.

That we can pass to a new form of research sharing.

In publicity they say that the public is like a bull, in the sense that a bull can be taken anywhere by the ring in his nose. Certainly not like the bull from Bemis' story.

When you read an article you are like a passive couch potato in front of the TV. They (the publishers, hand in hand with academic managers) cast the shows, you have the dubious freedom to tap onto the remote control.

Now, it is possible, hard but possible and doable on a case by case basis. It is possible to do more. Comparable to the experience you have in a computer game vs the one you have in front of the TV.

You can experience research actively, via research works which run in the browser. I'll call them "articles" for the lack of the right name, but articles they are not.

An article which runs in the browser should have the following features:

you, the reader-gamer, can verify the findings by running (playing) the article
so there has to be some part, if not all of the content, into a form which is executed during gameplay, not only as an attached library of programs which can be downloaded and run by the interested reader (although such an attachment is already a huge advance over the legacy publisher pity offer)
verification (aka validation) is up to you, and not limited to a yes/no answer. By playing the game (as well as other related articles) you can, and you'll be interested into discovering more, or different, or opposing results than the one present in the passive version of the article and why not in the mind of the author
as validation is an effect of playing the article, peer review becomes an obsolete, much weaker form of validation
peer review is anyways a very weird form of validation: the publisher, by the fact it publishes an article, implies that some anonymous members of the research guild have read the article. So when you read the article in the legacy journal you are not even told, only hinted that somebody from the editorial staff exchanged messages with somebody who's a specialist, who perhaps read the article and thought it is worthy of publication. This is so ridiculous, but that is why you'll find in many reviews, which you see as an author, so many irrelevant remarks from the reviewer, like my pet example of the reviewer who's offput by my use of quotation signs. That's why, because what the reviewer can do is very limited, so in order to give the impression he/she did something, to give some proof that he/she read the article, then it comes with this sort of circumstantial proof. Actually, for the most honest reviewer, the ideally patient and clever fellow who validates the work of the author, there is not much else to do. The reviewer has to decide if he believes it or not, from the passive form of the article he received from the editor, and in the presence of the conflict of interests which comes from extreme specialisation and low number of experts on a tiny subject. Peer review is not even a bad joke.
the licence should be something comparable to CC-BY-4.0, and surely not CC-BY-NC-ND. Something which leave free both the author and the reader/gamer/author of derivative works, and in the same time allows the propagation of the authorship of the work
finally, the article which runs in the browser does not need a publisher, nor a DRM manager. What for?

Reproducibility vs peer review

from: https://chorasimilarity.wordpress.com/2015/04/09/reproducibility-vs-peer-review/

Here are my thoughts about replacing peer review by validation.

Peer review is the practice where the work of a researcher is commented by peers. The content of the commentaries (reviews) is clearly not important. The social practice is to not make them public, nor to keep a public record about those. The only purpose of peer review is to signal that at least one, two, three or four members of the professional community (peers) declare that they believe that the said work is valid.

Validation by reproducibility is much more than this peer review practice. Validation means the following:

a researcher makes public (i.e. "publishes") a body of work, call it W. The work contains text, links, video, databases, experiments, anything. By making it public, the work is claimed to be valid, provided that the external resources used (as other works, for example) are valid. In itself, validation has no meaning.
a second part (anybody) can also publish a validation assessment of the work W. The validation assessment is a body of work as well, and thus is potentially submitted to the same validation practices described here. In particular, by publishing the validation assessment, call it W1, it is also claimed to be valid, provided the external resources (other works used, excepting W) are valid.
the validation assessment W1 makes claims of the following kind: provided that external works A,B,C are valid, then this piece D of the work W is valid because it has been reproduced in the work W1. Alternatively, under the same hypothesis about the external work, in the work W1 is claimed that the other piece E of the work D cannot be reproduced in the same.
the means for reproducibility have to be provided by each work. They can be proofs, programs, experimental data.

As you can see the validation can be only relative, not absolute. I am sure that scientific results are never amenable to an acyclic graph of validations by reproducibility. Compared to peer review, which is only a social claim that somebody from the guild checked it, validation through reproducibility is much more, even if it does not provide means to absolute truths. What is preferable: to have a social claim that something is true, or to have a body of works where "relative truth" dependencies are exposed? This is moreover technically possible, in principle. However, this is not easy to do, at least because:

traditional means of publication and its practices are based on social validation (peer review)
there is this illusion that there is somehow an absolute semantical categorification of knowledge, pushed forward by those who are technically able to implement a validation reproducibility scheme at a large scale.

The mentioned illusion is also related to outdated parts of the cartesian method. It is therefore a manifestation of the "cartesian disease" at github, at telegra.ph.

Let's take several researchers who produce works, some works related to others, as explained in the validation procedure.

Differently from the time of Descartes, there are plenty of researchers who think in the same time, and moreover the body of works they produce is huge.

Every piece of the cartesian method has to be considered relative to each researcher and this is what causes many problems.

Parts (1a),(1b), (1c)

(1a) "never to accept anything for true which I did not clearly know to be such"

(1b) "to comprise nothing more in my judgement than what was presented to my mind"

(1c) "so clearly and distinctly as to exclude all ground of doubt"

can be seen as part of the validation technique, but with the condition to see "true"and "exclude all grounds of doubt" as relative to the reproducibility of work W1 by a reader who tries to validate it up to external resources.

Parts (2a), (2b)

(2a) "to divide each of the difficulties under examination into as many parts as possible"

(2b) "and as might be necessary for its adequate solution"

are clearly researcher dependent; in a interconnected world these parts may introduce far more complexity than the original research work W1.

Combined with (1c), this leads to the illusion that the algorithm which embodies the cartesian method, when run in a decentralized and asynchronous world of users, HALTS.

There is no ground for that.

The most damaging part is (3d)

"assigning in thought a certain order even to those objects which in their own nature do not stand in a relation of antecedence and sequence"

First, every researcher embeds a piece of work into a narrative in order to explain the work. There is nothing "objective" about that. In a connected world, with the help of Google and alike, who impose or seek for global coherence, the parts (3d) and (2a), (2b) transform the cartesian method into a global echo chamber.

The management of work bloats and spill over the work itself and in the same time the cartesian method always HALT, but for no scientific reason at all.

Open Science is rwx science (reloaded)

Open Science is rwx science

An interview

Bemis and the bull

Reproducibility vs peer review

Report Page