Wednesday, September 05, 2007

Definitions in Open Science SFLO Transcript

We had our third SciFoo Lives On session yesterday (September 3, 2007).

The topic was "Definitions in Open Science" and there were two posters: mine on Open Notebook Science and Bill Hooker's on broader considerations. Richard Akerman also presented, using some of Bill's slides, where he was cited.

As I expected, there were fewer attendees than last week (about a dozen compared to over 30 for the Medicine 2.0 session). Defining terms is not exactly the sexiest of topics. As Bill mentioned, it is more interesting to do the work, rather than philosophize about it. I agree with him for the most part - in fact I think that defining terms from a top down way can be counterproductive, especially if we hold back doing work, waiting for standards and terms to be entirely established.

Terms are useful to succinctly communicate ideas and it is even fine for terms to be "fuzzy", as long as everyone communicating understands that. For example, the term "open science" is extremely broad and as such it is useful to describe categories (such as sessions here and at SciFoo).

Problems arise when people use terms with multiple meanings. For example "Open Source Science" may mean Open Source scientific software or science done openly. Even in the later definition, confusion can arise. For me, openly means that that the raw data should be available on the open internet but that is not an assumption shared by everyone. That's why I started using the term "Open Notebook Science".

A recent confusion over definitions has surprised many people. The term "Open Access" has had a fairly unambiguous definition: universal free online access to traditional journal articles. However, as Peter Murray-Rust has reported lately, the term has been abused my some publishers.

Next week - Monday September 10, 2007 at 16:00 - the SFLO session is "Video in Science". We should have a good turnout. For other sessions information see the SFLO wiki.

Here is the transcript:

[9:00] You: ok lets get started
[9:00] You: you will notice some changes from last week
[9:00] You: the posters are ordered and will remain there
[9:00] You: so that we can have poster sessions anytime
[9:00] You: there are bells on some of them
[9:01] You: they tell you if the presenter is online
[9:01] You: and gives you a way to summon them if youwant to talk
[9:01] You: I thought we would give the "poster session" a try in breakout groups after the talks
[9:01] You: we have a smaller set this week anyway - 3 speakers
[9:02] You: so this week is about definitions in open science
[9:02] You: I have 3 slides here to discuss my concern
[9:02] You: Bill Hooker and Richard Akerman will discuss their points in a few minutes
[9:03] You: the confusion has been (I noticed this at Scifoo also)
[9:03] You: that one person uses the term open science
[9:03] You: and the listener interprets it very differently
[9:03] You: there are many shades of open science
[9:04] You: the most closed form of science I think is the private lab notebook of a researcher
[9:04] You: then the traditional article is shared with the world but is not free
[9:04] You: Open Access has come to mean something very specific
[9:05] You: that of standard articles made free to view
[9:05] You: well Peter Murray-Rust would have a lot more to say about that but couldn't make it
[9:05] You: finally on this plot the term that is of most interest to me
[9:06] You: is Open Notebook Science, where the reseracher's actual lab notebook is made public in real time
[9:06] You: I think it is important that ALL experiments are made public for ONS
[9:06] You: the selection of experiments when writing an article can leave one with the impression
[9:07] You: that the science is a lot simpler than it really is
[9:07] You: by doing ONS, people can also study how science actually gets done, messiness and all
[9:07] You: we do ours in a wiki
[9:07] You: but other people have interesting approaches
[9:08] You: Cameron Neylon is using a blog to record experiments
[9:08] You: any comments or issues you guys want to discuss?
[9:08] Troy McLuhan noticed that the blogs now support LaTeX markup - handy for math etc.
[9:09] Hiro Sheridan: I love the idea of how students can view the research process
[9:09] Berci Dryke: and how they can be tracked :)
[9:09] Hiro Sheridan: I wish I had that available when I was a student
[9:09] You: yes it is a way of interacting with my students that is handy
[9:09] CW Underwood: there's a caution to be voiced, though, about laws and sausages...
[9:09] Vidal Loon: tracked, corrected, discussed, etc
[9:09] Emile Pintens: Is everyone else hearing the side conversation?
[9:09] You: I leave comments in bold and italics and they can respond
[9:09] CW Underwood: there's a side discussion?
[9:09] Vidal Loon: wikis allow for all this and more.
[9:10] You: we're not using right now guys - maybe later in the breakout sessions
[9:10] WhiteWizard Chemistry: Agreed. My concern is that we might end up in a situations where different ways of recording science will not interact with each other and not be transportable
[9:10] Duriel Akula: from an evolutionary point of view it also makes the process modular. the importance becomes the individual experiments and not the paper
[9:10] Troy McLuhan: When I said blogs, I meant wikis
[9:10] WhiteWizard Chemistry: Troy that is excellent
[9:10] You: the standardization is an issue
[9:10] Hiro Sheridan: Horace do you want to mention how we used google docs to collaborate?
[9:11] CW Underwood: modularity is very important -- i think the future of the scientific paper as we know it is limited
[9:11] Emile Pintens: We have integrated wikispaces into Knowble as an FYI
[9:11] You: the way we are approaching it is to first worry about recording all the science
[9:11] Emile Pintens: It is a work in progress though..
[9:11] CW Underwood: i think science will (soon?) be reported experiment-by-experiment
[9:11] You: so we use free text with links to spectra and molecules
[9:11] You: but we are going to add automation to process much of that
[9:12] Hiro Sheridan: :)
[9:12] You: for example we now use InChI tags in all experiments
[9:12] You: to track molecules
[9:12] WhiteWizard Chemistry: How does this tie into Neil Saunders' desire for an open source electronic lab notebook? I can see one with specific modules for different science/data types
[9:13] Duriel Akula: i guess that goes a bit into the standards , microformats etc
[9:13] You: usually when people refer to open source electronic notebooks they mean open source software I think
[9:13] CW Underwood: frankly, i haven't seen an ELN that does anything better than a wiki...
[9:14] You: from my experience i think a general purpose wiki works very well as lab notebook
[9:14] Duriel Akula: I cant speak for Neil , but I think he does mean a ELN system. that really does not exist per se.
[9:14] CW Underwood: though I haven't looked at the Neylon lab's attempt to use custom blog software as an ELN
[9:14] WhiteWizard Chemistry: The cool thing is that we are just at the start and there is such an opportunity here
[9:14] Vidal Loon: Wikis can be extended with many new functions. Mediawiki allows for extensions and these can make the wiki a better tool for science research information. One very good example of this is the work being done at OpenWetWare.
[9:14] You: the Neylon lab is based on a blog format
[9:15] You: ww I think that is the point
[9:15] You: there are lots of opportunities
[9:15] Rakerman Yellowjacket: I think eSciDoc may be working on an e-lab notebook
[9:15] You: to experiment with science
[9:15] CW Underwood: " experiment with science" -- that, after all, is what we do, no? :-)
[9:16] Duriel Akula: I am hoping one of the big guys comes out with a system for your scientists to manage their labs online, so that this can lift off
[9:16] You: I think scientists take a lot for granted in how science can be done
[9:16] Duriel Akula: *young scientists"
[9:17] You: Richard, would you like to say a few words before Bill?
[9:17] Rakerman Yellowjacket: sure
[9:17] Rakerman Yellowjacket: let me run some ideas up the virtual flagpole, as it were
[9:17] Vidal Loon: I personally think that these tools and much more can be weaved together to make a real "office" type of setup for the scientist to keep tabs on the info and have it available for other.
[9:17] Rakerman Yellowjacket: my perspective comes from software engineering and enterprise architecture
[9:18] Rakerman Yellowjacket: both of which have big aspects of technology planning based on requirements gathering
[9:18] Vidal Loon: there are just so many tools available to juggle data nowadays. it's a question of knitting them all together.
[9:18] CW Underwood: VL: as long as each module stands alone and doesn't lock anyone into anything, that's a great idea (don't like the idea of Office as model though...)
[9:18] WhiteWizard Chemistry: Agrees with Rakerman
[9:18] Rakerman Yellowjacket: so when I saw the discussions at SciFoo I thought it would be useful to apply that kind of thinking, to systematise the discussion
[9:19] Rakerman Yellowjacket: I found a lot of things were getting mixed together - desire for speedy publication, desire for recognition for grants/tenure/post-doc, and issues with the current scholarly communication system
[9:19] Rakerman Yellowjacket: so i think the key is to focus on what problem you're trying to solve
[9:19] Rakerman Yellowjacket: To me, when I hear the core of the discussions about open science
[9:19] Rakerman Yellowjacket: it's about making better *science* through collaboration and open sharing
[9:20] CW Underwood: better science, yep
[9:20] Rakerman Yellowjacket: that is quite different from some of the goals of open access, which address availability of science later on
[9:20] Rakerman Yellowjacket: so I would suggest focusing around the aspect of making better science happen through open communication
[9:21] Rakerman Yellowjacket: then that leads to a clearer discussion of the kinds of tools that would support that
[9:21] Rakerman Yellowjacket: we have already seen a good discussion about using wikis
[9:21] Rakerman Yellowjacket: from a library perspective, I think one of the most useful aspects is to preserve more of the scientific work from the get-go
[9:21] You: yes Richard is right - first and foremost get the science done then worry about technology
[9:22] Rakerman Yellowjacket: often today data and failed experiments are lost from the record, even if they had some original ditgital format
[9:22] Rakerman Yellowjacket: we don't want the current science work to be a digital dark age when seen from the future
[9:22] WhiteWizard Chemistry: Scientists should worry about the science ... someone else should (working with the scientists) think about the technology that makes lives easier for scientists
[9:22] Rakerman Yellowjacket: I see some positive developments in terms of science communication
[9:23] Rakerman Yellowjacket: one example is eSciDoc, a project in Germany to develop end-to-end communication - from lab notebook to publication and beyond
[9:23] Rakerman Yellowjacket:
[9:23] You: As long as authors retain copyright - they don't have to worry about re-formatting their work :)
[9:23] Rakerman Yellowjacket: I also think Fedora Commons presents a lot of opportunities in this area
[9:23] Rakerman Yellowjacket:
[9:24] Rakerman Yellowjacket: as a mostly-technology guy these days, I'm interseted in gathering the best requirements so that the tools support the real goals!
[9:24] You: have you looked at the escidoc richard?
[9:24] Rakerman Yellowjacket: So I agree with WhiteWizard
[9:24] Rakerman Yellowjacket: What I'm concerned about is if we're not careful, often we end up with closed systems - like Facebook, or even Blackboard and WebCT
[9:25] Rakerman Yellowjacket: Horace I have seen the eSciDoc people present, I think they're doing great work
[9:25] You: I think redundancy is a way to safeguard getting locked in
[9:25] You: is their work open to anyone
[9:25] Duriel Akula: I would not mind to have a company making a profit with such a system
[9:25] Duriel Akula: as long as the data was open to anyone
[9:25] WhiteWizard Chemistry: and examples like Aordpress/Automattic provide excellent examples for profitable open source platforms
[9:26] Rakerman Yellowjacket: as far as I know, they plan to share the Scholarly Workbench once it is developed
[9:26] WhiteWizard Chemistry: oope Wordpress
[9:26] You: the company interface is interesting - for example ChemSpider
[9:26] CW Underwood: I don't mind who makes what money, but I'd like to see the software on SourceForge...
[9:26] You: is a company but giving out database for free
[9:27] Rakerman Yellowjacket: I think there are lots of different models, the main goal is that the information should be open to improve science collaboration and preservation
[9:27] WhiteWizard Chemistry: There are models oand made available to other developers to build on top of
[9:27] Rakerman Yellowjacket: Horace, do you want to move to Bill's presentation now?
[9:28] You: sure
[9:28] Duriel Akula: it actually an interesting business model, so much would be known about each groups reasearch that it would make ads much more valuable
[9:28] You: thanks Richard
[9:28] CW Underwood: heh, presentation?
[9:28] CW Underwood: I'm not sure what I can say that isn't in the slides
[9:28] CW Underwood: they are so full of text because I had no idea what SL was like
[9:29] You: do you have any thoughts Bill that have not been discussed?
[9:29] CW Underwood: I think the easiest way to go through it would be as questions for discussion
[9:29] CW Underwood: first question being: do we need/want to define Open Science
[9:29] CW Underwood: further than it already is (al la Richard)
[9:30] Hiro Sheridan: Do you think there is a need for an open-science format - some sort of xml data format?
[9:30] WhiteWizard Chemistry: I think that is a key point. We tend to get bogged down in philosophical arguments
[9:30] CW Underwood: i think that most of the tools are already available, if perhaps a bit clumsy
[9:30] CW Underwood: think of wikis and what JC is doing
[9:30] CW Underwood: a fully featured ELN suite might be slicker
[9:31] CW Underwood: but it would not capture any more science
[9:31] Hiro Sheridan: It would be nice to be able to mashs thing up though
[9:31] CW Underwood: and once it becomes available it will be able to slurp up the wikified content
[9:31] CW Underwood: Hiro's idea of a standard format is very important though
[9:31] Rakerman Yellowjacket: I think we are going to need lots of different formats depending on the discipline (for data interchange)
[9:31] You: yes that is the point Bill - reprocess info
[9:31] CW Underwood: wihtout standards and metatdata data interchange is virtually impossible
[9:32] Troy McLuhan: There is MathML in mathematics
[9:32] WhiteWizard Chemistry: It's why I like the idea of a core platform (s) that adhere to some simple standards and allow people to build on top of that
[9:32] Rakerman Yellowjacket: I do think standard formats are *essential* to enable more automated processing and interchange of science
[9:32] CW Underwood: so in answer to Hiro, I would like to see such a format, but I wonder if (say) XML + MathML and
[9:32] WhiteWizard Chemistry: and those standards exist, MIAME, etc
[9:32] CW Underwood: a few others that already exist (MIAME, etc)
[9:32] CW Underwood: would not do the job already?
[9:33] Duriel Akula: but a minimal standard would be .. to minimal. just to say that here goes science. there is little else that would be common to all
[9:33] Doolin Chemistry: I think the emphasis on standards is way overemphasized
[9:33] Rakerman Yellowjacket: Dr. Liz Lyon has done quite a bit of work in the UK on open scholarship as well as some work specifically on crystallography formats
[9:33] Rakerman Yellowjacket:
[9:33] You: the differences between fields makes it very difficult
[9:33] WhiteWizard Chemistry: I think they would ... we just need the hooks ... I think the standards should focus on interoperability and let the scientific standards stay with the individual scientific areas
[9:33] Doolin Chemistry: I think a very flexible infrastracture that allows for rapid prototyping is the most essential thing.
[9:34] CW Underwood: WW, yes, interoperability is the reason for having the standards -- that is the job they should do
[9:34] Doolin Chemistry: Scientists can always figure out what some other data is about as long as the formt is completely described.
[9:34] Rakerman Yellowjacket: I think it may be useful to work on the idea of "semantically aware science wiki"
[9:35] You: we would love to collaborate with people with standard models
[9:35] CW Underwood: Richard's got a point, we do seem to be coming back to wikis time and again
[9:35] WhiteWizard Chemistry: Rakerman ... I really like your line of thought
[9:35] You: we have all the data there in the wiki already - just tell us or add the tags/structure
[9:35] CW Underwood: perhaps that would provide a good starting platform
[9:35] CW Underwood: yup, great minds and allthat
[9:36] CW Underwood: speaking of wikis then
[9:36] CW Underwood: perhaps we can skip most of the verbiage on my slides
[9:36] Rakerman Yellowjacket: there's a project in this area - a commercial effort -
[9:36] CW Underwood: and go to what I think is the best point
[9:36] Hiro Sheridan: Ok next question, what is the role that journals such as nature have or should have with open science?
[9:36] CW Underwood: again comes from Richard
[9:36] Rakerman Yellowjacket: more info at
[9:37] CW Underwood: which is that we could get a "good enough" solution to the "define Open Science" (non)problem
[9:37] CW Underwood: using the NodalPoint wiki
[9:37] CW Underwood: and that would be in keeping with the whole "small pieces loosely joined" approach
[9:37] CW Underwood: that seems to be coming to the fore in these discussions
[9:37] WhiteWizard Chemistry: Great place to have this discussion, but we might leave out non-life scientists
[9:38] You: the only time definitions become a problem is when people make assumptions
[9:38] CW Underwood: sure, but people do that all the time
[9:38] CW Underwood: look at the confusion over Open Access
[9:38] CW Underwood: (Hiro: I think Nature is taking the lead as far as the role of journals goes)
[9:38] You: Nature Precedings
[9:39] CW Underwood: the first job for journals is Open Access, on which Open Science depends
[9:39] Duriel Akula: a good way to hammer some of the definitions would be to write them together and then try to all together promote them in the blogs or open letters
[9:39] CW Underwood: after that, Nature Precedings is a *wonderful* thing, a preprint server for life sciences
[9:40] You: actually it is more than pre-print because they take almost any format
[9:40] You: it could be instead-of-print in a lot of cases
[9:40] Troy McLuhan: There are already things like Precedings in other branches of science, notably
[9:40] Hiro Sheridan: yes
[9:40] You: Troy no arxiv is different
[9:40] CW Underwood: Horace: yes, Precedings goes further than arXive
[9:40] You: Arxiv has a format
[9:41] CW Underwood: Precedings could, for instance, capture failed experiments and observations too small to constitute a formal paper
[9:41] You: exactly bill
[9:41] Hiro Sheridan: good point
[9:41] CW Underwood: such things could be put into Precedings more or less directly from
[9:41] CW Underwood: a lab wiki or ELN
[9:42] You: yes we would like to do that when experiments reach conclusions
[9:42] You: but they did take some blog posts of mine
[9:42] CW Underwood: can you update a document on Precedings?
[9:42] You: about work in progress
[9:42] You: Yes Bill
[9:42] You: there are versions in Precedings
[9:42] Duriel Akula: my particular view would be that a preceeding would be a good place to post solutions, once they are found. giving a time stant and DOI
[9:42] You: actually Precedings has a poster here - 3 I think
[9:42] CW Underwood: ah, good, then you could add a story result-by-result
[9:43] Hiro Sheridan: does precedings have rss/subscription capabilities?
[9:43] CW Underwood: in fact, Precedings could almost do what JC's wiki does
[9:43] CW Underwood: except that we would not want to fill Precedings up with "history"
[9:43] CW Underwood: the side-comments and thinking out loud and so on
[9:43] You: well it does not have a wiki style version system I think
[9:43] CW Underwood: that will be so valuable to historians of science
[9:44] CW Underwood: studying Open Notebook systems
[9:44] You: I think people like Heather Piwowar are interested in that
[9:44] Hiro Sheridan: so really its a place to store data?
[9:44] You: Hiro they won't take massive data sets (yet)
[9:44] Rakerman Yellowjacket: the Google people are also interseted in hosting datasets for people
[9:44] You: they will take posters
[9:45] CW Underwood: Google is interesting as a data repository
[9:45] You: last time I checked Preceding took ppt, pdf, doc
[9:45] Hiro Sheridan: Maybe they need to be a little more 'social'
[9:45] CW Underwood: we seem to be more interested in tools than definitions
[9:45] CW Underwood: which I think is a good thing
[9:45] You: that was 2 weeks ago :)
[9:45] CW Underwood: it is about getting things done after all
[9:46] Duriel Akula: we keep talking about the tools instead of definitions :)
[9:46] CW Underwood: so i'm not sure my slides are of much further use
[9:46] CW Underwood: perhaps we could move to the next person?
[9:46] You: Bill, people will read your slides when they visit
[9:47] Hiro Sheridan: Sorry guys I have to go, I'll catch up on the blog, keep it interesting :)
[9:47] CW Underwood: Horace, yep, that's what I was hoping -- there are a lot of links in there that may be useful
[9:47] You: Bill you are the last speaker - let me make some announcements
[9:47] CW Underwood: OK, i'm done
[9:47] You: next week Monday is video and science
[9:47] You: I think the turnout will be good
[9:47] CW Underwood: be sure to get Deepak!
[9:47] You: and we have several speakers - Berci is helping out with that
[9:47] Berci Dryke: we'll get him for sure
[9:48] You: yes Deepak will be there :)
[9:48] Berci Dryke: Videjog, JoVe, SciVee, Bioscreencast all seem to be interested
[9:48] You: to my left there is another 36 poster area
[9:48] You: right now I'm collecting posters for a ChemFoo area
[9:48] You: if other fields want to contribute that would be cool
[9:48] Berci Dryke: anyway, I've been blogging live about this session (in case you're interested to hear back your thoughts :)
[9:49] Duriel Akula: other fields ? general science ?
[9:49] You: the idea is to have "regular science" posters and discussion
[9:49] You: to "draw out" the unconverted colleagues
[9:49] You: if we have posters in their areas maybe they will visit
[9:49] Troy McLuhan: Like a discussion about the Homologies of Pn3?
[9:50] You: exactly Troy
[9:50] You: I have some organic chemistry I want to share
[9:50] You: on the synthesis of our anti-malarials
[9:50] You: I can help anyone put up their posters - if you have it in ppt you are almost done
[9:51] You: we have special boards to make it easy
[9:51] You: we also now have bells
[9:51] Emile Pintens: JC, sorry I didn't get a ppt done on open innovation
[9:51] You: if someone visits you can set up your second life IM to forward to your email
[9:52] You: do control P then change the setting
[9:52] You: Emile you can add a poster
[9:52] You: if there any posters from previous weeks that you would like to see
[9:52] You: you can check them out - if the speakers are here they can discuss
[9:52] Emile Pintens: Where should I put it up, I'll try and get it done later today
[9:52] You: like a regular poster session
[9:53] You: you can also play with voice at posters
[9:53] You: if you have it
[9:53] You: you can mute people from adjacent plots if needed
[9:53] You: so any other questions/comments?
[9:54] Rakerman Yellowjacket: post the link to the scifoolives on wiki :)
[9:54] You: yes I'll put the link to the wiki shortly
[9:54] You: next to the big sign
[9:54] Berci Dryke: if you have suggestions for the future session, please leave your note on the wiki
[9:55] You: ok- lets checkout the posters then
