Simfish/InquilineKea's Thoughts

March 4, 2010, 10:08 pm
The Fourth Paradigm: Data-Intensive Scientific Discovery – Microsoft Research

So basically, the two main paradigms used to be experiment and theory. Then in the 1950s came simulations, and now we have data-intensive scientific discovery. Some ppl have recently written programs that can derive physical formulas from massive amounts of data. Such methods can produce true results without an a priori basis for scientific discovery, which runs counter to the scientific method.

As an interesting article says:
The End of Theory: The Data Deluge Makes the Scientific Method Obsolete

This is not to say that the scientific method is irrelevant. It isn’t, and data-intensive scientific discovery still needs a priori hypotheses for efficient algorithms that don’t take forever to run. But it does mean that it may be easier for people to develop weaker a priori hypotheses that can serve as the basis of algorithms, or one could, say, try a simulation run at a smaller data set that could produce a conclusion that can serve as the hypothesis of a larger data set.

Anyways, statisticians have long been known to get small amounts of data from large amounts of noise. I’ve talked to a number of professors about this, and they all seem to agree that it comes from statistical techniques. As a New York Times article says,…y/06stats.html.

Anyways, the science of the future contains sensors with very high resolution, an ability to distribute the sensors in such a way that they can show representative samples (and also an exponential increase in sensors), the exponential growth in storage space per unit of hard drive, and the exponential growth in processing speed as follows Moore’s Law (although this growth will certainly asymptote in the next few decades). This, of course, allows for the possibility of a data deluge. Then comes the algorithms. Regular science will not be obsolete, but rather, be supplemented. The scope of it might possibly change.

Then there’s neural networks and artificial intelligence. Technically neural networks are a subset of artificial intelligence, but I like to separate the two out. I think one point of discussion in the future will be this: how different is data mining/machine learning – how different is it from AI? And are the current statistical heuristics merely primitive means of artificial intelligence? In fact, crowdsourcing (giving the “masses” a means of knowledge creation/discovery – wikipedia is an excellent example, as is anything user-created) is a sort of artificial intelligence – a sort of distributed artificial intelligence.

So the point of this thread? I’m thinking that the great scientific discoveries of the future will have disproportionate influence (relative to the past) from data mining/pattern recognition. Algorithms, in particular, will also be important (those, too, are probably just a subset of AI). So the aspiring scientist may be wisest to study those fields in particular. And then s/he may be well prepared for any particular field.


some more links:

[q]We are in the midst of a generational shift in research, and research funding opportunities, driven by new ‘disruptive’ technologies. The rapid emergence of a new world of science driven by very large scale data, next generation sensors, and advanced robotic instruments, in a host of disciplines from the environmental, physical and other sciences and engineering through public health and medicine, requires research universities to make a new set of high-level specialty faculty and technical skills and resources available to research endeavors and proposals in order for them to remain competitive.[/q]

