@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix obo: <http://purl.obolibrary.org/obo/> .
@prefix pmid: <http://denigma.org/resource/PubMed/> .
@prefix : <http://denigma.org/resource/> .
rdfs:subClassOf :Discovery ;
:of :Concept ;
:image <http://denigma.io/media/web-crawler.jpg> .
A major threat is that we are aging much faster than we can reverse it. We are still very far away from inferring, which information is most likely relevant for reversing aging that we MUST take an undirected method to counteract this problem because we do not have any better alternative.
Every day lots of new pairs of information is added to the web. Anything, which define at least two indivisible pieces of information as a value pair indicating a specific instance can be ingested by machine learning algorithm. Therefore, we should start developing an independently working software, which keeps crawling the net for any instance defined by at least two informational units as input data. Then, even though this software cannot infer the meaning of any of the event-defining information pair, it can use their values in predicting pretty much any other combination of paired information and try to predict any pair with any other pair.
This would allow to identify even weak correlations and dependencies much sooner than when exclusively selecting features manually in our traditional way based on logic reasoning. Although logic reasoning and highly directed and targeted manipulations are good to have it takes us way too much time until our understanding and concepts of new correlations has developed far enough to contribute to logically driven data feature selection and data manipulation. This continuously web-crawling software keeps adding anything, which could either serve as input our output value for any kind of supervised machine learning process. When this software can predict any random feature by whatever means it can possibly think of, it will let us know so we can check whether this could possibly make sense. We need to improve the NLP (Natural Language Processing) and semantic recognizing ability of this randomly feature adding software so that it can combine the same informational components into a singe unit feature. But nevertheless, just like evolution random mistakes in grouping the same information component into a single indivisible feature, variations in the groupings of informational components, which must be predicted at all once, could turn out to be a good thing. For example, considering all transcription factor binding sites (TFBS)-associated information into a single informational group may allow for the most accurate prediction rate but only when our random model contains all input features we need to define any possible informational dimension needed to sufficiently define all the parameters, which could belong to the TFBS dimension. For example if our feature hungry crawler has not yet discovered that TFBS binding is a co-operative rate than a Boolean process it would fail. But if it could learn to predict time series plots only based on the Boolean value indicating whether a particular transcription factor (TF) could possibly bind to a promoter but disregarding the number and order of the TFBS for the same TF in the promoter of one gene it could still predict time series plots well enough to raise its prediction power far above the threshold at which we would take a look at it. Although this old model is still imperfect it has value to get it as soon as possible instead of waiting until our crawler has found all input to parameter to assign a value to all possible dimension of the TFBS domain. This would actually speak in favor of allowing our prediction crawler to randomly vary any specific dimension of any domain suited for training supervised machine learning because the fewer the number of dimensions making up any domain the fewer and smaller information input domain are required for building a model based on randomly considering and randomly grouped information domains.
Currently, most of us are not aware of the artificial imperative limitations resulting from letting humans have the monopoly on deciding, which dimensions can be grouped together to form a meaningful instance for input or output to train a supervised model. It is likely that smaller domains consisting of fewer dimensions or larger domain combining more dimensions could be more. But although there are so many humans on this planet our thinking, understanding, conceptualizing, imagining and applying our intuitive preferences for intuitively tending to include very specific dimensions into an indivisible input or output instance without even worrying about possible alternatives. The way in which our senses, perceptions, imagination, concepts and partial understanding of any phenomenon intuitively selects the dimensions to a larger domain, which most of us would never even consider to predict in parts or as a very small dimension of a much larger super-domain, is only one out of very many possible options for combining any number of specific dimensions into a domain from which any number of input or output instances can be formed. One could imagine a domain as a row like a gene, which can have any number of column i.e. its dimensions, which must be considered like a single instance in their combination because it is lacking the option to consider only a few of its columns or combining some of with columns from an entirely different and unrelated table. A good example are time series plots. Human tend to be bias and prefer to define the gene expression time series curves by mRNA the dance measured at each time point. This sounds so obvious but is this the best way for conceptualizing the temporal expression signature for each gene? I feel my colorful plots have much more meaning and can carry much more informational value as well as a more meaningful concept for imaging, comparing and analyzing gene specific temporal signatures. But although they look very pretty and are a good way to get a first impression about the similarities between two curves, they are not well suite to find out whether the plots for the gene, which belong to the same gene ontology term are indeed more correlated to each other than to the rest. Since I felt that a vector can never be the same as a curve I tried many ways to account for the slopes connecting each time point. But since I could think of so many different ways to achieve this but I could not decide on any way that I had consider as the best possible option I am still not sure how to convert time series plots into numerical dimensions, which possess the very obvious advantage to allow for easy comparing, ranking and quantifying. I am not sure how to account for differences between plots from the Y axis. Maybe we should add another dimension to our concepts of our understanding of a time series curve. If we added to its time points also the total area under the curve to each plot, maybe we could quantify them in a much better and more intuitive way. But how much numerical value should we give each time point and the area under the curve. I am stuck with this problem ever since I tried to quantify time series plots.
But imagine how many more option you would had if you were not a human because then you would not limit your dimensions for defining your domain to only those you can easily imagine.
A computer can randomly extract and try out any combination, subset or superset of dimensions without tending to be limited to those dimensions that can easily be conceptualized as a picture.
Unsupervised machine learning which never gets tired to randomly define an indivisible domain by any combination of dimensions might have much more luck. to uncover still imperatively hidden objects/factors (IHO/F) than the entire observationally and perceptionally very bias world population of homo Sapiens, which tends to prefer familiar analytical methods, to which it can most easily relate without much regards for, whether the most convenient and intuitively-seeming analytical methods, measurements, selected features and research procedures are truly best suited for solving the very specific scientific problem at hand. Even professionally very successful scientists, experimentalists, researchers and data analysts tend to search for the best problem to match their analytical skills, experiences and preferred methods of measuring rather than choosing the best set of research procedures for overcoming a very specific scientific challenge. AI won’t suffer from this human methodical bias if trained properly.