The ever-increasing biological data deposited in databases worldwide requires corresponding bioinformatics tools that can quickly reveal its hidden information. We describe here Protein Sequence Profiler (PsP) that was developed to characterize protein sequence entries (stored in database) which gives the user a simplified description about proteins sequences as well as the capability to generate new dataset, either subjected to redundancy check or not for prediction purposes. The system is built using PHP as the computing language and the use of arrays as data structure. The system could filter-out and retrieve from the protein sequence database entries according to the following groupings (or in combination): signal peptide, taxonomy, protein type, transmembrane type, non-membrane type, and evidence level. Consequently, the filtered protein sequence entries could be downloaded, which in effect creates a new data set, or could further be subjected to the integrated redundancy checker to remove “highly” similar protein sequences.
Theory and Practice of Computation, pp. 44-58. DOI: 10.1142/9789813234079_000