About GeneCup


GeneCup searches PubMed to find abstracts containing genes of interest and keywords in the custom ontologies. The title and abstracts corresponding to the PMIDs are then retrieved from a local archive of the PubMed. No limit on the date of publication is set. Each abstract is then broken down into sentences, which are then filtered by gene names and keywords. We also parse the GWAS catalog to obtain genetics associations with the keywords of the custom ontology.

A list of curated addiction-related keywords can be used to search addiction-related genes. We compiled the most studied 100 addiction related genes by searching 29,761 human genes against addiction related keywords. To ensure comprehensive coverage, gene alias obtained from NCBI gene database were included in the search. The results were extensively curated to remove over 900 alias that matched words that were not gene name or wrong genes. Some incorrect results remained because the same name also produced correct results. The resulting 61,000 sentences are archived localy and can be accessed via the Addiction Genes link. We also archived 5.3 million PMIDs associated with these gene for efficient search of query gene to addiction gene relations. We obtain 23,000 genetics associations with the addiction and psychiatric phenotypes from GWAS catalog. These results are included in the search by default.

We plan to update the local PubMed archive daily and the EBI GWAS catalog quarterly.


Cite: Gunturkun MH, Flashner E, Wang T, Mulligan MK, Williams RW, Prins P, Chen H. GeneCup: mining PubMed and GWAS catalog for gene-keyword relationships. G3 (Bethesda). 2022 May 6;12(5):jkac059. doi: 10.1093/g3journal/jkac059. PMID: 35285473; PMCID: PMC9073678.

Source code