The Clean Energy Project Database (CEPDB) was created as the data warehouse for the Harvard Clean Energy Project (CEP). It is an information repository on molecular semiconductors for organic photovoltaics (OPV) and other electronics applications. It is in a similar spirit to screening initiatives in the field of inorganic solid state materials, such as those by Ceder (Materials Project), Curtarolo (Aflowlib), and Jacobsen (Computational Materials Repository), and their respective co-workers.
- 1 The Harvard Clean Energy Project
- 2 The Role of the CEPDB in the Clean Energy Project
- 3 The CEPDB as a Research Tool for the Organic Electronics Community
- 4 The CEPDB as a Research Tool for the Computational Chemistry and Materials Science Community
- 5 Some Technical Details about the CEPDB
- 6 A Roadmap for the CEPDB
- 7 Version History
The Harvard Clean Energy Project
The Harvard Clean Energy Project (CEP) is a virtual high-throughput framework for the in silico design and assessment of carbon-based materials for plastic solar cells. The CEP allows us to characterize the electronic and optical properties of millions of potential organic photovoltaic candidate structures using first-principles quantum chemistry. In addition to these high-level electronic structure calculations, we also adopt strategies and techniques from the drug discovery community, from cheminformatics/materials informatics, pattern recognition, machine learning, and Big Data, in order to assess the quality of our candidate compounds. The data generated within this computational framework is compiled in the CEPDB.
The Role of the CEPDB in the Clean Energy Project
The CEPDB serves as the organizational centerpiece of the CEP: it facilitates project management and bookkeeping as well as dynamically performing data processing and maintenance tasks. The scripts that automatically run the CEP are built around the database, which provides them with all the necessary processing information. In addition to the computational data, the CEPDB contains a significant volume of experimental data from the literature. The latter is used as a training and calibration set in the CEP and it has been referenced in the literature as the Organic Electronics Set.
Note: the initial CEPDB release omits the experimental data tables, but they will be available in the upcoming months.
The CEPDB as a Research Tool for the Organic Electronics Community
The CEPDB web interface allows for the search of materials within a desired parameter set. For example, one can search for molecular candidates with specific properties appropriate to organic solar cells. These materials could also be useful for other applications in organic electronics. The CEPDB can be used for data mining, scoring and analysis of candidates. Our group has employed it to understand global trends and rank top performing candidates using theoretical models. This allows for the development of structure-property relationships and OPV design rules.
Our current efforts focus on expanding the CEPDB analysis and mining modules, e.g., to generalize the search from single-junction OPV materials to those of tandem solar cells (which require a completely different set of properties).
We are further developing the experimental data collection into a public repository, in which research participants can upload their own results in addition to searching for information (similar to the Protein Data Bank (PDB) in the biology community). We will notify our users when this experimental data is online and the ability to upload datasets is enabled. In this way the CEPDB will allow for a collection of datasets from different researchers to be gathered in the same repository. Feel free to contact us with questions about these upcoming features.
The CEPDB as a Research Tool for the Computational Chemistry and Materials Science Community
A further task of the CEPDB is to supply benchmarks for the performance of the various theoretical methods employed in the CEP. In this function it is comparable in spirit to the NIST Computational Chemistry Comparison and Benchmark Database (CCCBDB). A systematic investigation into the behavior of different computational models and their numerical implementation is useful not only to judge their capabilities but also to guide improvements and new developments. This concerns both methodological advances as well as algorithmic robustness. The CEPDB is an ideal testbed for the development of new techniques. To this effect, we coordinate our efforts with the developers of the quantum chemistry program package Q-Chem. Finally, in collaboration with other groups, we are employing the CEPDB as an ab initio parameter repository for other types of calculations, such as the parametrization of model Hamiltonians.
Some Technical Details about the CEPDB
The CEPDB uses MySQL for its backend and Django (Python) as its database management system and engine for the HTML frontend. It contains data on 2.3 million molecular graphs with 22 million geometries generated in 150 million DFT calculations, altogether this comprises 400 TB of data.
A Roadmap for the CEPDB
The CEPDB will continually be extended with new data, analysis tools, and features. The following list describes a roadmap of the components to be added over the next weeks and months:
- add InChI strings and common names
- access to tables with detailed calculation results and the individual calibrations
- embed sketching tool for chemical structures
- access to tables with experimental data
- improved calibration scheme
- better unit conversion capability
- fingerprinting and similarity search
- upload portal for experimental data by users
- custom Scharber model configuration
- Scharber model for multi-junction cells
We are also open to suggestions and feature requests. Feel free to contact us.
- v 1.0.0 (2013-06-24): original CEPDB release