💡 Key Takeaways

  • Contains 894 records / 8 fields
  • Available for free download in Excel, CSV, and PDF formats
  • Data sourced from: https://github.com/awslabs/open-data-registry

📋 ข้อมูลทั้งหมด

Showing 500 of 894

⚠️ Showing first 500 of 894 records. Download the file for complete data.

Slug Name Description Managed By Update Frequency Tags License Deprecated
1000-genomes-data-lakehouse-ready1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5 - Data Lakehouse ReadyThe 1000 Genomes Project is an international collaboration which has established the most detailed catalogue of human genetic variation, including SNPs, structural variants, and their haplotype contex…[Amazon Web Services](https://aws.amazon.com/)Not updatedbiology,bioinformatics,genetic,genomic,Homo sapiens,life sciences,parquet,population genetics,vcfData from the 1000 Genomes Project is now available without embargo, following the final publication from the project. Use of the data should be cited in the usual way, with current details available…true
1000-genomes1000 GenomesThe 1000 Genomes Project is an international collaboration which has established the most detailed catalogue of human genetic variation, including SNPs, structural variants, and their haplotype contex…National Institutes of HealthNot updatedaws-pds,genetic,genomic,life sciences,whole genome sequencing,fastqData from the 1000 Genomes Project is now available without embargo, following the final publication from the project. Use of the data should be cited in the usual way, with current details available…
1kg-ont-vienna1KG-ONT-VIENNA panelThe 1KG-ONT-VIENNA panel comprises medium coverage ONT sequencing data for 1.019 samples from the 1000 Genomes Project collection, structural variants, and their haplotype context.Institute of Molecular PathologyIrregulargenetic,genomic,life sciences,whole genome sequencing,fastq,fast5There are no restrictions on the use of this data. Use of the data should be cited in the usual way, with current details available at https://github.com/1kg-ont-vienna/sv-analysis/
3dcompat3DCoMPaT: Composition of Materials on Parts of 3D Things3D CoMPaT is a richly annotated large-scale dataset of rendered compositions of Materials on Parts of thousands of unique 3D Models. This dataset primarily focuses on stylizing 3D shapes at part-level…[Vision-CAIR, CEMSE, KAUST](https://cemse.kaust.edu.sa/vision-cair)Continually improving 3D annotations and renderingsaws-pds,computer vision,machine learninghttps://3dcompat-dataset.org/LICENSE
3kricegenome3000 Rice Genomes ProjectThe 3000 Rice Genome Project is an international effort to sequence the genomes of 3,024 rice varieties from 89 countries.[International Rice Research Institute](https://www.irri.org/)Not updatedagriculture,food security,aws-pds,genetic,genomic,life sciencesThis data is available for anyone to use under the terms of the Toronto Statement, which is available [here](http://www.nature.com/nature/journal/v461/n7261/box/461168a_BX1.html)
4dnucleome4D Nucleome (4DN)The goal of the National Institutes of Health (NIH) Common Fund’s 4D Nucleome (4DN) program is to study the three-dimensional organization of the nucleus in space and time (the 4th dimension). The nuc…4DN Data Coordination and Integration Center (4DN-DCIC)Dailybiology,bioinformatics,genetic,genomic,imaging,life sciences,aws-pdsExternal data users may freely download, analyze, and publish results based on any 4DN data provided here without restrictions.
990-spreadsheetsIRS 990 Filings (Spreadsheets)Excerpts of electronic Form 990 and 990-EZ filings, converted to spreadsheet form. Additional fields being added regularly.[Applied Nonprofit Research, LLC](https://www.appliednonprofitresearch.com/)Quarterlyaws-pds,regulatory,statistics,us,economicsAttribution 4.0 International [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)true
abeja-cc-jaABEJA CC JAA large Japanese language corpus created through preprocessing Common Crawl data[ABEJA inc.](https://www.abejainc.com/)Nonenatural language processing,web archive,internet,japaneseThis data is available for anyone to use under the [Common Crawl Terms of Use](https://commoncrawl.org/terms-of-use/)
aef-sourceGoogle Satellite Embedding V1COG (Cloud-Optimized GeoTIFF) files that together contain the AlphaEarth Foundations annual Satellite Embedding dataset. It contains the annual embeddings for the years from 2018 to 2024, inclusive.[Source Cooperative](https://source.coop/)As new data versions become availableaws-pds,machine learning,satellite imagery,aerial imagery,earth observation,imagingCC-BY 4.0
aev-a2d2A2D2: Audi Autonomous Driving DatasetAn open multi-sensor dataset for autonomous driving research. This dataset comprises semantically segmented images, semantic point clouds, and 3D bounding boxes. In addition, it contains unlabelled 36…[Audi AG](http://a2d2.audi/)The dataset may be updated with additional or corrected data on a need-to-update basis.autonomous vehicles,deep learning,computer vision,lidar,mapping,machine learning,robotics,aws-pdshttps://creativecommons.org/licenses/by-nd/4.0/
africa-field-boundary-labelsA region-wide, multi-year set of crop field boundary labels for AfricaCrop field boundaries digitized in Planet imagery collected across Africa between 2017 and 2023, developed by [Farmerline](https://farmerline.co/), [Spatial Collective](https://spatialcollective.com/)…[The Agricultural Impacts Research Group](https://agroimpacts.info/)Updated versions of the dataset are added as they are developedagriculture,machine learning,land cover,satellite imagery,cog,labeled[Planet NICFI participant license agreement](https://assets.planet.com/docs/Planet_ParticipantLicenseAgreement_NICFI.pdf)
afsisAfrica Soil Information Service (AfSIS) Soil ChemistryThis dataset contains soil infrared spectral data and paired soil property reference measurements for georeferenced soil samples that were collected through the Africa Soil Information Service (AfSIS)…[QED](https://qed.ai/)As requiredagriculture,aws-pds,environmental,food security,machine learning,life sciencesODC Open Database License ("[ODbL](https://opendatacommons.org/licenses/odbl/summary/index.html)") version 1.0, with attribution to AfSIS
ag-loamAG-LOAM DatasetAG-LOAM dataset has been released to facilitate the evaluation of LiDAR-based odometry algorithms in agricultural environments. 1) It was collected by a wheeled mobile robot at the Agricultural Experi…[Autonomous Robots and Control Systems Lab](https://sites.google.com/view/arcs-lab)NAaws-pds,robotics,agriculture,lidar,localization,mappingCreative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
ai3AI3 Protein-Ligand Binding Affinity DatasetThe rapid advancement of computing technologies, particularly artificial intelligence (AI), has revolutionized various domains, including drug discovery. Curated datasets are crucial for developing re…International Institute of Information Technology HyderabadNot updatedpharmaceutical,simulations,health,life sciences,machine learning,protein,molecular dynamics,aws-pdshttps://devalab.in/AI3.html
airborne-object-trackingAirborne Object Tracking DatasetAirborne Object Tracking (AOT) is a collection of 4,943 flight sequences of around 120 seconds each, collected at 10 Hz in diverse conditions. There are 5.9M+ images and 3.3M+ 2D annotations of airbor…[Amazon](https://www.amazon.com/)Not updatedamazon.science,computer vision,deep learning,machine learningCommunity Data License Agreement – Permissive, Version 1.0 https://cdla.dev/permissive-1-0/
aiwpAI Weather Prediction (AIWP) Model Reforecasts<br/> <br/> This is an archive of pure AI-based weather prediction reforecasts produced collaboratively between the Cooperative Institute for Research in the Atmosphere [(CIRA)](https://www.cira.colo…Dr. Jacob Radford (jacob.radford@noaa.gov)2 times a day, every 12 hours starting at midnight UTCenvironmental,meteorological,weatherOpen Data. There are no restrictions on the use of this data.
allen-brain-observatoryAllen Brain Observatory - Visual Coding AWS Public Data SetThe Allen Brain Observatory – Visual Coding is a large-scale, standardized survey of physiological activity across the mouse visual cortex, hippocampus, and thalamus. It includes datasets collected wi…[Allen Institute](http://www.alleninstitute.org/)Annuallyaws-pds,neurobiology,neuroimaging,image processing,imaging,life sciences,signal processing,electrophysiology,Mus musculushttp://www.alleninstitute.org/legal/terms-use/
allen-cell-imaging-collectionsAllen Cell Imaging CollectionsThis bucket contains multiple datasets (as Quilt packages) created by the Allen Institute for Cell Science. The types of data included in this bucket are listed below: 1) Field of view or cropped ima…[Allen Institute for Cell Science](https://www.allencell.org)Biweeklyaws-pds,life sciences,biology,cell imaging,cell biology,microscopy,image processing,machine learning,Homo sapienshttps://www.allencell.org/terms-of-use.html
allen-hmba-releasesHuman and Mammalian Brain AtlasHuman and Mammalian Brain Atlas (HMBA) is a major atlas of the BRAIN Initiative Cell Atlas Network (BICAN) that proposes to establish a comprehensive, highly granular cell atlas in complete adult huma…[Allen Institute](http://www.alleninstitute.org/)Neveraws-pds,biology,gene expression,neurobiology,life sciences,single-cell transcriptomics,Mus musculus,Homo sapiens,non-human primatehttp://www.alleninstitute.org/legal/terms-use/
allen-ivy-glioblastoma-atlasAllen Ivy Glioblastoma AtlasThis dataset consists of images of glioblastoma human brain tumor tissue sections that have been probed for expression of particular genes believed to play a role in development of the cancer. Each t…[Allen Institute](http://www.alleninstitute.org/)Neveraws-pds,biology,genetic,gene expression,imaging,neurobiology,image processing,life sciences,machine learning,computer visionhttp://www.alleninstitute.org/legal/terms-use/
allen-mouse-brain-atlasAllen Mouse Brain AtlasThe Allen Mouse Brain Atlas is a genome-scale collection of cellular resolution gene expression profiles using in situ hybridization (ISH). Highly methodical data production methods and comprehensive…[Allen Institute](http://www.alleninstitute.org/)Neveraws-pds,biology,genetic,gene expression,imaging,neurobiology,image processing,life sciences,transcriptomics,Mus musculushttp://www.alleninstitute.org/legal/terms-use/
allen-nd-ephys-compressionAllen Institute for Neural Dynamics - Extracellular Electrophysiology Compression BenchmarkExtracellular electrophysiology data is growing at a remarkable pace. This data, collected neuropixels probes by the Allen Institute and the International Brain Lab can be used to benchmark throughput…[Allen Institute](http://neuraldynamics.alleninstitute.org/)Weeklyaws-pds,neurobiology,life sciences,signal processing,electrophysiology,Mus musculusCC-BY-4.0
allen-nd-open-dataAllen Institute for Neural Dynamics - Mouse Neuroanatomy and Physiology DataThe Allen Institute for Neural Dynamics (AIND) is committed to FAIR, Open, and Reproducible science. We therefore share all of the raw and derived data we collect publicly with rich metadata, includin…[Allen Institute](http://neuraldynamics.alleninstitute.org/)Weeklyaws-pds,neurobiology,neuroimaging,image processing,imaging,life sciences,signal processing,electrophysiology,Mus musculusCC-BY-4.0
allen-sea-ad-atlasSeattle Alzheimer's Disease Brain Cell Atlas (SEA-AD)The Seattle Alzheimer's Disease Brain Cell Atlas (SEA-AD) consortium strives to gain a deep molecular and cellular understanding of the early pathogenesis of Alzheimer's disease and is funded by the N…[Allen Institute](http://www.alleninstitute.org/)Annuallyaws-pds,biology,cell biology,cell imaging,epigenomics,gene expression,histopathology,Homo sapiens,imaging,medicinehttps://alleninstitute.org/legal/terms-use/
allen-synphysAllen Institute for Brain Science - Synaptic Physiology Public Data SetThis is a large-scale survey that describes the physiology (strength, kinetics, and short term plasticity) of thousands of synapses from patch clamp experiments in mouse visual cortex and human middle…[Allen Institute](http://www.alleninstitute.org/)Finalizedaws-pds,neurobiology,life sciences,signal processing,electrophysiology,Mus musculus,Homo sapienshttp://www.alleninstitute.org/legal/terms-use/
allenai-arcAI2 Reasoning Challenge (ARC) 20187,787 multiple choice science questions and associated corpora[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,json,csv[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/)
allenai-aristo-miniAristo Mini Corpus1,197,377 science-relevant sentences[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,json,csv[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/)
allenai-diagramsAI2 Diagram Dataset (AI2D)4,817 illustrative diagrams for research on diagram understanding and associated question answering.[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/)
allenai-dropDiscrete Reasoning Over the content of Paragraphs (DROP)The DROP dataset contains 96k Question and Answer pairs (QAs) over 6.7K paragraphs, split between train (77k QAs), development (9.5k QAs) and a hidden test partition (9.5k QAs).[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,natural language processing[CC BY](https://creativecommons.org/licenses/by/4.0)
allenai-meaningful-citationsAI2 Meaningful Citations Data Set630 paper annotations[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,csv[CC BY](https://creativecommons.org/licenses/by/4.0)
allenai-quorefQuoref24K Question/Answer (QA) pairs over 4.7K paragraphs, split between train (19K QAs), development (2.4K QAs) and a hidden test partition (2.5K QAs).[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,natural language processing[CC BY](https://creativecommons.org/licenses/by/4.0)
allenai-ropesReasoning Over Paragraph Effects in Situations (ROPES)14k QA pairs over 1.7K paragraphs, split between train (10k QAs), development (1.6k QAs) and a hidden test partition (1.7k QAs).[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,natural language processing,json[CC BY](https://creativecommons.org/licenses/by/4.0)
allenai-tablestore-questionsAI2 TabMCQ: Multiple Choice Questions aligned with the Aristo Tablestore9092 crowd-sourced science questions and 68 tables of curated facts[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,natural language processing[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/)
allenai-tablestoreAI2 Tablestore (November 2015 Snapshot)68 tables of curated facts[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,natural language processing[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/)
allenai-tqaTextbook Question Answering (TQA)1,076 textbook lessons, 26,260 questions, 6229 images[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/)
allenai-tuple-kbAristo Tuple KB294,000 science-relevant tuples[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,natural language processing[CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/)
allenai-zestZEST: ZEroShot learning from Task descriptionsZEST is a benchmark for zero-shot generalization to unseen NLP tasks, with 25K labeled instances across 1,251 different tasks.[Allen Institute for AI](https://allenai.org)Not updatedaws-pds,machine learning,natural language processing[CC BY](https://creativecommons.org/licenses/by/4.0)
alliance-genome-resourcesAlliance of Genome ResourcesThe Alliance of Genome Resources is a consortium that integrates genomic, genetic, and molecular data from leading model organism databases including Drosophila melanogaster, Caenorhabditis elegans, D…Alliance of Genome Resources ConsortiumQuarterly releases (every ~3 months)aws-pds,genomic,bioinformatics,biology,gene expression,life sciences,genetic,genome,Drosophila melanogaster,Caenorhabditis elegansMost Alliance data is available under CC0 1.0 Universal (Public Domain Dedication). Some datasets may use CC-BY 4.0 (attribution required). Full details at https://www.alliancegenome.org/terms-of-use
allthebacteriaAllTheBacteriaAll bacterial isolate whole-genome sequencing data from INSDC, uniformly assembled, quality-controlled, annotated, and searchable.[European Bioinformatics Institute](https://www.ebi.ac.uk/)The current release is for all SRA bacterial isolate data up to August 2024. The colllection will be updated occasionally, with no fixed schedule. assembly,bacteria,bioinformatics,fasta,genomic,life sciences,microbial genomics,short read sequencing,whole genome sequencing[MIT License](https://opensource.org/license/mit)
amazon-berkeley-objectsAmazon Berkeley Objects DatasetAmazon Berkeley Objects (ABO) is a collection of 147,702 product listings with multilingual metadata and 398,212 unique catalog images. 8,222 listings come with turntable photography (also referred as…[Amazon](https://www.amazon.com/)Not updatedamazon.science,computer vision,deep learning,information retrieval,machine learning,machine translationCreative Commons Attribution-NonCommercial 4.0 (CC BY-NC 4.0) https://creativecommons.org/licenses/by-nc/4.0/
amazon-bin-imageryAmazon Bin Image DatasetThe Amazon Bin Image Dataset contains over 500,000 images and metadata from bins of a pod in an operating Amazon Fulfillment Center. The bin images in this dataset are captured as robot units carry po…[Amazon](https://www.amazon.com/)Not updatedamazon.science,computer vision,machine learningCreative Commons Attribution-NonCommercial-ShareAlike 3.0 United States (CC BY-NC-SA 3.0 US) https://creativecommons.org/licenses/by-nc-sa/3.0/us/
amazon-conversational-product-searchVoiSeRVoice-based refinements of product search[Amazon](https://www.amazon.com/)Not currently being updatedamazon.science,natural language processing,machine learning,information retrieval[cc-by-sa 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
amazon-last-mile-challenges2021 Amazon Last Mile Routing Research Challenge DatasetThe 2021 Amazon Last Mile Routing Research Challenge was an innovative research initiative led by Amazon.com and supported by the Massachusetts Institute of Technology’s Center for Transportation and…[Amazon](https://www.amazon.com/)Nonetransportation,machine learning,deep learning,amazon.science,urban,analytics,geospatial,logistics,last mile,optimizationCopyright Amazon.com, Inc. or its affiliates. All Rights Reserved. The material for the Amazon Last Mile Routing Research Challenge is provided under a Creative Commons Attribution-NonCommercial 4.0 I…
amazon-pqaAmazon-PQAAmazon product questions and their answers, along with the public product information.AmazonNoneamazon.science,natural language processing,machine learninghttps://cdla.dev/permissive-1-0/
amazon-reviews-mlThe Multilingual Amazon Reviews CorpusWe present a collection of Amazon reviews specifically designed to aid research in multilingual text classification. The dataset contains reviews in English, Japanese, German, French, Chinese and Span…AmazonNone specified.amazon.science,natural language processing,machine learninghttps://github.com/awslabs/open-data-docs/blob/main/docs/amazon-reviews-ml/license.txttrue
amazon-seller-contact-intent-sequenceAmazon Seller Contact Intent SequenceWhen sellers need help from Amazon, such as how to create a listing, they often reach out to Amazon seller support through email, chat or phone. For each contact, we assign an intent so that we can ma…[Amazon](https://www.amazon.com/)Noneamazon.science,machine learning,temporal point process,Hawkes Processhttps://cdla.dev/permissive-1-0/
amazoniaAmazonia EO satellite on AWSImagery acquired by Amazonia-1 satellite. The image files are recorded and processed by Instituto Nacional de Pesquisas Espaciais (INPE) and are converted to Cloud Optimized Geotiff format in order to…[Frederico Liporace](https://github.com/fredliporace)Dailyaws-pds,agriculture,earth observation,geospatial,imaging,satellite imagery,sustainability,disaster response,stac,coghttps://creativecommons.org/licenses/by-sa/3.0/
answer-reformulationAnswer ReformulationOriginal StackExchange answers and their voice-friendly Reformulation.[Amazon](https://www.amazon.com/)Not currently being updatedamazon.science,natural language processing,machine learning[cc-by-sa 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
anvilprojectNHGRI AnVIL ProjectThe NHGRI Analysis, Visualization, and Informatics Lab-space (AnVIL) Project (https://anvilproject.org/) is the National Human Genome Research Institute's cloud-based platform for genomic data sharing…The AnVIL Project, and UC Santa Cruz Genomics Institute, University of California, Santa Cruz (UCSC)Quarterlylife sciences,biology,genome,genomic,gene expression,Homo sapienshttps://anvilproject.org/faq/data-security
aodn_animal_acoustic_tracking_delayed_qcAnimal Tracking - Acoustic Telemetry - Quality controlled detectionsSince 2007, the Integrated Marine Observing System’s Animal Tracking Facility (formerly known as the Australian Animal Tracking And Monitoring System (AATAMS)) has established a permanent array of aco…AODNAs Neededaws-pds,oceans,marine mammals,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_animal_ctd_satellite_relay_tagging_delayed_qcMarine Animal - Satellite Relay Tagging - Quality controlled profilesCTD (Conductivity-Temperature_Depth)-Satellite Relay Data Loggers (CTD-SRDLs) are used to explore how marine animal behaviour relates to their oceanic environment. Loggers developed at the University…AODNAs Neededoceans,marine mammals,biology,chemistry,chemical biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_model_sea_level_anomaly_gridded_realtimeOceanCurrent - Gridded sea level anomaly - Near real timeGridded (adjusted) sea level anomaly (GSLA), gridded sea level (GSL) and surface geostrophic velocity (UCUR,VCUR) for the Australasian region. GSLA is mapped using optimal interpolation of detided, de…AODNAs Neededoceans,ocean velocity,ocean sea surface heighthttp://creativecommons.org/licenses/by/4.0/
aodn_mooring_ctd_delayed_qcNational Mooring Network - CTD profilesThis collection includes conductivity-temperature-depth (CTD) profiles obtained at the National Reference Stations (NRS) as part of the water sampling program. The instruments used also measure dissol…AODNAs Neededoceans,chemistryhttp://creativecommons.org/licenses/by/4.0/
aodn_mooring_hourly_timeseries_delayed_qcMoorings - Hourly time-series productIntegrated Marine Observing System (IMOS) have moorings across both it's National Mooring Network and Deep Water Moorings facilities. The National Mooring Network facility comprises a series of nation…AODNAs Neededoceans,chemistry,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_mooring_satellite_altimetry_calibration_validationSatellite - Altimetry calibration and validationHigh precision satellite altimeter missions including TOPEX/Poseidon (T/P), Jason-1 and now OSTM/Jason-2, have contributed fundamental advances in our understanding of regional and global ocean circul…AODNAs Neededoceans,ocean currents,chemistryhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_bonneycoast_velocity_hourly_averaged_delayed_qcOcean Radar - Bonney coast site - Sea water velocity - Delayed modeThe Bonney Coast (BONC) HF ocean radar system covers an area of the Bonney Coast, South Australia, which has a recurring annual upwelling feature near to the coast that significantly changes the ecosy…AODNAs Neededoceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_capricornbunkergroup_velocity_hourly_averaged_delayed_qcOcean Radar - Capricorn bunker group site - Sea water velocity - Delayed modeThe Capricorn Bunker Group site is in the southern region of the Great Barrier Reef Marine Park World Heritage Area (GBR). The HF ocean radar coverage is from the coast to beyond the edge of the con…AODNAs Neededoceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_capricornbunkergroup_wave_delayed_qcOcean Radar - Capricorn bunker group site - Wave - Delayed modeThe Capricorn Bunker Group site is in the southern region of the Great Barrier Reef Marine Park World Heritage Area (GBR). The HF ocean radar coverage is from the coast to beyond the edge of the con…AODNAs Neededoceans,ocean currentshttp://creativecommons.org/licenses/by/4.0/
aodn_radar_capricornbunkergroup_wind_delayed_qcOcean Radar - Capricorn bunker group site - Wind - Delayed modeThe Capricorn Bunker Group site is in the southern region of the Great Barrier Reef Marine Park World Heritage Area (GBR). The HF ocean radar coverage is from the coast to beyond the edge of the con…AODNAs Neededoceans,ocean currents,meteorologicalhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_coffsharbour_velocity_hourly_averaged_delayed_qcOcean Radar - Coffs Harbour site - Sea water velocity - Delayed modeThe Coffs Harbour (COF) HF ocean radar site is located near the point at which the East Australian Current (EAC) begins to separate from the coast. Here the EAC is at its narrowest and swiftest: to t…AODNAs Neededoceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_coffsharbour_wave_delayed_qcOcean Radar - Coffs Harbour site - Wave - Delayed modeThe Coffs Harbour (COF) HF ocean radar site is located near the point at which the East Australian Current (EAC) begins to separate from the coast. Here the EAC is at its narrowest and swiftest: to t…AODNAs Neededoceans,ocean currentshttp://creativecommons.org/licenses/by/4.0/
aodn_radar_coffsharbour_wind_delayed_qcOcean Radar - Coffs Harbour site - Wind - Delayed modeThe Coffs Harbour (COF) HF ocean radar site is located near the point at which the East Australian Current (EAC) begins to separate from the coast. Here the EAC is at its narrowest and swiftest: to t…AODNAs Neededoceans,ocean currents,meteorologicalhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_coralcoast_velocity_hourly_averaged_delayed_qcOcean Radar - Coral coast site - Sea water velocity - Delayed modeThe Coral Coast (CORL) HF ocean radar system covers an area of the Western Australia Coast, Western Australia, an area subject to the variability of the Leeuwin Current (LC) and its coupling with coas…AODNAs Neededoceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_newcastle_velocity_hourly_averaged_delayed_qcOcean Radar - Newcastle site - Sea water velocity - Delayed modeThe Newcastle (NEWC) HF ocean radar system covers an area of the Central Coast, New South Wales, an area subject to the variability of the East Australian Current (EAC) and its coupling with coastal w…AODNAs Neededaws-pds,oceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_northwestshelf_velocity_hourly_averaged_delayed_qcOcean Radar - Northwest shelf site - Sea water velocity - Delayed modeThe Northwest Shelf (NWA) HF ocean radar system covers an area which includes the Ningaloo Peninsula and the Ningaloo Reef to the west. The Ningaloo Reef is one of the longest and most pristine reefs…AODNAs Neededoceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_rottnestshelf_velocity_hourly_averaged_delayed_qcOcean Radar - Rottnest shelf site - Sea water velocity - Delayed modeThe Rottnest Shelf (ROT) HF ocean radar system covers an area which includes Rottnest Island and the Perth Canyon to the north-west. The Perth Canyon has the highest marine biodiversity in the region…AODNAs Neededoceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_rottnestshelf_wave_delayed_qcOcean Radar - Rottnest shelf site - Wave - Delayed modeThe Rottnest Shelf (ROT) HF ocean radar system covers an area which includes Rottnest Island and the Perth Canyon to the north-west. The Perth Canyon has the highest marine biodiversity in the region…AODNAs Neededoceans,ocean currentshttp://creativecommons.org/licenses/by/4.0/
aodn_radar_rottnestshelf_wind_delayed_qcOcean Radar - Rottnest shelf site - Wind - Delayed modeThe Rottnest Shelf (ROT) HF ocean radar system covers an area which includes Rottnest Island and the Perth Canyon to the north-west. The Perth Canyon has the highest marine biodiversity in the region…AODNAs Neededoceans,ocean currents,meteorologicalhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_southaustraliagulfs_velocity_hourly_averaged_delayed_qcOcean Radar - South Australian gulfs site - Sea water velocity - Delayed modeThe South Australia Gulfs (SAG) HF ocean radar system covers the area of about 40,000 square kilometres bounded by Kangaroo Island to the east and the Eyre Peninsula to the north. This is a dynamic r…AODNAs Neededoceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_southaustraliagulfs_wave_delayed_qcOcean Radar - South Australian gulfs site - Wave - Delayed modeThe South Australia Gulfs (SAG) HF ocean radar system covers the area of about 40,000 square kilometres bounded by Kangaroo Island to the east and the Eyre Peninsula to the north. This is a dynamic r…AODNAs Neededoceans,ocean currentshttp://creativecommons.org/licenses/by/4.0/
aodn_radar_southaustraliagulfs_wind_delayed_qcOcean Radar - South Australian gulfs site - Wind - Delayed modeThe South Australia Gulfs (SAG) HF ocean radar system covers the area of about 40,000 square kilometres bounded by Kangaroo Island to the east and the Eyre Peninsula to the north. This is a dynamic r…AODNAs Neededoceans,ocean currents,meteorologicalhttp://creativecommons.org/licenses/by/4.0/
aodn_radar_turquoisecoast_velocity_hourly_averaged_delayed_qcOcean Radar - Turquoise coast site - Sea water velocity - Delayed modeThe Turquoise Coast (TURQ) HF ocean radar system covers the area of shelf between Seabird and Jurien Bay and is the logical continuation of major research efforts to understand the role of the Leeuwin…AODNAs Neededoceans,ocean currents,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_carder_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Chlorophyll-a concentration (Carder model)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_gsm_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Chlorophyll-a concentration (GSM model)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_gsm_1day_noaa20Satellite - Ocean Colour - NOAA20 - 1 day - Chlorophyll-a concentration (GSM model)The NOAA20 satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer t…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_gsm_1day_snppSatellite - Ocean Colour - SNPP - 1 day - Chlorophyll-a concentration (GSM model)The SNPP satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_oc3_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Chlorophyll-a concentration (OC3 model)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_oc3_1day_noaa20Satellite - Ocean Colour - NOAA20 - 1 day - Chlorophyll-a concentration (OC3 model)The NOAA20 satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer t…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_oc3_1day_snppSatellite - Ocean Colour - SNPP - 1 day - Chlorophyll-a concentration (OC3 model)The SNPP satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_oci_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Chlorophyll-a concentration (OCI model)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_oci_1day_noaa20Satellite - Ocean Colour - NOAA20 - 1 day - Chlorophyll-a concentration (OCI model)The NOAA20 satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer t…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_chlorophylla_oci_1day_snppSatellite - Ocean Colour - SNPP - 1 day - Chlorophyll-a concentration (OCI model)The SNPP satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_diffuse_attenuation_coefficent_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Diffuse attenuation coefficient (k490)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_diffuse_attenuation_coefficent_1day_noaa20Satellite - Ocean Colour - NOAA20 - 1 day - Diffuse attenuation coefficient (k490)The NOAA20 satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer t…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_diffuse_attenuation_coefficent_1day_snppSatellite - Ocean Colour - SNPP - 1 day - Diffuse attenuation coefficient (k490)The SNPP satellite platform carries a VIIRS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l3c_1day_nighttime_himawari8Satellite - Sea surface temperature - Level 3 - Single sensor - Himawari-8 - 1 day - Night timeThis is a regional GHRSST level 3 collated (L3C) dataset on 0.02-degree rectangular grid over the Australasian domain (70E to 190E, 70S to 20N) based on retrievals from the AHI imager on board Himawar…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l3s_1day_daynighttime_multi_sensor_australiaSatellite - Sea surface temperature - Level 3 - Multi sensor - 1 day - Day and night timeThis is a multi-sensor SSTfnd L3S product for a single 24 hour period, derived using sea surface temperature retrievals from the VIIRS sensor on the Suomi-NPP satellite and JPSS series of satellites,…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l3s_1day_daynighttime_single_sensor_australiaSatellite - Sea surface temperature - Level 3 - Single sensor - 1 day - Day and night timeThis is a single-sensor multi-satellite SSTfnd product for a single 24 hour period, derived using observations from AVHRR instruments on all available NOAA polar-orbiting satellites. It is provided…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l3s_1day_daynighttime_single_sensor_southernoceanSatellite - Sea surface temperature - Level 3 - Single sensor - 1 day - Day and night time - Southern OceanThis is a single-sensor SSTfnd product for a single 24 hour period, derived using observations from AVHRR instruments on all available NOAA polar-orbiting satellites. It is provided as a 0.02deg x 0.…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l3s_1month_daytime_single_sensor_australiaSatellite - Sea surface temperature - Level 3 - Single sensor - 1 month - Day timeThis is a single-sensor multi-satellite SSTskin product for 1 month of consecutive day-time periods, derived using observations from AVHRR instruments on all available NOAA polar-orbiting satellites.…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l3s_3day_daynighttime_multi_sensor_australiaSatellite - Sea surface temperature - Level 3 - Multi sensor - 3 day - Day and night timeThis is a multi-sensor SSTfnd L3S product for a single 72 hour period, derived using sea surface temperature retrievals from the VIIRS sensor on the Suomi-NPP satellite and JPSS series of satellites,…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l3s_6day_daynighttime_single_sensor_australiaSatellite - Sea surface temperature - Level 3 - Single sensor - 6 day - Day and night timeThis is a single-sensor multi-satellite SSTfnd product for a 144 hour period, derived using observations from AVHRR instruments on all available NOAA polar-orbiting satellites. It is provided as a 0.…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l4_gamssa_1day_multi_sensor_worldSatellite - Sea surface temperature - Level 4 - Multi sensor - Global AustralianAn International Group for High-Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis, produced daily on an operational basis at the Australian Bureau of Meteorology usi…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_ghrsst_l4_ramssa_1day_multi_sensor_australiaSatellite - Sea surface temperature - Level 4 - Multi sensor - Regional AustralianAn International Group for High Resolution Sea Surface Temperature (GHRSST) Level 4 sea surface temperature analysis, produced daily on an operational basis at the Australian Bureau of Meteorology usi…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_nanoplankton_fraction_oc3_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Nanoplankton fraction (OC3 model and Brewin et al 2012 algorithm)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…FILL UP MANUALLY - CHECK DOCUMENTATIONAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_net_primary_productivity_gsm_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Net Primary Productivity (GSM model and Eppley-VGPM algorithm)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_net_primary_productivity_oc3_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Net Primary Productivity (OC3 model and Eppley-VGPM algorithm)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…AODNAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_optical_water_type_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Optical Water Type (Moore et al 2009 algorithm)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These measurements at discrete wavelengths represent th…AODNAs Neededoceans,satellite imageryhttp://creativecommons.org/licenses/by/4.0/
aodn_satellite_picoplankton_fraction_oc3_1day_aquaSatellite - Ocean Colour - MODIS - 1 day - Picoplankton fraction (OC3 model and Brewin et al 2012 algorithm)The Aqua satellite platform carries a MODIS sensor that observes sunlight reflected from within the ocean surface layer at multiple wavelengths. These multi-spectral measurements are used to infer the…FILL UP MANUALLY - CHECK DOCUMENTATIONAs Neededoceans,satellite imagery,biologyhttp://creativecommons.org/licenses/by/4.0/
aodn_slocum_glider_delayed_qcOcean Gliders - Delayed modeThe Australian National Facility for Ocean Gliders (ANFOG), with IMOS/NCRIS funding, deploys a fleet of eight gliders around Australia. The data represented by this record, are presented in delayed mo…AODNAs Neededoceans,ocean currents,chemistry,ocean velocityhttp://creativecommons.org/licenses/by/4.0/
aodn_vessel_air_sea_flux_product_delayedShips of Opportunity - Air-sea fluxes - Meteorological and flux - Delayed modeEnhancement of Measurements on Ships of Opportunity (SOOP)-Air Sea Flux sub-facility collects underway meteorological and oceanographic observations during scientific and Antarctic resupply voyages in…AODNAs Neededoceans,air temperature,atmosphere,meteorological,radiationhttp://creativecommons.org/licenses/by/4.0/
aodn_vessel_air_sea_flux_sst_meteo_realtimeShips of Opportunity - Air-sea fluxes - Meteorological and sea surface temperature - Real timeEnhancement of Measurements on Ships of Opportunity (SOOP)-Air Sea Flux sub-facility collects underway meteorological and oceanographic observations during scientific and Antarctic resupply voyages in…AODNAs Neededoceans,air temperature,atmosphere,meteorological,precipitation,radiationhttp://creativecommons.org/licenses/by/4.0/
aodn_vessel_co2_delayed_qcShips of Opportunity - Biogeochemical sensors - Delayed modeThe IMOS Ship of Opportunity Underway CO2 Measurements group is a research and data collection project working within the IMOS Ship of Opportunity Multi-Disciplinary Underway Network sub-facility. The…AODNAs Neededoceans,chemistry,atmosphere,meteorologicalhttp://creativecommons.org/licenses/by/4.0/
aodn_vessel_fishsoop_realtime_qcShips of Opportunity - Fisheries vessels - Real timeFisheries Vessels as Ships of Opportunities (FishSOOP) is an IMOS Sub-Facility working with fishers to collect real-time temperature and depth data by installing equipment on a network of commercial f…AODNAs Neededoceanshttp://creativecommons.org/licenses/by/4.0/
aodn_vessel_sst_delayed_qcShips of Opportunity - Sea surface temperature - 1-minute average data productsThe Sea Surface Temperature (SST) sub-facility produces 1-minute average data products. Observed data are 1-minute median SST values and are retrieved from the vessel once an hour. High-resolution 1-m…AODNAs Neededoceans,air temperature,atmospherehttp://creativecommons.org/licenses/by/4.0/
aodn_vessel_trv_realtime_qcShips of Opportunity - Tropical research vessels - Real timeThe research vessels (RV Cape Ferguson and RV Solander) of the Australian Institute of Marine Science (AIMS) routinely record along-track (underway) measurements of near-surface water temperature, sal…AODNAs Neededoceans,chemistryhttp://creativecommons.org/licenses/by/4.0/
aodn_vessel_xbt_delayed_qcShips of Opportunity - Expendable bathythermographs - Delayed modeIMOS Ship of Opportunity Underway Expendable Bathythermographs (XBT) group is a research and data collection project working within the IMOS Ship of Opportunity Multi-Disciplinary Underway Network sub…AODNAs Neededoceanshttp://creativecommons.org/licenses/by/4.0/
aodn_vessel_xbt_realtime_nonqcShips of Opportunity - Expendable bathythermographs - Real timeXBT real-time data is available through the IMOS portal. Data is acquired by technicians who ride the ships of opportunity in order to perform high density sampling along well established transit line…AODNAs Neededoceanshttp://creativecommons.org/licenses/by/4.0/
aodn_wave_buoy_realtime_nonqcWave buoys observations - Real timeBuoys provide integral wave parameters. Buoy data from the following organisations contribute to the National Wave Archive: Manly Hydraulics Laboratory (part of the NSW Department of Planning and Envi…AODNAs Neededoceanshttp://creativecommons.org/licenses/by/4.0/
apd_galaxymorphAstrophysics Division Galaxy Morphology Benchmark DatasetHubble Space Telescope imaging data and associated identification labels for galaxy morphology derived from citizen scientist labels from the Galaxy Zoo: Hubble project. [NASA](https://osdr.nasa.gov/)No updatesaws-pds,astronomy,machine learning,satellite imagery,NASA SMD AIThere are no restrictions on the use of this data.
apd_galaxysegmentationAstrophysics Division Galaxy Segmentation Benchmark DatasetPan-STARSS imaging data and associated labels for galaxy segmentation into galactic centers, galactic bars, spiral arms and foreground stars derived from citizen scientist labels from the Galaxy Zoo:…[NASA](https://osdr.nasa.gov/)No updatesaws-pds,astronomy,machine learning,segmentation,NASA SMD AIThere are no restrictions on the use of this data.
apexAPEX-CONNECTSThe BRAIN Initiative Connectivity Across Scales (CONNECTS) program is working to create detailed maps of brain wiring across different species and scales, using advanced imaging technologies. APEX s…[Brainlife Team](https://brainlife.io/team/)New datasets are added monthlyneuroscience,neuroimaging,microscopy,life sciences,zarr,metadata,machine learning,infrastructure,json,imaging[CC BY](https://creativecommons.org/licenses/by/4.0)
argo-gdac-marinedataArgo marine floats data and metadata from Global Data Assembly Centre (Argo GDAC)Argo is an international program to observe the interior of the ocean with a fleet of profiling floats drifting in the deep ocean currents (https://argo.ucsd.edu). Argo GDAC is a dataset of 5 billion…[Euro-Argo](https://www.euro-argo.eu/)Data is updated daily.aws-pds,climate,oceans,chemical biology,chemistry,datacenter,digital assets,geochemistry,geophysics,geoscienceOpen data, there are no restrictions on the use of this data. https://creativecommons.org/licenses/by/4.0/
argoverseArgoverseHome of the Argoverse datasets. Public datasets supported by detailed maps to test, experiment, and teach self-driving vehicles how to understand the world around them. This bucket includes the foll…[Argoverse](https://argoverse.org)Infrequentlyaws-pds,autonomous vehicles,computer vision,lidar,robotics,geospatial[CC-BY-NC-SA-4.0](https://spdx.org/licenses/CC-BY-NC-SA-4.0.html)
arpa-e-performARPA-E PERFORM Forecast dataThe ARPA-E PERFORM Program is an ARPA-E funded program that aim to use time-coincident power and load seeks to develop innovative management systems that represent the relative delivery risk of each a…[National Renewable Energy Laboratory](https://www.nrel.gov/)As neededaws-pds,energy,environmental,geospatial,model,solarCreative Commons Attribution 3.0 United States License
asem-projectAutomated Segmentation of Intracellular Substructures in Electron Microscopy (ASEM) on AWSThe Automated Segmentation of intracellular substructures in Electron Microscopy (ASEM) project provides deep learning models trained to segment structures in 3D images of cells acquired by Focused Io…Kirchhausen Lab at Harvard Medical SchoolData is added as it becomes availableaws-pds,biology,cell biology,segmentation,microscopy,electron microscopy,computer vision,imaging,life sciencesAll available datasets and models are licensed under a Creative Commons Attribution-ShareAlike 4.0 International License
asf-event-dataASF SAR Data Products for Disaster Eventssynthetic Aperture Radar (SAR) data is a powerful tool for monitoring and assessing disaster events and can provide valuable insights for researchers, scientists, and emergency response teams. The Al…[The Alaska Satellite Facility (ASF)](https://asf.alaska.edu/)Irregular, in response to disaster events aws-pds,disaster response,satellite imagery,geospatial,cog,stacThis data falls under the terms and conditions of the [Creative Commons Zero (CC0) 1.0 Universal License](https://creativecommons.org/publicdomain/zero/1.0/) unless otherwise noted.
askapASKAP Radio Telescope ASKAP is the CSIRO’s newest radio telescope. It is situated at the Inyarrimanha Ilgari Bundara, the CSIRO Murchison Radio-astronomy Observatory on Wajarri Yamaji Country in the Murchison region of We…[Australia Telescope National Facility, CSIRO](http://www.atnf.csiro.au/)Roughly quarterlyaws-pds,astronomy,archivesCC-BY-4.0. Attribution required for refereed scientific papers.
asl_1000ASL 1000This dataset provides a high-fidelity collection of American Sign Language (ASL) videos annotated with 2D landmarks for hands, pose, and face. The data is designed to train advanced research and devel…[NVIDIA Corporation](https://www.nvidia.com/en-us/)New data added as soon as it is available.aws-pds,video,machine learningPlease see the [NVIDIA Dataset License](https://github.com/NVIDIA/Trustworthy-AI/blob/main/ASL%20Developer%20Community/NVIDIA%20Data%20License%20for%20ASL%20Project.pdf)
asr-error-robustnessAutomatic Speech Recognition (ASR) Error RobustnessSentence classification datatasets with ASR Errors.[Amazon](https://www.amazon.com/)N/Aamazon.science,natural language processing,deep learning,machine learning,speech recognitionSee https://github.com/anjiefang/asr-error-robustness
asset-data-igp-coal-plantIGP Coal PlantThis dataset includes detailed information about coal power plants, their locations, capacities, emissions, and other relevant attributes around the Indian Gangetic Plain.APADas neededair quality,energy,meteorological,environmentalhttps://creativecommons.org/licenses/by/4.0/
aster-l1tASTER L1T Cloud-Optimized GeoTIFFsThe Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Level 1 Precision Terrain Corrected Registered At-Sensor Radiance (AST_L1T) data contains calibrated at-sensor radiance, whic…[EarthDaily](https://earthdaily.com/)Dailyaws-pds,earth observation,satellite imagery,geospatial,natural resource,sustainability,mining,cogThere are no restrictions on the use of data, unless expressly identified prior to or at the time of receipt.
aurora_msdsAurora Multi-Sensor DatasetThe Aurora Multi-Sensor Dataset is an open, large-scale multi-sensor dataset with highly accurate localization ground truth, captured between January 2017 and February 2018 in the metropolitan area of…Aurora Operations, Inc.This dataset is complete.aws-pds,autonomous vehicles,computer vision,lidar,mapping,robotics,transportation,urban,weather,trafficThis data is intended for non-commercial academic use only. It is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
australasian-genomicsAustralasian GenomesAustralasian Genomes is the genomic data repository for the Threatened Species Initiative (TSI) and the ARC Centre for Innovations in Peptide and Protein Science (CIPPS). This repository contains refe…[Australasian Wildlife Genomics Group at The University of Sydney](https://www.sydney.edu.au/science/our-research/research-areas/life-and-environmental-sciences/wildlife-genomics-group.html)As new genomes are producedaws-pds,biology,biodiversity,conservation,genetic,genomic,life sciences,transcriptomics,wildlifehttps://threatenedspeciesinitiative.com/threatened-species-initiative-data-policy-v1-0-june-2020/
aws-covid19-lakeCOVID-19 Data LakeA centralized repository of up-to-date and curated datasets on or related to the spread and characteristics of the novel corona virus (SARS-CoV-2) and its associated illness, COVID-19. Globally, there…[Amazon Web Services](https://aws.amazon.com/)Periodicallyamazon.science,bioinformatics,biology,coronavirus,COVID-19,health,life sciences,MERS,medicine,SARSVaries by dataset
aws-igenomesAWS iGenomesCommon reference genomes hosted on AWS S3. Can be used when aligning and analysing raw DNA sequencing data.[SciLifeLab](https://opensource.scilifelab.se/)New data are added when available.amazon.science,agriculture,biology,genetic,genomic,life sciences,reference index,Caenorhabditis elegans,Danio rerio,Homo sapiensMultiple - please see [data origins](https://github.com/ewels/AWS-iGenomes#data-origin).
aws-public-blockchainAWS Public Blockchain Data<p>The AWS Public Blockchain Data initiative provides free access to blockchain datasets through collaboration with data providers. The data is optimized for analytics by being transformed into compre…[Amazon Web Services](https://aws.amazon.com/)New data is delivered daily to the current date folders Parquet files.aws-pds,blockchain,web3https://github.com/aws-samples/digital-assets-examples/blob/main/LICENSE
bdsp-harvard-eegHarvard Electroencephalography DatabaseThe Harvard EEG Database will encompass data gathered from four hospitals affiliated with Harvard University:Massachusetts General Hospital (MGH), Brigham and Women's Hospital (BWH), Beth Israel Deaco…[Brain Data Science Platform](https://bdsp.io/)New data is added as soon as it is available.aws-pds,neurophysiology,medicine,machine learning,neuroscience,deep learning,life sciences,bioinformaticsBDSP Restricted Health Data License 1.0.0 "[BDSP Licence](https://bdsp.io/content/harvard-eeg-db/view-license/1.0/); [Data User Agreement:](https://bdsp.io/content/harvard-eeg-db/view-dua/1.0/)"
bdsp-heedbHarvard-Emory ECG DatabaseThe Harvard-Emory ECG database (HEEDB) is a large collection of 12-lead electrocardiography (ECG) recordings, prepared through a collaboration between Harvard University and Emory University investiga…[Brain Data Science Platform](https://bdsp.io/)New data is added as soon as it is available.aws-pds,neurophysiology,medicine,machine learning,neuroscience,deep learning,life sciences,bioinformaticsBDSP Restricted Health Data License 1.0.0 "[BDSP Licence](https://bdsp.io/content/heedb/view-license/1.0/); [Data User Agreement:](https://bdsp.io/content/heedb/view-dua/1.0/)"
bdsp-hspThe Human Sleep ProjectThe Human Sleep Project (HSP) sleep physiology dataset is a growing collection of clinical polysomnography (PSG) recordings. Beginning with PSG recordings from from ~15K patients evaluated at the Mass…[Brain Data Science Platform](https://bdsp.io/)New data is added as soon as it is available.aws-pds,neurophysiology,medicine,machine learning,neuroscience,deep learning,life sciences,bioinformatics[BDSP Restricted Health Data License 1.0.0](https://bdsp.io/content/hsp/view-license/1.0/); [terms of use](https://github.com/bdsp-core/bdsp-license-and-dua/blob/main/terms-of-use.md)
bdsp-icareI-CARE:International Cardiac Arrest REsearch consortium Electroencephalography DatabaseThe International Cardiac Arrest REsearch consortium (I-CARE) Database includes baseline clinical information and continuous electroencephalography (EEG) recordings from 1,020 comatose patients with a…[Brain Data Science Platform](https://bdsp.io/)New data is added as soon as it is available.aws-pds,neurophysiology,medicine,machine learning,neuroscience,deep learning,life sciences,bioinformaticsBDSP Restricted Health Data License 1.0.0 "[BDSP Licence](https://bdsp.io/content/bdsp-icare/view-license/1.0/); [Data User Agreement:](https://bdsp.io/content/bdsp-icare/view-dua/1.0/)"
bdsp-sparcnetSPaRCNet data:Seizures, Rhythmic and Periodic Patterns in ICU ElectroencephalographyThe IIIC dataset includes 50,697 labeled EEG samples from 2,711 patients' and 6,095 EEGs that were annotated by physician experts from 18 institutions. These samples were used to train SPaRCNet (Seiz…[Brain Data Science Platform](https://bdsp.io/)New data is added as soon as it is available.aws-pds,neurophysiology,medicine,machine learning,neuroscience,deep learning,life sciences,bioinformaticsBDSP Restricted Health Data License 1.0.0 "[BDSP Licence](https://bdsp.io/content/bdsp-sparcnet/view-license/1.1/); [Data User Agreement:](https://bdsp.io/content/bdsp-sparcnet/view-dua/1.1/)"
beatamlBeat Acute Myeloid Leukemia (AML) 1.0Beat AML 1.0 is a collaborative research program involving 11 academic medical centers who worked collectively to better understand drugs and drug combinations that should be prioritized for further…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,life sciences,cancer,genetic,genomic,Homo sapiens,STRIDESNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
bhl-open-dataBiodiversity Heritage Library Metadata and Page ImagesThe Biodiversity Heritage Library (BHL) is the world’s largest open access digital library for biodiversity literature and archives. BHL operates as a worldwide consortium of natural history, botanica…[The Biodiversity Heritage Library](https://biodiversitylibrary.org/)Metadata is updated monthly. Images are updated weekly.biodiversity,bioinformatics,life sciencesPublic Domain, CC0, or Creative Commons. Exact licenses are found in the related metadta files and <a href="https://github.com/gbhl/bhl-open-data">documentation</a>.
binding-dbBinding DB - Data Lakehouse ReadyThis a parquet representation of The Binding Database's [Full BindingDB Database Dump](https://www.bindingdb.org/bind/chemsearch/marvin/SDFdownload.jsp?all_download=yes) that you can query straight fr…[Amazon Web Services](https://aws.amazon.com/)Within 2 months after an new BindingDB release.aws-pds,chemistry,genetic,genomic,molecule,life sciences,biotech blueprint,parquethttps://github.com/aws-samples/data-lake-as-code/blob/roda/docs/roda_attributions.txttrue
biolipBioLiPBioLiP is a semi-manually curated database for high-quality, biologically relevant ligand-protein binding interactions. The structure data are collected primarily from the Protein Data Bank (PDB), wit…[Zhang Lab](https://zhanggroup.org/)No regular schedule; updated upon availability of major dataset revisionsprotein,structural biology,molecular docking,bioinformatics,molecule,life sciences,chemistryNo explicit license stated (publicly available for academic and research use).
black_marble_combustionNighttime-Fire-FlareDetection of nighttime combustion (fire and gas flaring) from daily top of atmosphere data from NASA's Black Marble VNP46A1 product using VIIRS Day/Night Band and VIIRS thermal bands.[USRA](https://srijachakraborty.com/) and [NASA Black Marble](https://blackmarble.gsfc.nasa.gov/)New combustion detections are added whenever it is available and with model updates.aws-pds,anomaly detection,classification,earth observation,satellite imagery,disaster response,socioeconomic,environmental,urban,NASA SMD AIThere are no restrictions on the use of this data.
blended-tropomi-gosat-methaneBlended TROPOMI+GOSAT Satellite Data Product for Atmospheric MethaneA dataset of satellite retrievals of atmospheric methane that extends from 30 April 2018 to present.Nicholas BalasusMonthlyaws-pds,climate,environmental,satellite imageryThere are no restrictions on the use of this data, but please contact nicholasbalasus@g.harvard.edu before its use in a publication.
blue_etIWMI DIWASA Blue ET for AfricaBlue evapotranspiration (Blue ET) is the portion of ET derived from blue water sources, including surface water (rivers, lakes, reservoirs) and groundwater used for irrigation. It is a key component o…[IWMI](https://www.iwmi.org/)Noneaws-pds,surface water,irrigated cropland,ground water,evapotranspiration,waterCreative Commons open license
bluebrain_opendataBlue Brain Open DataThe Blue Brain Open Data represents an extensive neuroscience dataset encompassing a diverse range of data types, including experimental, model, and simulation data, along with images and videos depic…Open Brain InstituteNo updatesneuroscience,simulation neuroscience,brain models,morphological reconstructions,electrophysiology,life sciences,single neuron models,ion channels,brain images,microcircuit modeling and simulationCC-BY-4.0
bobsrepositoryBaby Open Brains (BOBs) Repository on AWSManually curated and reviewed infant brain segmentations and accompanying T1w and T2w images for a range of 1-9 month old participants from the Baby Connectome Project (BCP)Masonic Institute for the Developing Brain (MIDB) Open Data InitiativeThe repository is updated when: (1) all brain segmentations have undergone further rounds of manual correction, which may include refinement of existing ROIs and/or delineation of additional FreeSurfe…neuroimaging,magnetic resonance imaging,neuroscience,pediatric,nifti,segmentation,life sciencesCC-By Attribution 4.0 International
bodymBodyM DatasetThe first large public body measurement dataset including 8978 frontal and lateral silhouettes for 2505 real subjects, paired with height, weight and 14 body measurements. The following artifacts are…AmazonNoneamazon.science,computer vision,deep learningCreative Commons Attribution-Non Commercial 4.0 International Public License - https://creativecommons.org/licenses/by/4.0/legalcode
boltz1Boltz-1 Training DataThis is the data used to train the Boltz-1 model. It contains the following datasets: - Our pre-processed version of the Protein Data Bank - Our pre-processed version of the multiple sequence alig…MIT CSAIL - Regina Barzilay GroupNonedeep learning,protein folding,molecular docking,open source software,life sciencesMIT License
boreasBoreas Autonomous Driving DatasetThis autonomous driving dataset includes data from a 128-beam Velodyne Alpha-Prime lidar, a 5MP Blackfly camera, a 360-degree Navtech radar, and post-processed Applanix POS LV GNSS data. This dataset…[ASRL](http://asrl.utias.utoronto.ca)New driving sequences will be added as they are collected.autonomous vehicles,robotics,computer vision,lidar,aws-pds[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)
bossdbBossDB Open Neuroimagery DatasetsThis data ecosystem, Brain Observatory Storage Service & Database (BossDB), contains several neuro-imaging datasets across multiple modalities and scales, ranging from nanoscale (electron microscopy),…[Johns Hopkins University Applied Physics Laboratory](https://https://jhuapl.edu)New datasets are added as soon as it is available. Minor updates on existing datasets occur sporadically.aws-pds,life sciences,imaging,neuroscience,neuroimaging,electron microscopy,x-ray tomography,x-ray microtomography,x-ray,magnetic resonance imagingCreative Commons 4.0 International (CC BY 4.0); Creative Commons CC0 1.0 Universal (CC0-1.0); Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)
bps_microscopyBiological and Physical Sciences (BPS) Microscopy Benchmark Training DatasetFluorescence microscopy images of individual nuclei from mouse fibroblast cells, irradiated with Fe particles or X-rays with fluorescent foci indicating 53BP1 positivity, a marker of DNA damage. These…[NASA](https://osdr.nasa.gov/)New fluorescence microscopy mouse fibroblast nuclei data is added whenever it is available.aws-pds,fluorescence imaging,genetic,genetic maps,microscopy,GeneLab,NASA SMD AI,life sciencesThere are no restrictions on the use of this data.
bps_rnaseqBiological and Physical Sciences (BPS) RNA Sequencing Benchmark Training DatasetRNA sequencing data from spaceflown and control mouse liver samples, sourced from NASA GeneLab and augmented with generative adversarial network.[NASA](https://osdr.nasa.gov/)New spaceflight liver RNA sequencing data is added whenever it is available.aws-pds,space biology,gene expression,genetic,genetic maps,GeneLab,NASA SMD AI,life sciencesThere are no restrictions on the use of this data.
braidyn-bc_cued-lever-pullBraiDyn-BC: Cued lever-pull task datasetThe BraiDyn-BC (Brain Dynamics underlying emergence of Behavioral Change) Database offers an extensive, multimodal dataset that links wide-field calcium imaging of the mouse neocortex to comprehensive…[BraiDyn-BC Database Project](https://boatneck-weeder-7b7.notion.site/BraiDyn-BC-Database-303cf08c89f94d81bb2eaed4c3c50345)NAMus musculus,neuroscience,calcium imaging,video,imaging,life sciences,aws-pdsCreative Commons Attribution 4.0 International (CC-BY 4.0)
brain-encoding-response-generatorBrain Encoding Response Generator (BERG)Brain Encoding Response Generator (BERG) is a resource consisting of multiple pre-trained encoding models of the brain and an accompanying Python package to generate accurate in silico neural response…[Neural Dynamics of Visual Cognition](https://www.ewi-psy.fu-berlin.de/en/psychologie/arbeitsbereiche/neural_dyn_of_vis_cog/index.html), [CVAI](https://www.cvai.cs.uni-frankfurt.de/)Updates will be released on a per-model or dataset basis as new models are trained and tested.neuroscience,machine learning,brain models,deep learning,neuroimaging,computer vision,life sciencesCC BY-NC 4.0. For Terms & Conditions, see https://brain-encoding-response-generator.readthedocs.io/en/latest/about/terms_and_conditions.html
brainglobeBrainGlobe AtlasesBrainGlobe provides an archive and standardised interface to anatomical atlases from multiple species. This dataset includes these atlases, and other data (e.g. sample neuroanatomy data) to allow the…[BrainGlobe](https://brainglobe.info/)When new atlases are packagedbiology,life sciences,digital preservation,Homo sapiens,image processing,imaging,light-sheet microscopy,magnetic resonance imaging,medical imaging,microscopyCreative Commons CC0 1.0 Universal
brainminds-marmoset-connectivityBrain/MINDS Marmoset Connectivity Resource on AWSBrain/MINDS Marmoset Connectivity Resource (BMCR) is a resource that provides access to anterograde and retrograde neuronal tracer data, made available by Brain/MINDS project. It is currently restrict…[RIKEN Center for Brain Science](https://cbs.riken.jp)As new data become available.brain images,imaging,microscopy,neurobiology,neuroimaging,neuroscience,nifti,non-human primate,life sciencesCreative Commons Attribution 4.0 International
brazil-data-cubesEarth Observation Data Cubes for BrazilEarth observation (EO) data cubes produced from analysis-ready data (ARD) of CBERS-4, Sentinel-2 A/B and Landsat-8 satellite images for Brazil. The datacubes are regular in time and use a hierarchical…[INPE - Brazil Data Cube](http://brazildatacube.org/)New EO data cubes are added as soon as there are produced by the Brazil Data Cube project.earth observation,satellite imagery,geoscience,geospatial,image processing,open source software,cog,stac,aws-pdsThe EO data cubes are produced from free and open images of CBERS-4, Landsat-8 and Sentinel-2 satellites. Data usage is subject to Terms and Conditions for the use and distribution of [Landsat data](h…
broad-gnomadGenome Aggregation Database (gnomAD)The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and genome data from a wide range of large-sca…gnomAD Production Team at the Broad InstituteData from new releases are made public as soon as they are available. New releases, including both minor and major versions, have historically been issued on the order of once per year.aws-pds,population genetics,population,whole genome sequencing,genomic,genetic,life sciences,bioinformatics,short read sequencing[MIT](https://github.com/broadinstitute/gnomad_methods/blob/master/LICENSE); [terms of use](https://gnomad.broadinstitute.org/terms)
broad-pan-ukbUK Biobank Pan-Ancestry Summary StatisticsA multi-ancestry analysis of 7,221 phenotypes using a generalized mixed model association testing framework, spanning 16,119 genome-wide association studies. We provide standard meta-analysis across a…Analytic and Translational Genetics Unit, Massachusetts General Hospital and the Broad InstituteOccasionalaws-pds,genetic,genome wide association study,genomic,life sciences,population geneticsCC BY-4.0 (usage may be restricted by UK Biobank, more details on the "[Downloads page](https://pan.ukbb.broadinstitute.org/downloads)")
broad-referencesBroad Genome ReferencesBroad maintained human genome reference builds hg19/hg38 and decoy references.Broad InstituteMonthlyaws-pds,biology,bioinformatics,cancer,genetic,genomic,life sciences,reference index,Homo sapiensCC0 1.0 Universal (CC0 1.0) Public Domain Dedication
busco-dataBUSCO DatasetsLineage datasets for use with BUSCO software package. Each dataset contains HMM profiles for clade specific, universal, single-copy marker genes. Datasets are available across archaea, bacteria, eukar…Computational Evolutionary Genomics Group, University of GenevaNew datasets are released to correspond with updates in OrthoDB versions. Maintenance updates occur a few times a year if necessary to fix any bugs or update metadata.assembly,bacteria,bioinformatics,genomic,life sciences,metagenomics,open source software,protein,virus,aws-pdsThe BUSCO datasets are licensed under the Creative Commons Attribution-NoDerivatives 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nd/4.0/ or…
c2smsfloodsCloud to Street - Microsoft Flood and Clouds DatasetThis dataset consists of chips of Sentinel-1 and Sentinel-2 satellite data. Each Sentinel-1 chip contains a corresponding label for water and each Sentinel-2 chip contains a corresponding label for wa…[Radiant Earth Foundation](https://radiant.earth/)Not updatedaws-pds,computer vision,deep learning,machine learning,floods,geospatial,earth observation,satellite imagery,cog,synthetic aperture radarCC-BY-4.0 https://creativecommons.org/licenses/by/4.0/
caendrCaenorabditis Diversity Natural ResourceThe Caenorhabditis Natural Diversity Resource (CaeNDR) is a data repository and analysis hub of wild strains of selfing Caenhorabditis species C. elegans, C. briggsae, and C. tropicalis from around th…The Andersen Lab, Johns Hopkins University, Baltimore, MD, USAAnnuallyaws-pds,bam,bioinformatics,biology,Caenorhabditis elegans,fastq,gatk-sv,genetic maps,genome,genome wide association studyAll CaeNDR data is available under the MIT License. Use of the data should be cited in the usual way, with current details available at https://caendr.org/#citationModal
caladapt-coproduced-climate-dataCo-Produced Climate Data to Support California's Resilience InvestmentsDownscaled future and historical climate projections for California and her environs in support of California's Fifth Climate AssessmentCal-Adapt: Analytics Engine (by Eagle Rock Analytics, Inc.)Infrequent, Irregularatmosphere,aws-pds,climate,climate model,earth observation,geoscience,geospatial,meteorological,simulations,weatherVaries, see dataset specific metadata
caladapt-wildfire-datasetWildfire Projections to Support Climate ResilienceWildfire projections for California and her environs in support of California's Fifth Climate Assessment supported with historical weather observations and renewable energy capacity profiles for gri…Cal-Adapt: Analytics Engine (by Eagle Rock Analytics, Inc.)Infrequent, Irregularaws-pds,climate,climate model,climate projections,weather,solar,energy,electricity,sustainability,agricultureVaries, see dataset specific metadata
camelyonCAncer MEtastases in LYmph nOdes challeNge (CAMELYON) Dataset"This dataset contains the all data for the [CAncer MEtastases in LYmph nOdes challeNge or CAMELYON](https://camelyon17.grand-challenge.org). CAMELYON was the first challenge using whole-slide images…Radboud University Medical CenterAs requiredaws-pds,life sciences,cancer,computational pathology,grand-challenge.org,histopathology,deep learning,computer visionCC0
canelevation-demCanElevation - Canada Digital Elevation ModelsThe Canadian DEM represents the current coverage of elevation data available. This dataset includes a Digital Terrain Model (DTM), a Digital Surface Model (DSM) and other derived products. This datase…[Natural Resources Canada](https://nrcan.gc.ca/)The dataset is updated as new DEM models becomes available. <br/> L'ensemble de données est mis à jour à mesure que des nouveaux modèles numérique d'élévation deviennent disponibles. aws-pds,canada,elevation,geospatial,stac,land,dsm,dtm,dem[Open Government License (OGL)](https://open.canada.ca/en/open-government-licence-canada)
canelevation-pointcloudCanElevation - LiDAR Point CloudsThe [LiDAR Point Clouds](https://open.canada.ca/data/en/dataset/7069387e-9986-4297-9f55-0288e9676947) is a product that is part of the CanElevation Series created to support the [National Elevation Da…[Natural Resources Canada](https://www.nrcan.gc.ca/)The dataset is updated as new LiDAR data becomes available. L'ensemble de données est mis à jour à mesure que de nouvelles données LiDAR deviennent disponibles.aws-pds,lidar,elevation,geospatial,floods,land,urbanThe LiDAR Point Clouds dataset is available free of charge under the [Open Government License - Canada](https://open.canada.ca/en/open-government-licence-canada). Le jeu de données Nuages de points Li…
canoeCANOE (Canadian Aquatic Navigation for Observation of the Environment) DatasetThis autonomous marine navigation dataset includes data from a 360-degree Navtech radar, a 128-beam Ouster OS1 lidar with integrated IMU, a Teledyne Bumblebee stereo camera, Oculus M3000d imaging sona…[Autonomous Space Robotics Laboratory (ASRL)](http://asrl.utias.utoronto.ca)New sequences will be added as they are collected.aws-pds,autonomous vehicles,robotics,computer vision,lidar,radar[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode)
capella_opendataCapella Space Synthetic Aperture Radar (SAR) Open DatasetOpen Synthetic Aperture Radar (SAR) data from Capella Space. Capella Space is an information services company that provides on-demand, industry-leading, high-resolution synthetic aperture radar (SAR…[Capella Space](https://www.capellaspace.com/)New data is added quarterly.aws-pds,cog,stac,earth observation,satellite imagery,geospatial,image processing,computer vision,synthetic aperture radar[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
carbonpdfCarbonPDFA carbon question-answering (QA) dataset specifically designed to facilitate the extraction and analysis of data from real-world carbon reports of computing products. The dataset features annotated me…Pittcps labData for a new company is added once collected.aws-pds,environmental,product comparison,csv,information retrieval,industryCC BY 4.0 (https://creativecommons.org/licenses/by/4.0/)
cartostoreCartoStoreCross-Platform Repository for High-resolution Spatial Transcriptomics Datasets. [Hyun Min Kang](https://scholar.google.com/citations?user=8e0jy0IAAAAJ&hl=en)Monthlyspatial transcriptomics,spatial omics,genomic,bioinformatics,life sciences"[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)"
catalyst-cooperative-pudlPublic Utility Data Liberation ProjectThe Public Utility Data Liberation Project (PUDL) provides analysis-ready U.S. energy system data in bulk for programmatic use. Sources include the U.S. Energy Information Administration (EIA), the En…[Catalyst Cooperative](https://catalyst.coop/)New outputs published nightly as part of our continuous integration process. New data is integrated on a rolling basis as it is published by the original sources. Agencies typically release data month…aws-pds,economics,electricity,energy,energy modeling,environmental,geospatial,government records,industry,industrial[CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) (Creative Commons Attribution 4.0 International License).
cbersCBERS on AWSImagery acquired by the China-Brazil Earth Resources Satellite (CBERS), 4 and 4A. The image files are recorded and processed by Instituto Nacional de Pesquisas Espaciais (INPE) and are converted to Cl…[Frederico Liporace](https://github.com/fredliporace)Dailyaws-pds,agriculture,earth observation,geospatial,imaging,satellite imagery,disaster response,stac,coghttps://creativecommons.org/licenses/by-sa/3.0/
ccicChalmers Cloud Ice ClimatologyThe Chalmers Cloud Ice Climatology (CCIC) is a novel, deep-learning-based climate record of ice-particle concentrations in the atmosphere. CCIC results are available at high spatial and temporal resol…[Geoscience and Remote Sensing at Chalmers University of Technology](https://www.chalmers.se/en/departments/see/research/geo)Quarterlyatmosphere,aws-pds,climate,deep learning,environmental,exploration,geophysics,geoscience,geospatial,global[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
ccleCancer Cell Line Encyclopedia (CCLE)The Cancer Cell Line Encyclopedia (CCLE) project is an effort to conduct a detailed genetic characterization of a large panel of human cancer cell lines. The CCLE provides public access to genomic dat…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,cancer,genetic,genomic,life sciences,transcriptomics,whole genome sequencing,Homo sapiens,STRIDESNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
ccrsmodisalbedoCCRS MODIS albedo over Canada | Albédo MODIS du CCT couvrant le CanadaTimes series of 10-day spectral and broadband albedo products derived at 250-m spatial resolution over Canadian territory and neighboring areas produced at the Canada Centre for Remote Sensing (CCRS)…Canada Centre for Remote Sensing (CCRS), Canada Centre for Mapping and Earth Observation (CCMEO), Department of Natural Resources Canada (NRCan) https://natural-resources.canada.ca/science-data/sciencSemi-annually, until the end of MODIS operations Deux fois par an, jusqu'à la fin des opérations MODIS aws-pds,analysis ready data,broadband,cog,earth observation,satellite imageryCreative Commons Licence. Creative Commons BY 4.0 https://creativecommons.org/licenses/by/4.0/
cell-painting-image-collectionCell Painting Image CollectionThe Cell Painting Image Collection is a collection of freely downloadable microscopy image sets. Cell Painting is an unbiased high throughput imaging assay used to analyze perturbations in cell models…The Broad Instituteirregularlyaws-pds,microscopy,biology,life sciences,imaging,high-throughput imaging,cell imaging,cell painting,fluorescence imagingCC0 1.0 Universal (CC0 1.0) Public Domain Dedication
cellpainting-galleryCell Painting GalleryThe Cell Painting Gallery is a collection of image datasets created using the [Cell Painting](https://pubmed.ncbi.nlm.nih.gov/27560178/) assay. The images of cells are captured by microscopy imaging,…Carpenter-Singh and Cimini Labs at the Broad InstituteTypically when an associated publication is posted on biorxivaws-pds,bioinformatics,biology,cancer,cell biology,cell imaging,cell painting,chemical biology,computer vision,csvCC0 1.0 Universal (CC0 1.0) Public Domain Dedication, but please do cite the corresponding publication for each dataset, as listed [here](https://github.com/broadinstitute/cellpainting-gallery#citatio…
census-2010-amc-mdf-replicatesEstimating Confidence Intervals for 2020 Census Statistics Using Approximate Monte Carlo Simulation (2010 Census Proof of Concept)The 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Approximate Monte Carlo (AMC) method seed Privacy Protected Microdata File (PPMF0) and PPMF replicates (PPMF1, PPMF2,…[United States Census Bureau](http://www.census.gov/)Not Updatedcensus,differential privacy,disclosure avoidance,ethnicity,group quarters,hispanic,latino,housing,housing units,noisy measurementsCC0 1.0 Universal
census-2010-dhc-nmf2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement FileThe 2010 Census Production Settings Demographic and Housing Characteristics (DHC) Demonstration Noisy Measurement File (2023-06-30) is an intermediate output of the 2020 Census Disclosure Avoidance Sy…[United States Census Bureau](http://www.census.gov/)Not Updatedaws-pds,census,differential privacy,disclosure avoidance,ethnicity,group quarters,hispanic,latino,housing,housing unitsCC0 1.0 Universal
census-2010-pl94-nmf2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement FileThe 2010 Census Production Settings Redistricting Data (P.L. 94-171) Demonstration Noisy Measurement File (2023-04-03) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) To…[United States Census Bureau](http://www.census.gov/)Last updated November 10, 2023: Modifications to identifiers within the parquet metadata used to support internal tracking of source data.aws-pds,census,differential privacy,disclosure avoidance,ethnicity,group quarters,hispanic,latino,housing,housing unitsCC0 1.0 Universal
census-2020-amc-mdf-replicatesEstimating Confidence Intervals for 2020 Census Statistics Using Approximate Monte Carlo Simulation (2020 Census Production Run)The 2020 Census Production Settings Demographic and Housing Characteristics (DHC) Approximate Monte Carlo (AMC) method seed Privacy Protected Microdata File (PPMF0) and PPMF replicates (PPMF1, PPMF2,…[United States Census Bureau](http://www.census.gov/)Not Updatedcensus,decennial census,2020 census,differential privacy,disclosure avoidance,ethnicity,group quarters,hispanic,latino,housingCC0 1.0 Universal
census-2020-dhc-nmf2020 Census Demographic and Housing Characteristics (DHC) Noisy Measurement FileThe 2020 Census Demographic and Housing Characteristics Noisy Measurement File is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in A…[United States Census Bureau](http://www.census.gov/)Not Updatedaws-pds,census,differential privacy,disclosure avoidance,ethnicity,group quarters,housing,housing units,noisy measurements,populationCC0 1.0 Universal
census-2020-pl94-gls2020 Redistricting Data File Least Squares EstimatesThe 2020 Redistricting Data File Least Squares Estimates data product provides count estimates, and their standard deviations, for each tabulation that was published as part of the persons universe of…[United States Census Bureau](http://www.census.gov/)Not Updatedaws-pds,census,differential privacy,disclosure avoidance,ethnicity,group quarters,housing,housing units,noisy measurements,populationCC0 1.0 Universal
census-2020-pl94-nmf2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement FileThe 2020 Census Redistricting Data (P.L. 94-171) Noisy Measurement File (NMF) is an intermediate output of the 2020 Census Disclosure Avoidance System (DAS) TopDown Algorithm (TDA) (as described in Ab…[United States Census Bureau](http://www.census.gov/)Not Updatedaws-pds,census,differential privacy,disclosure avoidance,ethnicity,group quarters,housing,housing units,noisy measurements,populationCC0 1.0 Universal
census-dataworld-pumsU.S. Census ACS PUMSU.S. Census Bureau American Community Survey (ACS) Public Use Microdata Sample (PUMS) available in a linked data format using the Resource Description Framework (RDF) data model.Data.worldYearly, after ACS 1-year PUMS raw data are releasedaws-pds,statistics,census,survey[Creative Commons Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
cesm-hrCESM-HRThis dataset provides several global fields describing the state of atmosphere, ocean, land and ice from a high-resolution (0.1o for the ocean/ice models 0.25o for the land/atmosphere models) numerica…[TAMU](https://www.tamu.com/)Rare. The CESM-HR PI-CTRL experiment is complete. Updates are expected only if any issues with the copy of the data on the AWS is reported in the future. Other experiments will be shared in the future…aws-pds,climate,climate model,climate projections,CMIP6,ocean circulation,ocean currents,ocean velocity,ocean sea surface height,ocean simulationThis dataset is created in collaboration with NCAR and the NCAR’s “Creative Commons Attribution 4.0 International license” used for their CESM2 LENS data product on AWS (https://www.ucar.edu/terms-of-…
cgciCancer Genome Characterization Initiatives - Burkitt Lymphoma, HIV+ Cervical CancerThe Cancer Genome Characterization Initiatives (CGCI) program supports cutting-edge genomics research of adult and pediatric cancers. CGCI investigators develop and apply advanced sequencing methods t…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,cancer,genomic,life sciences,transcriptomics,STRIDESNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
cgiardataCCAFS-Climate DataHigh resolution climate data to help assess the impacts of climate change primarily on agriculture. These open access datasets of climate projections will help researchers make climate change impact a…[International Center for Tropical Agriculture](https://ciat.cgiar.org/)Every three monthsaws-pds,agriculture,food security,climate,sustainabilityCreative Commons Attribution-NonCommercial 4.0 International License http://creativecommons.org/licenses/by-nc/4.0/
challenge-2021Will Two Do? Varying Dimensions in Electrocardiography: The PhysioNet/Computing in Cardiology Challenge 2021The electrocardiogram (ECG) is a non-invasive representation of the electrical activity of the heart. Although the twelve-lead ECG is the standard diagnostic screening system for many cardiological is…[PhysioNet](https://physionet.org/)Not updatedaws-pdsCreative Commons Attribution 4.0 International Public License
chammiCHAMMI-75Quantifying cell morphology using images and machine learning models has proven to be a powerful tool to study the response of cells to treatments. However, the models used to quantify cellular morph…Morgridge Institute for ResearchEvery 2 yearsmicroscopy,machine learning,biology,life sciences,imaging,high-throughput imaging,cell imaging,fluorescence imaging,aws-pdsCC BY 4.0 License
chemblChEMBL - Data Lakehouse ReadyChEMBL is a manually curated database of bioactive molecules with drug-like properties. It brings together chemical, bioactivity and genomic data to aid the translation of genomic information into eff…[Amazon Web Services](https://aws.amazon.com/)Upon request. We try to keep it updated to every odd version.chemistry,genomic,molecule,life sciences,biotech blueprint,parquethttps://github.com/aws-samples/data-lake-as-code/blob/roda/docs/roda_attributions.txttrue
chimeraCHIMERAThis dataset contains the training data for the [CHIMERA - Combining HIstology, Medical imaging (radiology) and molEcular data for medical pRognosis and diAgnosis](https://chimera.grand-challenge.org/…Radboud University Medical CenterAs requiredaws-pds,life sciences,cancer,computational pathology,deep learning,grand-challenge.org,histopathology,computer vision,digital pathology,medical image computingCC BY-NC-SA 4.0
citrus-farmCitrusFarm DatasetCitrusFarm is a multimodal agricultural robotics dataset that provides both multispectral images and navigational sensor data for localization, mapping and crop monitoring tasks. 1) It was collected…[Autonomous Robots and Control Systems Lab](https://sites.google.com/view/arcs-lab)NAaws-pds,robotics,computer vision,agriculture,localization,mapping,lidar,IMUCreative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0).
civicCIViC (Clinical Interpretation of Variants in Cancer)Precision medicine refers to the use of prevention and treatment strategies that are tailored to the unique features of each individual and their disease. In the context of cancer this might involve t…The Griffith Lab at Washington University School of MedicineFirst of each monthaws-pds,genetic,genomic,life sciences,vcf,cancer[CC0](https://creativecommons.org/publicdomain/zero/1.0/)
clay-model-v0-embeddingsClay Model v0 EmbeddingsMachine learning model embeddings dataset providing pre-computed feature representations for satellite and aerial imagery analysis.[Source Cooperative](https://source.coop/)As new model versions become availablemachine learning,computer vision,satellite imagery,aerial imagery,earth observation,imagingCreative Commons Attribution 4.0 International License
clay-v1-5-naip-2Clay v1.5 NAIP-2National Agriculture Imagery Program (NAIP) dataset providing high-resolution aerial imagery for agricultural monitoring, land use analysis, and natural resource management.[Source Cooperative](https://source.coop/)As new NAIP data becomes availableaerial imagery,agriculture,land use,natural resource,environmentalCreative Commons Attribution 4.0 International License
clay-v1-5-sentinel2Clay v1.5 Sentinel-2Sentinel-2 satellite imagery dataset providing high-resolution optical data for land monitoring, agriculture, and environmental applications.[Source Cooperative](https://source.coop/)As new Sentinel-2 data becomes availablesatellite imagery,earth observation,agriculture,land use,environmentalCreative Commons Attribution 4.0 International License
clinical-ultrasound-image-dataClinical Ultrasound Image RepositoryGeneric Clinical Ultrasound Data from Random Subjects acquired for Clinical Reasons, to be used for Developing Artificial Intelligence Applications. This dataset is complete with 2000 studies from 200…[MONAI Development Team](https://github.com/Project-MONAI/MONAI)This is a static dataset; however, tutorials and resources will be updated as they are developed.medicine,medical imaging,machine learning,life sciences,aws-pds[CC-BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
clinvarClinVar - Data Lakehouse ReadyClinVar is a freely accessible, public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. ClinVar thus facilitates access to and communication abo…[Amazon Web Services](https://aws.amazon.com/)Every Sunday at 1AM UTCchemistry,genetic,genomic,life sciences,biotech blueprint,parquethttps://github.com/aws-samples/data-lake-as-code/blob/roda/docs/roda_attributions.txttrue
cmas-data-warehouseCMAS Data WarehouseCMAS Data Warehouse on AWS collects and disseminates meteorology, emissions and air quality model input and output for Community Multiscale Air Quality (CMAQ) Model Applications. This dataset is avail…[CMAS CENTER](https://cmascenter.org/)New data is added as soon as it is available.aws-pds,air quality,meteorological,geospatial,environmental,climateThere are no restrictions on the use of this data. US EPA License (https://pasteur.epa.gov/license/sciencehub-license.html)
cmip6-era5-hybrid-southeast-asiaHybrid statistical-dynamic downscaling based on multi-model ensembles in Southeast AsiaGCMs under CMIP6 have been widely used to investigate climate change impacts and put forward associated adaptation and mitigation strategies. However, the relatively coarse spatial resolutions (usuall…[PREP-NexT Lab](https://github.com/PREP-NexT)Update when needed.climate,netcdf,precipitation,aws-pdsAll the code in this repository is [MIT](https://choosealicense.com/licenses/mit/) licensed, but we request that you please provide attribution if reusing any of our digital content (graphics, logo, c…
cmip6Coupled Model Intercomparison Project 6The sixth phase of global coupled ocean-atmosphere general circulation model ensemble. <br /><br /> ESGF and PangeoCore CMIP6 datasets are added as soon as they are available.aws-pds,agriculture,atmosphere,climate,earth observation,environmental,model,oceans,simulations,weatherSee [docs] (https://pangeo-data.github.io/pangeo-cmip6-cloud/licensing_citation.html) for further info
cmsdesynpuf-omopCMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in OMOP Common Data ModelDE-SynPUF is provided here as a 1,000 person (1k), 100,000 person (100k), and 2,300,000 persom (2.3m) data sets in the [OMOP Common Data Model](https://www.ohdsi.org/data-standardization/) format. Th…[Amazon Web Sevices](https://aws.amazon.com/)Not updatedamazon.science,bioinformatics,health,life sciences,natural language processing,ushttps://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/Downloads/SynPUF_DUG.pdf
coawstUSGS COAWST (Coupled Ocean Atmosphere Wave and Sediment Transport) Forecast Model Archive, US East and Gulf CoastsThe COAWST modeling system has been used to simulate ocean, wave and sediment transport processes along the of US East Coast and Gulf of Mexico. The grid has a horizontal resolution of approximately 5…Fathom ScienceNoneaws-pds,oceansCC0
cobraCOBRAThis page describes the COBRA (Classification Of Basal cell carcinoma, Risky skin cancers and Abnormalities) skin pathology dataset, which comprises over 7000 histopathology whole-slide-images related…Radboud University Medical CenterAs requiredaws-pds,life sciences,cancer,computational pathology,deep learning,histopathology,computer visionCC BY-SA-NC 4.0
code-mixed-nerMultilingual Name Entity Recognition (NER) Datasets with GazetteerName Entity Recognition datasets containing short sentences and queries with low-context, including LOWNER, MSQ-NER, ORCAS-NER and Gazetteers (1.67 million entities). This release contains the multili…[Amazon](https://www.amazon.com/)N/Aamazon.science,natural language processing[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
colorado-elevation-dataState of Colorado Elevation DataThe State of Colorado has gathered public historical elevation data.State of Colorado Governors Office of Information Technology OIT GIS teamPeriodicallyaws-pds,geospatial,imaging,mappinghttps://creativecommons.org/publicdomain/zero/1.0/legalcode
colorado-imageryState of Colorado ImageryThe State of Colorado has gathered public historical imagery ranging from 2005 to 2021.State of Colorado, Governor's Office of Information Technology OIT GIS teamPeriodicallyaws-pds,aerial imagery,geospatial,imaging,mappinghttps://creativecommons.org/publicdomain/zero/1.0/legalcode
commoncrawlCommon CrawlA corpus of web crawl data composed of over 300 billion web pages.[Common Crawl](https://commoncrawl.org/)Monthlyaws-pds,encyclopedic,natural language processing,internet,web archiveThis data is available for anyone to use under the [Common Crawl Terms of Use](https://commoncrawl.org/terms-of-use/)
comonscreensCommon ScreensA corpus of web screenshot and metadata data composed of over 70 million websites.[Common Screens](https://commonscreens.com/)Monthlyaws-pds,encyclopedic,natural language processing,internet[Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
copernicus-demCopernicus Digital Elevation Model (DEM)The Copernicus DEM is a Digital Surface Model (DSM) which represents the surface of the Earth including buildings, infrastructure and vegetation. We provide two instances of Copernicus DEM named GLO-3…[Sinergise](https://www.sinergise.com/)None, except GLO-30 Public can be updated if the public tile list changes.aws-pds,agriculture,elevation,earth observation,satellite imagery,geospatial,disaster response,cogGLO-30 Public and GLO-90 are available on a free basis for the general public under the terms and conditions of the Licence found on [here](https://dataspace.copernicus.eu/explore-data/data-collection…
coralreef-image-classification-trainingCommunity coral reef image classification training dataCommunity-sourced repository of coral reef image classification training data, including continually updated confirmed annotations from [MERMAID](https://datamermaid.org/)[MERMAID](https://datamermaid.org/)Each partner organization updates on their own cadence. MERMAID updates once per day.aws-pds,coastal,conservation,coral reef,csv,global,machine learning,marine,parquet,survey[Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)](https://creativecommons.org/licenses/by-nc-sa/4.0/)
cord-19COVID-19 Open Research Dataset (CORD-19)Full-text and metadata dataset of COVID-19 and coronavirus-related research articles optimized for machine readability.Allen Institute for AIWeeklyaws-pds,COVID-19,coronavirus,life sciences,SARS,MERSOpen (see license file for details)
cornell-eas-data-lakeCornell EAS Data LakeEarth & Atmospheric Sciences at Cornell University has created a public data lake of climate data. The data is stored in columnar storage formats (ORC) to make it straightforward to query using standa…Not currently managedHourlyagriculture,aws-pds,climate,earth observation,elevation,environmental,geospatial,mapping,meteorological,weatherhttps://datalake.eas.cornell.edu/license.txttrue
cotonoha-dicJapanese Tokenizer DictionariesJapanese Tokenizer Dictionaries for use with MeCab.CotonohaInfrequently (typically less than once a year)aws-pds,natural language processing,csv,japaneseVersions of Unidic offered here are available under the GPL/LGPL/BSD license. IPADic is offered under a unique BSD-like license. See below. https://github.com/polm/ipadic-py/blob/master/ipadic/…
covers-brCoversBRCoversBR is the first large audio database with, predominantly, Brazilian music for the tasks of Covers Song Identification (CSI) and Live Song Identifications (LSI). Due to copyright restrictions aud…Dirceu G SilvaNew metadata, songs features files and audio streamings for live song identification will be added as soon as available.aws-pds,copyright monitoring,cover song identification,live song identification,music,music features dataset,music information retrieval,music recognitionThe code in this repository is licensed under Apache 2.0 The metadata and the pre-extracted features are licensed under CC BY-NC-SA 4.0
cptac-2Clinical Proteomic Tumor Analysis Consortium 2 (CPTAC-2)The Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genom…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,cancer,genomic,life sciences,transcriptomics,STRIDESNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
cptac-3Clinical Proteomic Tumor Analysis Consortium 3 (CPTAC-3)The Clinical Proteomic Tumor Analysis Consortium (CPTAC) is a national effort to accelerate the understanding of the molecular basis of cancer through the application of large-scale proteome and genom…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,cancer,genomic,life sciences,transcriptomics,STRIDESNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
craam-open-vlfOpen VLF: Scientific Open Data Initiative for CRAAM's SAVNET and AWESOME VLF Data.This platform is maintained by [CRAAM](https://www.mackenzie.br/centro-de-radio-astronomia-e-astrofisica-mackenzie) (Mackenzie Radio Astronomy and Astrophysics Center), a research center operated by […[CRAAM Mackenzie](https://www.mackenzie.br/centro-de-radio-astronomia-e-astrofisica-mackenzie)Various. Data since 2006, and still updated. Follow the announcements and what is new on the project website [Open VLF](https://open-vlf.web.app).archives,astronomy,atmosphere,aws-pds,global,open source software,signal processing,life sciencesThere are no restrictions on the use of this data.
cropland_partitioiningIWMI DIWASA Rainfed and Irrigated Cropland Map for AfricaA framework integrating the Budyko model has been developed to distinguish between rainfed and irrigated cropland areas across Africa. This expands on remote sensing land cover products available for…[IWMI](https://www.iwmi.org/)Nonecropland partitioning,irrigated cropland,rainfed cropland,agriculture,land use,land coverThere are no restrictions on the use of this data.
cryoet-data-portalCryoET Data PortalCryo-electron tomography (cryoET) is a powerful technique for visualizing 3D structures of cellular macromolecules at near atomic resolution in their native environment. Observing the inner workings o…[Chan Zuckerberg Initiative Foundation](http://www.chanzuckerberg.com/)New releases are published on a rolling basis. Please contact the team via email for any questions.cryo electron tomography,electron tomography,cell biology,structural biology,machine learning,segmentation,czi,life sciencesCC0
cse-cic-ids2018A Realistic Cyber Defense Dataset (CSE-CIC-IDS2018)This dataset is the result of a collaborative project between the Communications Security Establishment (CSE) and The Canadian Institute for Cybersecurity (CIC) that use the notion of profiles to gene…Canadian Institute for CybersecurityAnnuallyaws-pds,network traffic,internet,intrusion detection,cyber security,aws-pdshttp://www.unb.ca/cic/datasets/ids-2018.html
csiro-cafe60CAFE60 reanalysisThe CSIRO Climate retrospective Analysis and Forecast Ensemble system: version 1 (CAFE60v1) provides a large ensemble retrospective analysis of the global climate system from 1960 to present with suff…[CSIRO](http://csiro.au/)6 Monthly (Approx)aws-pds,climate,sustainabilityCreative Commons Attribution-ShareAlike 4.0 International Licence
ctrees-amazon-canopy-heightCanopy Tree Height Map for the Amazon Forest (mean height composite 2020-2024) by CTrees.orgMean canopy Tree Height for the Amazon Forest on the period 2020-2024 at 4.78 m of spatial resolution. Created using a deep learning model on high-resolution Planet imagery from the Norway's Internati…[CTrees](https://ctrees.org/)TBDaws-pds,cog,earth observation,land cover,deep learning,lidar,satellite imagery,image processing,environmental,conservationOur canopy height map is a derivative product of Planet-NICFI and follows the same licence: [NICFI licensing agreement](https://planet.widen.net/s/zfdpf8qxwk/participantlicenseagreement_nicfi_2024)
ctrees-california-vhr-tree-heightSub-Meter Canopy Tree Height of California in 2020 by CTrees.orgCanopy Tree Height maps for California in 2020. Created using a deep learning model on very-high-resolution airborne imagery from the National Agriculture Imagery Program (NAIP) by United States Depar…[CTrees](https://ctrees.org/)TBDaws-pds,cog,earth observation,land cover,deep learning,aerial imagery,image processing,environmental,conservation,geospatialhttps://creativecommons.org/licenses/by/4.0/
ctsp-dlbclClinical Trial Sequencing Project - Diffuse Large B-Cell LymphomaThe goal of the project is to identify recurrent genetic alterations (mutations, deletions, amplifications, rearrangements) and/or gene expression signatures. National Cancer Institute (NCI) utilized…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,cancer,genomic,life sciences,transcriptomics,whole genome sequencing,STRIDESNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
cwa_opendataCentral Weather Administration OpenDataVarious kinds of weather raw data and charts from Central Weather Administration.[Central Weather Administration](https://www.cwa.gov.tw/)Data is updated as soon as newer one is available.aws-pds,climate,earth observation,earthquakes,satellite imagery,weatherhttp://data.gov.tw/license
cwb_opendataCentral Weather Bureau OpenDataVarious kinds of weather raw data and charts from Central Weather Bureau.[Central Weather Bureau](https://www.cwb.gov.tw/)Data is updated as soon as newer one is available.aws-pds,climate,earth observation,earthquakes,satellite imagery,weatherhttp://data.gov.tw/license
czb-opencellOpenCell on AWSThe OpenCell project is a proteome-scale effort to measure the localization and interactions of human proteins using high-throughput genome engineering to endogenously tag thousands of proteins in the…[Chan Zuckerberg Biohub](https://www.czbiohub.org/)This is the final version of the dataset.aws-pds,biology,cell biology,life sciences,imaging,cell imaging,fluorescence imaging,microscopy,computer vision,machine learningCC BY-SA 4.0
czi-benchmarkingCZ Grand Challenges - Model BenchmarkingThis dataset includes data and models relevant to benchmarking multimodal biological models. The data has been sourced and curated by a team of experts at CZI and is provided as part of these datasets…[Chan Zuckerberg Initiative Foundation](http://www.chanzuckerberg.com/)Update cadence may vary for each resourcebenchmark,biology,biomolecular modeling,czi,life sciences,machine learning,model,cell biologyCC0 1.0
czi-cellxgene-censusCZ CELLxGENE Discover CensusCZ CELLxGENE Discover ([cellxgene.cziscience.com](https://cellxgene.cziscience.com/)) is a free-to-use platform for the exploration, analysis, and retrieval of single-cell data. CZ CELLxGENE Discover…[Chan Zuckerberg Initiative Foundation](http://www.chanzuckerberg.com/)New releases are published weekly. Long-term supported (LTS) releases are published every 6 months.aws-pds,single-cell transcriptomics,transcriptomics,cell biology,bioinformatics,life sciencesCC BY license
czi-imaging-bsdCZ Grand Challenges - Imaging BSD licensed data and modelsThis dataset contains a diverse range of imaging biological data and models. The data is sourced and curated by a team of experts at CZI and is made available as part of these datasets only when it is…[Chan Zuckerberg Initiative Foundation](http://www.chanzuckerberg.com/)Update cadence may vary for each resourcemicroscopy,cell imaging,machine learning,model,czi,imaging,biodiversity,bioinformatics,biology,biomolecular modeling[BSD-3-Clause License](https://opensource.org/license/bsd-3-clause)
czi-imagining-mitCZ Grand Challenges - Imaging MIT Licensed data and modelsThis dataset contains a diverse range of imaging biological data and models. The data is sourced and curated by a team of experts at CZI and is made available as part of these datasets only when it is…[Chan Zuckerberg Initiative Foundation](http://www.chanzuckerberg.com/)Update cadence may vary for each resourcemicroscopy,cell imaging,machine learning,model,czi,imaging,biodiversity,bioinformatics,biology,biomolecular modeling[MIT License](https://opensource.org/license/mit)
czi-transcriptomics-mitCZ Grand Challenges - Transcriptomic MIT Licensed data and modelsThis dataset contains a transcriptomics biological data and models. The models embed transcriptomic data and facilitate transcriptomic analysis. The data is sourced and curated by a team of experts at…[Chan Zuckerberg Initiative Foundation](http://www.chanzuckerberg.com/)Update cadence may vary for each resourcetranscriptomics,machine learning,model,czi,hdf5,biodiversity,biology,biomolecular modeling,cell biology,life sciences[MIT License](https://opensource.org/license/mit)
dandiarchiveDistributed Archives for Neurophysiology Data Integration (DANDI)DANDI is a public archive of neurophysiology datasets, including raw and processed data, and associated software containers. Datasets are shared according to Creative Commons CC0 or CC-BY licenses. Th…[DANDI Archive](https://about.dandiarchive.org/team)New datasets deposited every monthaws-pds,biology,calcium imaging,cell imaging,electrophysiology,hdf5,life sciences,neuroimaging,neurophysiology,neuroscienceCC0, CC-BY
darpa-invisible-headlightsDARPA Invisible Headlights Dataset"The DARPA Invisible Headlights Dataset is a large-scale multi-sensor dataset annotated for autonomous, off-road navigation in challenging off-road environments. It features simultaneously collected o…[Kitware](http://www.kitware.com/)No updates anticipated at this time. aws-pds,autonomous vehicles,broadband,computer vision,lidar,machine learning,segmentation,usThis work is licensed under CC BY 4.0. This dataset was developed with funding from the Defense Advanced Research Projects Agency (DARPA). Distribution Statement A: Approved for public release. Distri…
data-to-scienceData to Science CatalogA user-generated geospatial data collection maintained by the Data to Science platform. Contributions vary by project, but typically include cloud-optimized datasets such as Cloud-Optimized GeoTIFFs…Geospatial Data Science Lab at Purdue UniversityNoneaws-pds,aerial imagery,agriculture,cog,dsm,dtm,earth observation,geospatial,high-throughput imaging,image processingCC-BY-SA-4.0 (https://creativecommons.org/licenses/by-sa/4.0/)
dataforgood-fb-forestsHigh Resolution Canopy Height Maps by WRI and MetaGlobal and regional Canopy Height Maps (CHM). Created using machine learning models on high-resolution worldwide Maxar satellite imagery. [Meta](https://dataforgood.fb.com/)TBDaws-pds,cog,earth observation,climate,land cover,agriculture,machine learning,aerial imagery,satellite imagery,image processinghttps://creativecommons.org/licenses/by/4.0/
dataforgood-fb-forestsv2Version 2 High Resolution Canopy Height Maps by WRI and MetaVersion 2 Global and regional Canopy Height Maps (CHMv2). Created using machine learning models on high-resolution worldwide Vantor satellite imagery. [Meta](https://dataforgood.fb.com/)TBDaws-pds,cog,earth observation,climate,land cover,agriculture,machine learning,aerial imagery,satellite imagery,image processinghttps://creativecommons.org/licenses/by/4.0/
dataforgood-fb-hrslHigh Resolution Population Density Maps + Demographic Estimates by CIESIN and MetaPopulation data for a selection of countries, allocated to 1 arcsecond blocks and provided in a combination of CSV and Cloud-optimized GeoTIFF files. This refines [CIESIN’s Gridded Population of the W…[Meta](https://dataforgood.fb.com/)Quarterlypopulation,demographics,machine learning,aerial imagery,satellite imagery,image processing,geospatial,disaster response,aws-pdshttps://creativecommons.org/licenses/by/4.0/
daylight-osmDaylight Map Distribution of OpenStreetMapDaylight is a complete distribution of global, open map data that’s freely available with support from community and professional mapmakers. Meta combines the work of global contributors to projects l…[Meta](https://dataforgood.fb.com/)Quarterlygeospatial,osm,mapping,disaster response,aws-pds[Open Database License (ODbL)](https://opendatacommons.org/licenses/odbl/1-0/)
dc-lidar-2015District of Columbia - Classified Point Cloud LiDAR"LiDAR point cloud data for Washington, DC is available for anyone to use on Amazon S3. This dataset, managed by the Office of the Chief Technology Officer (OCTO), through the direction of the Distric…[Washington DC government](https://dc.gov/)The most recent data is from 2018 and 2015 data is available as well. A new data acquisition is planned for 2020.aws-pds,geospatial,cities,us-dc,disaster responseSee Washington, DC [Terms of Use](https://dc.gov/page/terms-and-conditions-use)true
dc-lidarDistrict of Columbia - Classified Point Cloud LiDARLiDAR point cloud data for Washington, DC is available for anyone to use on Amazon S3. This dataset, managed by the Office of the Chief Technology Officer (OCTO), through the direction of the District…[Washington DC government](https://dc.gov/)The most recent data is from 2018 and 2015 data is available as well. A new data acquisition is planned for 2020.aws-pds,geospatial,cities,us-dc,disaster responseSee Washington, DC [Terms of Use](https://dc.gov/page/terms-and-conditions-use)
dccThe MIT Supercloud DatasetCollection of parsed datacenter logs and time series data of hardware utilization from the MIT Supercloud system.Siddharth SamsiData will be updated annuallydatacenter,HPC,cloud computing,workload analysis,energy,aws-pdshttp://creativecommons.org/licenses/by-nc-nd/4.0/
deafrica-alos-jersDigital Earth Africa ALOS PALSAR, ALOS-2 PALSAR-2 and JERS-1The ALOS/PALSAR annual mosaic is a global 25 m resolution dataset that combines data from many images captured by JAXA’s PALSAR and PALSAR-2 sensors on ALOS-1 and ALOS-2 satellites respectively. This…[Digital Earth Africa](https://www.digitalearthafrica.org/)As available, generally annually.aws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,disaster response,synthetic aperture radar,deafrica,stacData is available for free under the [terms of use](https://earth.jaxa.jp/policy/en.html).
deafrica-chirpsDigital Earth Africa CHIRPS RainfallDigital Earth Africa (DE Africa) provides free and open access to a copy of the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS) monthly and daily products over Africa. The CHIR…[Digital Earth Africa](https://www.digitalearthafrica.org/)Monthly.aws-pds,agriculture,climate,earth observation,food security,geospatial,meteorological,satellite imagery,sustainability,deafricaTo the extent possible under the law, Pete Peterson has waived all copyright and related or neighboring rights to CHIRPS. CHIRPS data is in the public domain as registered with Creative Commons.
deafrica-clgm-lwqDigital Earth Africa - Copernicus Global Land Service - Lake Water QualityThe Copernicus Global Land Service – Lake Water Quality products offer a comprehensive, satellite-derived monitoring system for assessing key water quality indicators in major large lakes, typically t…[Digital Earth Africa](https://www.digitalearthafrica.org/)New scene-level data is added regularly, as the Lake Water Quality (LWQ) datasets are updated every 10 days (dekadal composites), with near-real-time versions typically available within 3 to 4 days af…aws-pds,agriculture,disaster response,earth observation,geospatial,natural resource,satellite imagery,water,deafrica,stacDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-coastlinesDigital Earth Africa CoastlinesAfrica's long and dynamic coastline is subject to a wide range of pressures, including extreme weather and climate, sea level rise and human development. Understanding how the coastline responds to th…[Digital Earth Africa](https://www.digitalearthafrica.org/)To be defined.aws-pds,climate,coastal,earth observation,geospatial,satellite imagery,sustainability,deafricaDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-crop-extentDigital Earth Africa Cropland Extent Map (2019)Digital Earth Africa's cropland extent map (2019) shows the estimated location of croplands in Africa for the period January to December 2019. Cropland is defined as: "a piece of land of minimum 0.01…[Digital Earth Africa](https://www.digitalearthafrica.org/)To be defined.aws-pds,agriculture,earth observation,food security,geospatial,satellite imagery,sustainability,deafrica,stac,cogDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-fractional-coverDigital Earth Africa Fractional CoverFractional cover (FC) describes the landscape in terms of coverage by green vegetation, non-green vegetation (including deciduous trees during autumn, dry grass, etc.) and bare soil. It provides insig…[Digital Earth Africa](https://www.digitalearthafrica.org/)New scene-level data is added as new Landsat data is available. New summaries are available soon after data is available for a year.aws-pds,agriculture,disaster response,earth observation,geospatial,natural resource,satellite imagery,sustainability,deafrica,stacDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-geomadDigital Earth Africa GeoMADGeoMAD is the Digital Earth Africa (DE Africa) surface reflectance geomedian and triple Median Absolute Deviation data service. It is a cloud-free composite of satellite data compiled over specific ti…[Digital Earth Africa](https://www.digitalearthafrica.org/)GeoMADs for Sentinel-2 and Landsat are updated as their respective time periods are available.aws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,disaster response,deafrica,stac,cogDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-landsatDigital Earth Africa Landsat Collection 2 Level 2Digital Earth Africa (DE Africa) provides free and open access to a copy of Landsat Collection 2 Level-2 products over Africa. These products are produced and provided by the United States Geological…[Digital Earth Africa](https://www.digitalearthafrica.org/)New Landsat data are added regularly, usually within a few hours of them being available in the usgs-landsat bucket.aws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,disaster response,deafrica,stac,cogThere are no restrictions on Landsat data downloaded from the USGS; it can be used or redistributed as desired. USGS request that you include a [statement of the data source](https://www.usgs.gov/cen…
deafrica-mangroveDigital Earth Africa Global Mangrove WatchThe Global Mangrove Watch (GMW) dataset is a result of the collaboration between Aberystwyth University (U.K.), solo Earth Observation (soloEO; Japan), Wetlands International the World Conservation Mo…[Digital Earth Africa](https://www.digitalearthafrica.org/)To be defined.aws-pds,natural resource,earth observation,coastal,geospatial,satellite imagery,sustainability,deafrica,stac,cogDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-ndvi_anomalyDigital Earth Africa Monthly Normalised Difference Vegetation Index (NDVI) AnomalyDigital Earth Africa’s Monthly NDVI Anomaly service provides estimate of vegetation condition, for each caldendar month, against the long-term baseline condition measured for the month from 1984 to 20…[Digital Earth Africa](https://www.digitalearthafrica.org/)From September 2022, the Monthly NDVI Anomaly is generated as a low latency product, i.e. anomaly for a month is generated on the 5th day of the following month. This ensures data is available shortly…aws-pds,agriculture,disaster response,earth observation,geospatial,natural resource,satellite imagery,deafrica,stac,cogDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-ndvi_climatology_lsDigital Earth Africa Normalised Difference Vegetation Index (NDVI) ClimatologyDigital Earth Africa’s NDVI climatology product represents the long-term average baseline condition of vegetation for every Landsat pixel over the African continent. Both mean and standard deviation N…[Digital Earth Africa](https://www.digitalearthafrica.org/)N/A.aws-pds,agriculture,disaster response,earth observation,geospatial,natural resource,agriculture,satellite imagery,deafrica,stacDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-sentinel-1-mosaicSentinel-1 Monthly MosaicSynthetic Aperture Radar (SAR) sensor have the advantage of operating at wavelengths not impeded by cloud cover and can acquire data over a site during the day or night. The Sentinel-1 mission, part o…[Digital Earth Africa](https://www.digitalearthafrica.org/)N/A.aws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,disaster response,deafrica,stac,cogAccess to S1 Monthly Mosaic data is free, full and open for the broad Regional, National, European and International user community. View [Terms and Conditions](https://scihub.copernicus.eu/twiki/do/v…
deafrica-sentinel-1Digital Earth Africa Sentinel-1 Radiometrically Terrain CorrectedDE Africa’s Sentinel-1 backscatter product is developed to be compliant with the CEOS Analysis Ready Data for Land (CARD4L) specifications. The Sentinel-1 mission, composed of a constellation of two C…[Digital Earth Africa](https://www.digitalearthafrica.org/)New Sentinel-1 data are added regularly.aws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,disaster response,deafrica,stac,cogAccess to Sentinel data is free, full and open for the broad Regional, National, European and International user community. View [Terms and Conditions](https://scihub.copernicus.eu/twiki/do/view/SciHu…
deafrica-sentinel-2-c1Digital Earth Africa Sentinel-2 Level-2A Surface Reflectance Collection 1The Sentinel-2 mission is part of the European Union Copernicus programme for Earth observations. Sentinel-2 consists of twin satellites, Sentinel-2A (launched 23 June 2015) and Sentinel-2B (launched…[Digital Earth Africa](https://www.digitalearthafrica.org/)New Sentinel-2 scenes are added regularly, usually within few hours after they are available on Copernicus OpenHub.aws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,disaster response,deafrica,stac,cogAccess to Sentinel data is free, full and open for the broad Regional, National, European and International user community. View [Terms and Conditions](https://scihub.copernicus.eu/twiki/do/view/SciHu…
deafrica-sentinel-2Digital Earth Africa Sentinel-2 Level-2AThe Sentinel-2 mission is part of the European Union Copernicus programme for Earth observations. Sentinel-2 consists of twin satellites, Sentinel-2A (launched 23 June 2015) and Sentinel-2B (launched…[Digital Earth Africa](https://www.digitalearthafrica.org/)New Sentinel-2 scenes are added regularly, usually within few hours after they are available on Copernicus OpenHub.aws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,disaster response,deafrica,stac,cogAccess to Sentinel data is free, full and open for the broad Regional, National, European and International user community. View [Terms and Conditions](https://scihub.copernicus.eu/twiki/do/view/SciHu…
deafrica-waterbodiesDE Africa Waterbodies Monitoring ServiceThe Digital Earth Africa continental Waterbodies Monitoring Service identifies more than 700,000 water bodies from over three decades of satellite observations. This service maps persistent and season…[Digital Earth Africa](https://www.digitalearthafrica.org/)Single historical extent derived from the full temporal range.aws-pds,agriculture,disaster response,earth observation,geospatial,natural resource,satellite imagery,water,deafrica,stacDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deafrica-wofsDigital Earth Africa Water Observations from SpaceWater Observations from Space (WOfS) is a service that draws on satellite imagery to provide historical surface water observations of the whole African continent. WOfS allows users to understand the l…[Digital Earth Africa](https://www.digitalearthafrica.org/)New scene-level data is added as new Landsat data is available. New summaries are available soon after data is available for a year.aws-pds,agriculture,disaster response,earth observation,geospatial,natural resource,satellite imagery,water,deafrica,stacDE Africa makes this data available under the Creative Commons Attribute 4.0 license https://creativecommons.org/licenses/by/4.0/.
deepdrug-dpebDeepDrug Protein Embeddings Bank (DPEB)DPEB is a multimodal database of human protein embeddings integrating four biologically complementary representations—AlphaFold2, BioEmbeddings, ESM-2, and ProtVec—designed for enhanced protein-protei…Louisiana State UniversityInitial release; maintained for at least 2 years with updates planned based on new embedding models and protein coverage.bioinformatics,protein,structural biology,machine learning,life sciences,aws-pdsMIT
dendritic-consortiumDendritic Consortium Multimodal DatasetThe Dendritic Consortium provides a multimodal dataset integrating calcium and voltage imaging, electrophysiology, electron microscopy, proteomics, and computational models of Baz1a pyramidal neurons…Dendritic ConsortiumContinuously updated as new experimental and computational data are generated.brain images,brain models,electrophysiology,electron microscopy,imaging,life sciences,Mus musculus,neuroscience,neurobiology,neuroimagingThere are no restrictions on the use of this data.
dep-coastlinesPacific Coastlines ChangePacific Coastlines beta version product includes coastline change detection since the year 2000 for Pacific Island Country and Territories (PICTs). This product will provide ongoing monitoring of coas…[Pacific Community (SPC)](https://www.spc.int/)Annuallyearth observation,environmental,coastal,geoscience,geospatialDigital Earth Pacific Data is available under the Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
dep-ls-geomadsLandsat Geometric Median and Absolute Deviations (GeoMAD) over the Pacific.The GeoMAD is derived from Landsat surface reflectance data. The data are masked for cloud, shadows and other image artefacts using the associated pixel quality product to help provide as clear a set…[Pacific Community (SPC)](https://www.spc.int/)Annuallyearth observation,geoscience,geospatialDigital Earth Pacific Data is available under the Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
dep-mangrovesDigital Earth Pacific Mangroves Extent and DensityPacific Mangroves beta version product is an extension of the Global Mangrove Watch (GMV v3, 2020). which shows the extent of mangrove ecosystems across Pacific Island Countries and Territories (PICTs…[Pacific Community (SPC)](https://www.spc.int/)Annuallyearth observation,environmental,climate,geoscience,geospatialDigital Earth Pacific Data is available under the Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
dep-s1-annual-mosaicsSentinel-1 Mean and Median Annual MosaicSentinel-1 carries a Synthetic Aperture RADAR (SAR) that operates on the C-band. This platform offers SAR data day and night and in all-weather conditions. [Pacific Community (SPC)](https://www.spc.int/)Annuallyearth observation,environmental,climate,geoscience,geospatialDigital Earth Pacific Data is available under the Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
dep-s2-geomadsSentinel-2 Geometric Median and Absolute Deviations (GeoMAD) over the PacificThe Geometric Median and Absolute Deviations (GeoMAD) product is a cloud-free annual mosaic that uses a more robust method of determining the median observation than a simple median. Along with the me…[Pacific Community (SPC)](https://www.spc.int/)Annuallyearth observation,geoscience,geospatialDigital Earth Pacific Data is available under the Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
dep-wofsDigital Earth Pacific Water Observatins from Space (WOfS)Water Observations from Space (WOfS) beta version product for Water Observations from Space (WOfS) is an annual summary of the temporal and spatial extent of surface water over landscapes. In essence,…[Pacific Community (SPC)](https://www.spc.int/)Annuallyearth observation,environmental,water,geoscience,geospatialDigital Earth Pacific Data is available under the Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
depmap-omics-ccleThe Cancer Dependency Map (DepMap) Cancer Cell Line Encyclopedia (CCLE) DatasetThis dataset consists of whole genome sequencing (WGS), whole exome sequencing (WES), and RNA sequencing files generated from ~1000 cancer cell lines described in Ghandi et al., 2019.[Broad Institute](https://www.broadinstitute.org/)occasionally (as additional sequencings are generated for publicly-releasible CCLE models)aws-pds,bam,biology,bioinformatics,cancer,genetic,genomic,Homo sapiens,life sciences,short read sequencingBy downloading this data you agree to our [Terms and Conditions](https://depmap.org/portal/terms/)
deutsche-boerse-pdsDeutsche Börse Public DatasetThe Deutsche Börse Public Data Set consists of trade data aggregated to one minute intervals from the Eurex and Xetra trading systems. It provides the initial price, lowest price, highest price, final…Not currently managedThe data is updated every minute during trading hours.aws-pds,market data,financial markets,tradingNon-commercial (NC) - licensees may copy, distribute, display, and perform the work and make derivative works and remixes based on it only for non-commercial purposes.true
dharani-brain-datasetDHARANI Developing Human-Brain AtlasWe introduce DHARANI, the first online platform with three-dimensional (3D) histological reconstructions of the developing human brain from 14 to 24 gestational weeks (GW) across the five fetal brains…[Sudha Gopalakrishnan Brain Centre, IIT Madras](http://iitm.humanbrain.in/)Neverlife sciences,brain images,microscopy,neurobiology,segmentation,computer visionCC-by-4.0
dialoglueDialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented DialogueThis bucket contains the checkpoints used to reproduce the baseline results reported in the DialoGLUE benchmark hosted on EvalAI (https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview)…[Amazon](https://www.amazon.com/)Not currently being updatedamazon.science,natural language processing,machine learning,conversation data[CDLA-Sharing](https://cdla.io/sharing-1-0/)
dig-open-analysis-dataKnowledge Portal Network Bottom-line Genetic AssociationsAt the Knowledge Portal Network, we aggregate and analyze genetic association results for a wide range of diseases and traits. For any given disease, a large number of individual genetic association d…Jason Flannick lab, Broad Institute of MIT and HarvardThree times per yeargenome wide association study,life sciences,genetichttps://a2f.hugeamp.org/policies.html
digital-globe-open-dataMaxar Open Data ProgramPre and post event high-resolution satellite imagery in support of emergency planning, risk assessment, monitoring of staging areas and emergency response, damage assessment, and recovery. Also includ…[Maxar](https://www.maxar.com/)New data is released in response to activations.earth observation,disaster response,geospatial,satellite imagery,cogCreative Commons Attribution Non Commercial 4.0true
digitalcorporaDigitalCorporaDisk images, memory dumps, network packet captures, and files for use in digital forensics research and education. All of this information is accessible through the digitalcorpora.org website, and mad…[Simson Garfinkel](https://simson.net/)Quarterlyaws-pds,computer security,cyber security,CSI,information retrieval,computer forensics,digital forensics,machine learning,machine translation,image processingThere are no restrictions on the use of this data.
dmi-danra-05Danish Meteorological Institute (DMI) Reanalysis dataset v0.5DANRA is a high-resolution meteorological reanalysis dataset for Denmark and Northwestern Europe covering the period September 1990 to December 2023[Danish Meteorological Institute](https://www.dmi.dk/)Not updatedaws-pds,air temperature,atmosphere,geospatial,global,land,meteorological,near-surface air temperature,near-surface relative humidity,near-surface specific humidityDMI Reanalysis dataset v0.5 is distributed under the [Creative Commons License CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode.en)
dmi-opendataDanish Meteorological Institute (DMI) Open Data ForecastsDMI forecast data consist of various models where each model contains different set of parameters relating to a specific domain like ocean (WAM), storm flooding (DKSS) or weather (HARMONIE)[Danish Meteorological Institute](https://www.dmi.dk/)Every hour, 3 hours or 6 hours depending on modelaws-pds,air temperature,atmosphere,forecast,meteorological,near-surface air temperature,near-surface relative humidity,near-surface specific humidity,model,ocean circulationDMI's Open Data are distributed under the [Creative Commons License CC BY 4.0](https://creativecommons.org/licenses/by/4.0/legalcode.en)
dmspssjDefense Meteorology Satellite Program (DMSP) Auroral Particle FluxThe United States Air Force (USAF) Defense Meteorological Satellite Program (DMSP) SSJ precipitating particle instrument measures in-situ total flux and energy distribution of electrons and ions at lo…Space Weather Technology, Research and Education Center (TREC) at University of Colorado, BoulderInfrequentaws-pds,solar,space weather,geospatial,earth observationThis data is in the '[public domain](https://creativecommons.org/publicdomain/zero/1.0/)'
dnastack-covid-19-sra-dataDNAStack COVID19 SRA DataThe [Sequence Read Archive (SRA)](https://www.ncbi.nlm.nih.gov/sra/) is the primary archive of high-throughput sequencing data, hosted by the National Institutes of Health (NIH). The SRA represents th…[DNAstack](https://dnastack.com/)Rollingaws-pds,bam,bioinformatics,coronavirus,COVID-19,fasta,fastq,global,genetic,genomic[DNAstack terms of use](https://dnastack.com/terms-of-use/)
dynamical-ecmwf-aifs-singleECMWF AIFS Single - dynamical.org Icechunk Zarr<p> The Artificial Intelligence Forecasting System (AIFS) is a data driven forecast model developed by the European Centre for Medium-Range Weather Forecasts (ECMWF). This is the non-ensemble co…[dynamical.org](https://dynamical.org)ECMWF AIFS Single forecast: Forecasts initialized every 6 hoursaws-pds,weather,atmosphere,meteorological,climate,forecast,zarr[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
dynamical-ecmwf-ifs-ensECMWF IFS ENS - dynamical.org Icechunk Zarr<p> The Integrated Forecasting System (IFS) is a global forecast model developed by ECMWF. ENS is an ensemble configuration of IFS, containing 51 ensemble members. IFS consists of a numerica…[dynamical.org](https://dynamical.org)ECMWF IFS ENS Forecast, 15 day, 0.25 degree: Forecasts initialized every 24 hoursaws-pds,weather,atmosphere,meteorological,climate,forecast,zarr[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
dynamical-noaa-gefsNOAA GEFS - dynamical.org Icechunk Zarr<p> The Global Ensemble Forecast System (GEFS) is a National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) weather forecast model. GEFS creat…[dynamical.org](https://dynamical.org)NOAA GEFS forecast, 35 day: Forecasts initialized every 24 hours; NOAA GEFS analysis: 3.0 hoursaws-pds,weather,atmosphere,meteorological,climate,forecast,zarr[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
dynamical-noaa-gfsNOAA GFS - dynamical.org Icechunk Zarr<p> The Global Forecast System (GFS) is a National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) weather forecast model that generates data f…[dynamical.org](https://dynamical.org)NOAA GFS analysis: 1 hour; NOAA GFS forecast: Forecasts initialized every 6 hoursaws-pds,weather,atmosphere,meteorological,climate,forecast,zarr[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
dynamical-noaa-hrrrNOAA HRRR - dynamical.org Icechunk Zarr<p> The High-Resolution Rapid Refresh (HRRR) is a NOAA real-time 3-km resolution, hourly updated, cloud-resolving, convection-allowing atmospheric model, initialized by 3km grids with 3km radar…[dynamical.org](https://dynamical.org)NOAA HRRR forecast, 48 hour: Forecasts initialized every 6 hours; NOAA HRRR analysis: 1 houraws-pds,weather,atmosphere,meteorological,climate,forecast,zarr[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
dynamical-noaa-mrmsNOAA MRMS - dynamical.org Icechunk Zarr<p> The NOAA Multi-Radar/Multi-Sensor System (MRMS) integrates data from multiple radars and radar networks, surface observations, numerical weather prediction (NWP) models, and climatology to g…[dynamical.org](https://dynamical.org)NOAA MRMS CONUS analysis, hourly: 1 houraws-pds,weather,atmosphere,meteorological,climate,forecast,zarr[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
e11bio-prismE11bio PRISMThis dataset was generated using E11.bio's PRISM technology (Protein Reconstruction and Identification through Multiplexing), a platform that combines viral barcoding, expansion microscopy, and iterat…[E11.bio](https://e11.bio)As requiredbioinformatics,biology,brain images,cell imaging,computer vision,fluorescence imaging,high-throughput imaging,image processing,imaging,ion channelshttps://e11.bio/terms-of-use
eai-essential-web-v1Essential-Web v1.0: 24T tokens of organized web dataA 24-trillion-token dataset in which every document is annotated with a twelve-category taxonomy covering topic, format, content complexity, and quality.[EssentialAI](https://www.essential.ai)Not updatedaws-pds,machine learning,natural language processing,web archive,text analysisEssential-Web-v1.0 contributions are made available under the [ODC attribution license](https://opendatacommons.org/licenses/by/odc_by_1.0_public_text.txt); however, users should also abide by the [Co…
ebd-sentinel-1-global-coherence-backscatterGlobal Seasonal Sentinel-1 Interferometric Coherence and Backscatter Data SetThis data set is the first-of-its-kind spatial representation of multi-seasonal, global SAR repeat-pass interferometric coherence and backscatter signatures. Global coverage comprises all land masses…[Earth Big Data LLC](https://earthbigdata.com/)The data set covers the time period from 1-Dec-2019 to 30-Nov-2020. No updates are currently produced.global,satellite imagery,ecosystems,agriculture,urban,infrastructure,earth observation,earthquakes,environmental,geologyThe use of these data fall under the terms and conditions of the [Creative Commons Attribution 4.0 International Public License](https://creativecommons.org/licenses/by/4.0/legalcode). Contains modifi…
ecmwf-era5ECMWF ERA5 ReanalysisERA5 is the fifth generation of ECMWF atmospheric reanalyses of the global climate, and the first reanalysis produced as an operational service. It utilizes the best available observation data from sa…[Intertrust](https://www.intertrust.com/)Monthlyaws-pds,agriculture,climate,earth observation,meteorological,weatherGenerated using Copernicus Climate Change Service Information 2018. See http://apps.ecmwf.int/datasets/licences/copernicus/ for additional information.true
ecmwf-forecastsECMWF real-time forecastsThese products are a subset of the ECMWF real-time forecast data and are made available to the public free of charge. They are based on the medium-range (high-resolution and ensembles) forecast models…[European Centre for Medium-Range Weather Forecasts](https://www.ecmwf.int/)IFS data is published according to the [real-time dissemination schedule](https://confluence.ecmwf.int/display/DAC/Dissemination+schedule), while AIFS data is published as soon as it is available for…aws-pds,air temperature,atmosphere,meteorological,near-surface air temperature,near-surface relative humidity,near-surface specific humidity,precipitation,weatherThis ECMWF data is published under a Creative Commons Attribution 4.0 International license ([CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/)) and the [ECMWF Terms of Use](https://apps.ecmwf.…
eegdashEEGDash on AWSThe EEG-DaSh (EEG Data Sharing) data archive is a large-scale data-sharing resource for magnetoencephalography and electroencephalography (MEEG) data hosted at the Swartz Center for Computational Neur…[Swartz Center for Computational Neuroscience](https://sccn.ucsd.edu/)About once a weekaws-pds,life sciences,machine learning,deep learning,neuroscience,neuroimagingThere are no restrictions on the use of this data.
elp-nouabale-landscapeSounds of Central African landscapesArchival soundscapes recorded in the rainforest landscapes of Central Africa, with a focus on the vocalizations of African forest elephants (Loxodonta cyclotis). Center for Conservation Bioacoustics, Cornell University (https://elephantlisteningproject.org)New sound data spanning 4-month time periods added as soon as possibleaws-pds,biodiversity,ecosystems,biology,land,life sciences,natural resource,survey,geospatialThese sound files are freely available for scientific study and exploration, including for the development of detection algorithms. Derivative works based on use of these sounds (e.g. publications, re…
emberEMBER Open DatasetsThis is data from, Ecosystem for Multi-modal Brain-behavior Experimentation and Research (EMBER), It contains time series behavioral and neuroscience data from animal and deidentified human subjects a…[Johns Hopkins University Applied Physics Laboratory](https://www.jhuapl.edu)New datasets are added as soon as it is available. Minor updates on existing datasets occur sporadically.neuroscience,neurobiology,neuroimaging,neurophysiology,electrophysiology,machine learning,magnetic resonance imaging,json,hdf5,zarrCreative Commons 4.0 International (CC BY 4.0)
emearthEnsemble Meteorological Dataset for Planet Earth, EM-EarthEM-Earth provides data for precipitation, mean air temperature, air temperature range, and dew-point temperature at 0.1° spatial resolution over global land areas from 1950 to 2019. EM-Earth provides…[Computational Hydrology at the University of Saskatchewan](https://uofs-comphyd.github.io/)N/Aaws-pds,atmosphere,netcdf,near-surface air temperature,precipitation,meteorologicalCreative Commons Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/
emory-breast-imaging-dataset-embedEMory BrEast Imaging Dataset (EMBED)EMBED is a racially diverse mammography dataset containing 3.4M screening and diagnostic images from 110,000 patients collected from 2013-2020, with an equal representation of black and white women. T…Health Innovation and Translational Informatics Lab at Emory University (hitilab.com)New data to be added annuallyaws-pds,health,cancer,mammography,imaging,x-ray,bias,biology,life sciences[Custom License](https://github.com/Emory-HITI/EMBED_Open_Data/blob/main/EMBED_license.md)
encode-projectEncyclopedia of DNA Elements (ENCODE)The Encyclopedia of DNA Elements (ENCODE) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build…ENCODE Data Coordinating CenterDailyaws-pds,biology,bioinformatics,genetic,genomic,life sciencesExternal data users may freely download, analyze, and publish results based on any ENCODE data without restrictions.
enhance-pet-1-6kENHANCE.PET 1.6k - Whole-/Total-Body [18F]FDG-PET/CT with CT-Derived SegmentationsOpen, multi-center dataset of 1,597 whole-/total-body FDG-PET/CT studies with 130 CT-derived, expert-verified anatomical segmentations per scan (~250 GB). Provided as anonymized NIfTI (PET, CT, labels…ENHANCE.PET initiative (LMU Klinikum & partners)Ad hoc (new releases aligned with additional cohort availability)medical imaging,segmentation,nifti,cancer,radiology,life sciencesDataset licensing per originating site: - AutoPET Challenge: CC BY-NC 4.0 (non-commercial use) - University Hospital Leipzig: CC BY 4.0 - Azienda Ospedaliero Universitaria Careggi: CC BY 4.0 Software…
eot-web-archiveEnd of Term Web Archive DatasetThe End of Term Web Archive (EOT) captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008,…[End of Term Web Archive](https://eotarchive.org)Every four years after a US Presidential Electionaws-pds,natural language processing,internet,web archive,archivesThere are no restrictions on the use, access, and/or download of data from the End of Term Web Archive Dataset. We request that you cite the End of Term Web Archive project when using the data provid…
epa-2022-modeling-platformOSAP 2022 Modeling Platform The data are part of the 2022 Modeling Platform used to support regulatory actions and technical analyses conducted by the EPA's Office of State Air Partnerships (OSAP). Specifically, this data inclu…U.S. Environmental Protection Agency (https://www.epa.gov)As neededaws-pds,air quality,regulatory,weather,meteorological,environmentalThese datasets are products of the U.S. Government and are intended for public access and use. Unless otherwise specified, all data produced by the U.S EPA is, by default, in the public domain and are…
epa-ccte-httrU.S. Environmental Protection Agency (EPA) Center for Computational Toxicology and Exposure High Throughput Transcriptomics DataHigh-throughput transcriptomics (HTTr) data generated by US EPA Office of Research and Development, Center for Computational Toxicology and Exposure (CCTE), Biomolecular and Computational Toxicology D…U.S. Environmental Protection Agency (https://www.epa.gov)Quarterlyaws-pds,transcriptomics,gene expression,fastq,bioinformaticsThese datasets are products of the U.S. Government and are intended for public access and use. Unless otherwise specified, all data produced by the U.S EPA is, by default, in the public domain and are…
epa-edde-v1EPA Dynamically Downscaled Ensemble (EDDE) Version 1The data are a subset of the EPA Dynamically Downscaled Ensemble (EDDE), Version 1. EDDE is a collection of physics-based modeled data that represent 3D atmospheric conditions for historical and futu…U.S. Environmental Protection Agency (https://www.epa.gov)Quarterlyaws-pds,weather,climate,climate model,climate projections,CMIP5,CMIP6,us,atmosphere,environmentalThese datasets are products of the U.S. Government and are intended for public access and use. Unless otherwise specified, all data produced by the U.S EPA is, by default, in the public domain and are…
epa-edde-v2EPA Dynamically Downscaled Ensemble (EDDE) Version 2The data are a subset of the EPA Dynamically Downscaled Ensemble (EDDE), Version 2. EDDE is a collection of physics-based modeled data that represent 3D atmospheric conditions for historical and futu…U.S. Environmental Protection Agency (https://www.epa.gov)Quarterlyaws-pds,weather,climate,climate model,climate projections,CMIP5,CMIP6,us,atmosphere,environmentalThese datasets are products of the U.S. Government and are intended for public access and use. Unless otherwise specified, all data produced by the U.S EPA is, by default, in the public domain and are…
epa-equates-v1Community Multiscale Air Quality (CMAQ) 2019 3D Gridded and Column data from the EPA's Air Quality Time Series (EQUATES) ProjectThe data are part of EPA’s Air Quality Time Series (EQUATES) Project. The data consist of hourly gridded pollutant concentrations estimates by the Community Multiscale Air Quality (CMAQ) model versio…U.S. Environmental Protection Agency (https://www.epa.gov)Annualaws-pds,air quality,atmosphere,modelThese datasets are products of the U.S. Government and are intended for public access and use. Unless otherwise specified, all data produced by the U.S EPA is, by default, in the public domain and are…
epa-hourly-prognostic-meteorologyEPA Hourly Prognostic Meteorological Data The data are hourly outputs from the Weather Research and Forecasting (WRF) model generated by the EPA's Office of State Air Partnerships (OSAP), Air Quality Assessment Division, Air Quality Modeling…U.S. Environmental Protection Agency (https://www.epa.gov)Annuallyaws-pds,air quality,regulatory,weather,meteorological,environmentalThese datasets are products of the U.S. Government and are intended for public access and use. Unless otherwise specified, all data produced by the U.S EPA is, by default, in the public domain and are…
epa-rsei-pdsEPA Risk-Screening Environmental IndicatorsDetailed air model results from EPA’s Risk-Screening Environmental Indicators (RSEI) model.U.S. Environmental Protection Agency (https://www.epa.gov)Updated infrequentlyaws-pds,environmentalUS Government work
epilepsy-scienceEpilepsy.ScienceEpilepsy.Science comprise a set of datasets focused on Epilepsy Research that span both Clinical Data and Pre-clinical data. Datasets are contributed by the Epilepsy Research community and published…[Pennsieve Labs](https://discover.pennsieve.io/)Continuously as new data is publishedaws-pds,life sciences,bioinformatics,neuroscience,medicine,electrophysiologyCreative Commons BY 4.0
epoch-of-reionizationEpoch of Reionization DatasetThe data are from observations with the Murchison Widefield Array (MWA) which is a Square Kilometer Array (SKA) precursor in Western Australia. This particular dataset is from the Epoch of Reionizati…University of washington Radio Cosmology GroupIrregularlyastronomy,aws-pdsBSD 2-Clause
era5-for-wrfERA5-for-WRF Open Data on AWSERA5 reanalysis data on AWS, preprocessed for use with the Weather Research and Forecasting (WRF) model.[Veer Renewables](http://www.veer.eco/)Monthly.aws-pds,weather,sustainability,atmosphere,electricity,meteorological,modelCC BY-SA 4.0
esa-worldcover-vito-compositesESA WorldCover Sentinel-1 and Sentinel-2 10m Annual CompositesThe WorldCover 10m Annual Composites were produced, as part of the European Space Agency (ESA) WorldCover project, from the yearly Copernicus Sentinel-1 and Sentinel-2 archives for both years 2020 and…[VITO](https://vito.be)Not updated.aws-pds,earth observation,agriculture,satellite imagery,geospatial,natural resource,sustainability,cog,disaster response,mappingCC-BY 4.0
esa-worldcover-vitoESA WorldCoverThe European Space Agency (ESA) WorldCover product provides global land cover maps for 2020 & 2021 at 10 m resolution based on Copernicus Sentinel-1 and Sentinel-2 data. The WorldCover product comes w…[VITO](https://vito.be)Yearly.aws-pds,earth observation,agriculture,satellite imagery,geospatial,natural resource,sustainability,cog,disaster response,mappingCC-BY 4.0
esa-worldcoverESA WorldCoverThe European Space Agency (ESA) WorldCover is a global land cover map with 11 different land cover classes produced at 10m resolution based on combination of both Sentinel-1 and Sentinel-2 data. In ar…[Sinergise on behalf of VITO](https://vito.be)Not yet defined.aws-pds,earth observation,agriculture,satellite imagery,geospatial,natural resource,cog,disaster response,mapping,synthetic aperture radarCC-BY 4.0, for Citation see [Citation section](https://esa-worldcover.org/en/data-access) / [DOI](https://doi.org/10.5281/zenodo.5571936) true
euclid-q1Euclid Quick Release 1 (Q1)Euclid launched in July 2023 as a European Space Agency (ESA) mission with involvement by NASA. The primary science goals of Euclid are to better understand the composition and evolution of the dark U…NASA/IPAC Infrared Science Archive ([IRSA](https://irsa.ipac.caltech.edu)) at CaltechEuclid Q1 has been finalized and will not be updated. The data may be presented in new ways as the products become available.astronomy,imaging,object detection,satellite imagery,surveyhttps://irsa.ipac.caltech.edu/data_use_terms.html
euro-cordexEURO-CORDEX - European component of the Coordinated Regional Downscaling ExperimentThe EURO-CORDEX dataset contains regional climate model data for Europe, for use in impacts, decision-making, and climate science. Currently, the bucket contains monthly datasets of 2m air temperatur…[Helmholtz Centre Hereon / GERICS](https://www.climate-service-center.de)We will add more datasets on demand.aws-pds,climate,model,climate model,atmosphere,geospatial,zarrhttps://is-enes-data.github.io/cordex_terms_of_use.pdf
exceptional-respondersExceptional Responders InitiativeThe Exceptional Responders Initiative is a pilot study to investigate the underlying molecular factors driving exceptional treatment responses of cancer patients to drug therapies. Study researchers w…“[Center for Translational Data Science at The University of Chicago] (https://ctds.uchicago.edu/)”Genomic Data Commons is the source of truth for this dataset and it offers monthly releases, although this particular dataset may not be updated at every releaseaws-pds,STRIDES,cancer,life sciences,genomic,whole genome sequencing,whole exome sequencing,transcriptomics,epigenomicsNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
fashionlocaltripletsFashionLocalTripletsFine-grained localized visual similarity and search for fashion.[Amazon](https://www.amazon.com/)Not updatedamazon.science,computer vision,machine learningCreative Commons Non-Commercial 4.0
fast-ai-cocoCOCO - Common Objects in Context - fast.ai datasetsCOCO is a large-scale object detection, segmentation, and captioning dataset. This is part of the fast.ai datasets collection hosted by AWS for convenience of fast.ai students. If you use this dataset…[fast.ai](http://www.fast.ai/)As requiredaws-pds,deep learning,computer vision,machine learningCreative Commons http://cocodataset.org/#termsofuse
fast-ai-imageclasImage classification - fast.ai datasetsSome of the most important datasets for image classification research, including CIFAR 10 and 100, Caltech 101, MNIST, Food-101, Oxford-102-Flowers, Oxford-IIIT-Pets, and Stanford-Cars. This is part…[fast.ai](http://www.fast.ai/)As requiredaws-pds,deep learning,computer vision,machine learningVaries by dataset - see documentation link
fast-ai-imagelocalImage localization - fast.ai datasetsSome of the most important datasets for image localization research, including Camvid and PASCAL VOC (2007 and 2012). This is part of the fast.ai datasets collection hosted by AWS for convenience of…[fast.ai](http://www.fast.ai/)As requiredaws-pds,deep learning,computer vision,machine learningVaries by dataset - see documentation link
fast-ai-nlpNLP - fast.ai datasetsSome of the most important datasets for NLP, with a focus on classification, including IMDb, AG-News, Amazon Reviews (polarity and full), Yelp Reviews (polarity and full), Dbpedia, Sogou News (Pinyin)…[fast.ai](http://www.fast.ai/)As requiredaws-pds,deep learning,natural language processing,machine learningVaries by dataset - see documentation link
fcp-indiInternational Neuroimaging Data-Sharing Initiative (INDI)This bucket contains multiple neuroimaging datasets that are part of the International Neuroimaging Data-Sharing Initiative. Raw human and non-human primate neuroimaging data include 1) Structural MRI…[Child Mind Institute](https://childmind.org/our-research/)Each dataset within INDI has its own release schedule. See release date and frequency for [each dataset](http://fcon_1000.projects.nitrc.org/indi/IndiPro.html)aws-pds,life sciences,imaging,neuroimaging,neuroscience,magnetic resonance imaging,Homo sapiensODC-By v1.0 for imaging data and BSD 3-Clause for CPAC, unless otherwise specified
flabFLAb: Fitness Landscapes for AntibodiesFLAb is the largest publicly available therapeutic antibody dataset designed to train and benchmark protein AI models. It provides open-access, high-quality developability data on diverse therapeutic…[Jeffrey Gray Lab, Johns Hopkins University](https://graylab.jhu.edu/)Any new public release of antibody developabilty data is deposited into FLAbprotein,protein template,machine learning,life sciences,aws-pdshttps://creativecommons.org/licenses/by/4.0/
fm-adFoundation Medicine Adult Cancer Clinical Dataset (FM-AD)The Foundation Medicine Adult Cancer Clinical Dataset (FM-AD) is a study conducted by Foundation Medicine Inc (FMI). Genomic profiling data for approximately 18,000 adult patients with a diverse array…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is the source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,cancer,genomic,life sciencesNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
fmi-radarFinnish Meteorological Institute Weather Radar DataThe up-to-date weather radar from the FMI radar network is available as Open Data. The data contain both single radar data along with composites over Finland in GeoTIFF and HDF5-formats. Available com…[Finnish Meteorological Institute](https://www.ilmatieteenlaitos.fi/)5 minutesaws-pds,agriculture,earth observation,weather,meteorologicalCreative Commons Attribution 4.0 International (CC BY 4.0)
foldingathome-covid19Foldingathome COVID-19 Datasets[Folding@home](http://foldingathome.org) is a massively distributed computing project that uses biomolecular simulations to investigate the [molecular origins of disease](https://foldingathome.org/dis…Folding@homeDatasets will be updated periodically as additional simulations are completed.alchemical free energy calculations,aws-pds,biomolecular modeling,coronavirus,COVID-19,foldingathome,health,life sciences,molecular dynamics,protein[CC0](https://creativecommons.org/share-your-work/public-domain/cc0/)
fomo-norlabFoMo - A Multi-Season Dataset for Robot Navigation in Forêt MontmorencyThe FoMo dataset is a multi-season collection recorded in a boreal forest environment, featuring deep snow, off-road terrain, steep slopes, and highly variable weather. It provides synchronized multi-…[Norlab, Université Laval](https://norlab.ulaval.ca)The dataset is considered complete and stable. Minor updates or corrections may occur, but they are expected to be infrequent.aws-pds,robotics,autonomous vehicles,localization,mapping,perception,benchmark,lidar,radar,IMUCreative Commons Attribution 4.0 International (CC BY 4.0). See https://creativecommons.org/licenses/by/4.0/
ford-multi-av-seasonalFord Multi-AV Seasonal DatasetThis research presents a challenging multi-agent seasonal dataset collected by a fleet of Ford autonomous vehicles at different days and times during 2017-18. The vehicles The vehicles were manually d…[Ford Motor Company](https://avdata.ford.com)New data will be added until the entire dataset is released online.autonomous vehicles,computer vision,lidar,mapping,robotics,transportation,urban,weather,aws-pdsThis data is intended for non-commercial academic use only. It is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
frag-strucRNA structure by fragmentation frequencyThe fragSTRUC project devises a software to extract RNA secondary structure information from Illumina datasets, based on divalent ions in standard RNA-seq library preparation fragmenting sequences at…The Genome Institute of Singapore (https://www.a-star.edu.sg/gis) and UMass Chan Medical School's RNA Therapeutics Institute (https://www.umassmed.edu/rti/)Datasets will be updated periodically as additional data is generated.genomic,transcriptomics,life sciences,bioinformatics,aws-pds[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
fvcom_gom3UMASSD-FVCOM-GOM3-HindcastThe Finite Volume Community Ocean Model (FVCOM) was used to simulate ocean water levels, velocity, temperature and salinity over a multi-decadal period (1984-present) in the waters of the Northeast US…Open Science Computing, LLCNoneaws-pds,oceansCC0
gadalGrid Algorithms and Data Analytics Library (GADAL)The aim of this project is to create an easy-to-use platform where various types of analytics can be performed on a wide range of electrical grid datasets. The aim is to establish an open-source libra…[National Renewable Energy Laboratory](https://www.nrel.gov/)As neededaws-pds,energy,environmental,sustainability,modelCreative Commons Attribution 3.0 United States License
gatk-sv-dataGATK Structural Variation (SV) DataThis dataset holds the data needed to run a [structural variation discovery pipeline](https://github.com/broadinstitute/gatk-sv) for Illumina short-read whole-genome sequencing (WGS) data in AWS. [Loka Inc.](https://loka.com/)Every 3 monthsaws-pds,biology,bioinformatics,genetic,genomic,life sciences,structural variation,gatk-sv,cromwellhttps://github.com/LokaHQ/aws-open-data/blob/main/LICENSE
gatk-test-dataGATK Test DataThe GATK test data resource bundle is a collection of files for resequencing human genomic data with the Broad Institute's [Genome Analysis Toolkit (GATK)](https://software.broadinstitute.org/gatk/). Broad InstituteEvery 3 monthsaws-pds,biology,bioinformatics,cancer,genetic,genomic,life sciencesCC0 1.0 Universal (CC0 1.0) Public Domain Dedication
gbifGlobal Biodiversity Information Facility (GBIF) Species OccurrencesThe Global Biodiversity Information Facility (GBIF) is an international network and data infrastructure funded by the world's governments providing global data that document the occurrence of species.…The Global Biodiversity Information Facility ([GBIF](https://www.gbif.org))Snapshots of GBIF are taken on a monthly basisaws-pds,earth observation,biodiversity,bioinformatics,conservation,life sciences[CC-BY-NC](https://creativecommons.org/licenses/by-nc/4.0/) under the GBIF [terms of use](https://www.gbif.org/terms).
gcbo-datasetGlobal Carbon Budget DataThe Global Carbon Budget (GCB) is recognised globally as the most comprehensive report on global carbon emissions and sinks. This dataset, updated every year, includes estimates of land and ocean carb…Global Carbon Budget Office at the University of Exeter, UKAnnualclimate,land,oceansOpen Data. There are no restrictions on the use of this data.
gdeltGlobal Database of Events, Language and Tone (GDELT)This project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, counts, themes, so…UnmanagedNot currently being updatedaws-pds,events,disaster responsehttp://www.gdeltproject.org/about.html#termsofuse
gdr-data-lakeDepartment of Energy’s Geothermal Data Repository (GDR) Data LakeData released from projects funded by the Department of Energy's Geothermal Technologies Office (DOE GTO) that are too large or complex to be conveniently accessed by traditional means. The GDR data…[National Laboratory of the Rockies](https://www.nrel.gov/)As neededaws-pds,energy,geothermalCreative Commons Attribution 4.0 United States License
genomearkGenome ArkThe Genome Ark hosts genomic information for the Vertebrate Genomes Project (VGP) and other related projects. The VGP is an international collaboration that aims to generate complete and near error-fr…The Genome10K community of scientistsData will be continually updated as it is generated.aws-pds,biodiversity,bioinformatics,biology,conservation,genetic,genomic,life sciences[G10K Data Use Policy](https://genome10k.soe.ucsc.edu/data-use-policies/)
genomekitGenomeKit genomic dataGenomeKit is Deep Genomics’ Python library for fast and easy access to genomic resources such as sequence, data tracks, and annotations. The goal is to let machine learning researchers build data sets…Deep GenomicsData is updated when popular new genome versions (assemblies or annotations) are releasedgenome,genomic,Homo sapiens,Mus musculus,non-human primate,Rattus norvegicus,variant annotation,bioinformatics,life sciences,open source softwareApache License Version 2.0 https://www.apache.org/licenses/LICENSE-2.0
geo_tide_geojsonsGeoJSON Files for Geo-TIDEGeoJSON files for the MIT Climate & Sustainability Consortium's Geospatial Trucking Industry Decarbonization ExplorerMIT Climate & Sustainability ConsortiumQuarterlyaws-pds,electricity,energy,environmental,geospatial,supply chain,sustainability,transportationCreative Commons Attribution 4.0 International
geoglows-v2GEOGLOWS Hydrological Model Version 2GEOGLOWS is the Group on Earth Observation's Global Water Sustainability Program. It coordinates efforts from public and private entities to make application ready river data more accessible and sust…Riley HalesMonthlyaws-pds,hydrology,hydrologic model,simulations,zarr,hydrography,geopackage[Creative Commons BY 4 (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
geonetGeoNet Aotearoa New Zealand DataGeoNet provides geological hazard information for Aotearoa New Zealand. This dataset contains data and products recorded by the GeoNet sensor network. <br /> <br /> GNSS (Global Navigation Satellite S…[GeoNet](https://www.geonet.org.nz/)Daily for majority of datasets.aws-pds,broadband,coastal,Continuously Operating Reference Station (CORS),earthquakes,geophysics,geoscience,geoscience,GNSS,GPSGeoNet data are made available free of charge under '[CC3 licence](https://creativecommons.org/licenses/by/3.0/nz/)' to facilitate research and risk assessment. Please acknowledge the GeoNet programme…
geoschem-input-dataGEOS-Chem Input DataInput data for the GEOS-Chem Chemical Transport Model, includes NASA/GMAO MERRA-2 and GEOS-FP [meteorological products](https://geos-chem.readthedocs.io/en/latest/gcclassic-user-guide/input-overview.h…[GEOS-Chem Support Team](https://geoschem.github.io/support-team.html)New meteorological and emission data will be added when available.aws-pds,climate,weather,meteorological,environmental,air quality,chemistry,atmosphere,modelhttps://geoschem.github.io/license.html
geoschem-nested-input-dataGEOS-Chem Nested Input DataInput data for nested-grid simulations using the GEOS-Chem Chemical Transport Model. This includes the NASA/GMAO MERRA-2 and GEOS-FP [meteorological products](https://geos-chem.readthedocs.io/en/lates…[GEOS-Chem Support Team](https://geoschem.github.io/support-team.html)New meteorological and emission data will be added when available.aws-pds,climate,weather,meteorological,environmental,air quality,chemistry,atmosphere,modelhttp://geos-chem.org/license
giabGenome in a Bottle on AWSSeveral reference genomes to enable translation of whole human genome sequencing to clinical practice. On 11/12/2020 these data were updated to reflect the most [up to date GIAB release](https://www.n…[National Institute of Standards and Technology](https://www.nist.gov/)New data are added as soon as they are available.aws-pds,genetic,genomic,life sciences,reference index,vcfThere are no restrictions on the use of this data. More information on citation is available [here](https://www.nist.gov/programs-projects/genome-bottle).
glad-landsat-ardGLAD Landsat ARDThe Landsat Analysis Ready Data (ARD) created by the Global Land Analysis and Discovery Lab (GLAD) at the University of Maryland serves as a spatially and temporally consistent input for land cover ma…[Global Land Analysis and Discovery Lab](https://glad.umd.edu/)New 16-day composites are added monthly.agriculture,earth observation,satellite imagery,geospatial,natural resource,cogThe GLAD ARD imposes no restrictions on subsequent redistribution or use, provided proper citation is given following the Creative Commons Attribution License (CC BY).
glo-30-handGlobal 30m Height Above Nearest Drainage (HAND)Height Above Nearest Drainage (HAND) is a terrain model that normalizes topography to the relative heights along the drainage network and is used to describe the relative soil gravitational potentials…[The Alaska Satellite Facility (ASF)](https://asf.alaska.edu/)None, except HAND may be updated if the[ Copernicus GLO-30 Public](https://registry.opendata.aws/copernicus-dem/) dataset is updated. aws-pds,elevation,hydrology,agriculture,disaster response,satellite imagery,geospatial,cog,stacCopyright 2022 Alaska Satellite Facility (ASF). Produced using the Copernicus WorldDEM™-30 © DLR e.V. 2010-2014 and © Airbus Defence and Space GmbH 2014-2018 provided under COPERNICUS by the European…
global-drought-flood-catalogueA Global Drought and Flood Catalogue from 1950 to 2016Hydrological extremes, in the form of droughts and floods, have impacts on a wide range of sectors including water availability, food security, and energy production, among others. Given continuing la…[PREP-NexT Lab](https://github.com/PREP-NexT)No future updates planned.aws-pds,floods,global,netcdf,precipitation,near-surface specific humidity,near-surface air temperatureGDFC archive is made available under [CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/).
gmsdataThe Genome Modeling SystemThe Genome Institute at Washington University has developed a high-throughput, fault-tolerant analysis information management system called the Genome Modeling System (GMS), capable of executing compl…Genome Institute at the Washington University School of Medicine in St. LouisNot updatedaws-pds,genetic,genomic,life sciences[GNU Lesser General Public License v3.0](https://github.com/genome/gms/blob/ubuntu-12.04/LICENSE)
gnomad-data-lakehouse-readyGenome Aggregation Database (gnomAD) - Data Lakehouse ReadyThe [Genome Aggregation Database (gnomAD)](https://gnomad.broadinstitute.org/) is a resource developed by an international coalition of investigators that aggregates and harmonizes both exome and geno…[Amazon Web Services](https://aws.amazon.com/)Not updatedbiology,bioinformatics,biotech blueprint,genomic,genetic,life sciences,parquet,population genetics,vcf,whole genome sequencing[MIT](https://github.com/broadinstitute/gnomad_methods/blob/master/LICENSE); [terms of use](https://gnomad.broadinstitute.org/terms)true
gnss-ro-opendataEarth Radio OccultationThis is an updating archive of radio occultation (RO) data using the transmitters of the Global Navigation Satellite Systems (GNSS) as generated and processed by the COSMIC DAAC (ucar), the Jet Propul…Verisk Atmospheric and Environmental Research, Inc.The dataset is updated monthly for UCAR and ROM SAF contributions only. The update frequency for the JPL contribution is to be determined.aws-pds,atmosphere,climate,earth observation,global,signal processing,weatherA Creative Commons open-use licence (https://www.ucar.edu/terms-of-use/data) applies to the data contributed by UCAR; the EUMETSAT Data Policy (https://www.eumetsat.int/eumetsat-data-licensing) applie…
google-brain-genomics-publicGoogle Brain Genomics Sequencing Dataset for Benchmarking and DevelopmentTo facilitate benchmarking and development, the Google Brain group has sequenced 9 human samples covering the Genome in a Bottle truth sets on different sequencing instruments, sequencing modalities (…Amazon Web ServicesOccasionally as new derived files (alignment files or variant call files) are generatedamazon.science,bioinformatics,life sciences,genetic,genomic,fastq,short read sequencing,long read sequencing,whole exome sequencing,whole genome sequencing[CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/)
google-ngramsGoogle Books NgramsN-grams are fixed size tuples of items. In this case the items are words extracted from the Google Books corpus. The n specifies the number of elements in the tuple, so a 5-gram contains five words or…Not managedNot updatedamazon.science,natural language processingCreative Commons Attribution 3.0 Unported License
graf-reforecastGRAF ReforecastA zarr-formatted dataset of 1836 reforecast cases (approx. 5 years) from The Weather Company GRAF (Global high-Resolution Atmospheric Forecasting) model, a version of the National Center for Atmospher…[The Weather Company](https://www.weathercompany.com/)One time push onlyatmosphere,forecast,geoscience,geospatial,model,near-surface air temperature,near-surface relative humidity,precipitation,wind speeds,cloud amount[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
green_etIWMI DIWASA Green ET for AfricaGreen evapotranspiration (Green ET) is the portion of ET derived from green water, which includes soil moisture and rainfall used by vegetation. It represents a key component of green water fluxes in…[IWMI](https://www.iwmi.org/)Nonesoil moisture,rainfed cropland,interception loss,evapotranspiration,waterCreative commons open license
gretel-synthetic-safety-alignment-en-v1Gretel Synthetic Safety Alignment DatasetA comprehensive dataset designed for aligning language models with safety and ethical guidelines. Contains 8,361 curated triplets of prompts, responses, and safe responses across various risk categori…[Gretel.ai](https://gretel.ai)Static dataset, version 1.0 (Released December 2024)machine learning,natural language processing,ai safety,synthetic dataApache License 2.0 (https://www.apache.org/licenses/LICENSE-2.0)
grillo-openeewOpenEEWGrillo has developed an IoT-based earthquake early-warning system, with sensors currently deployed in Mexico, Chile, Puerto Rico and Costa Rica, and is now opening its entire archive of unprocessed ac…[Grillo](https://grillo.io/)Approximately every 5 minutesdisaster response,earth observation,earthquakes,aws-pdshttps://github.com/openeew/openeew#license
gtgseqGarvan Institute Long Read Sequencing Benchmark DataThe dataset contains reference samples that will be useful for benchmarking and comparing bioinformatics tools for genome analysis. Examples include: NA12878 (HG001) and NA24385 (HG002) sequenced on a…Genomic Technologies Group, Garvan Institute of Medical Research (https://www.garvan.org.au/research/labs-groups/genomic-technologies-lab)We plan to extend this open dataset with additional samples, including sequencing runs from vendors other than ONT. We will continue to provide updated basecalls when there is a major update to the ba…genomic,life sciences,long read sequencing,bioinformatics,aws-pds[CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/)
gulfwide-avian-monitoringGulfwide Avian Colony Monitoring Survey PhotosFor this project, The Water Institute (the Institute) and subcontractor Colibri Ecological Consulting, LLC (Colibri) utilized established methods and protocols capable of assessing changes of colonial…[CPRA](https://coastal.la.gov/) and [The Water Institute](https://thewaterinstitute.org/)~2 yearsbiology,conservation,ecosystems,object detection,labeled,environmental,aws-pdsCreative Commons BY-SA
guys-breast-cancer-lymph-nodesGuy's Breast Cancer Lymph Nodes (GRAPE)This is a retrospective dataset of 1523 H&E-stained whole slide images (WSI) of lymph nodes from breast cancer patients. The cohort consisted of 177 patients (122 LN-positive - metastasis was reported…http://www.cancerbioinformatics.co.ukNew data will be added as soon as it's availableaws-pds,biology,cancer,life sciences,breast cancer,computational pathology,histopathologyThis work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
hail-vep-pipelineVariant Effect Predictor (VEP) and the Loss-Of-Function Transcript Effect Estimator (LOFTEE) PluginVEP determines the effect of genetic variants (SNPs, insertions, deletions, CNVs or structural variants) on genes, transcripts, and protein sequence, as well as regulatory regions. The European Bioinf…[Tennex](https://www.tennex.io/)New packages are added as soon as they are available and confirmed to work with recent versions of Hail.aws-pds,genome wide association study,genomic,life sciences,vep,loftee[VEP](https://uswest.ensembl.org/info/about/publications.html) use is governed by the Apache 2.0 licenses, and [LOFTEE](https://github.com/konradjk/loftee/blob/master/LICENSE) use is governed by the M…
hcmi-cmdcHuman Cancer Models Initiative (HCMI) Cancer Model Development CenterThe Human Cancer Models Initiative (HCMI) is an international consortium that is generating novel, next-generation, tumor-derived culture models annotated with genomic and clinical data. HCMI-develope…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,cancer,genomic,life sciences,whole genome sequencing,STRIDESNIH Genomic Data Sharing Policy https://gdc.cancer.gov/access-data/data-access-policies
hcp-openaccessThe Human Connectome ProjectThe Human Connectome Project (HCP Young Adult, HCP-YA) is mapping the healthy human connectome by collecting and freely distributing neuroimaging and behavioral data on 1,200 normal young adults, aged…[Connectome Coordination Facility](https://www.humanconnectome.org/ccf-staff)Uncertainaws-pds,biology,imaging,neurobiology,neuroimaging,neuroscience,life sciences[HCP Data Use Agreement](https://www.humanconnectome.org/storage/app/media/data_use_terms/DataUseTerms-HCP-Open-Access-26Apr2013.pdf)
hecatombHecatomb DatabasesPreprocessed databases for use with the Hecatomb pipeline for viral and phage sequence annotation.[Washington University in St. Louis](https://wustl.edu/)Every 6 to 12 monthsaws-pds,life sciences,genetic,genomic,metagenomics,bioinformatics,whole genome sequencing,virus[MIT](https://opensource.org/licenses/MIT)
helpful-sentences-from-reviewsHelpful Sentences from ReviewsA collection of sentences extracted from customer reviews labeled with their helpfulness score.[Amazon](https://www.amazon.com/)Not updatedamazon.science,information retrieval,natural language processing,text analysis,jsonThis data is available for anyone to use under the terms of the CDLA-Sharing license, which is available [here](https://cdla.dev/sharing-1-0/)
hirlamHIRLAM Weather ModelHIRLAM (High Resolution Limited Area Model) is an operational synoptic and mesoscale weather prediction model managed by the Finnish Meteorological Institute.[Finnish Meteorological Institute](https://www.ilmatieteenlaitos.fi/)The data is updated four times a day with analysis hours 00, 06, 12 and 18. Corresponding model runs are available roughly five hours after analysis time (~ after model run has started).aws-pds,agriculture,earth observation,climate,weather,meteorologicalCreative Commons Attribution 4.0 International (CC BY 4.0)
hpgp-dataHuman PanGenomics ProjectThis dataset includes sequencing data, assemblies, and analyses for the offspring of ten parent-offspring trios.(Human Pangenome Reference Consortium)[https://humanpangenome.org/]Data will be added and updated as technologies improve or new data uses are encountered.aws-pds,genomic,genetic,life sciences,fastq,fast5,cram[Creative Commons CC0 1.0 Universal](https://creativecommons.org/publicdomain/zero/1.0/deed.en)
hprc-epigenomeEpigenomes of the Human Pangenome Reference Consortium (HPRC) Release 2The Human Pangenome Reference Consortium (HPRC) Release 2 represents a landmark achievement in genomics, providing high-quality phased genome assemblies from over 200 individuals with comprehensive fu…Ting Wang Lab (https://wang.wustl.edu/)Annual. The repository will be updated with each new batch of data as it is generated and released under the next HPRC yearly cycle.biology,bioinformatics,genetic,genomic,epigenomics,life sciences,aws-pdsExternal data users may freely download, analyze, and publish results based on any HPRC data provided here without restrictions.
hsip-lidar-us-citiesHomeland Security and Infrastructure US CitiesThe U.S. Cities elevation data collection program supported the US Department of Homeland Security Homeland Security and Infrastructure Program (HSIP). As part of the HSIP Program, there were 133+ U.S…[Hobu, Inc.](https://hobu.co)Periodicallyaws-pds,elevation,disaster response,geospatial,lidarUS Government Public Domain https://www.usgs.gov/faqs/what-are-terms-uselicensing-map-services-and-data-national-map
huj-herbariumNational Herbarium of IsraelOur collection encompasses approximately one million vascular plant specimens from the Mediterranean and Middle East biodiversity hotspot, representing flora from Israel, Jordan, Hermon, Sinai, Egypt,…National Natural History Collections, The Hebrew University of JerusalemMonthlybiology,life sciences,biodiversity,environmental,climate,digital preservation,imaging,image processing,aws-pdsCC-BY-SA 4.0
human-microbiome-projectThe Human Microbiome ProjectThe NIH-funded Human Microbiome Project (HMP) is a collaborative effort of over 300 scientists from more than 80 organizations to comprehensively characterize the microbial communities inhabiting the…[The National Institutes of Health Office of Strategic Coordination - The Common Fund](https://commonfund.nih.gov/hmp)Uncertainaws-pds,life sciences,genetic,genomic,metagenomics,microbiome,fasta,amino acid,fastqThe data is publicly available to the community free of charge.
humancellatlasHuman Cell AtlasThe Human Cell Atlas (HCA) is a collaborative community of international scientists. Our mission is to create comprehensive reference maps of all the cells in the human body as a basis for both unders…UC Santa Cruz Genomics Institute, University of California, Santa Cruz, UCSCMonthlylife sciences,biology,cell biology,genome,genomic,transcriptomics,gene expression,single-cell transcriptomics,cell imaging,Homo sapienshttps://data.humancellatlas.org/about/data-use-agreement
humor-detectionHumor Detection from Product Question Answering SystemsThis dataset provides labeled humor detection from product question answering systems. The dataset contains 3 csv files: [Humorous.csv](https://humor-detection-pds.s3-us-west-2.amazonaws.com/Humorous.…[Amazon](https://www.amazon.com/)Not currently being updatedamazon.science,natural language processing,machine learning[CDLA-Sharing](https://cdla.io/sharing-1-0/)
humor-patternsHumor patterns used for querying Alexa trafficHumor patterns used for querying Alexa traffic when creating the taxonomy described in the paper "“Alexa, Do You Want to Build a Snowman?” Characterizing Playful Requests to Conversational Agents" by…[Amazon](https://www.amazon.com/)Not currently updatedamazon.science,dialog,natural language processing,machine learningCDLA-Sharing license
hycom-global-driftersHYCOM-OceanTrack Integrated HYCOM Eulerian Fields and Lagrangian Trajectories DatasetA combined dataset of simulated ocean sea surface height, near-surface velocities, and particle trajectories from a global 1/25th degree HYbrid Coordinate Ocean Model (HYCOM) 1-year run.Shane ElipotNot updatedaws-pds,drifters,Eulerian,HYCOM,Lagrangian,oceans,ocean simulation,ocean circulation,ocean currents,ocean velocityThere are no restrictions on the use of this data.
hycom-gofs-3pt1-reanalysisHYbrid Coordinate Ocean Model Global Ocean Forecast System ReanalysisGlobal Ocean Forecasting System (GOFS) 3.1 output on the GLBv0.08 grid. The resolution is 0.08° resolution between 40°S and 40°N, 0.04° poleward of these latitudes. The temportal frequenct is 3 hourly…[COAPS](https://www.coaps.fsu.edu/)Static Dataset Covering 1994-01-01 to 2015-12-31aws-pds,global,oceansThere are no restrictions on the use of this data.
ibl-autismIBL Neuropixels Brainwide Map on AWSElectrophysiological recordings of mouse brain activity acquired during a decision making task in multiple autism mice models.[International Brain Laboratory](https://www.internationalbrainlab.com)TBDaws-pds,life sciences,neuroscience,neurophysiology,open source software,Mus musculus,autism spectrum disorderCC-BY 4.0
ibl-behaviourIBL Behavioral Data on AWSBehavioral data of mice performing a decision-making task, associated with 2020 publication of the IBL.[International Brain Laboratory](https://www.internationalbrainlab.com)TBDaws-pds,life sciences,neuroscience,neurophysiology,open source software,Mus musculusCC-BY 4.0
ibl-brain-wide-mapIBL Neuropixels Brainwide Map on AWSElectrophysiological recordings of mouse brain activity acquired during a decision making task.[International Brain Laboratory](https://www.internationalbrainlab.com)TBDaws-pds,life sciences,neuroscience,neurophysiology,open source software,Mus musculusCC-BY 4.0
ibl-reproducible-ephysIBL Neuropixels Reproducible Ephys Data on AWSElectrophysiological recordings acquired using Neuropixels probes in different mice and labs, targeting the same brain locations (including posterior parietal cortex, hippocampus, and thalamus).[International Brain Laboratory](https://www.internationalbrainlab.com)TBDaws-pds,life sciences,neuroscience,neurophysiology,open source software,Mus musculusCC-BY 4.0
iceye-opendataICEYE Synthetic Aperture Radar (SAR) Open DatasetICEYE operates the world’s largest constellation of synthetic aperture radar (SAR) satellites, delivering unmatched access to persistent, high-resolution Earth observation data regardless of time of d…[ICEYE](https://www.iceye.com)New data is added frequently.aws-pds,synthetic aperture radar,stac,earth observation,satellite imagery,image processing,geospatial,computer vision,disaster responseThe data is provided under the Creative Commons License [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/), which gives the user the right to share, copy, and redistribute the material in any m…
icgcICGC on AWSThe International Cancer Genome Consortium (ICGC) coordinates projects with the common aim of accelerating research into the causes and control of cancer. The PanCancer Analysis of Whole Genomes (PCAW…[International Cancer Genome Collaboratory](https://dcc.icgc.org/)New data is added as soon as it is available.aws-pds,cancer,genetic,genomic,life sciences,bam,vcfData use is subject to the access and publication polices of the source. Distribution of the data is subject to ICGC Trusted Partner Approval. More information on terms of use is available at https:/…
ichangemycityIChangeMyCity Complaints Data from JanaagrahaThe [IChangeMyCity](https://www.ichangemycity.com) project provides insight into the complaints raised by citizens from diffent cities of India related to the issues in their neighbourhoods and the re…Not currently managedDailyaws-pds,cities,civic,complaintsThere are no restrictions on the use of data received from IChangeMyCity.com. More information on licensing and IChangeMyCity data citation is available from IChangeMyCity.comtrue
ideam-radaresIDEAM - Colombian Radar NetworkHistorical and one-day delay data from the IDEAM radar network.[IDEAM](http://www.ideam.gov.co/)Updated level II data is added as soon as it is available.aws-pds,agriculture,earth observation,natural resource,weather,meteorologicalCreative Commons Attribution 4.0 International (CC BY 4.0)
igvf-consortiumThe Impact of Variation on Function Consortium (IGVF)The IGVF (Impact of Genomic Variation on Function) Consortium aims to understand how genomic variation affects genome function, which in turn impacts phenotype. The NHGRI is funding this collaborativ…IGVF Data Administration and Coordination Center at Stanford UniversityDailyaws-pds,biology,bioinformatics,genetic,genomic,life sciencesCC BY 4.0 - https://creativecommons.org/licenses/by/4.0/ - You are free to share and adapt this data with proper attribution
ihartiHART Whole Genome Sequencing Data SetiHART is the [Hartwell Foundation](http://www.thehartwellfoundation.org/)’s Autism Research and Technology Initiative. This release contains whole genome data from over 1000 families with 2 or more ch…[Stanford University](https://wall-lab.stanford.edu/projects/ihart/)The dataset may be updated with additional or corrected data on a need-to-update basis.aws-pds,autism spectrum disorder,genetic,genomic,life sciences,whole genome sequencing,bam,vcfData use is subject to the access and publication polices of the iHART. More information on terms of use is available at [iHART website](http://www.ihart.org/)
ilmn-dragen-1kgp1000 Genomes Phase 3 Reanalysis with DRAGEN 3.5, 3.7, 4.0, 4.2, and 4.4# Description ## Overivew This dataset contains alignment files and small variant (includes single nucleotide variants (SNV) and indels), copy number variant (CNV), short tandem repeat (*i.e.*, repe…[Illumina, Inc.](https://www.illumina.com/products/by-type/informatics-products/dragen-bio-it-platform.html)Files may be updated subsequent to changes to the 1000 Genomes Project data set or select new DRAGEN features or offerings.aws-pds,bam,bioinformatics,biology,cram,genetic,genomic,genotyping,life sciences,machine learningTBD
in-elevationIndiana Statewide Elevation CatalogThe State of Indiana Geographic Information Office and IOT Office of Technology manage a series of digital LiDAR LAS files stored in AWS, dating back to the 2011-2013 collection and including the NRCS…Indiana Geographic Information OfficeThe State of Indiana has another four-year cycle of collecting orthoimagery and Lidar starting in 2025 and continuing through 2028. The collections are designated by counties in three groups that cov…lidar,aws-pds,earth observation,geospatial,imaging,mapping,natural resource,sustainability,agricultureAccess to Indiana Geographic Information Office Lidar is governed by Creative Commons 0 (CC0): https://creativecommons.org/publicdomain/zero/1.0/legalcode
in-imageryIndiana Statewide Digital Aerial Imagery CatalogThe State of Indiana Geographic Information Office and IOT Office of Technology manage a series of digital orthophotography dating back to 2005. Every year's worth of imagery is available as Cloud Op…Indiana Geographic Information OfficeThe State of Indiana has had a 4-year cycle collecting imagery. The collections are designated by counties in three groups that cover Indiana, South to North. These areas are frequently referred to…aerial imagery,aws-pds,earth observation,geospatial,imaging,mapping,cog,natural resource,sustainability,agricultureAccess to Indiana Geographic Information Office Orthoimagery is governed by Creative Commons 0 (CC0): https://creativecommons.org/publicdomain/zero/1.0/legalcode
inaturalist-open-dataiNaturalist Licensed Observation ImagesiNaturalist is a community science effort in which participants share observations of living organisms that they encounter and document with photographic evidence, location, and date. The community wo…[iNaturalist](https://www.inaturalist.org/) is an independent, tax-exempt, 501(c)(3), not-for-profit organization based in the United States of America (EIN/Tax ID: 92-1296468).Images are posted in real time, and we are currently copying over images from existing observations. Metadata is updated monthly. More information on the metadata can be found in the [documentation](h…aws-pds,earth observation,biodiversity,bioinformatics,conservation,life sciencesCreative Commons or Public Domain (CC0), varying by image. More information on how to query and properly treat licenses can be found in the <a href="https://github.com/inaturalist/inaturalist-open-dat…
indian-high-court-judgmentsIndian High Court JudgmentsThis dataset contains judgements from the Indian High Courts, downloaded from ecourts website. It contains judgments of 25 high courts, along with raw metadata (in json format) and structured metadata…[Dattam Labs](https://dattam.in)Quarterlylegal dataCC-BY-4.0
indian-supreme-court-judgmentsIndian Supreme Court JudgmentsThis dataset contains judgements from the Indian Supreme Court, downloaded from ecourts website. It contains judgments from 1950 to 2025, along with raw metadata (in json format) and structured metada…[Dattam Labs](https://dattam.in)Bi-monthlylegal data,aws-pdsCC-BY-4.0
inlab-covid-19-images-datasetInRad COVID-19 X-Ray and CT ScansThis dataset is a collection of anonymized thoracic radiographs (X-Rays) and computed tomography (CT) scans of patients with suspected COVID-19. Images are acommpanied by a positive or negative diagno…[Faculdade de Medicina da Universidade de São Paulo Institute of Radiology (InRad)](http://inrad.hc.fm.usp.br)As Necessaryaws-pds,bioinformatics,coronavirus,COVID-19,health,life sciences,medicine,SARSCreative Commons Attribution 4.0 International (CC BY 4.0)
intelinair_agriculture_visionAgricultureVisionAgriculture-Vision aims to be a publicly available large-scale aerial agricultural image dataset that is high-resolution, multi-band, and with multiple types of patterns annotated by agronomy experts.…Intelinair, Inc.Periodicallyaws-pds,aerial imagery,agriculture,computer vision,deep learning,machine learningProvided in the bucket.
intelinair_corn_kernel_countingCorn Kernel Counting DatasetDataset associated with the March 2021 Frontiers in Robotics and AI paper "Broad Dataset and Methods for Counting and Localization of On-Ear Corn Kernels", DOI: 10.3389/frobt.2021.627009Intelinair, Inc.Periodicallyaws-pds,agriculture,computer vision,machine learningProvided in the bucket.
intelinair_longitudinal_nutrient_deficiencyLongitudinal Nutrient DeficiencyDataset associated with the 2021 AAAI Paper- Detection and Prediction of Nutrient Deficiency Stress using Longitudinal Aerial Imagery. The dataset contains 3 image sequences of aerial imagery from 38…Intelinair, Inc.Periodicallyaws-pds,aerial imagery,agriculture,computer vision,deep learning,machine learningProvided in the bucket.
io-lulc10m Annual Land Use Land Cover (9-class)This dataset, produced by Impact Observatory, Microsoft, and Esri, displays a global map of land use and land cover (LULC) derived from ESA Sentinel-2 imagery at 10 meter resolution for the years 2017…[Impact Observatory](https://www.impactobservatory.com/)A new year is made available annually, each January. A new time series was provided July 2023 to reduce anomalous change.aws-pds,earth observation,environmental,geospatial,satellite imagery,sustainability,stac,cog,land cover,land use[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
irs990IRS 990 Filings**On December 16, 2021 the IRS announced that it would [discontinue updates to the IRS 990 Filings dataset on AWS, starting December 31, 2021](https://www.irs.gov/newsroom/irs-makes-tax-exempt-organiz…The Internal Revenue ServiceNew filings are added regularlyaws-pds,regulatory,statisticsNonetrue
isdasoiliSDAsoiliSDAsoil is a resource containing soil property predictions for the entire African continent, generated using machine learning. Maps for over 20 different soil properties have been created at 2 differ…[Innovative Solutions for Decision Agriculture (iSDA)](https://www.isda-africa.com/)Based upon the availability of new dataagriculture,analytics,aws-pds,biodiversity,conservation,deep learning,food security,geospatial,machine learning,satellite imageryCC-BY 4.0
iservISERVISS SERVIR Environmental Research and Visualization System (ISERV) was a fully-automated prototype camera aboard the International Space Station that was tasked to capture high-resolution Earth imager…[Radiant Earth Foundation](https://www.radiant.earth/)Not updatedaws-pds,geospatial,earth observation,satellite imagery,environmentalThe data is released under a ODC Public Domain Dedication & License 1.0 ([PDDL-1.0](https://spdx.org/licenses/PDDL-1.0.html)).
isic-archiveInternational Skin Imaging Collaboration (ISIC) ArchiveA public-access archive of skin lesion images, supporting teaching, research, and the development and evaluation of diagnostic algorithms.International Skin Imaging Collaboration (ISIC)Upon new data ingest from contributors.biology,cancer,classification,computational pathology,dicom,grand-challenge.org,health,Homo sapiens,imaging,life sciencesCreative Commons licenses (CC-0, CC-BY, or CC-BY-NC) are defined per-image.
its-live-dataInter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE)The Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) project has a singular mission: to accelerate ice sheet and glacier research by producing globally comprehensive, high resol…[The Alaska Satellite Facility (ASF)](https://asf.alaska.edu/)Up to daily, as new satellite imagery is made available.aws-pds,ice,earth observation,satellite imagery,geophysics,geospatial,global,cog,netcdf,zarr[Creative Commons Zero (CC0) 1.0 Universal License](https://creativecommons.org/publicdomain/zero/1.0/)
janelia-cosemCell Organelle Segmentation in Electron Microscopy (COSEM) on AWSHigh resolution images of subcellular structures.[Janelia Research Campus](https://www.janelia.org/)New datasets and derived data are added as soon as they are available.aws-pds,cell biology,computer vision,electron microscopy,life sciences,imaging,organelleCC-BY-4.0
janelia-flylightFly Brain Anatomy: FlyLight Gen1 and Split-GAL4 ImageryThis data set, made available by Janelia's FlyLight project, consists of fluorescence images of Drosophila melanogaster driver lines, aligned to standard templates, and stored in formats suitable fo…[Janelia Research Campus](https://www.janelia.org)As requiredaws-pds,biology,life sciences,imaging,neuroscience,neurobiology,neuroimaging,fluorescence imaging,microscopy,image processing[Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/)
janelia-mouselightMouse Brain Anatomy: MouseLight ImageryThis data set, made available by Janelia's MouseLight project, consists of images and neuron annotations of the Mus musculus brain, stored in formats suitable for viewing and annotation using the Hor…[Janelia Research Campus](https://www.janelia.org)As requiredaws-pds,biology,life sciences,imaging,neuroscience,neurobiology,neuroimaging,fluorescence imaging,microscopy,image processing[Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/)
japan_pointcloudJapan Prefectures, 3D Point Cloud DataThis dataset comprises high-precision 3D point cloud data that covers all prefectures throughout Japan. The data is produced through aerial laser surveys, airborne laser bathymetry, and mobile mapping…[AIGID](https://aigid.jp/)Currently not scheduledaws-pds,disaster response,elevation,geospatial,japanese,land,lidar,mappingCreative Commons Attribution 4.0 International (CC-BY 4.0)
jaxa-alos-palsar2-scansarPALSAR-2 ScanSAR CARD4L (L2.2)The 25 m PALSAR-2 ScanSAR is normalized backscatter data of PALSAR-2 broad area observation mode with observation width of 350 km. The SAR imagery was ortho-rectificatied and slope corrected using t…[JAXA](https://www.jaxa.jp/)Every month after 42 days observedaws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,sustainability,disaster response,synthetic aperture radar,deafricaData is available for free under the [terms of use](https://earth.jaxa.jp/policy/en.html).
jaxa-usgs-nasa-kaguya-tc-dtmsJAXA / USGS / NASA Kaguya/SELENE Terrain Camera Digital Terrain ModelsThe Japan Aerospace EXploration Agency (JAXA) SELenological and ENgineering Explorer (SELENE) mission’s Kaguya spacecraft was launched on September 14, 2007 and science operations around the Moon star…[NASA](https://www.nasa.gov)The Kaguya/SELENE mission has completed. At least one update to this dataset is planned to address identified issues with the nominal swath width evening observations.aws-pds,planetary,elevation,stac,cog[CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/)
jaxa-usgs-nasa-kaguya-tcJAXA / USGS / NASA Kaguya/SELENE Terrain Camera ObservationsThe Japan Aerospace EXploration Agency (JAXA) SELenological and ENgineering Explorer (SELENE) mission’s Kaguya spacecraft was launched on September 14, 2007 and science operations around the Moon star…[NASA](https://www.nasa.gov)The Kaguya/SELENE mission has completed. No updates to this dataset are planned.aws-pds,planetary,satellite imagery,stac,cog[CC0 1.0](https://creativecommons.org/publicdomain/zero/1.0/)
jhu-indexesCloud Indexes for Bowtie, Kraken, HISAT, and CentrifugeGenomic tools use reference databases as indexes to operate quickly and efficiently, analogous to how web search engines use indexes for fast querying. Here, we aggregate genomic, pan-genomic and meta…Langmead Lab at Johns Hopkins University & Kim Lab at University of Texas SouthwesternAs new data becomes available; roughly quarterlyaws-pds,genomic,bioinformatics,biology,whole genome sequencing,medicine,reference index,mapping,life sciencesPublic Domain
kaiju-indexesIndexes for KaijuThis dataset comprises pre-built indexes for the bioinformatics software Kaiju, which is used for taxonomic classification of metagenomic sequencing data. Various indexes for different source referenc…[Peter Menzel](https://github.com/pmenzel)roughly yearlyaws-pds,bioinformatics,biology,genomic,life sciences,whole genome sequencing,reference index,metagenomics,microbiomePublic Domain
kanagawa_pointcloudKanagawa, 3D Point Cloud DataThis dataset comprises high-precision 3D point cloud data that encompasses the entire Kanagawa prefecture in Japan. The data is produced through aerial laser survey, airborne laser bathymetry and mobi…[AIGID](https://aigid.jp/)Currently not scheduledaws-pds,disaster response,elevation,geospatial,japanese,land,lidar,mappingCreative Commons Attribution 4.0 International (CC-BY 4.0)
keplerKepler Mission DataThe Kepler mission observed the brightness of more than 180,000 stars near the Cygnus constellation at a 30 minute cadence for 4 years in order to find transiting exoplanets, study variable stars, and…[Space Telescope Science Institute](http://www.stsci.edu/)Neverastronomy,aws-pdsPublic domain. Attribution required for refereed scientific papers.
kids-firstGabriella Miller Kids First Pediatric Research Program (Kids First)The NIH Common Fund's Gabriella Miller Kids First Pediatric Research Program’s (“Kids First”) vision is to “alleviate suffering from childhood cancer and structural birth defects by fostering collabor…[The Gabriella Miller Kids First Data Resource Center at the Children's Hospital of Philadelphia](https://kidsfirstdrc.org/)Data is updated on a rolling basis by the KFDRC to make data available as rapidly as possible under the NIH Genomic Data Sharing Policy. aws-pds,life sciences,cancer,genetic,genomic,Homo sapiens,pediatric,structural birth defect,whole genome sequencing,STRIDESNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
kittiKITTI Vision Benchmark SuiteDataset and benchmarks for computer vision research in the context of autonomous driving. The dataset has been recorded in and around the city of Karlsruhe, Germany using the mobile platform AnnieWay…[Max Planck Campus Tübingen](http://tue.mpg.de/)Not updatedaws-pds,autonomous vehicles,computer vision,robotics,machine learning,deep learningCreative Commons Attribution-NonCommercial-ShareAlike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/
klarna_productpage_datasetThe Klarna Product-Page DatasetA collection of 51,701 product pages from 8175 e-commerce websites across 8 markets (US, GB, SE, NL, FI, NO, DE, AT) with 5 manually labelled elements, specifically, the product price, name and image,…Web Automation Research, KlarnaThe dataset is not expected to update frequently.aws-pds,internet,natural language processing,computer vision,commerce,deep learning,machine learning,information retrieval,graphCC BY-NC-SAtrue
kraken2-ncbi-refseq-complete-v205Kraken2 NCBI RefSeq Complete V205 database on AWSDatabase for use with Kraken2 (taxonomic annotation of metagenomic sequencing reads) including all NCBI RefSeq genomes available in release V205Robyn WrightThis database is currently what was published in our 2023 paper comparing the performance of MetaPhlAn3 and Kraken2, but an updated database may be added in the future.aws-pds,metagenomics,microbiome,benchmark,bioinformatics,life sciencesThere are no restrictions on the use of this data.
krepprefReference Indexes for kreppkrepp is an alignment-free method for estimating distances and phylogenetic placement of individual reads to many thousands of reference genomes in a scalable manner using k-mers. This dataset include…Mirarab Lab at UC San DiegoQuarterly or as new data becomes availablebioinformatics,metagenomics,microbiome,reference index,aws-pds,life sciencesGPL-3.0 license. Use of the data should be cited in the usual way, following https://github.com/bo1929/krepp/tree/master?tab=readme-ov-file#citation.
kyfromaboveKyFromAbove on AWSThe KyFromAbove initiative is focused on building and maintaining a current basemap for Kentucky that can meet the needs of its users at the state, federal, local, and regional level. A common basemap…[Kentucky Division of Geographic Information](https://kygeonet.ky.gov)KyFromAbove data are typically updated on an annual basis. Each year, a portion of the state is acquired with an overall update cycle of every three to four years. This update cadance is determined by…aerial imagery,aws-pds,cog,dtm,disaster response,earth observation,elevation,geopackage,geospatial,lidarPublic Domain with Attribution
lab41-sri-voicesVoices Obscured in Complex Environmental Settings (VOiCES)VOiCES is a speech corpus recorded in acoustically challenging settings, using distant microphone recording. Speech was recorded in real rooms with various acoustic features (reverb, echo, HVAC system…[In-Q-Tel](https://www.iqt.org/)Data from two additional rooms will be added to the corpus Fall 2018.aws-pds,machine learning,automatic speech recognition,speaker identification,denoising,speech processingCreative Commons BY 4.0 (see [here](htpps://voices.lab41.org) for more details)
ladiLow Altitude Disaster Imagery (LADI) DatasetThe Low Altitude Disaster Imagery (LADI) Dataset consists of human and machine annotated airborne images collected by the Civil Air Patrol in support of various disaster responses from 2015-2023. Two…[MIT Lincoln Laboratory Humanitarian Assistance and Disaster Relief group](https://www.ll.mit.edu/r-d/biotechnology-and-human-systems/humanitarian-assistance-and-disaster-relief-systems)Periodicallyaws-pds,aerial imagery,coastal,computer vision,disaster response,earth observation,earthquakes,geospatial,imaging,image processingCreative Commons Attribution 4.0 International (CC BY 4.0)
landsat-8Landsat 8An ongoing collection of satellite imagery of all land on Earth produced by the Landsat 8 satellite. [Planet](https://www.planet.com/)New Landsat 8 scenes are added regularly as soon as they are available.aws-pds,agriculture,earth observation,satellite imagery,geospatial,natural resource,disaster responseThere are no restrictions on the use of data received from the U.S. Geological Survey's Earth Resources Observation and Science (EROS) Center or NASA's Land Processes Distributed Active Archive Center…true
leiLegal Entity Identifier (LEI) and Legal Entity Reference Data (LE-RD)The Legal Entity Identifier (LEI) is a 20-character, alpha-numeric code based on the ISO 17442 standard developed by the International Organization for Standardization (ISO). It connects to key refere…[GLEIF](http://www.gleif.org/)Three times daily (about every 8 hours)analytics,aws-pds,blockchain,climate,commerce,copyright monitoring,csv,financial markets,governance,government spendingCreative Commons (CC0) license
leopardLEarning biOchemical Prostate cAncer Recurrence from histopathology sliDes challenge (LEOPARD) Dataset"This dataset contains the all data for the [LEarning biOchemical Prostate cAncer Recurrence from histopathology sliDes challenge or LEOPARD](https://leopard.grand-challenge.org/).Prostate cancer, imp…Radboud University Medical CenterAs requiredaws-pds,life sciences,cancer,computational pathology,grand-challenge.org,histopathology,deep learning,computer visionCC BY NC SA
lgnd-clay-v1-5-sentinel2LGND Clay v1.5 Sentinel-2A global dataset of Clay v1.5 embeddings for Sentinel2.[Source Cooperative](https://source.coop/)As new data versions become availableaws-pds,satellite imagery,earth observation,machine learning,imaging,computer visionCreative Commons Attribution 4.0 International License
loc-sanborn-mapsSanborn Maps Data PackageThe dataset contains metadata records for 50,600 maps from the [Sanborn Fire Insurance Maps collection](https://www.loc.gov/collections/sanborn-maps/) and their corresponding 440,048 JPEG images. The…[Library of Congress](https://www.loc.gov/)As new and significant changes to the underlying digital collection occursaws-pds,archives,cities,computer vision,conservation,culture,cultural preservation,demographics,digital assets,geospatialThe content of the Library of Congress online Sanborn Maps Collection is in the public domain and is free to use and reuse. For more information, see https://www.loc.gov/collections/sanborn-maps/about…
lofar-elais-n1LOFAR ELAIS-N1 cycle 2 observations on AWSThese data correspond to the [International LOFAR Telescope](http://www.lofar.org/) observations of the sky field ELAIS-N1 (16:10:01 +54:30:36) during the cycle 2 of observations. There are 11 runs of…Institute for Astronomy, University of EdinburghNot updatedastronomy,aws-pds,survey,imagingThe data are considered "LOFAR data in the public domain" and their use must adhere to the [LOFAR data policy](https://www.astron.nl/radio-observatory/observing-proposals/lofar-data-policy/lofar-data-…
longbenchLongBench - cross-platform reference dataset profiling cancer cell lines with bulk and single-cell approachesLongBench is a comprehensive benchmark dataset of the latest long-read transcriptomics technologies from Oxford Nanopore (ON) and Pacific Biosciences, alongside a comparison with next-generation seque…Richie Lab, Walter and Eliza Hall Institute of Medical ResearchNew data will be added as soon as they are available.benchmark,long read sequencing,single-cell transcriptomics,short read sequencing,bioinformatics,fastq,bam,vcf,cancer,life sciencesCC BY-4.0
lowcontext-ner-gazLow Context Name Entity Recognition (NER) Datasets with GazetteerSee https://lowcontext-ner-gaz.s3.amazonaws.com/readme.html[Amazon](https://www.amazon.com/)N/Aamazon.science,natural language processing[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
ltrf-cqa-datasetLearning to Rank and Filter - community question answeringThis dataset provides product related questions and answers, including answers' quality labels, as as part of the paper 'IR Evaluation and Learning in the Presence of Forbidden Documents'.[Amazon](https://www.amazon.com/)Not currently being updated.amazon.science,natural language processing,machine learning[CDLA-Permissive](https://cdla.dev/permissive-1-0/)
luad-eagleIntegrative Analysis of Lung Adenocarcinoma in Environment and Genetics Lung cancer Etiology (Phase 2)We performed whole genome sequencing and whole exome sequencing of 31 lung adenocarcinoma (LUAD) samples from the Environment And Genetics in Lung cancer Etiology (EAGLE) study. The EAGLE study is mad…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. cancer,whole exome sequencing,whole genome sequencing,aws-pds,life sciences,STRIDES,genomic,epigenomics(NIH Genomic Data Sharing Policy)[https://gdc.cancer.gov/access-data/data-access-policies]
lwi-model-dataLouisiana Watershed Initiative (LWI) Model DataGeographic (land cover, land elevation, etc.), meteorologic (pluvial, wind, etc.), hydrologic (fluvial, tidal, etc.), hydrodynamic (water surface elevations, flow velocities), and built environment (…The Water Instituteyearlyforecast,bathymetry,climate,coastal,disaster response,elevation,floods,geospatial,hydrologic model,hydrologyhttps://creativecommons.org/licenses/by/4.0/ with attribution to Louisiana Watershed Council
m3edMulti-robot, Multi-Sensor, Multi-Environment Event Dataset (M3ED)M3ED is the first multi-sensor event camera (EC) dataset focused on high-speed dynamic motions in robotics applications. M3ED provides high-quality synchronized data from multiple platforms (car, legg…[Daniilidis Group](https://www.grasp.upenn.edu/people/kostas-daniilidis/), [KumarRobotics](https://www.kumarrobotics.org/)The dataset will be uploaded sporadically, when bugs are found and new features are implemented (see [updates](https://m3ed.io/#updates)).aws-pds,autonomous vehicles,computer vision,deep learning,event camera,global shutter camera,GNSS,GPS,h5,hdf5Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
maf-genomeGolden Retriever Lifetime Study: Whole genome genotyping of Golden Retrievers on Axiom HD ArraysMorris Animal Foundation’s Golden Retriever Lifetime Study is a longitudinal, prospective study following 3044 golden retrievers. The Study’s purpose is to identify the nutritional, environmental, lif…[Morris Animal Foundation](https://www.morrisanimalfoundation.org/)Staticaws-pds,genome,genotyping,golden retriever lifetime study,morris animal foundation,life sciencesCC-BY-SA
man-truckscenesMAN TruckScenesA large scale multimodal dataset for Autonomous Trucking. Sensor data was recorded with a heavy truck from MAN equipped with 6 lidars, 6 radars, 4 cameras and a high-precision GNSS. MAN TruckScenes al…[MAN Truck and Bus SE](https://www.man.eu)The dataset may be updated with additional or corrected data on a need-to-update basis.autonomous vehicles,radar,lidar,IMU,GPS,computer vision,machine learning,deep learning,perception,object detection[CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/)
mapping-africaHigh resolution, annual cropland and landcover maps for selected African countriesHigh resolution, annual cropland and landcover maps for selected African countries developed by [Clark University](https://clarku.edu)'s [Agricultural Impacts Research Group](https://agroimpacts.info/…[The Agricultural Impacts Research Group](https://agroimpacts.info/)New maps are added as developedaws-pds,agriculture,land cover,satellite imagery,machine learning,deep learning,cog,labeled[Planet NICFI participant license agreement](https://assets.planet.com/docs/Planet_ParticipantLicenseAgreement_NICFI.pdf)
marine-energy-dataDepartment of Energy's Marine Energy Data LakeData released from projects funded by the Department of Energy's Water Power Technologies Office (DOE WPTO) that are too large or complex to be conveniently accessed by traditional means. The Marine…[National Laboratory of the Rockies](https://www.nrel.gov/)As neededaws-pds,energy,marine,waterCreative Commons Attribution 4.0 United States License
mast-gaia-dr3Gaia DR3[Gaia DR3 data](https://www.cosmos.esa.int/web/gaia/dr3) were originally released by the European Space Agency in December 2020. This [HATS](https://hats.readthedocs.io/en/stable)-formatted catalog wa…[Space Telescope Science Institute](http://www.stsci.edu/)NeverastronomyAttribution required.
mast-galexGalaxy Evolution Explorer Satellite (GALEX)The Galaxy Evolution Explorer Satellite (GALEX) was a NASA mission led by the California Institute of Technology, whose primary goal was to investigate how star formation in galaxies evolved from the…[Space Telescope Science Institute](http://www.stsci.edu/)Neverastronomy,aws-pdsPublic Domain. Attribution required for refereed scientific papers.
mast-hstHubble Space TelescopeThe Hubble Space Telescope (HST) is one of the most productive scientific instruments ever created. This dataset contains calibrated and raw data for all currently active instruments on HST: ACS, COS,…[Space Telescope Science Institute](http://www.stsci.edu/)Hourlyastronomy,aws-pdsPublic domain. Attribution required for refereed scientific papers.
mast-jadesThe JWST Advanced Extragalactic Survey JADESJADES is an infrared imaging and multi-object spectroscopy survey focused on two deep fields: the Hubble Deep Field (GOODS-N) and Hubble Ultra Deep Field (GOODS-S). JADES conducted NIRCam imaging in…[Space Telescope Science Institute](http://www.stsci.edu/)Neveraws-pds,astronomyAll HLSPs hosted at MAST are subject to a [CC By 4.0 license](https://creativecommons.org/licenses/by/4.0/).
mast-jwstJames Webb Space Telescope (JWST)The James Webb Space Telescope (JWST) is the world's next flagship infrared observatory led by NASA with its partners, ESA (European Space Agency), and CSA (Canadian Space Agency). JWST offers scienti…[Space Telescope Science Institute](http://www.stsci.edu/)Hourlyastronomy,aws-pdsPublic domain. Attribution required for refereed scientific papers.
mast-k2K2 Mission DataThe K2 mission observed 100 square degrees for 80 days each across 20 different pointings along the ecliptic, collecting high-precision photometry for a selection of targets within each field. The mis…[Space Telescope Science Institute](http://www.stsci.edu/)Neverastronomy,aws-pdsPublic domain. Attribution required for refereed scientific papers.
mast-keplerKepler Mission DataThe Kepler mission observed the brightness of more than 180,000 stars near the Cygnus constellation at a 30 minute cadence for 4 years in order to find transiting exoplanets, study variable stars, and…[Space Telescope Science Institute](http://www.stsci.edu/)Neverastronomy,aws-pdsPublic domain. Attribution required for refereed scientific papers.
mast-panstarrsPan-STARRS PS1 SurveyThe PS1 surveys used a 1.8 meter telescope and its 1.4 Gigapixel camera to image the sky in five broadband filters. The largest of these surveys provides coverage of the entire sky north of -30 degree…[Space Telescope Science Institute](http://www.stsci.edu/)Neveraws-pds,astronomySTScI hereby grants the non-exclusive, royalty-free, non-transferable, worldwide right and license to use, reproduce, and publicly display in all media data from the PS1 surveys.
mast-tess-spocTESS-SPOCThe data products for the TESS-SPOC FFI targets are the same as for the [TESS](https://archive.stsci.edu/missions-and-data/tess) two-minute cadence targets: calibrated target pixel files, simple apert…[Space Telescope Science Institute](http://www.stsci.edu/)Monthlyaws-pds,astronomyAll HLSPs hosted at MAST are subject to a [CC By 4.0 license](https://creativecommons.org/licenses/by/4.0/).
mast-tessTransiting Exoplanet Survey Satellite (TESS)The Transiting Exoplanet Survey Satellite (TESS) is a multi-year survey that has discovered exoplanets in orbit around bright stars across the entire sky using high-precision photometry. The survey al…[Space Telescope Science Institute](http://www.stsci.edu/)Monthlyastronomy,aws-pdsPublic domain. Attribution required for refereed scientific papers.
mast-tglcTESS-GAIA Light Curve (TGLC)TESS-Gaia Light Curve (TGLC) is a PSF-based TESS full-frame image (FFI) light curve product. Using Gaia DR3 as priors, the team forward models the FFIs with the effective point spread function to remo…[Space Telescope Science Institute](http://www.stsci.edu/)Monthlyastronomy,aws-pdsAll HLSPs hosted at MAST are subject to a [CC By 4.0 license](https://creativecommons.org/licenses/by/4.0/).
materials-projectMaterials Project DataMaterials Project is an open database of computed materials properties aiming to accelerate materials science research. The resources in this OpenData dataset contain the raw, parsed, and build data p…[Materials Project](https://materialsproject.org)New versions and objects added as we continuously calculate, parse and build new materials and their properties.aws-pds,chemistry,cloud computing,data assimilation,digital assets,digital preservation,energy,environmental,free software,genome[Materials Project Terms of Use](https://materialsproject.org/about/terms)
maxar-open-dataMaxar Open Data ProgramPre and post event high-resolution satellite imagery in support of emergency planning, risk assessment, monitoring of staging areas and emergency response, damage assessment, and recovery. These imag…[Maxar](https://www.maxar.com/)New data is released in response to activations. Older data may be migrated to the ARD format as needed.aws-pds,earth observation,disaster response,geospatial,satellite imagery,cog,stacCreative Commons Attribution Non Commercial 4.0
mbers-open-dataMarginal Build Emissions Rates (MBERs) for ElectricityThe Climate TRACE coalition has developed and maintains free global hourly Build Margin data, also known as MBERs, that are compliant with the Greenhouse Gas Protocol's Project Protocol electricity se…Climate TRACEAnnuallyaws-pds,carbon,climate,csv,electricity,energy,energy modeling,environmentalAll data are free and provided without license restrictions.
mcrpcGenomic Characterization of Metastatic Castration Resistant Prostate CancerBiopsies of castration resistant prostate cancer metastases were subjected to whole genome sequencing (WGS), along with RNA-sequencing (RNA-Seq). The overarching goal of the study is to illuminate mol…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,life sciences,cancer,genomic,whole genome sequencing,STRIDESNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
megascenesMegaScenesThe MegaScenes Dataset is an extensive collection of around 430k scenes, featuring over 100k structure-from-motion reconstructions and over 2 million registered images. MegaScenes includes a diverse a…Cornell University (https://www.cs.cornell.edu/~snavely/)The dataset will be updated periodically.internet,benchmark,computer vision,deep learningCreative Commons Attribution 4.0 International License. The photos have their own licenses.
met-office-global-deterministicMet Office Global Deterministic 10km on a 2-year rolling archiveTHIS DATASET IS CHANGING<br><br> Files uploaded from late January 2026 onward will contain changes including: - precision changes - new parameters - changes to existing parameters e.g. adding vertica…[Met Office](https://www.metoffice.gov.uk/)The model is run four times each day, with forecast reference times of 00:00, 06:00, 12:00 and 18:00 (UTC).<br><br> The runs at 00:00 and 12:00 provide data for the next 168 hours. The runs at 06:00…aws-pds,air temperature,atmosphere,forecast,geoscience,geospatial,model,near-surface air temperature,near-surface relative humidity,netcdfBritish Crown copyright 2023-2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-global-ensembleMet Office Global Ensemble Prediction System (MOGREPS-G) on a 30-day rolling archiveTHIS DATASET IS CHANGING<br><br> Files uploaded from late January 2026 onward will contain changes including: - precision changes - new parameters - changes to existing parameters e.g. adding vertica…[Met Office](https://www.metoffice.gov.uk/)The MOGREPS-G model runs four times per day at 00:00, 06:00, 12:00 and 18:00 UTC.aws-pds,air temperature,atmosphere,forecast,geoscience,geospatial,global,meteorological,model,near-surface air temperatureBritish Crown copyright 2024-2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-global-oceanMet Office Global Ocean model on a 2-year rolling archiveThe Global Ocean component of the Met Office Global Coupled Atmosphere-Land-Ocean-Ice system which has been running in operations since May 2022. The system provides a global physical analysis and cou…[Met Office](https://www.metoffice.gov.uk/)The Global Ocean model runs once daily. T+00h is the midnight prior to production, the following are produced. T-48h to T-24h best estimate; T-24h to T+00h analyses; 7 day forecast (from T+00h).aws-pds,forecast,geoscience,geospatial,global,marine,model,netcdf,ocean sea surface height,oceansBritish Crown copyright 2024-2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-global-waveMet Office Global Wave model on a 2-year rolling archiveThe Met Office runs global wave forecast models to support marine safety and operational decision making. Met Office configurations are developed to be run using the community wave model WAVEWATCH III…[Met Office](https://www.metoffice.gov.uk/)The Global Wave model runs at 00:00, 12:00 (UTC) to T+144h, and at 06:00, 18:00 (UTC) to T+66h.aws-pds,forecast,geoscience,geospatial,global,marine,model,netcdf,ocean sea surface height,oceansBritish Crown copyright 2024-2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-nws-oceanMet Office NWS Ocean model on a 2-year rolling archiveThe Northwest European continental shelf physical ocean model predicts temperature, salinity and circulation for waters surrounding the UK. Ocean physics analysis provides a 6-day forecast for the No…[Met Office](https://www.metoffice.gov.uk/)The NWS Ocean model runs once daily.aws-pds,forecast,geoscience,geospatial,marine,model,netcdf,ocean sea surface height,oceans,weatherBritish Crown copyright 2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-nws-waveMet Office NWS Wave model on a 2-year rolling archiveNorthwest European continental shelf regional wave model predicting sea-state and various sea and swell wave characteristics for waters surrounding the UK. The Met Office runs global and regional wav…[Met Office](https://www.metoffice.gov.uk/)The model runs at 00:00, 06:00, 12:00, 18:00 (UTC) from T-48h to T+143haws-pds,forecast,geoscience,geospatial,marine,model,netcdf,ocean sea surface height,oceans,weatherBritish Crown copyright 2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-uk-deterministicMet Office UK Deterministic (UKV)2km on a 2-year rolling archiveTHIS DATASET IS CHANGING<br><br> Files uploaded from late January 2026 onward will contain changes including: - precision changes - new parameters - changes to existing parameters e.g. adding vertica…[Met Office](https://www.metoffice.gov.uk/)There are three lengths of model run, each with its own update frequency: - Nowcast: forecasts the next 12 hours and runs at 0100, 0200, 0400, 0500, 0700, 0800, 1000, 1100, 1300, 1400, 1600, 1700, 190…aws-pds,air temperature,atmosphere,forecast,geoscience,geospatial,model,near-surface air temperature,near-surface relative humidity,netcdfBritish Crown copyright 2023-2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-uk-ensembleMet Office Global and Regional Ensemble Prediction System - UK (MOGREPS-UK) on a 30-day rolling archiveTHIS DATASET IS CHANGING<br><br> Files uploaded from late January 2026 onward will contain changes including: - precision changes - new parameters - changes to existing parameters e.g. adding vertica…[Met Office](https://www.metoffice.gov.uk/)The MOGREPS-UK model runs 24 times per day.aws-pds,air temperature,atmosphere,forecast,geoscience,geospatial,global,meteorological,model,near-surface air temperatureBritish Crown copyright 2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-uk-land-observationsMet Office UK Land Surface ObservationsLand surface weather observations for 31 parameters from over 250 locations across the Met Office UK land observation network. The data is available as CSV files. You can use it to monitor the latest…[Met Office](https://www.metoffice.gov.uk/)60 minutes of new data for all locations is ingested into the Amazon Registry of Open Data every hour. For example, at 12:00 new data will arrive for observations collected between 10:00-1100. This is…aws-pds,atmosphere,geospatial,csv,precipitation,weatherBritish Crown copyright 2024-2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-uk-marine-observationsMet Office UK Marine ObservationsMarine surface weather observations for 32 parameters from 69 locations across the Met Office marine observation network. Observations are available for a rolling 7-day period (168 hours). The data is…[Met Office](https://www.metoffice.gov.uk/)Every hour, at 15 minutes past the hour, we retrieve a single set of parameter values for each location. For example, at 12:15 new data is retrieved for observations collected between 11:00 and 12:00.…aws-pds,atmosphere,geospatial,csv,precipitation,weatherBritish Crown copyright 2024-2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-uk-radar-observationsMet Office UK Radar Observations on a 2-year rolling archiveThe United Kingdom Composite, Surface Rain Rate Estimate is an international radar composite produced by Met Office (UK). This is a composite, radar reflectivity derived, surface rain rate estimate pr…[Met Office](https://www.metoffice.gov.uk/)Four images per hour (every 15 minutes). The data is available within 20 minutes of the validity time of the product.aws-pds,atmosphere,geospatial,h5,hdf5,precipitation,radar,weatherBritish Crown copyright 2024-2025, the Met Office, is licensed under [CC BY-SA](https://creativecommons.org/licenses/by-sa/4.0/deed.en)
met-office-ukesm1-ariseMet Office UK Earth System Model (UKESM1) ARISE-SAI geoengineering experiment dataData from the UK Earth System Model (UKESM1) ARISE-SAI experiment. The UKESM1 ARISE-SAI experiment explores the impacts of geoengineering via the injection of sulphur dioxide (SO2) into the stratosphe…[Met Office](https://www.metoffice.gov.uk)Rare once completeclimate,model,climate model,atmosphere,oceans,land,ice,geospatial,aws-pds,sustainabilityCMIP6 data included is licensed under CC-BY 4.0 (see [here](https://wcrp-cmip.github.io/CMIP6_CVs/docs/CMIP6_source_id_licenses.html). Additional ARISE-SAI data is licenced under the [Open Government…
metagraphMetaGraph Sequence IndexesThe MetaGraph Sequence Indexes dataset comprises full-text searchable index files for raw sequencing data hosted in major public repositories. These include the European Nucleotide Archive (ENA) manag…[Biomedical Informatics Lab, ETH Zurich, Switzerland](https://bmi.inf.ethz.ch)Continuously as new sequencing data becomes available.biodiversity,bioinformatics,biology,analysis ready data,fasta,genome,genomic,graph,information retrieval,life sciences[CC BY-SA 4.0](https://creativecommons.org/licenses/by-sa/4.0/)
meteo-france-modelsAtmospheric Models from Météo-FranceGlobal and high-resolution regional atmospheric models from Météo-France. - ARPEGE World covers the entire world at a base horizontal resolution of 0.5° (~55km) between grid points, it predicts weath…[OpenMeteoData](https://openmeteodata.com)Every 6 hoursaws-pds,agriculture,climate,disaster response,earth observation,environmental,meteorological,model,weatherhttps://mf-models-on-aws.org/en/doc/license
mevadataMultiview Extended Video with Activities (MEVA)The Multiview Extended Video with Activities (MEVA) dataset consists video data of human activity, both scripted and unscripted, collected with roughly 100 actors over several weeks. The data was col…[Kitware](http://www.kitware.com/)We anticipate two or three updates as more video data is released to the public. aws-pds,computer vision,urban,us,videohttp://mevadata.org/resources/MEVA-data-license.txt
mimic-iv-demoMIMIC-IV Clinical Database DemoThe Medical Information Mart for Intensive Care (MIMIC)-IV&nbsp;database is comprised&nbsp;of&nbsp;deidentified electronic health records for patients admitted to the Beth Israel Deaconess Medical Cen…[PhysioNet](https://physionet.org/)Not updatedaws-pdsOpen Data Commons Open Database License v1.0
mimic-iv-ecgMIMIC-IV-ECG: Diagnostic Electrocardiogram Matched SubsetThe MIMIC-IV-ECG module contains approximately 800,000 diagnostic electrocardiograms across nearly 160,000 unique patients. These diagnostic ECGs use 12 leads and are 10 seconds in length. They are sa…[PhysioNet](https://physionet.org/)Not updatedaws-pdsOpen Data Commons Open Database License v1.0
mimiciiiMIMIC-III (‘Medical Information Mart for Intensive Care’)MIMIC-III (‘Medical Information Mart for Intensive Care’) is a large, single-center database comprising information relating to patients admitted to critical care units at a large tertiary care hosp…[MIT Laboratory for Computational Physiology](https://lcp.mit.edu/)Not updatedaws-pds,bioinformatics,health,life sciences,natural language processing,ushttps://physionet.org/content/mimiciii/view-license/1.4/
mirrulationsmirrulationsThe regulations.gov website allows users to view proposed rules and supporting documents for the federal rule-making process. In addition, users can post and view comments about those proposed rules.…[Moravian University Computer Science](https://www.moravian.edu/computer-science)Hourlygovernment recordshttps://creativecommons.org/publicdomain/mark/1.0/deed.en
mmidThe Massively Multilingual Image Dataset (MMID)MMID is a large-scale, massively multilingual dataset of images paired with the words they represent collected at the [University of Pennsylvania](https://upenn.edu). The dataset is doubly parallel: f…[Penn NLP](https://github.com/penn-nlp)Language data is added as it is ready for distribution.aws-pds,computer vision,machine learning,machine translation,natural language processingSee citation instructions at http://multilingual-images.org
mmrf-commpassCoMMpass from the Multiple Myeloma Research FoundationThe Relating Clinical Outcomes in Multiple Myeloma to Personal Assessment of Genetic Profile study is the Multiple Myeloma Research Foundation (MMRF)’s landmark personalized medicine initiative. CoMM…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,cancer,genomic,genetic,whole genome sequencing,STRIDES,life sciencesNIH Genomic Data Sharing Policy: https://gdc.cancer.gov/access-data/data-access-policies
modis-astraeaMODIS MYD13A1, MOD13A1, MYD11A1, MOD11A1, MCD43A4Data from the Moderate Resolution Imaging Spectroradiometer (MODIS), managed by the U.S. Geological Survey and NASA. Five products are included: MCD43A4 (MODIS/Terra and Aqua Nadir BRDF-Adjusted Refle…[Astraea](https://astraea.earth/)New scenes are added daily.aws-pds,agriculture,geospatial,satellite imagery,natural resource,disaster responseThere are no restrictions on the use of data, unless expressly identified prior to or at the time of receipt.
modisMODISThe Moderate Resolution Imaging Spectroradiometer (MODIS) MCD43A4 Version 6 Nadir Bidirectional Reflectance Distribution Function (BRDF)-Adjusted Reflectance (NBAR) dataset is produced daily using 16…Not currently managedDailyaws-pds,agriculture,geospatial,satellite imagery,natural resource,disaster responseThere are no restrictions on the use of data, unless expressly identified prior to or at the time of receipt.true
mogrepsUK Met Office Global and Regional Weather ForecastsData from two models is available: MOEGREPS-UK, a high resolution weather forecast covering the United Kingdom, and MOGREPS-G, a global weather forecast.[UK Met Office](https://www.metoffice.gov.uk/)Not currently updatedaws-pds,agriculture,earth observation,climate,weather,meteorologicalThis data is free to use for non-commercial research purposes only under the terms of the [Non-Commercial Government Licence](http://www.nationalarchives.gov.uk/doc/non-commercial-government-licence/n…true
molssi-covid19-hubCOVID-19 Molecular Structure and Therapeutics HubAggregating critical information to accelerate drug discovery for the molecular modeling and simulation community. A community-driven data repository and curation service for molecular structures, mod…[Molecular Sciences Software Institute (MolSSI)](https://molssi.org/) and [BioExcel](https://bioexcel.eu/)Data contributions come from external researchers and groups at a roughly weekly cadence.aws-pds,biology,bioinformatics,coronavirus,COVID-19,life sciences,molecular docking,pharmaceuticalMost data will be in an open license provided by the contributing individual(s). Neither MolSSI nor BioExcel hold the license(s) for data in the data set unless explicitly noted.
monkeyMONKEYThis dataset contains the training data for the [Machine learning for Optimal detection of iNflammatory cells in the KidnEY or MONKEY](https://monkey.grand-challenge.org/) challenge. The MONKEY challe…Radboud University Medical CenterAs requiredaws-pds,life sciences,cancer,computational pathology,deep learning,grand-challenge.org,histopathology,computer vision,digital pathology,medical image computingCC BY-NC-SA 4.0
mosaicMeta-Organized Stimuli And fMRI Imaging data for Computational modeling (MOSAIC)This extensible dataset, MOSAIC, aggregates individual functional magnetic resonance imaging (fMRI) datasets by leveraging a shared preprocessing pipeline and stimulus curation procedure. This dataset…Massachusetts Institute of Technology, Georgia TechNew data is uploaded as researchers preprocess their fMRI data according to MOSAIC format and submit.aws-pds,brain images,brain models,hdf5,neuroimaging,neuroscience,machine learningCC BY 4.0
motional-nuplannuPlannuPlan is the world's first large-scale planning benchmark for autonomous driving.[Motional, Inc.](https://motional.com)Finalizedaws-pds,autonomous vehicles,lidar,robotics,transportation,urban[Commercial](https://www.nuscenes.org/terms-of-use)
motional-nuscenesnuScenesPublic large-scale dataset for autonomous driving. It enables researchers to study challenging urban driving situations using the full sensor suite of a real self-driving car.[Motional, Inc.](https://motional.com)Finalizedaws-pds,autonomous vehicles,computer vision,lidar,robotics,transportation,urban[Commercial](https://www.nuscenes.org/terms-of-use)
mp2prtMolecular Profiling to Predict Response to Treatment (phs001965)The Molecular Profiling to Predict Response to Treatment (MP2PRT) program is part of the NCI's Cancer Moonshot Initiative. The aim of this program is the retrospective characterization and analysis of…[Center for Translational Data Science at The University of Chicago](https://ctds.uchicago.edu/)Genomic Data Commons (GDC) is source of truth for this dataset; GDC offers monthly data releases, although this dataset may not be updated at every release. aws-pds,life sciences,cancer,genomic,whole genome sequencing,STRIDES[NIH Genomic Data Sharing Policy](https://gdc.cancer.gov/access-data/data-access-policies)
mrkrEmory Knee Radiograph (MRKR) datasetThe Emory Knee Radiograph (MRKR) dataset is a large, demographically diverse collection of 503,261 knee radiographs from 83,011 patients, 40% of which are African American. This dataset provides imagi…Health Innovation and Translational Informatics Lab at Emory University (hitilab.com)New data is added as soon as it is available.aws-pds,radiology,medical imaging,medical image computing,machine learning,computer vision,health,imaging,biology,bioinformaticsCC-BY-SA
msdMedical Segmentation DecathlonWith recent advances in machine learning, semantic segmentation algorithms are becoming increasingly general purpose and translatable to unseen tasks. Many key algorithmic advances in the field of med…[MONAI Development Team](https://github.com/Project-MONAI/MONAI)This is a static dataset; however, tutorials and resources will be updated as they are developed.aws-pds,health,life sciences,medicine,imaging,magnetic resonance imaging,nifti,computed tomography,segmentation[CC-BY-SA 4.0 International](https://creativecommons.org/licenses/by/4.0/)
multi-token-completionMulti Token CompletionThis dataset provides masked sentences and multi-token phrases that were masked-out of these sentences. We offer 3 datasets: a general purpose dataset extracted from the Wikipedia and Books corpora, a…[Amazon](https://www.amazon.com/)Not currently being updatedamazon.science,natural language processing,machine learningDatasets are published under [CC-NC-SA-3.0](https://creativecommons.org/licenses/by-nc-sa/3.0/). Human evaluation is published under [CC-SA-4.0](https://creativecommons.org/licenses/by-sa/4.0/).
multiconerMultiCoNER DatasetsMultiCoNER 1 is a large multilingual dataset (11 languages) for Named Entity Recognition. It is designed to represent some of the contemporary challenges in NER, including low-context scenarios (short…[Amazon](https://www.amazon.com/)N/Anatural language processing[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)
multimedia-commonsMultimedia CommonsThe Multimedia Commons is a collection of audio and visual features computed for the nearly 100 million Creative Commons-licensed Flickr images and videos in the YFCC100M dataset from Yahoo! Labs, alo…[Multimedia Commons](http://mmcommons.org/)Not updated.aws-pds,computer vision,machine learning,multimedia,videoThe International Computer Science Institute and Lawrence Livermore National Laboratory have released the feature corpus and annotations under Creative Commons 0 (public domain), so there are no restr…
murMulti-Scale Ultra High Resolution (MUR) Sea Surface Temperature (SST)A global, gap-free, gridded, daily 1 km Sea Surface Temperature (SST) dataset created by merging multiple Level-2 satellite SST datasets. Those input datasets include the NASA Advanced Microwave Scann…[Farallon Institute](https://faralloninstitute.org)The temporal extent of the Zarr store is 2002-06-01 to 2020-01-20.aws-pds,earth observation,environmental,natural resource,oceans,satellite imagery,climate,water,weatherThere are no restrictions on the use of these data.
mwis-vr-instancesMWIS VR InstancesLarge-scale node-weighted conflict graphs for maximum weight independent set solvers[Amazon](https://www.amazon.com/)Infrequentamazon.science,traffic,transportation,graphMIT-0
naipNAIP on AWSThe National Agriculture Imagery Program (NAIP) acquires aerial imagery during the agricultural growing seasons in the continental U.S. This "leaf-on" imagery andtypically ranges from 30 centimeters t…[Esri](https://www.esri.com/en-us/home)NAIP data is provided state by state at varying time intervals. Each year, a variable number of states are updated with an overall update cycle of every two to three years for each state. This catalog…aws-pds,agriculture,earth observation,aerial imagery,geospatial,natural resource,regulatory,cogPublic Domain with Attribution
nanoporeNanopore Reference Human GenomeThis dataset includes the sequencing and assembly of a reference standard human genome (GM12878) using the MinION nanopore sequencing instrument with the R9.4 1D chemistry.Nanopore Whole Genome Sequencing ConsortiumData will be added as methodology improves or new data uses are encountered.aws-pds,genetic,genomic,life sciences,whole genome sequencingNanopore Human Reference data is released under the Creative Commons CC-BY license and allows free, full and open access to all. For more details please refer to the data reuse and license section of…
napieroneNapierOne Mixed File DatasetNapierOne is a modern cybersecurity mixed file data set, primarily aimed at, but not limited to, ransomware detection and forensic analysis. The dataset contains over 500,000 distinct files, represent…[School of Computing at Edinburgh Napier University](https://www.napier.ac.uk/about-us/our-schools/school-of-computing/)Data will be added as methodology improves or new common or required file types are encountered.computer forensics,computer security,cyber security,digital forensics,ransomware,malware,mixed file dataset,aws-pdsNapierOne is released under the Edinburgh Napier University License Agreement and allows free, full and open access to all. For more details please refer to the License and Attribution section of the…
nara-1940-census1940 Census Population Schedules, Enumeration District Maps, and Enumeration District DescriptionsThe 1940 Census population schedules were created by the Bureau of the Census in an attempt to enumerate every person living in the United States on April 1, 1940, although some persons were missed. T…National Archives and Records Administration (NARA)Not updatednara,census,archives,1940 census,demography,aws-pdsUS Government work
nara-1950-census1950 Census Population Schedules, Enumeration District Maps, and Enumeration District DescriptionsThe 1950 Census population schedules were created by the Bureau of the Census in an attempt to enumerate every person living in the United States on April 1, 1950, although some persons were missed. T…National Archives and Records Administration (NARA)Not updatednara,census,archives,1950 census,demography,aws-pdsUS Government work
nara-national-archives-catalogNational Archives CatalogThe National Archives Catalog dataset contains all of the descriptions; authority records; digitized and electronic records; and tags, transcriptions and comments for NARA’s archival holdings availabl…National Archives and Records Administration (NARA)Biannualnara,national archives catalog,archives,government records,aws-pdsUS Government work
nasa-airibradAIRS/Aqua L1B Infrared (IR) geolocated and calibrated radiances V005 (AIRIBRAD) at GES DISCWARNING: On 2021/09/23 the EOS Aqua executed a Deep Space Maneuver (DSM). In the DSM, the spacecraft is turned such that the normal Earth field of regard is deep space. The thermal impact of the DSM…NASAFrom 2002-08-30 to Ongoingaws-pds,atmosphere,datacenter,earth observation,global,hdf,ice,land,metadata,opendap[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)
nasa-airicradAIRS/Aqua L1C Infrared (IR) resampled and corrected radiances V6.7 (AIRICRAD) at GES DISCThe Atmospheric Infrared Sounder (AIRS) is a grating spectrometer (R = 1200) aboard the second Earth Observing System (EOS) polar-orbiting platform, EOS Aqua. In combination with the Advanced Microwav…NASAFrom 2002-08-30 to Ongoingaws-pds,atmosphere,climate,datacenter,earth observation,global,metadata,opendap,orbit,hdf[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)
nasa-astl1tASTER Level 1T Precision Terrain Corrected Registered At-Sensor Radiance V004The Terra Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) Level 1 Precision Terrain Corrected Registered At-Sensor Radiance (AST_L1T) data contains calibrated at-sensor radiance…NASAFrom 2000-03-04 to Ongoing (Varies)aws-pds,cog,earth observation,global,land,orbit,cog[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)
nasa-atl03ATLAS/ICESat-2 L2A Global Geolocated Photon Data V006This data set (ATL03) contains height above the WGS 84 ellipsoid (ITRF2014 reference frame), latitude, longitude, and time for all photons downlinked by the Advanced Topographic Laser Altimeter System…NASAFrom 2018-10-13 to Ongoingaws-pds,atmosphere,datacenter,earth observation,global,hdf,ice,land,water[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)
nasa-atl08ATLAS/ICESat-2 L3A Land and Vegetation Height V006This data set (ATL08) contains along-track heights above the WGS84 ellipsoid (ITRF2014 reference frame) for the ground and canopy surfaces. The canopy and ground surfaces are processed in fixed 100 m…NASAFrom 2018-10-14 to Ongoingaws-pds,atmosphere,datacenter,earth observation,global,ice,land,hdf[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)
nasa-egaMars Spectrometry: Detect Evidence for Past HabitabilityNASA missions like the Curiosity and Perseverance rovers carry a rich array of instruments suited to collect data and build evidence towards answering if Mars ever had livable environmental conditions…NASAStatic dataset for the commercial data When more data is collected on the SAM instrument on Mars, the data will be available on the Planetary Data System website: Data from all SAM experiments are arc…aws-pds,analytics,archives,deep learning,planetary,machine learning,NASA SMD AIThere are no restrictions on the use, access, and/or download of data from the NASA project.
nasa-gcmsMars Spectrometry 2: Gas Chromatography for the Sample Analysis at Mars Data (SAM) InstrumentNASA missions like the Curiosity and Perseverance rovers carry a rich array of instruments suited to collect data and build evidence towards answering if Mars ever had livable environmental conditions…NASA"Static dataset for the commercial data. When more data is collected on the SAM instrument on Mars, the data will be available on the Planetary Data System [website](https://pds-geosciences.wustl.edu/…aws-pds,analytics,archives,deep learning,planetary,machine learning,NASA SMD AIThere are no restrictions on the use, access, and/or download of data from the NASA project.
nasa-gedi02aGEDI L2A Elevation and Height Metrics Data Global Footprint Level V002The Global Ecosystem Dynamics Investigation ([GEDI](https://gedi.umd.edu/)) mission aims to characterize ecosystem structure and dynamics to enable radically improved quantification and understanding…NASAFrom 2019-04-04 to 2023-03-16 (Varies)aws-pds,biodiversity,carbon,datacenter,earth observation,energy,global,hdf,ice,land[Creative Commons BY 4.0](https://creativecommons.org/licenses/by/4.0/)