NOTE: This is a trimmed dataset for the purpose of reducing runtime for examples.

50% of the data from the orignal dataset has been removed and the removed
data was chosen in a pseudo-random fashion.

The performance of models trained against this dataset WILL BE LOWER THAN THE
ORIGINAL EXPERIMENT.

If you aren't pressed for immediate time (this dataset was created for an
in-person workshop when iteration time was more important), you should download
the original dataset at:

https://infrastructure.fedoraproject.org/infra/fedora-openqa-data/20231116-openqa-2119970-composite_failures-1x-resolution.tar.xz

Information on original dataset
===============================

The images in this archive make up a dataset of composite screenshots that
represent 3082 failed modules collected from Fedora's OpenQA instance [1]
between August 30 and September 13, 2022.

This dataset is composed of images which are 1024x768 pixels

The images are split into two categories: 'rescheduled' which represent modules
which were rescheduled due to a failure which can be traced back to a long
running issue [2]. 'not_rescheduled' represents the failed modules which had
a different root cause.

The breakdown of these 3082 images is as follows:
  - 2214 not_rescheduled
  - 560 rescheduled

The original code used to generate these images can be found at:
https://pagure.io/fedora-qa/openqa_classifier

This dataset is licensed under a CC-BY-4.0 license [3]


[1] https://openqa.fedoraproject.org/
[2] https://bugzilla.redhat.com/show_bug.cgi?id=2119970
[3] https://creativecommons.org/licenses/by/4.0/
