CIDAR Exploration

CIDAR aims to create an Arabic version of the Alpaca dataset. We used the Alpagasus dataset to select around 9,000 samples to be reviewed. We also add around 1,000 samples that contain linguisitc instructions.

  1. Fix Grammatical Issues
    Some words might not be translated correctly especially at the start, we want all the statements to start by a question For example تحديد should be replaced by حدد or قم بتحديد
  2. Fix Translation Issues
    Some inputs or outputs might not be translted. Also, some instructions might be specific to English please give their corropsonding examples in Arabic.
  3. Adapt Cultural Content
    Some examples in the original English Alpaca might contain some examples that represent the western cultures. We want to replace them with insturctions that represent the Arabic region and its culture.
Total Contributions:

English

Arabic