CIDAR aims to create an Arabic version of the Alpaca
dataset. We used the Alpagasus dataset to select
around 9,000 samples to be reviewed. We also add around 1,000 samples that contain linguisitc instructions.
Fix Grammatical Issues
Some words might not be translated correctly especially at the start, we want all the
statements to start by a question
For example تحديد should be replaced by حدد or قم بتحديد
Fix Translation Issues
Some inputs or outputs might not be translted. Also, some instructions might be specific
to English please give their corropsonding examples in Arabic.
Adapt Cultural Content
Some examples in the original English Alpaca might contain some examples that represent the
western cultures. We want to replace them with insturctions that represent the Arabic region and its
culture.