The Vashantor dataset consists of 32,500 sentences from different regions, including Chittagong, Noakhali, Sylhet, Barishal, and Mymensingh. It is categorized into two language formats: "Bangla" and "Banglish." Each region and language combination has specified quantities for training, testing, and validation samples. The dataset details are as follows:
Specifics of the Core Data:
| Type | Bangla | Banglish | English |
|:----------: |:------: |:--------: |:-------: |
| Train | 1875 | 1875 | 1875 |
| Test | 375 | 375 | 375 |
| Validation | 250 | 250 | 250 |
Specifics of the Regional Data:
<table class="tg">
<thead>
<tr>
<th class="tg-c3ow">Region</th>
<th class="tg-c3ow">Type</th>
<th class="tg-c3ow">Train</th>
<th class="tg-c3ow">Test</th>
<th class="tg-c3ow">Validation</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-c3ow" rowspan="2">Chittagong</td>
<td class="tg-0pky">Bangla</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td> </td>
<td class="tg-0pky">Banglish</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td class="tg-c3ow" rowspan="2">Noakhali</td>
<td class="tg-0pky">Bangla</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td> </td>
<td class="tg-0pky">Banglish</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td class="tg-c3ow" rowspan="2">Sylhet</td>
<td class="tg-0pky">Bangla</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td> </td>
<td class="tg-0pky">Banglish</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td class="tg-c3ow" rowspan="2">Barishal</td>
<td class="tg-0pky">Bangla</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td> </td>
<td class="tg-0pky">Banglish</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td class="tg-c3ow" rowspan="2">Mymensingh</td>
<td class="tg-0pky">Bangla</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
<tr>
<td> </td>
<td class="tg-0pky">Banglish</td>
<td class="tg-dvpl">1875</td>
<td class="tg-dvpl">375</td>
<td class="tg-dvpl">250</td>
</tr>
</tbody>
</table>