Skip to yearly menu bar Skip to main content


BigDocs: A Permissively-Licensed Dataset for Training Vision-Language Models on Document and Code Tasks

Juan Rodriguez ⋅ Xiangru Jian ⋅ Siba Smarak Panigrahi ⋅ Tianyu Zhang ⋅ Aarash Feizi ⋅ Abhay Puri ⋅ Akshay Kalkunte Suresh ⋅ François Savard ⋅ Amirhossein Abaskohi ⋅ Ahmed Masry ⋅ Shravan Nayak ⋅ Mahsa Massoud ⋅ Rabiul Awal ⋅ Pierre-André Noël ⋅ Mats L Richter ⋅ Saverio Vadacchino ⋅ Shubham Agarwal ⋅ Sanket Biswas ⋅ Ying Zhang ⋅ Sathwik Tejaswi Madhusudhan ⋅ Joao Monteiro ⋅ Krishnamurthy Dvijotham ⋅ Torsten Scholak ⋅ Nicolas Chapados ⋅ Sean Hughes ⋅ M. Tamer Özsu ⋅ Aishwarya Agrawal ⋅ Marco Pedersoli ⋅ Chris Pal ⋅ Perouz Taslakian ⋅ David Vazquez ⋅ Issam Hadj Laradji ⋅ Spandana Gella ⋅ Sai Rajeswar Mudumba

Abstract

Chat is not available.