Chapter 3: Small data and few-shot machine learning
“ …people are individual people and not an ‘average’ (footnote 114)
.”
Gary Marsden, Andrew Maunder and Munier Parker
Small data analysis is the use of tools and techniques for data analysis in settings where there is only limited amounts of data and information. While disabled people are a large group at 16% of the global population (footnote 115) , the wide variety of disabilities and how they are experienced means in practice datasets for specific disabilities are typically small. This contrasts with dominant approaches in technology development which emphasise the importance and use of large datasets, typically referred to as ‘big data’. Small data is an alternative to this paradigm, where advancements in small data methods and techniques could offer opportunities to create new DigAT and enable more inclusive analysis of data.
What is small data and its benefits?
Small data analysis refers to the ability to derive insights and analyse detailed context-specific information from smaller datasets. These approaches have always been important to scientific research with early scientific discoveries, such as early astronomical observations, relying on small numbers of observations. However, since the early 2000s, big data approaches, relying on the ability to search and analyse vast datasets, have become increasingly popular due to advancements in computing power and access to large quantities of data.
Small data techniques can be invaluable in situations where large datasets are simply not available, such as research on rare diseases or creating products for niche markets. Big data approaches often rely on the misleading assumption that bigger datasets lead to more reliable conclusions (footnote 116). In practice, these techniques often fail when confronted with outliers or unique scenarios. For example, AI models used for self-driving cars have failed to recognise backward-propelled wheelchairs despite being trained on wheelchair-representative datasets (footnote 117). Small data techniques can preserve more contextual information and improve reliability when datasets are smaller and contain large variations.
Small data and more personalised approaches can better capture the unique and diverse experiences of individuals. Big data approaches can lead to an overemphasis on the average needs of a population, neglecting those who fall outside the ‘norm’. This can be particularly concerning when big data and statistical averages are used for decision making and policy making. While averages can be useful summaries, they mask important variations and can lead to decisions that prioritise the needs of the majority without adequately addressing the needs of all individuals, particularly disabled people who may have less common needs (footnote 118).
What are the techniques for small data? (footnote 119)
Small data research is currently undertaken across many disciplines meaning there is a range of different methods used. In applying small data methods, three key concepts are used: similarity, transfer and uncertainty.
- Similarity
Determining the similarity between different datasets is important when working with small data. Several quantitative methods (footnote 120) have been developed to assess similarity between different datasets, which can help with assessing whether datasets can be combined and whether insights from one group can apply to another. For example, in rare disease research, assessing similarity between different patient groups can help with leveraging evidence from similar cases to improve treatment prediction. - Transfer
Transfer of information is key for small data, when there is a transfer of information between similar datasets or when a small dataset can be enriched with information from other external sources, such as databases or other models. These methods can include techniques for few-shot learning, representation learning and neuro-symbolic AI. - Uncertainty
Uncertainty is particularly important in small data settings due to the limited information available for modelling. Several methods can be used to quantify and estimate uncertainty in model parameters though more work is needed to assess uncertainty from model selection. One important approach for reducing uncertainty is meta-learning, where a model learns across many datasets.
The following examples are some techniques and methods that are useful for small data:
Few-shot learning
Few-shot learning is a machine learning technique for learning a task or category given a small number of examples (footnote 121). Humans are natural small data learners: given a few images of a car, children can generalise the concept and recognise similar objects and few-shot learning techniques aim to apply this idea to machine learning systems. Few-shot learning is an attractive tool for tackling small data challenges as it aims to optimise performance when data is scarce, as is often the case for disabled communities.
Meta-learning
Meta-learning, or ‘learning to learn’, refers to a technique for training machine learning models using knowledge from several (potentially small) datasets. By training a model on several datasets, the aim is for the model to then be more readily adaptable to new tasks with few examples, which is especially useful for small data settings.
Neuro-symbolic AI
Neuro-symbolic AI combines two approaches to AI: neural networks based on data-driven modelling and symbolic AI which builds in explicit knowledge or rules into a system. This combines the strengths of neural networks, which can learn from large amounts of data but are difficult to interpret, with the strengths of symbolic AI techniques, which rely on knowledge and assumptions explicitly coded into a system, increasing explainability and efficiency. This approach can be useful for integrating small data with big data, where small data is explicit knowledge built into a larger neural network based on big data.
What is the potential of small data analytics for supporting disabled people?
Advancements in small data research and techniques could significantly improve analysis of disability data and create better DigAT.
Small data for research and policy
Small data approaches are a necessity when analysing small datasets, such as in rare disease research (diseases afflicting less than one in two thousand people). For example, a clinician may need to assess the right dose of a treatment for a new child patient with a rare genetic condition. Given the small number of previous patients with the condition, small data approaches could be used to match the new patient to the most similar subgroup of patients (eg patients under 10 years old) or draw on relevant information (eg age) to enable better predictions of the right dose (footnote 122). Small data techniques could also be used to combine data from several different studies of individual patients to create a relatively larger dataset to be used for dose predictions (footnote 123).
Analysis of small data can also be useful for evidence-informed policy-making. The use of big data analysis for policy-making prioritises the needs of the average individual, reinforcing the ‘invisibility’ of marginalised groups, such as disabled people, in decisions around spending priorities. Small data approaches could help ensure policy-making is more contextual and inclusive, leading to better outcomes for both individuals and society as a whole (footnote 124). However, there are trade-offs involved since insights from small data will need to be balanced against other data to ensure small datasets are not skewed or biased.
Personalising DigAT
Small data approaches can also help personalise DigAT to better suit disabled peoples’ unique needs. Few-shot learning, meta-learning and neuro-symbolic AI can enable systems to learn from smaller datasets creating opportunities for new adaptable DigATs.
Emerging research uses few-shot learning and meta-learning for automated sign language recognition systems (footnote 125), (footnote 126), personalisation of sound recognition systems used by D/deaf and hard-of-hearing users (footnote 127) and to design Augmentative and Alternative Communication (AAC) systems for people with complex speech and communication needs (footnote 128), (footnote 129).
For example, WESPER is a zero-shot AI tool converting whispers to normal speech, which can be useful for people with hearing loss (footnote 130). FindMyThings, developed by Microsoft, is an AI object recognition tool designed to help people with vision loss find their personal items, which uses few-shot learning to reduce the number of examples required to complete the task with minimal effort from users (footnote 131). Neuro-symbolic AI could be used to personalise devices, such as smartphones, by suggesting optimal accessibility settings on a phone based on data inputted by a disabled user (footnote 132).
Analysing small data is also key to advancements in wearables and remote monitoring devices, such as those used in social care settings to detect falls. Fall detection is challenging due to significant variations in human bodies and how movement is recorded (footnote 133). Small data approaches comparing small and large datasets can be used to personalise these devices by using an individual’s collected data to understand how they normally move and improve accuracy (footnote 134).
What are the limitations of small data approaches?
Small data problems occur in a range of fields and hence, small data methodologies have been developed across many research areas. While this shows the relevance of small data research in many domains, it also means that research may be impeded due to a lack of interdisciplinary communication. This includes a lack of shared language for small data approaches.
One key limitation of small data approaches is the risk of overfitting, where the model learns patterns that are too closely aligned to the training data and fails to generalise to other datasets. While this is also a risk in big data approaches, the limited information available in small datasets means the data may not be diverse enough to cover a wide range of situations making it more likely a model will learn specific patterns that do not generalise (footnote 135). This risk is heightened in cases where certain categories are overrepresented in a dataset leading to biased predictions or when a model relies on historical data, where the underlying pattern could change in the future (footnote 136).
Another limitation to small data approaches is the challenge of validating models trained on small data. To validate a model, there should be no overlap between the dataset used to train the model and the dataset used to test the model. When there is an overlap, a phenomenon known as data leakage, it leads to an overestimation of the model’s accuracy and decreases the ability of a model to generalise to new data – an effect which is amplified for small datasets. External validation, where a model is tested on new similar datasets, is challenging in small data settings where there may be a scarcity of data available for training, let alone validation.
To address these challenges, assessment of similarity between datasets is crucial. Policies facilitating data exchange such as encouraging collaboration and providing data sharing infrastructure for researchers can help with addressing data scarcity (footnote 137). The creation of datasets in accordance with the FAIR (Findable, Accessible, Interoperable, Reusable) principles enables easier comparisons for similarity assessments (footnote 138).
Case study 3: DigAT for travel and tourism
For travel and tourism, DigAT can be used by disabled people to address challenges related to navigation, anxiety and communication. DigAT can improve how disabled travellers locate and interact with relevant information and create more opportunities for accessible experiences while travelling.
Opportunities
One key challenge disabled travellers face is navigation as it can be difficult to know in advance whether a route or location meets their accessibility needs. While mainstream travel search sites allow for filtering of search results for ‘accessibility’, these results are often unreliable due to lack of standardisation (footnote 139) and navigation apps often don’t include accessibility features (footnote 140). Online platforms, such as accessibleGO (footnote 141) and wheelmap (footnote 142), include more specific information about facilities so travellers with mobility issues can accurately check whether a location meets their needs, for example, accessible bathrooms and showers. Companies, such as Ocean 3D, create virtual tours of airports, hotels and bars, which anxious people are able to view on their computer or on virtual reality headsets to explore a route and practice ahead of a trip (footnote 143).
While travelling or staying in novel environments, disabled travellers can face communication issues or barriers to essential information, which may only be communicated in one format such as audio announcements in a noisy environment. Audio-to-text systems can be useful, such as Spoke, a mobile app that integrates with public address systems to convert live announcements and sounds into written text for d/Deaf and hard of hearing people while travelling (footnote 144). Navigation apps can increase the confidence of Blind and partially sighted people when travelling (footnote 145). For example, WayMap uses location technologies that don’t require wi-fi or mobile signal to help users navigate locations such as train stations with audio instructions (footnote 146).
On reaching a destination, DigAT can be used to create more customised accessible experiences for disabled people. Through 3D printing and digital near field communication (NFC) technologies, PictureLive creates audio-tactile interactive experiences of visual information and artefacts for blind and partially sighted people who are often excluded from traditional “sight-seeing” when travelling (footnote 147). Virtual reality headsets can also be used to provide tours of archaeological sites which are often inaccessible for wheelchair users (footnote 148).
Challenges
One challenge for using DigAT for travel is a lack of consistent high-quality data across countries for navigation and about accessibility requirements. Data on specific accessibility requirements, such as sensory accommodations, are often not included in general datasets and there is a lack of standardisation of existing data meaning, for example, ‘wheelchair accessible’ doesn’t guarantee standard measurements. Additionally, it is important that accessibility information is kept up-to-date to not foster a false sense of accessibility. International differences in how disability is defined and measured can also limit disabled travellers’ ability to make well-informed decisions on whether their accessibility needs are met in different locations.
Developing robust DigAT that can be used in a wide range of contexts and locations is hindered by a lack of globally comprehensive datasets. For example, audio-to-text AI systems used to transcribe real-time information when travelling require datasets including audio and text data from multiple languages, which can be expensive to create or access. Inaccuracies can mislead disabled travellers causing frustrating or dangerous situations when a system is unable to recognise announcements or information in different languages. Using DigAT also requires reliable access to electricity and the internet, which often cannot be guaranteed while travelling, particularly in low-resource settings, reinforcing digital exclusion.
Example: Transport for London
Local public transportation authorities, such as Transport for London (TfL), can use and support DigAT to address navigation challenges when travelling. One recent TfL initiative has used Google Street View to visually map London’s busiest stations so wheelchair users can virtually navigate and plan their travel routes (footnote 149). In 2023, TfL also trialled NaviLens, an app that detects special QR codes while travelling to provide voice guidance for Blind and partially sighted people (footnote 150). This is already used in several cities such as Barcelona and New York to provide access to real-time travel information in underground stations and bus stops (footnote 151).
Conclusion
There are significant opportunities to use DigAT for addressing challenges currently encountered by disabled travellers and tourists, such as for navigation and lack of accessible experiences. However, using DigAT in this context requires better collection and standardisation of accessibility data, access to relevant datasets for the creation of globally comprehensive DigAT and support for infrastructure so disabled travellers can reliably use DigAT.
Footnotes
-
114. Marsden G, Maunder A, Parker M. (2008). People Are People, but Technology Is Not Technology. Philosophical Transactions: Mathematical, Physical and Engineering Sciences 366, 3795–3804.
Back to report -
115. The World Health Organization. 2023 Disability. See https://www.who.int/news-room/fact-sheets/detail/disability-and-health (accessed 14 April 2025).
Back to report -
116. Boyd d, Crawford K. 2012 Critical Questions for Big Data: Provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15, 662–679. (doi: 10.1080/1369118X.2012.678878)
Back to report -
117. Treviranus J. 2019 The value of being different. Proceedings of the 16th international web for all conference, 1-7.
Back to report -
118. Hackenberg M et al. Small data explainer – The impact of small data methods in everyday life. See https://royalsociety.org/news-resources/projects/disability-data-assistive-technology/ (accessed 15 April 2025).
Back to report -
119. This sections draws extensively on Hackenberg M et al. Small data explainer – The impact of small data methods in everyday life. See https://royalsociety.org/news-resources/projects/disability-data-assistive-technology/ (accessed 15 April 2025).
Back to report -
120. Ibid
Back to report -
121. Vinyals O, Blundell C, Lillicrap T, Wierstra D. 2016 Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 29.
Back to report -
122. Hackenberg M et al. Small data explainer – The impact of small data methods in everyday life. See https://royalsociety.org/news-resources/projects/disability-data-assistive-technology/ (accessed 15 April 2025).
Back to report -
123. Ibid.
Back to report -
124. Hackenberg M et al. Small data explainer – The impact of small data methods in everyday life. See https://royalsociety.org/news-resources/projects/disability-data-assistive-technology/ (accessed 15 April 2025).
Back to report -
125. Nihal R A, Broti N M. 2023 A Few-Shot Approach to Sign Language Recognition: Can Learning One Language Enable Understanding of All? In: Lu H, Blumenstein M, Cho S-B, Liu C-L, Yagi Y, Kamiya T (eds) Pattern Recognition, Springer Nature Switzerland. (doi:10.1007/978-3-031-47637-2_11)
Back to report -
126. Zhou H, Lu T, DeHaan K, Gowda M. 2024 ASLRing: American Sign Language Recognition with Meta-Learning on Wearables. 2024 IEEE/ACM Ninth International Conference on Internet-of-Things Design and Implementation (IoTDI), 203–214. (doi:10.1109/IoTDI61053.2024.00022)
Back to report -
127. Jain D et al. 2022 ProtoSound: A Personalized and Scalable Sound Recognition System for Deaf and Hard-of-Hearing Users. CHI Conference on Human Factors in Computing Systems, 1–16. (doi:10.1145/3491102.3502020)
Back to report -
128. Paola A D, Muraro S, Marinelli R, Pilato C. 2024 Foundation Models in Augmentative and Alternative Communication: Opportunities and Challenges. arXiv:2401.08866. (doi:10.48550/arXiv.2401.08866)
Back to report -
129. Pereira J A, Pereira J A, Zanchettin C, do Nascimento Fidalgo R. 2024 PrAACT: Predictive Augmentative and Alternative Communication with Transformers. Expert Systems with Applications 240. (doi:10.1016/j.eswa.2023.122417)
Back to report -
130. Rekimoto J. 2023 WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, 1–12. (doi:10.1145/3544548.3580706)
Back to report -
131. Wen L Y, Morrison C, Grayson M, Marques R F, Massiceti D, Longden C, Cutrell E. 2024 Find My Things: Personalized Accessibility through Teachable AI for People who are Blind or Low Vision. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 1–6. (doi:10.1145/3613905.3648641)
Back to report -
132. Wald, M. (2021). AI Data-Driven Personalisation and Disability Inclusion. Frontiers in Artificial Intelligence 3. (doi:10.3389/frai.2020.571955)
Back to report -
133. Igual R, Medrano C, Plaza I. 2013 Challenges, issues and trends in fall detection systems. Biomed. Eng. OnLine 12, 66. (doi: 10.1186/1475-925X-12-66)
Back to report -
134. Hackenberg M et al. Small data explainer – The impact of small data methods in everyday life. See https://royalsociety.org/news-resources/projects/disability-data-assistive-technology/ (accessed 15 April 2025).
Back to report -
135. Pothuganti S. 2018 Review on over-fitting and under-fitting problems in Machine Learning and solutions. Int J Adv Res Electr Electron Instrum Eng 7, 3692–3695.
Back to report -
136. Vollmer S et al. 2020 Machine learning and artificial intelligence research for patient benefit: 20 critical questions on transparency, replicability, ethics, and effectiveness. bmj 368.
Back to report -
137. Champieux R et al. 2023 Ten simple rules for organizations to support research data sharing. PLOS Computational Biology 19, e1011136. (doi:10.1371/journal.pcbi.1011136)
Back to report -
138. Wilkinson M D et al. 2016 The FAIR guiding principles for scientific data management and stewardship. Sci. Data 3, 1–9.
Back to report -
139. Accessable. New AccessAble Survey highlights crucial insights on accessibility challenges and the need for action. See https://www.accessable.co.uk/articles/accessibility-and-you-survey-results-2023-2024 (accessed 15 April).
Back to report -
140. Warbox Creative. Leading the way in digital inclusion: Top apps for accessibility in 2024. See https://warboxcreative.co.uk/app-accessibility-2024/ (accessed 15 April 2025).
Back to report -
141. AccessibleGO. See https://accessiblego.com/ (accessed 15 April 2025).
Back to report -
142. Wheelmap. See https://wheelmap.org/ (accessed 15 April 2025).
Baclk to report -
143. Berti A. 2019 The digital twin: Creating virtual airport tours with Ocean3D. Airport Technology. 15 August 2019. See https://www.airport-technology.com/features/virtual-reality-at-airports/ (accessed 15 April 2025).
Back to report -
144. Spoke. See https://www.thespokeapp.com/ (accessed 15 April 2025).
Back to report -
145. Royal National Institute of Blind People (RNIB). 2023 Inclusive Journeys: Improving the accessibility of public transport for people with sight loss. See https://www.rnib.org.uk/professionals/health-social-care-education-professionals/knowledge-and-research-hub/reports-and-insight/inclusive-journeys-improving-the-accessibility-of-public-transport-for-people-with-sight-loss/ (accessed 15 April 2025).)
Jump -
146. Waymap. See https://www.waymapnav.com/ (accessed 15 April 2025).
Back to report -
147. PictureLive. See https://www.picturelive.org/ (accessed 15 April 2025).
Back to report -
148. Kyrlitsias C, Christofi M, Michael-Grigoriou D, Banakou D, Ioannou A. 2020 A Virtual Tour of a Hardly Accessible Archaeological Site: The Effect of Immersive Virtual Reality on User Experience, Learning and Attitude Change. Frontiers in Computer Science 2. (doi:10.3389/fcomp.2020.00023)
Back to report -
149. Edwards T. 2024 Busiest London stations visually mapped by Google. BBC News. 5 December 2024. See https://www.bbc.co.uk/news/articles/c4g2d0x1098o (accessed 15 April 2025).
Back to report -
150. Transport for London. Transport for London and KeolisAmey Docklands trial new NaviLens technology at DLR stations. See https://tfl.gov.uk/info-for/media/press-releases/2023/july/transport-for-london-and-keolisamey-docklands-trial-new-navilens-technology-at-dlr-stations (accessed 15 April 2025).
Back to report -
151. Royal National Institute of Blind People (RNIB). NaviLens. See https://www.rnib.org.uk/living-with-sight-loss/assistive-aids-and-technology/navigation-and-communication/navilens/ (accessed 15 April 2025).
Back to report