SMReferences
[1]\fnmAlexander E. \surSiemenn
1]\orgdivDepartment of Mechanical Engineering, \orgnameMassachusetts Institute of Technology, \orgaddress\street77 Massachusetts Avenue, \cityCambridge, \postcode02139, \stateMassachusetts, \countryUSA 2]\orgdivResearch Laboratory of Electronics, \orgnameMassachusetts Institute of Technology, \orgaddress\street77 Massachusetts Avenue, \cityCambridge, \postcode02139, \stateMassachusetts, \countryUSA
A Self-Supervised Robotic System for Autonomous Contact-Based Spatial Mapping of Semiconductor Properties
Abstract
Integrating robotically driven contact-based material characterization techniques into self-driving laboratories can enhance measurement quality, reliability, and throughput. While deep learning models support robust autonomy, current methods lack reliable pixel-precision positioning and require extensive labeled data. To overcome these challenges, we propose an approach for building self-supervised autonomy into contact-based robotic systems that teach the robot to follow domain expert measurement principles at high-throughputs. Firstly, we design a vision-based, self-supervised convolutional neural network (CNN) architecture that uses differentiable image priors to optimize domain-specific objectives, refining the pixel precision of predicted robot contact poses by 20.0% relative to existing approaches. Secondly, we design a reliable graph-based planner for generating distance-minimizing paths to accelerate the robot measurement throughput and decrease planning variance by 6x. We demonstrate the performance of this approach by autonomously driving a 4-degree-of-freedom robotic probe for 24 hours to characterize semiconductor photoconductivity at 3,025 uniquely predicted poses across a gradient of drop-casted perovskite film compositions, achieving throughputs over 125 measurements per hour. Spatially mapping photoconductivity onto each drop-casted film reveals compositional trends and regions of inhomogeneity, valuable for identifying manufacturing process defects. With this self-supervised CNN-driven robotic system, we enable high-precision and reliable automation of contact-based characterization techniques at high throughputs, thereby allowing the measurement of previously inaccessible yet important semiconductor properties for self-driving laboratories.
keywords:
autonomous robotics, pose prediction, self-supervised, spatial differentiability, reliable path planning, high-throughput, contact-based characterization
Contact-based characterization techniques such as contact profilometry, four-point probes, and nanoindentation, among many others, are valuable tools in quantifying materials’ surface [1, 2, 3, 4, 5] and electrical properties [6, 7, 8, 9, 10, 11, 12]. By integrating deep learning and autonomous robotics into these methods, we can improve the reliability and quality of measurements [13, 14, 15], relieving researchers from the burden of constantly monitoring experiments to ensure optimal performance. However, integrating autonomy into these contact-based methods of characterization faces the challenges of reliably predicting high-precision contact positions [16, 17, 18], establishing high-throughput feedback control [19, 20], and collecting large labeled training datasets [21, 22]. Deep learning-controlled robotic measurement of material and molecular properties has been widely implemented across a range of optical characterization techniques [23, 24, 25, 26, 27, 28, 29], due to their non-contact nature, which simplifies mechanical complexity and increases data acquisition throughput compared to contact-based methods. For example, Su et al. [6] propose a robotic non-contact atomic force microscopy (nc-AFM) probe with positional control driven by the Faster Region-based convolutional neural network (Faster R-CNN) [30], which detects a general spatial bounding box around a target molecule for fast real-time positioning. However, the general bounding box is insufficient for orienting the pose of the robot to pixel-precise positions, and the model may require collecting additional labeled image data for fine-tuning [6]. Instead, the implementation of a self-supervised approach designed specifically for spatial positioning tasks has the potential to address these challenges, resulting in high-precision predictive robotic control that more closely emulates the intuition of a human domain expert.
Here, we propose the design of a self-supervised and spatially differentiable convolutional neural network (SDCNN) for optimal pose prediction of contact-based robotic characterization systems and a reliable graph-based path planner to maximize measurement throughput across predicted poses. We utilize this SDCNN and path planner to precisely and autonomously control a 4-degree-of-freedom (4DOF) robot with a four-point probe end effector to make optimal contact with each film and measure photoconductivity without going out of bounds, i.e., a valid robot contact pose. Each film is drop-casted using only 4 L of chemical precursor, which allows us to maximize the number of combinatorial perovskite compositions explored but produces small-area films that are difficult to characterize using contact-based approaches. Hence, implementing spatial differentiability into a CNN enables the computer vision-segmented films to be used as shape priors in the loss function for the refinement of predictions directly in image space, transforming an unsupervised learning problem into a self-supervised one. We demonstrate the general-use nature of the proposed SDCNN on two different characterization tasks of perovskite semiconductors: (1) surface profilometry and (2) photoconductivity. The robot pose prediction performance is then evaluated across four metrics: (1) positional accuracy, (2) rotational accuracy, (3) valid pose generation, and (4) inference time. Our model’s performance across these metrics is compared to seven other CNN models with either conventional loss functions or robust loss functions from literature, such as Wing [31], Reverse Huber [32], and Barron [33], that are designed for spatial tasks. Our approach achieves a 20.0% improvement in the generation of valid poses, a 1.5% improvement in positional accuracy, and equivalent rotational accuracy and inference time compared to the robust loss functions from literature [31, 32, 33]. We develop a path planner to maximize the throughput of the autonomous characterization procedure by reliably generating distance-minimizing robot travel routes across predicted contact poses using stochastic noise. Generated path lengths are compared across four other planners from literature, such as A*, Christofides, Dijkstra, and a genetic algorithm. Our approach achieves a 5.0% decrease in median path length generation with a 529.1% tighter variance. These performance improvements of the SDCNN and planner approach enable the practical viability of our autonomous robot execution of contact-based measurements. We demonstrate this by autonomously characterizing photoconductivity curves of drop-casted perovskite semiconductors within 24 hours using the SDCNN-controlled 4DOF robotic system, achieving high throughputs of over 125 measurements per hour. These experimentally characterized photoconductivity values are then spatially mapped back onto each semiconductor film, resolving compositional trends of photoconductivity and identifying regions of non-uniformity where potential defects or degradation may exist. By achieving high-precision robotic spatial control for the characterization of semiconductors through self-supervised deep learning models and reliable path planning, we unlock new potential for integrating autonomous robotics into the semiconductor development and discovery pipeline, ultimately improving the reliability and quality of measurements at high throughputs without human supervision.
Results
Autonomous robotic control through vision and deep learning

Through optimal pose prediction from computer vision inputs, we autonomously drive a 4DOF robot with an illuminating four-point probe end effector to characterize the photoconductive properties of semiconductors spatially. Figure 2a illustrates this autonomous control pipeline. Firstly, semiconductor films are drop-casted offline from the autonomous feedback loop using an OpenTrons volumetric pipetter [34]. Then, an on-board camera takes an image of the semiconductors, , which is then rectified from the camera reference frame to the robot reference frame using a series of calibration matrices, :
(1) |
where rectifies from the camera to the image reference frame and then rectifies from the image to the robot reference frame. Once, in the same coordinate frame as the robot, the Fast Segment Anything Model (FastSAM) [35] is used to quickly find the edges of each drop-casted film, creating image segments, . Next, optimal poses are predicted directly onto the image segments using the proposed SDCNN model, an 8-layer CNN with an objective-based spatially differentiable loss function derived using domain information. The optimality of each pose is determined using as a prior in image space, which can be back-propagated into the network as a loss due to spatial differentiability. We develop a noisy Dijkstra’s planner to find a distance-minimizing path plan for traversing all predicted poses across all drop-casted films. Finally, the robot is controlled to each SDCNN-predicted pose to characterize the properties of the semiconductor at that location in space. The average positional and rotational accuracies of predicted poses by the SDCNN are shown to be higher than predictions by CNNs using existing loss functions (Fig. 2c), with improvements between 1.5% and 8.9%. Designing the SDCNN to predict poses with high positional accuracy and rotational uniqueness enables comprehensive spatial mapping of important measured semiconductor properties in the fewest number of poses.
The spatial mapping capabilities of this autonomous workflow can be generalized to function with different contact-based end effectors. Figure 2b highlights two characterization use cases: photoconductivity and surface profilometry. Photoconductivity is measured at each predicted pose by taking the difference between illuminated and dark current-voltage curves (Fig. 2b, left). The blue-shaded regions highlight the spread of the photoconductive properties across the area of a drop-casted methylammonium lead iodide (MAPbI3) perovskite film for a set of poses, predicted by the SDCNN to maximize the unique spatial area measured. These results indicate that photoconductivity is generally uniform across the area of this particular film. Conversely, the thickness of the film varies largely by tens of micrometers, measured using surface profilometry at the same spatially predicted contact poses (Fig. 2b, right). Utilizing different end effectors with the same driving SDCNN demonstrates the model’s generalizable nature for spatially resolving important properties of semiconductor materials in an autonomous fashion.
Spatial differentiability for optimal robot pose prediction

Spatial differentiation ensures the loss function is differentiable across image pixels, enabling back-propagation of image-based computations through SDCNN neurons. Here, we use spatial differentiability to convert unsupervised learning into self-supervised learning by creating a pixel-based loss function using domain information that accepts shape priors as inputs to refine a prediction. We aim to have the SDCNN predict a set of valid robot poses that will maximize the number of pixels making unique contact with the spatial area of each film. The image segments, , and predicted poses, , are the shape priors to the loss function. This method is valid for convex shapes in general and does not assume a perfect circle (Fig. S-5). These shapes are first passed through Gaussian filters to ensure they are smooth and differentiable before being passed to the loss function, generating, and (Fig. 3a). A predicted pose is considered a valid contact if all its pixels fall within the measurable area (non-zero valued region shown in Fig. 3b) of the differentiable segment , which can be tuned by modifying the standard deviation of the Gaussian filter, (Fig. 3a). If any predicted contact pose pixels fall within the background area (zero-valued region), the pose is considered invalid. The spatially differentiable loss function consists of a tunable weighted sum of two optimization objectives derived from domain information commonly followed when characterizing materials using contact-based methods: (1) maximize the coverage of all poses in the measurable area without overlap and (2) maximize the spatial and rotational uniqueness of all poses. By embedding this domain information as an objective-based loss function, the network learns to predict poses more aligned with the measurement intuition of a domain expert, improving the autonomous decision-making capabilities of the robot:
(2) |
where and are weights set to and is the composition of the differentiable segment pixels onto the differentiable pose pixels. and are the -pixel coordinates and the yaw-rotation angles, respectively, for unique differentiable poses. The objectives are negated to form a loss and are subject to the constraint of reducing the number of overlapping pixels between any two unique predicted poses.

Figure 4a illustrates this loss minimization procedure in image space. Before training, the network has a high loss since it has not learned how to use the shape priors, resulting in placing poses randomly. After training on computer vision-segmented semiconductor film borders, the network learns to place poses in unique positions within the measurable regions without overlap using the segments as priors to the objective-based loss function. Figure 4b compares the performance on the valid pose prediction task of our SDCNN against seven other CNN models that use existing loss functions: recent loss functions from literature designed for robust and spatial tasks (Reverse Huber [32], Wing [31], and Barron [33]) and conventional loss functions (mean squared error (MSE), mean absolute error (MAE), Huber, and Poisson negative log-likelihood). Corresponding photoconductivity curves are experimentally measured on the perovskite film using the 4DOF robotic system for each valid predicted pose. The models using existing loss functions tend to cluster predicted poses tightly together, often resulting in close or partial overlaps. In contrast, our approach encourages predictions to be more spread apart, effectively utilizing the full segment prior, while staying within the measurable area. Furthermore, in addition to tight clustering, as shown in Fig. 4b, the existing models also have lower positional accuracy (Fig. 2c), resulting in fewer valid poses being successfully generated for this subset of nine perovskite films, compared to our spatially differentiable model.
In Fig. 4c (middle), we expand this analysis to the full set of 35 films and perform 100 replicate trials of valid pose generation. Across these 100 trials, we demonstrate that our SDCNN model achieves improvements of 20.0% and 16.2% in the median success rate of generating valid poses over robust loss functions and conventional loss functions, respectively. Given these performance improvements, the inference time of our loss function is negligible compared to the hardware response time, increasing by only 2.4 ns, relative to the slowest tested loss function in our evaluation, Reverse Huber (Fig. 4c, right). Hence, utilizing spatial differentiability in a neural network for pose prediction tasks improves the reliability of autonomous robotics by successfully generating valid predictions without sacrificing compute speed while also reducing the data labeling burden through self-supervision.
Reliable graph-based robot path planning
A reliable path planner consistently generates optimal path plans with low variance across multiple executions of a given task. Optimality is defined here as minimizing the total travel distance of the robotic probe to execute a plan. Hence, a planner that can reliably generate time-efficient plans for the robotic probe increases the overall system throughput across autonomous experimental cycles. The planning task at hand is to move from a fixed start node and visit all other nodes in the graph exactly one time while accumulating the shortest travel distance. Each pose predicted by the SDCNN represents a node in a graph, where all nodes together form the full graph. This problem is akin to the classic Traveling Salesman Problem (TSP), which is an NP-hard optimization problem [36]. Our application differs from the traditional TSP such that we relax the constraint of returning back to the start node – we define this as an Open Loop Traveling Salesman Problem (OTSP) [37].

Several methods exist for generating global approximations of the TSP and OTSP, including genetic algorithms (GA) [38], A* [39], Christofides algorithm [40], and Dijkstra’s algorithm [41]. GAs introduce randomness into the optimization of distance-minimizing graphs by iteratively combining and mutating parent graphs to create new offspring graphs. The A* algorithm uses a distance heuristic, , estimating the cost from the current node to the target node to guide the search and find the shortest path. The Christofides algorithm uses the triangle inequality to connect all nodes while minimizing added distance, ensuring the total path distance does not exceed x the optimal path length. Dijkstra’s algorithm calculates the distances, , between a current node and all other unvisited nodes to build a graph that connects all nodes together with the shortest distances. However, for our task, predicted contact poses, i.e. nodes, are clustered within the area of each semiconductor film, resulting in these existing methods having trouble finding efficient paths. Redundant overlaps and loops create inefficiencies that increase the total travel distance and path variance across multiple executions (Fig. 5a–d). To overcome this challenge, we introduce a noisy Dijkstra’s algorithm that blends the distance-minimizing graph-building procedure from Dijkstra’s algorithm with the stochastic process of the GA by adding noise, , to each distance value to minimize the total travel distance over several generations, generating more reliable path plans without overlaps or loops (Fig. 5e–f):
(3) |
where is the total length of a generated path, , in the graph. is the length of each edge in the graph and is the added stochastic noise to each edge length, where each is independently sampled for each edge from a uniform distribution within of the edge length.
In Fig. 5g, we compare the generated total path lengths and GPU compute time for all five methods across 115 uniquely SDCNN-predicted graphs for arrays of 35 drop-casted gradients of semiconductor films. Through stochastically minimizing total path length with the addition of edge length noise, our proposed noisy Dijkstra approach reliably generates direct and non-overlapping distance-minimizing paths for the OTSP task. Across 115 graphs, noisy Dijkstra’s algorithm achieves a median total path length of mm – a % improvement over standard Dijkstra’s algorithm – with a variance of mm2 – a % improvement over the Christofides algorithm, which are the highest performing methods tested from literature for these metrics. Figure 5f illustrates the robot following an efficient path plan generated by this noisy Dijkstra approach across the graph of SDCNN-predicted contact poses. Furthermore, for the same number of generations, both noisy Dijkstra and the GA achieve comparable GPU compute times of seconds per path and seconds per path, respectively, despite noisy Dijkstra improving median path length by % and variance by % relative to the GA. These results demonstrate that our noisy Dijkstra’s algorithm is both an efficient and reliable path-planning approach for autonomous contact-based robotic measurement, outperforming existing methods in generating shorter paths with tighter variance while maintaining comparable computational speeds.
Robotic spatial mapping of photoconductivity with autonomy
Coupling our developed self-supervised SDCNN for pose prediction with the noisy Dijkstra path planner, this 4DOF robot – using an illuminating four-point probe end effector – enables high-throughput photoconductivity characterization by emulating expert intuition in selecting optimal contact points and generating efficient path plans without human intervention. This implementation of autonomy enables the robot to make domain-informed decisions about where to contact each material with its end effector and in what order to make these contacts, ultimately improving the reliability, quality, and throughput of the measurements. As many semiconducting materials such as solar cells and light-emitting diodes (LEDs) can now be rapidly synthesized and manufactured with self-driving laboratories [42, 43, 44, 45, 46], it is critical to ensure accompanying measurement techniques also increase in throughput while upholding human-level quality and reliability.

To demonstrate measurement performance, we run the developed robotic system autonomously for 24 hours in continuous operation without human intervention to measure the photoconductivity of drop-casted gradients of methylammonium lead bromide (MAPbBr3) to methylammonium lead iodide (MAPbI3) mixed-halide perovskite films, MAPb(Br1-xIx)3. Over the course of this 24-hour campaign, unique photoconductivity measurements are taken by the SDCNN-controlled robotic system, resulting in a characterization throughput of over 125 measurements per hour. At each SDCNN-predicted pose, the photocurrent, , is measured as a function of voltage, , by taking the difference between the illuminated and dark current-voltage curves. Then, photoconductance, , can be characterized at each pose by computing the slope of each photocurrent-voltage curve:
(4) |
where is the current measured under illumination and is the current measured in the dark. Figure 6a shows all characterized values of as a scatter plot across the gradient of drop-casted perovskites. The distribution of along the -axis for each composition illustrates the spatial variance of photoconductance across the area of each film with median values shown as black diamonds. An increasing trend in is observed (dashed curve in Fig. 6a) as the composition shifts from MAPbBr3 to MAPbI3 under broad-band white light illumination. This is consistent with the decreasing bandgaps of the corresponding perovskite compositions from 2.3 to 1.6 eV [47], assuming similar thicknesses between films. These results confirm that the robotic system not only achieves a high measurement throughput but also reliably captures the expected trends in photoconductivity across composition gradients.
Figure 6b displays the full experimentally measured photocurrent-voltage curves, measured at each predicted pose. The spatially characterized curves are highlighted for three perovskite films within the MAPb(Br1-xIx)3 gradient: bromine-rich (Fig. 6c), mixed (Fig. 6d), and iodine-rich (Fig. 6e). Data collected across several spatially distinct contact points for a given film allows us to generate a spatial map of experimental values using a simple Gaussian interpolation (Fig. 6f, bottom). Detailed spatial mapping is critical for identifying defects or non-uniformities in material synthesis, which can significantly affect device performance. For example, based on the trend observed in Fig. 6a, we expect iodine-rich films to have higher values across the area of the film area with some expected spatial variation. However, in certain instances of iodine-rich films (Fig. 6g), we observe regions of unexpectedly low , likely induced by early degradation or pinhole defects in the film. Hence, by rapidly resolving spatial properties and accelerating the detection of critical non-uniformities, the developed autonomous and high-throughput robotic characterization system capabilities ensure that measurement techniques keep pace with the rapid synthesis of semiconductors in self-driving laboratories while upholding human-level quality and reliability needed to identify performance-limiting defects [48, 49].
Discussion
In this paper, we develop an approach to enable autonomous and domain-informed characterization of semiconductor materials using contact-based robotics to improve measurement reliability, quality, and throughput. Firstly, we develop a self-supervised spatially differentiable convolutional neural network (SDCNN) for reliable and accurate pose prediction of contact-based autonomous robotics that optimizes predictions based on domain information to more closely emulate a human expert’s measurement intuition. We demonstrate that this self-supervised SDCNN improves the generation of valid robot contact poses by up to % with improved positioning and spatial coverage compared to existing supervised methods [31, 32, 33]. Secondly, we develop a noisy Dijkstra’s algorithm graph-based path planner that combines distance-minimizing graph-building approaches with stochastic genetic approaches to generate efficient measurement routes that reliably minimize the total robot travel distance. We demonstrate that this noisy Dijkstra planner generates efficient measurement routes with few to no overlaps or loops while decreasing planning variance across experiments by over 6x compared to existing graph-based methods [50, 39, 40, 41]. Through the coupling of these developed methods, we perform autonomous characterization of over photoconductivity curves from drop-casted mixed-halide MAPb(Br1-xIx)3 perovskite semiconductor films in 24 hours using a 4DOF robotic probe, achieving high throughputs of over 125 measurements per hour without human intervention. By using the domain-informed SDCNN to predict a minimal set of robot contact poses that adequately cover each uniquely shaped film, we quickly obtain a high-resolution spatial map of experimentally measured material properties. Rapidly resolving spatial properties offers early insight into the formation of defects and degradation, providing a tool to quickly and autonomously detect performance issues in semiconductor fabrication lines and self-driving laboratories [48, 49]. Altogether, this approach advances the methods of high-precision and efficient robotic spatial control in material characterization, taking a step toward fully autonomous integration while ensuring high-throughput and reliable measurements without human supervision.
Although the proposed robotic system is designed for autonomous operation from input image to experimental measurement, its calibration and swapping of end effectors are currently limited to manual operation (Fig. S-3). This manual calibration process is often tedious and relies heavily on the skill and precision of the user, leading to varying results and potential inconsistencies in experimental outcomes [16]. To minimize this source of variance and enhance the reliability of the robotic system, automated calibration techniques could be employed in future developments. For instance, automated bed leveling and height mapping are widely used in current 3D printing systems to ensure reliable performance between prints or tool changes [51], utilizing sensors and control algorithms to detect and compensate for misalignments. Integrating similar methods, such as vision-based calibration systems [52], or machine learning for calibration tasks [53, 54], could further enhance precision and adaptability. These advancements would reduce dependency on user expertise, improve overall efficiency, and broaden the system’s applicability to quantify a wider range of significant material properties.
Moreover, the number of predicted robot poses by the SDCNN is currently limited to the output vector length specified during the training procedure. This means that changing the number of output poses requires retraining the entire model, which can be time-consuming. To address this limitation, future developments may involve incorporating conditional rules through mixtures of experts (MoE) [55] or transformer-based architectures [56, 57], which facilitate the generation of a conditional number of poses without requiring complete retraining. Incorporating dynamic neural networks [58] and conditional variational autoencoders [59] could further enhance the model’s ability to adaptively generate poses tailored to specific experimental requirements. Additionally, integrating techniques from active learning [60] could enable the model to selectively update itself with new data, reducing the need for extensive retraining. These future works aim to extend the accessibility of the developed system to users without domain expertise in robotics while broadening the versatility of the applied deep learning models.
With the developed self-supervised SDCNN model and graph-based noisy Dijkstra path planner integrated into a 4DOF robotic system, we have demonstrated the implementation of reliable autonomy with minimal data overhead into the contact-based characterization of semiconductor films, measuring at high throughputs of over 125 photoconductivity curves per hour. Our proposed method improves the automated quantification of critical surface and electrical material properties, addressing prior challenges in achieving reliable and fast automation. This advancement in coupling deep learning and robotics for materials science takes a key step toward improving and accelerating the tools and methods for autonomous spatial characterization of critical semiconductor properties, in turn, galvanizing the pipeline of self-driving materials research.
Methods
Perovskite material preparation
To prepare the MAPb(Br1-xIx)3 gradient of perovskite semiconductors, we use OpenTrons mixing and drop-casting of 0.6M MAPbI3 and 0.6M MAPbBr3 precursor solutions. The MAPbI3 precursor is prepared using a 4:1 ratio of dimethylformamide (DMF, 99.8%, Sigma-Aldrich) to dimethylsulfoxide (DMSO, 99.9%, Sigma-Aldrich) solvent and then dissolving a 1:1 ratio of methylammonium iodide (MAI, 99.9%, Greatcell Solar Materials) to lead iodide (PbI2, 99.999% trace metal basis, Sigma-Aldrich) solutes into the solvent mixture using a vortex mixer. The MAPbBr3 precursor is prepared using a 4:1 ratio of DMF:DMSO solvent and then dissolving a 1:1 ratio of MAI to lead bromide (PbBr2, 99.999% trace metal basis, Sigma-Aldrich) solutes into the solvent mixture using a vortex mixer. Once the precursors are prepared, the OpenTrons pipettes gradated concentrations of each precursor into 35 smaller volume vials in serial. Then, mixing is induced for each of the 35 unique compositions by repeatedly aspirating and dispensing the fluid in the vials three times. Once mixed, 4 L of each solution is pipetted onto a glass slide that is pre-heated to 55∘C to form an array of individual films on the glass slide. Before heating, the glass slide was washed with isopropyl alcohol (IPA, 99.5%, VWR). After drop-casting, the glass slide is transferred to a pre-heated hot plate and annealed at 150∘C for 20 minutes.
Robot and end effector design
The 4DOF robot controlled by the SDCNN and used for the measurement of semiconductor properties is custom-built. The frame of the robot positioning system is built using 80/20 T-slotted aluminum extrusion rails. Three standard Nema 17 stepper motors and timing belts control the -positioning of the robot. A Bigtreetech Direct Octopus V1.1 control board with TMC2209 stepper motor drivers, each tuned for 0.75A of output current, are used to drive all Nema motors. The poses predicted by the SDCNN are converted to Marlin G-code, which is sent to the control board via Python serial communication to execute motion commands. The motor that controls the -positioning is affixed to a 3D-printed chassis and uses a ball screw to drive the -positioning of the end effector mount. The last motor that controls the -positioning (yaw) is a Nema 17 pancake motor affixed to a 3D-printed end effector mount. To this mount, an Ossila four-point probe head is affixed. We design an anti-cantilever attachment for the neck of the probe using rigid stereolithography 3D printing to mitigate -positioning drift. Attached to the head of the probe is a 3D-printed mount for three high-powered LEDs used to angle light to measure the of the film in contact. The LED mount is designed to maximize uniform spread of light while minimizing shading, in turn, improving the consistency of measured .
Spatially differentiable loss function construction
The spatially differentiable loss function accepts shapes as priors to refine predictions. However, to maintain differentiability within the spatial image domain, each shape’s edge must be smoothed. The shapes that get passed through the loss function are the computer vision segmentations of the semiconductor films, , and the set of poses predicted by the SDCNN, . To maintain differentiability, the pixels of these shapes pass through a 2D Gaussian filter:
(5) |
where is the standard deviation of the Gaussian and are the non-differentiable -pixels of a segment. A composition is created by superimposing the predicted poses directly onto a differentiable segment:
(6) |
where is the sigmoid function, used as a soft threshold for placement of the poses onto the segment, are the differentiable -pixels of a segment, and are the non-differentiable -pixels of a predicted pose. With this composition, differentiability within pixel space is ensured, and now all differentiable computations performed on the image can be backpropagated to the network weights during training. The loss function that we aim to minimize to train the network is expressed in Eq. 2.
Neural network architecture and training
We evaluate the performance of 8 different CNN models on the pose prediction task, each with the same 8-layer network architecture but with a different loss function: SDCNN (ours), robust spatial loss functions from literature (Wing [31], Reverse Huber [32], Barron [33]), and conventional loss functions (MSE, MAE, Poisson, Huber). Our 8-layer network consists of the following architecture: (1) convolution with batch normalization, (2) spatial attention mechanism, (3) convolution with batch normalization, (4) max pooling, (5) 50% dropout, (6) fully connected (FC) with neurons, (7) FC with neurons, (8) FC with neurons, for the (1) -coordinate midpoint, (2) -coordinate midpoint, and (3) rotation angle, , of -number of poses: (Fig 3b, top). The spatial attention module, derived from the Convolutional Block Attention Module (CBAM) [61], is placed at the beginning of the network to emphasize or suppress large-scale geometric features of the input images to help place contact poses within the film boundaries. Our self-supervised SDCNN model is trained on 8,500 augmented images of experimentally synthesized perovskite drop-casted films with an 80/20 training-validation split. The spatially differentiable loss function directly transforms the input predicted poses to the optimization objective in Eq. 2 to minimize. For the remaining 7 CNNs, a set of labels is generated for the 8,500 training images using a stochastic process. Each label is generated by inputting randomly generated poses into Eq. 2 for every image. The pose with the lowest loss, , becomes the image label for training:
(7) |
Although this trial-and-error process is slow, it generates effective data labels for benchmark purposes using the same loss function construction as our SDCNN but without spatial differentiability. Thus, enabling meaningful comparisons between model results. Figure 4c (left) shows an example of the performance of this stochastic process for generating labeled image data.
Path planning experiments
Path plans are generated by five methods and tested across 115 unique graphs. Each graph consists of new contact poses predicted by the SDCNN using computer vision-segmented images of arrays of 35 drop-casted mixed-halide MAPb(Br1-xIx)3 perovskite semiconductor films from different experimental cycles. The five tested algorithms include a genetic algorithm (GA), A*, Christofides algorithm, Dijkstra’s algorithm, and our proposed noisy Dijkstra’s algorithm. The GA is implemented using the built-in functions available from the scikit-opt optimization package in Python [50], A* is implemented based on [39], Christofides algorithm is implemented based on [40], and Dijkstra’s algorithm is implemented based on [41]. However, each approach has been slightly modified for solving an OTSP rather than a TSP by relaxing the constraint of the generated route returning to the start node. The GA hyperparameters have been hand-tuned to generations, a population size of , and a mutation rate of % to produce optimal results on the planning task of minimizing total travel distance. Similarly, the hyperparameters of the proposed noisy Dijkstra’s algorithm have been hand-tuned to generations with a noise level of .
Photoconductivity characterization
To characterize the photoconductive properties of each perovskite film, a Python-controlled Keithley 2425 source meter measures the resultant current from our SDCNN-driven robotic system with an illuminating four-point probe end effector. The current is measured at 40 unique voltage steps across a -40 V to 40 V voltage sweep for each contact pose to capture detailed current-response curves. This sweep is repeated twice for each contact, once in the dark and once under illumination to measure the photocurrent, (Eq. 4). Attached to the gold-tipped four-point probe end effector is a 3D-printed LED mount (Fig. S-4), which is controlled using Python commands sent to an Arduino microcontroller with solid-state relays. Illumination is provided by probe-mounted high-power white LEDs positioned 4 mm above the film surface. This setup ensures consistent and uniform lighting at an intensity of approximately 200 mW cm-2. The 3D-printed LED mount for the probe orients the LEDs to maximize the distribution of uniform light across the measurement area while avoiding shading effects. Additionally, the LEDs have a large viewing angle of 120∘ to optimize the overlap of light beams, further improving the light distribution uniformity during each measurement.
Data Availability
All result files and model weights have been deposited in the OSF database under accession code: https://osf.io/sdy7k.
Code Availability
All code used to develop the SDCNN models is available publicly with complete working examples on GitHub: https://github.com/PV-Lab/SDCNN.
References
- \bibcommenthead
- [1] Piegari, A. & Masetti, E. Thin film thickness measurement: a comparison of various techniques. Thin solid films 124, 249–257 (1985).
- [2] Brown, C. A. & Savary, G. Describing ground surface texture using contact profilometry and fractal analysis. Wear 141, 211–226 (1991).
- [3] Zhu, W., Hughes, J. J., Bicanic, N. & Pearce, C. J. Nanoindentation mapping of mechanical properties of cement paste and natural rocks. Materials characterization 58, 1189–1198 (2007).
- [4] Minor, A. M. et al. A new view of the onset of plasticity during the nanoindentation of aluminium. Nature materials 5, 697–702 (2006).
- [5] Custance, O., Perez, R. & Morita, S. Atomic force microscopy as a tool for atom manipulation. Nature nanotechnology 4, 803–810 (2009).
- [6] Su, J. et al. Intelligent synthesis of magnetic nanographenes via chemist-intuited atomic robotic probe. Nature Synthesis 3, 466–476 (2024).
- [7] Ebbesen, T. et al. Electrical conductivity of individual carbon nanotubes. Nature 382, 54–56 (1996).
- [8] Wang, Y. et al. Probing photoelectrical transport in lead halide perovskites with van der waals contacts. Nature Nanotechnology 15, 768–775 (2020).
- [9] Bash, D. et al. Accelerated automated screening of viscous graphene suspensions with various surfactants for optimal electrical conductivity. Digital Discovery 1, 139–146 (2022).
- [10] Chandra, H. et al. Open-source automated mapping four-point probe. Materials 10, 110 (2017).
- [11] Shimanovich, K. et al. Four-point probe electrical resistivity scanning system for large area conductivity and activation energy mapping. Review of scientific instruments 85 (2014).
- [12] Sun, L., Wang, J. & Bonaccurso, E. Conductivity of individual particles measured by a microscopic four-point-probe method. Scientific reports 3, 1991 (2013).
- [13] Soori, M., Arezoo, B. & Dastres, R. Artificial intelligence, machine learning and deep learning in advanced robotics, a review. Cognitive Robotics 3, 54–70 (2023).
- [14] Chen, A. I., Balter, M. L., Maguire, T. J. & Yarmush, M. L. Deep learning robotic guidance for autonomous vascular access. Nature Machine Intelligence 2, 104–115 (2020).
- [15] Hippalgaonkar, K. et al. Knowledge-integrated machine learning for materials: lessons from gameplaying and robotics. Nature Reviews Materials 8, 241–260 (2023).
- [16] Nejat, G. & Benhabib, B. High-precision task-space sensing and guidance for autonomous robot localization. 2003 IEEE International Conference on Robotics and Automation 1, 1527–1532 (2003).
- [17] Ünal, İ. & Topakci, M. International Journal of Advanced Robotic Systems 12, 194 (2015).
- [18] Li, R. & Qiao, H. A survey of methods and strategies for high-precision robotic grasping and assembly tasks—some new trends. IEEE/ASME Transactions on Mechatronics 24, 2718–2732 (2019).
- [19] Leveziel, M., Haouas, W., Laurent, G. J., Gauthier, M. & Dahmouche, R. Migribot: A miniature parallel robot with integrated gripping for high-throughput micromanipulation. Science Robotics 7, eabn4292 (2022).
- [20] Kuwata, Y. et al. Real-time motion planning with applications to autonomous urban driving. IEEE Transactions on control systems technology 17, 1105–1118 (2009).
- [21] Sünderhauf, N. et al. The limits and potentials of deep learning for robotics. The International journal of robotics research 37, 405–420 (2018).
- [22] Thompson, N. C., Greenewald, K., Lee, K. & Manso, G. F. The computational limits of deep learning. arXiv preprint arXiv:2007.05558 10 (2020).
- [23] Rapp, J. T., Bremer, B. J. & Romero, P. A. Self-driving laboratories to autonomously navigate the protein fitness landscape. Nature chemical engineering 1, 97–107 (2024).
- [24] Wang, P. et al. Data-driven process characterization and adaptive control in robotic arc welding. CIRP Annals 71, 45–48 (2022).
- [25] Siemenn, A. E. et al. Using scalable computer vision to automate high-throughput semiconductor characterization. Nature Communications 15, 4654 (2024).
- [26] Omidvar, M. et al. Accelerated discovery of perovskite solid solutions through automated materials synthesis and characterization. Nature Communications 15, 6554 (2024).
- [27] Zhao, H. et al. A robotic platform for the synthesis of colloidal nanocrystals. Nature Synthesis 2, 505–514 (2023).
- [28] Azizi, S. et al. Autonomous hyperspectral characterisation station: Robot aided measuring of polymer degradation. IEEE Transactions on Automation Science and Engineering (2024).
- [29] Mahjour, B. et al. Rapid planning and analysis of high-throughput experiment arrays for reaction discovery. Nature Communications 14, 3924 (2023).
- [30] Siradjuddin, I. A., Muntasa, A. et al. Faster region-based convolutional neural network for mask face detection. 2021 5th international conference on informatics and computational sciences (ICICoS) 282–286 (2021).
- [31] Feng, Z.-H., Kittler, J., Awais, M., Huber, P. & Wu, X.-J. Wing loss for robust facial landmark localisation with convolutional neural networks. Proceedings of the IEEE conference on computer vision and pattern recognition 2235–2245 (2018).
- [32] Zwald, L. & Lambert-Lacroix, S. The berhu penalty and the grouped effect (2012).
- [33] Barron, J. T. A general and adaptive robust loss function. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition 4331–4339 (2019).
- [34] McGee, J. Screening robotics and automation. Journal of Biomolecular Screening 19, 1131–1132 (2014).
- [35] Zhao, X. et al. Fast segment anything (2023).
- [36] Jünger, M., Reinelt, G. & Rinaldi, G. The traveling salesman problem. Handbooks in operations research and management science 7, 225–330 (1995).
- [37] Chieng, H. H. & Wahid, N. A performance comparison of genetic algorithm’s mutation operators in n-cities open loop travelling salesman problem. Recent Advances on Soft Computing and Data Mining: Proceedings of The First International Conference on Soft Computing and Data Mining (SCDM-2014) Universiti Tun Hussein Onn Malaysia, Johor, Malaysia June 16th-18th, 2014 89–97 (2014).
- [38] Potvin, J.-Y. Genetic algorithms for the traveling salesman problem. Annals of Operations Research 63, 337–370 (1996).
- [39] Hart, P., Nilsson, N. & Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics 4, 100–107 (1968).
- [40] Christophides, N. Worst-case analysis of a new heuristic for the traveling salesman problem. Proc. Symposium on New Directions and Recent Results in Algorithms and Complexity (1976).
- [41] Dijkstra, E. W. A note on two problems in connexion with graphs. Numerische mathematik 1, 269–271 (1959).
- [42] MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin-film materials. Science Advances 6, eaaz8867 (2020).
- [43] Langner, S. et al. Beyond ternary opv: high-throughput experimentation and self-driving laboratories optimize multicomponent systems. Advanced Materials 32, 1907801 (2020).
- [44] Bag, M. et al. Rapid combinatorial screening of inkjet-printed alkyl-ammonium cations in perovskite solar cells. Materials Letters 164, 472–475 (2016).
- [45] Luo, S., Li, T., Wang, X., Faizan, M. & Zhang, L. High-throughput computational materials screening and discovery of optoelectronic semiconductors. Wiley Interdisciplinary Reviews: Computational Molecular Science 11, e1489 (2021).
- [46] Son, K. H., Singh, S. P. & Sohn, K.-S. Discovery of novel phosphors for use in light emitting diodes using heuristics optimization-assisted combinatorial chemistry. Journal of Materials Chemistry 22, 8505–8511 (2012).
- [47] Jang, D. M. et al. Reversible halide exchange reaction of organometal trihalide perovskite colloidal nanocrystals for full-range band gap tuning. Nano letters 15, 5191–5199 (2015).
- [48] Wieghold, S., Morishige, A. E., Meyer, L., Buonassisi, T. & Sachs, E. M. Crack detection in crystalline silicon solar cells using dark-field imaging. Energy Procedia 124, 526–531 (2017).
- [49] Kunze, P., Rein, S., Hemsendorf, M., Ramspeck, K. & Demant, M. Learning an empirical digital twin from measurement images for a comprehensive quality inspection of solar cells. Solar RRL 6, 2100483 (2022).
- [50] Guofei9987. scikit-opt: Python library for optimization algorithms. https://github.com/guofei9987/scikit-opt/ (2023). Accessed: 2024-04-27.
- [51] Hofbauer, C., Aburaia, A., Stuja, K. & Aburaia, M. Automatic print bed leveling for industrial robot systems. Annals of DAAAM & Proceedings 34 (2023).
- [52] Enebuse, I. et al. A comparative review of hand-eye calibration techniques for vision guided robots. IEEE Access 9, 113143–113155 (2021).
- [53] Li, Z., Li, S. & Luo, X. Data-driven industrial robot arm calibration: a machine learning perspective. 2021 IEEE International Conference on Networking, Sensing and Control (ICNSC) 1, 1–6 (2021).
- [54] Kong, L.-B. & Yu, Y. Precision measurement and compensation of kinematic errors for industrial robots using artifact and machine learning. Advances in Manufacturing 10, 397–410 (2022).
- [55] Zhu, J. et al. Uni-perceiver-moe: Learning sparse generalist models with conditional moes (2022).
- [56] Vaswani, A. et al. Attention is all you need. Advances in Neural Information Processing Systems 30 (2017).
- [57] Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
- [58] Han, Y. et al. Dynamic neural networks: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 7436–7456 (2021).
- [59] Kingma, D. P., Mohamed, S., Jimenez Rezende, D. & Welling, M. Semi-supervised learning with deep generative models. Advances in neural information processing systems 27 (2014).
- [60] Baranes, A. & Oudeyer, P.-Y. Active learning of inverse models with intrinsically motivated goal exploration in robots. Robotics and Autonomous Systems 61, 49–73 (2013).
- [61] Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. Cbam: Convolutional block attention module. Proceedings of the European conference on computer vision (ECCV) 3–19 (2018).
Acknowledgements
The authors thank Julia Hsu for her contribution to the methodology of this research and Tianran Liu for providing perovskite materials to help calibrate measurements. The authors acknowledge funding support from: First Solar; Eni S.p.A. through the MIT Energy Initiative; University of Toronto’s Acceleration Consortium; and U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under the Solar Energy Technology Office (SETO) Award Number DE-EE0010503. This work made use of the MRSEC Shared Experimental Facilities at MIT, supported by the National Science Foundation under award number DMR-1419807.
Author Contributions
A.E.S. conceptualized the work. A.E.S., B.D., K.J., and T.B. designed the methodology. A.E.S. wrote the software. A.E.S., B.D., and F.S. prepared the experimental materials. A.E.S. and B.D. conducted experiments. A.E.S. performed the analysis. A.E.S. wrote the manuscript. All authors reviewed and edited the manuscript. B.D., K.J., and T.B. provided guidance.
Competing Interests
The authors declare no competing interests.
Supplementary Information:
A Self-Supervised Robotic System for Autonomous Contact-Based Spatial Mapping of Semiconductor Properties
-
Alexander E. Siemenn1∗, Basita Das1, Kangyu Ji1,2, Fang Sheng1, Tonio Buonassisi1
-
1Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
-
2Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
-
∗Corresponding author: asiemenn@mit.edu
Design
Robot control workflow

In this paper, we autonomously control a 4-degree-of-freedom (4DOF) robot with an illuminating four-point probe end effector to measure the photoconductivity of semiconductors. In Fig. S-1, shape priors are used to output optimal poses for the robot to move to and measure each material. A geometric transfer function is used to control the motion of the end effector about the -axis (yaw).
Robot calibration

Figure S-2 illustrates the mathematical procedure of achieving a target pose from an input set of coordinates. A geometric transformation is applied to ensure the contact point down the arm of the end effector meets the coordinates at the correct position in space:
(S-2) | ||||
(S-3) | ||||
(S-4) |
where , , and are the target coordinates, given the initial coordinates , , and and is the measured length of the end effector arm from the pivot point to the desired contact point.


To fully calibrate the robot positioning system, image-robot coordinate pairs are collected. These coordinate-pairs establish the robot’s location in space, given a set of image coordinates. Coordinate pairs are collected along the outer perimeter of the imaging field. Thus, the corners of the imaging field will align with the corners of the robot reference frame (Fig. S-3a). Due to lens effects and imperfect image rectification, there is still an image-robot coordinate mismatch if measured again after calibration. We measure 15 different image-robot coordinate pairs and generate a calibration mesh (Fig S-3b) to apply to the coordinates to ensure proper alignment of the robot within image space.
3D-printed LED mount


Fuse deposition modeling (FDM) 3D printing is used to create the LED mount for the four-point probe (Fig. S-4). Three LEDs are positioned as vertices to an isosceles triangle with the aim of maximizing illumination uniformity at the probe tips and minimizing shading. To minimize shading due to the probe tips, placing two LEDs, one in front and one behind the tips, would suffice. However, it is mechanically infeasible to place an LED behind the tips and still make contact with a material. Thus, we achieve this goal by positioning two LEDs to either side of the probe arm. To ensure uniform light distribution, these back LEDs are mounted farther away from the center point such that illumination intensity is not stronger, using the following relation:
(S-5) |
where and are the distances from the center point to either back LED, is the distance between the back LEDs, and is the distance from the center point to the front LED.
Experiments
Varying image inputs


The shape of the input image segments to the SDCNN governs the placement of poses. With the presented version of the model, the weights are trained on rounded, convex shapes due to the nature of predicting poses for drop-casted semiconductor films. However, when we take this pre-trained model and apply it to input shapes with different formats, we get varying results. Figure S-5 illustrates how well this SDCNN performs for input shapes varying in edge sharpness and convexity. Model predictions are augmented with poses randomly selected from four predictions for each input to increase randomness and demonstrate robustness to the task. We see that, in general, this SDCNN performs best on convex and rounded input shapes (upper left), as these are most similar to its training set. Robustness to edge sharpness is demonstrated along the horizontal axis, but model performance breaks down as input shapes become more non-convex along the vertical axis. Here, predicted poses start to drift towards the edges, with many falling outside of the shape boundary. These results highlight the robustness of the model to certain features but also show the importance of training the SDCNN using shape priors that more closely align with the expected testing conditions.
Material characterization

We design the photoconductivity end effector of the 4DOF robot to have variable illumination control. Although we only need dark and illuminated conditions to characterize photoconductance, we calibrate the measurement on all quantized illuminations. Figure S-6 illustrates the current response dependence on illumination intensity for a spun-coat FAPbI3 (formamidinium lead iodide) perovskite thin film. Six different LED illumination intensities are tested, as well as dark conditions. As the voltage supplied to the LEDs increases, the illumination intensity increases. As the illumination intensity increases, the photocurrent response of the FAPbI3 film increases.
We characterize the crystal phase of the semiconductor films drop-casted by the OpenTrons overhead volumetric pipetter used in this study. Figure S-7 illustrates 14 X-ray diffraction (XRD) traces measured from 14 of the total 35 drop-casted methylammonium lead bromide (MAPbBr3) to methylammonium lead iodide (MAPbI3) mixed-halide perovskite semiconductor films used in this study for characterizing photoconductance. We see a clear trend of the XRD peaks shifting along the MAPb(Br1-xIx)3 gradient, validating that the perovskites do form a gradient.
