-
Natural language processing on customer note data
Authors:
Andrew Hilditch,
David Webb,
Jozef Baca,
Tom Armitage,
Matthew Shardlow,
Peter Appleby
Abstract:
Automatic analysis of customer data for businesses is an area that is of interest to companies. Business to business data is studied rarely in academia due to the sensitive nature of such information. Applying natural language processing can speed up the analysis of prohibitively large sets of data. This paper addresses this subject and applies sentiment analysis, topic modelling and keyword extra…
▽ More
Automatic analysis of customer data for businesses is an area that is of interest to companies. Business to business data is studied rarely in academia due to the sensitive nature of such information. Applying natural language processing can speed up the analysis of prohibitively large sets of data. This paper addresses this subject and applies sentiment analysis, topic modelling and keyword extraction to a B2B data set. We show that accurate sentiment can be extracted from the notes automatically and the notes can be sorted by relevance into different topics. We see that without clear separation topics can lack relevance to a business context.
△ Less
Submitted 3 May, 2023;
originally announced May 2023.
-
The Benchmark M Dwarf Eclipsing Binary CM Draconis With TESS: Spots, Flares and Ultra-Precise Parameters
Authors:
David V. Martin,
Ritika Sethi,
Tayt Armitage,
Gregory J. Gilbert,
Romy Rodriguez Martinez,
Emily A. Gilbert
Abstract:
A gold standard for the study of M dwarfs is the eclipsing binary CM Draconis. It is rare because it is bright ($J_{\rm mag}=8.5$) and contains twin fully convective stars on an almost perfectly edge-on orbit. Both masses and radii were previously measured to better than $1\%$ precision, amongst the best known. We use 15 sectors of data from the Transiting Exoplanet Survey Satellite (TESS) to show…
▽ More
A gold standard for the study of M dwarfs is the eclipsing binary CM Draconis. It is rare because it is bright ($J_{\rm mag}=8.5$) and contains twin fully convective stars on an almost perfectly edge-on orbit. Both masses and radii were previously measured to better than $1\%$ precision, amongst the best known. We use 15 sectors of data from the Transiting Exoplanet Survey Satellite (TESS) to show that CM Draconis is the gift that keeps on giving. Our paper has three main components. First, we present updated parameters, with radii and masses constrained to previously unheard of precisions of $\approx 0.06\%$ and $\approx 0.12\%$, respectively. Second, we discover strong and variable spot modulation, suggestive of spot clustering and an activity cycle on the order of $\approx 4$ years. Third, we discover 163 flares. We find a relationship between the spot modulation and flare rate, with flares more likely to occur when the stars appear brighter. This may be due to a positive correlation between flares and the occurrence of bright spots (plages). The flare rate is surprisingly not reduced during eclipse, but one flare may show evidence of being occulted. We suggest the flares may be preferentially polar, which has positive implications for the habitability of planets orbiting M dwarfs.
△ Less
Submitted 11 January, 2024; v1 submitted 25 January, 2023;
originally announced January 2023.
-
The EBLM project X. Benchmark masses, radii and temperatures for two fully convective M-dwarfs using K2
Authors:
Alison Duck,
David V. Martin,
Sam Gill,
Tayt Armitage,
Romy Rodríguez Martínez,
Pierre F. L. Maxted,
Daniel Sebastian,
Ritika Sethi,
Matthew I. Swayne,
Andrew Collier Cameron,
Georgina Dransfield,
B. Scott Gaudi,
Michael Gillon,
Coel Hellier,
Vedad Kunovac,
Christophe Lovis,
James McCormac,
Francesco A. Pepe,
Don Pollacco,
Lalitha Sairam,
Alexandre Santerne,
Damien Ségransan,
Matthew R. Standing,
John Southworth,
Amaury H. M. J. Triaud
, et al. (1 additional authors not shown)
Abstract:
M-dwarfs are the most abundant stars in the galaxy and popular targets for exoplanet searches. However, their intrinsic faintness and complex spectra inhibit precise characterisation. We only know of dozens of M-dwarfs with fundamental parameters of mass, radius and effective temperature characterised to better than a few per cent. Eclipsing binaries remain the most robust means of stellar charact…
▽ More
M-dwarfs are the most abundant stars in the galaxy and popular targets for exoplanet searches. However, their intrinsic faintness and complex spectra inhibit precise characterisation. We only know of dozens of M-dwarfs with fundamental parameters of mass, radius and effective temperature characterised to better than a few per cent. Eclipsing binaries remain the most robust means of stellar characterisation. Here we present two targets from the Eclipsing Binary Low Mass (EBLM) survey that were observed with K2: EBLM J0055-00 and EBLM J2217-04. Combined with HARPS and CORALIE spectroscopy, we measure M-dwarf masses with precisions better than 5%, radii better than 3% and effective temperatures on order 1%. However, our fits require invoking a model to derive parameters for the primary star. By investigating three popular models, we determine that the model uncertainty is of similar magnitude to the statistical uncertainty in the model fits. Therefore, whilst these can be considered benchmark M-dwarfs, we caution the community to consider model uncertainty when pushing the limits of precise stellar characterisation.
△ Less
Submitted 11 January, 2024; v1 submitted 22 August, 2022;
originally announced August 2022.
-
Revised Temperatures For Two Benchmark M-dwarfs -- Outliers No More
Authors:
David V. Martin,
Tayt Armitage,
Alison Duck,
Matthew I. Swayne,
Romy Rodríguez Martínez,
Ritika Sethi,
B. Scott Gaudi,
Sam Gill,
Daniel Sebastian,
Pierre F. L. Maxted
Abstract:
Well-characterised M-dwarfs are rare, particularly with respect to effective temperature. In this letter we re-analyse two benchmark M-dwarfs in eclipsing binaries from Kepler/K2: KIC 1571511AB and HD 24465AB. Both have temperatures reported to be hotter or colder by approximately 1000 K in comparison with both models and the majority of the literature. By modelling the secondary eclipses with bot…
▽ More
Well-characterised M-dwarfs are rare, particularly with respect to effective temperature. In this letter we re-analyse two benchmark M-dwarfs in eclipsing binaries from Kepler/K2: KIC 1571511AB and HD 24465AB. Both have temperatures reported to be hotter or colder by approximately 1000 K in comparison with both models and the majority of the literature. By modelling the secondary eclipses with both the original data and new data from TESS we derive significantly different temperatures which are not outliers. Removing this discrepancy allows these M-dwarfs to be truly benchmarks. Our work also provides relief to stellar modellers. We encourage more measurements of M-dwarf effective temperatures with robust methods.
△ Less
Submitted 22 August, 2022;
originally announced August 2022.
-
An application of machine learning techniques to galaxy cluster mass estimation using the MACSIS simulations
Authors:
Thomas J. Armitage,
Scott T. Kay,
David J. Barnes
Abstract:
Machine learning (ML) techniques, in particular supervised regression algorithms, are a promising new way to use multiple observables to predict a cluster's mass or other key features. To investigate this approach we use the \textsc{MACSIS} sample of simulated hydrodynamical galaxy clusters to train a variety of ML models, mimicking different datasets. We find that compared to predicting the clust…
▽ More
Machine learning (ML) techniques, in particular supervised regression algorithms, are a promising new way to use multiple observables to predict a cluster's mass or other key features. To investigate this approach we use the \textsc{MACSIS} sample of simulated hydrodynamical galaxy clusters to train a variety of ML models, mimicking different datasets. We find that compared to predicting the cluster mass from the $σ-M$ relation, the scatter in the predicted-to-true mass ratio is reduced by a factor of 4, from $0.130\pm0.004$ dex (${\simeq} 35$ per cent) to $0.031 \pm 0.001$ dex (${\simeq} 7$ per cent) when using the same, interloper contaminated, spectroscopic galaxy sample. Interestingly, omitting line-of-sight galaxy velocities from the training set has no effect on the scatter when the galaxies are taken from within $r_{200c}$. We also train ML models to reproduce estimated masses derived from mock X-ray and weak lensing analyses. While the weak lensing masses can be recovered with a similar scatter to that when training on the true mass, the hydrostatic mass suffers from significantly higher scatter of ${\simeq} 0.13$ dex (${\simeq} 35$ per cent). Training models using dark matter only simulations does not significantly increase the scatter in predicted cluster mass compared to training on simulated clusters with hydrodynamics. In summary, we find ML techniques to offer a powerful method to predict masses for large samples of clusters, a vital requirement for cosmological analysis with future surveys.
△ Less
Submitted 19 October, 2018;
originally announced October 2018.
-
The Cluster-EAGLE project: a comparison of dynamical mass estimators using simulated clusters
Authors:
Thomas J. Armitage,
Scott T. Kay,
David J. Barnes,
Yannick M. Bahé,
Claudio Dalla Vecchia
Abstract:
Forthcoming large-scale spectroscopic surveys will soon provide data on thousands of galaxy clusters. It is important that the systematics of the various mass estimation techniques are well understood and calibrated. We compare three different dynamical mass estimators using the C-EAGLE galaxy clusters, a set of high resolution simulations with resolved galaxies a median total mass,…
▽ More
Forthcoming large-scale spectroscopic surveys will soon provide data on thousands of galaxy clusters. It is important that the systematics of the various mass estimation techniques are well understood and calibrated. We compare three different dynamical mass estimators using the C-EAGLE galaxy clusters, a set of high resolution simulations with resolved galaxies a median total mass, $M_{200c} = 10^{14.7} \, \mathrm{M_\odot}$. We quantify the bias and scatter of the Jeans, virial, and caustic mass estimators using all galaxies with a stellar mass $M_*> 10^9 \, \mathrm{M_\odot}$, both in the ideal 3D case and in the more realistic projected case. On average we find our mass estimates are unbiased, though relative to the true mass within $r_{200c}$ the scatter is large with a range of $0.09$ - $0.15$ dex. We see a slight increase in the scatter when projecting the clusters. Selecting galaxies using the same criteria, we find no significant difference in the mass bias or scatter when comparing results from hydrodynamical and dark matter only simulations. However, selecting galaxies by stellar mass reduces the bias compared to selecting by total mass. Comparing X-ray derived hydrostatic and dynamical masses, the former are ${\sim} 30$ per cent lower. We find a slight dependence between substructure, measured using two different metrics, and mass bias. In conclusion, we find that dynamical mass estimators, when averaged together, are unbiased with a scatter of $0.11 \pm 0.02$ dex when including interloper galaxies and with no prior knowledge of $r_{200c}$.
△ Less
Submitted 25 October, 2018; v1 submitted 5 September, 2018;
originally announced September 2018.
-
The Cluster-EAGLE project: velocity bias and the velocity dispersion - mass relation of cluster galaxies
Authors:
Thomas Joshua Armitage,
David. J. Barnes,
Scott. T. Kay,
Yannick M. Bahé,
Claudio Dalla Vecchia,
Robert A. Crain,
Tom Theuns
Abstract:
We use the Cluster-EAGLE simulations to explore the velocity bias introduced when using galaxies, rather than dark matter particles, to estimate the velocity dispersion of a galaxy cluster, a property known to be tightly correlated with cluster mass. The simulations consist of 30 clusters spanning a mass range $14.0 \le \log_{10}(M_{\rm 200c}/\mathrm{M_\odot}) \le 15.4$, with their sophisticated s…
▽ More
We use the Cluster-EAGLE simulations to explore the velocity bias introduced when using galaxies, rather than dark matter particles, to estimate the velocity dispersion of a galaxy cluster, a property known to be tightly correlated with cluster mass. The simulations consist of 30 clusters spanning a mass range $14.0 \le \log_{10}(M_{\rm 200c}/\mathrm{M_\odot}) \le 15.4$, with their sophisticated sub-grid physics modelling and high numerical resolution (sub-kpc gravitational softening) making them ideal for this purpose. We find that selecting galaxies by their total mass results in a velocity dispersion that is 5-10 per cent higher than the dark matter particles. However, selecting galaxies by their stellar mass results in an almost unbiased ($<5$ per cent) estimator of the velocity dispersion. This result holds out to $z=1.5$ and is relatively insensitive to the choice of cluster aperture, varying by less than 5 per cent between $r_{\rm 500c}$ and $r_{\rm 200m}$. We show that the velocity bias is a function of the time spent by a galaxy inside the cluster environment. Selecting galaxies by their total mass results in a larger bias because a larger fraction of objects have only recently entered the cluster and these have a velocity bias above unity. Galaxies that entered more than $4 \, \mathrm{Gyr}$ ago become progressively colder with time, as expected from dynamical friction. We conclude that velocity bias should not be a major issue when estimating cluster masses from kinematic methods.
△ Less
Submitted 21 November, 2017; v1 submitted 1 August, 2017;
originally announced August 2017.