Category: science

  • Imaging 20 Galaxies

    This report is on what I found while imaging the M96 Group. I ended up capturing 20 galaxies. The report details issues both in image processing and image and is a fairly typical representation of the experimentation in astrophotography. I’m going to share the final product with labels now, then show how the picture evolved.

    Background

    I’ve been working my way through the Messier objects, which are a list of 110 “comet-like” entities that were cataloged by the French astronomer Charles Messier in the 1700s. They’re a great list for an amateur astronomer to knock out, as they were visible using the optics of the 1700s, many are visible with modern binoculars. They are “comet-like” objects because the notion of a galaxy was not formalized until 1926 by Edwin Hubble.

    “20 Galaxies” – The final processed image with labels.

    On February 8, I set out to image M95, M96, and M105, collectively known as the “M96 Group”. These three galaxies are really close together and of a decent size, making them easy to image at once. They are all about 37 Million light-years away, which makes the use of long-exposures a necessity.

    Imaging

    As the first 5-minute exposures came back, I ran into some issues right away: the dust on my camera sensor, which I had been delaying cleaning, was creating little distortions all across the deep field of stars. For most of the objects I had targeted so far, I was able to crop around the dirt. However, these objects were perfectly positioned to make that impossible.

    Original framing, centered on M96. Notice all the circles caused by sensor dust.

    Another issue was guiding. In order to get clear, pinpoint stars in a long-exposure image, the movement of the stars must be counteracted. There are two mechanisms at work: an electronic “tracking” mount moves in lock step with the stars. However, precise polar alignment is required to set the reference point to have accurate tracking. This is often difficult. In order to correct for poor polar alignment, “guiding” is often used. A secondary imaging system is mounted to the primary telescope. It is attached to a computer, which calculates the drift of the star field using a guide star and sends small “pulse” signals to the tracking mount.

    Performing a polar alignment in KStars/EKOS for tracking accuracy. I was 1° 17′ 46″ off in my initial calibration. In order to bring the scope into alignment, I had to move the scope so that the star at the left end of the pink line is in the crosshair.

    The first time I used guiding, everything “just worked”, I stayed within the error tolerance with no effort. The second time, it was failing quite regularly with more than 8 arc-seconds of drift and constantly trying to find new guide stars. What happened? Did I forget to balance the scope? Was the guide scope going out of focus? No. It was just clouds!

    The guiding module in KStars/EKOS. On the left is the configuration info. The picture shows the guide camera view and the guide star. The chart shows the corrective pulses and the RMS error. The bullseye is a nice visualization of where the scope is drifting.

    That’s when I discovered a really cool feature of my imaging software, KStars: it can abort an exposure if guiding error goes above a threshold and retry once the guiding settles down. An interesting consequence is that if a cloud appears, the star will disappear, and errors will rapidly go up, aborting the exposure. Essentially, I can use this to automatically image even if there are sporadic clouds.

    The analytics module of KStars/EKOS. The bottom graph shows the drift and RMS. The first imaging run was largely successful, with 14 exposures of M67. I then moved my scope, which is what the yellow and blue boxes on top show. I then slewed to M96, and ran alignment (the two teal boxes). This is when the clouds started appearing, and you can see the aborted exposures in red on the bottom row, corresponding to very high RMS in the graph below. An in-progress shot is the hashed green box.

    Finally, all these errors gave me a chance to look at the framing of my shot. When I took my first shots and started processing, I noticed two more galaxies near M105: NGC3384 and NGC3389. As I looked at the shot alignment, I noticed a few more galaxies to the left of M105 on my star charts. By changing the center of the frame from M96 to M105, I was able to pick up 3 more galaxies in frame. Rather than imaging 3 galaxies, I would now be targeting 8 galaxies!

    Field of view for the M96 vs. M105 framing. This mosaic shot is from an early experiment in stacking frames from different imaging sessions.

    In summary, there were 3 issues that came up during the imaging process on February 8:

    • Cleaning – Dust directly on the camera sensor causing distortions.
    • Guiding – Clouds causing loss of guide-star.
    • Framing – Better framing by moving to a different target.

    Processing

    After I got my camera back from cleaning at Albuquerque Photo-Tech, I got out the scope again on February 11th and properly calibrated it. I was able to capture 17 5-minute exposures to use for the image. I used DeepSkyStacker to automatically align and stack the top 12 exposures, resulting in a 1-hour total exposure.

    For the first time, I experimented with a technique called “drizzling”. This method was pioneered by the Hubble Space Telescope and released as open-source software. It uses slightly offset images to create much higher resolution images than the camera sensor is able to capture by exploiting the sensor’s undersampling of the telescope resolution. By deliberately varying the telescope position slightly, a different portion of the image will be sampled from. These differences can be used to upscale the image. The position variance is called “dithering”, the upscaling is “drizzling”.

    Undersampling results from when the sensor resolution is lower than the telescope’s resolving power or atmospheric seeing. This allows for relaxed tolerance in guiding and is a great scenario for drizzling.
    Ideal sampling occurs when the sensor approximates the telescope’s resolving power. This is also a good scenario for drizzle, as the atmospheric seeing is still undersampled.
    Oversampling occurs when the sensor resolution is greater than the telescope’s resolving power. This results in “soft” images, as stars are spread across multiple pixels, rather than being a point source of light. Rather than drizzling to increase resolution, oversampling is addressed via “binning” pixels together, reducing resolution.

    For example, my imaging telescope has a resolving power of 1.9 arcsec. My camera sensor can resolve 2.97 arcsec/pixel. That means that I am undersampling what the telescope is capable of resolving by 33%. By dithering and drizzling, I was able to create a 48MP master from my 12MP sensor and get signal from a lot more galaxies than anticipated.

    Vignette removal – before and after comparison.

    The major struggle for me in processing is light pollution and the uneven background it casts on my images. While I can get to dark skies pretty easily, outings require a lot of coordination for family duties, so I took this image in my backyard, almost directly under the neighbor’s flood light. No matter how I adjusted the color curves, I couldn’t remove the light pollution without also eliminating the galaxies. Finally, I found a tutorial on removing gradients by Astrobackyard. It detailed how to remove the light pollution through use of a threshold layer to create an artificial “flat” image representing the uneven light field. With the help of the tutorial, I was able to get a mostly-uniform background, although there’s some light vignetting remaining.

    The telescope in its ultra-light-polluted, backyard habitat. Seriously, I think it might literally be the worst place in New Mexico to take pictures.

    On the positive side, by zooming on so many parts of the image to examine the vignetting, I discovered another 12(!) galaxies in the frame, bringing the total to 20!

    • Messier: M95, M96, M105
    • NGC: 3338, 3357, 3367, 3377, 3377a, 3384, 3389, 3412
    • PGC: 31937, 32371, 32393, 32488, 1403591
    • UGC: 5832, 5869, 5897
    • IC: 643

    I was helped by two tools in identifying these galaxies: astrometry.net and Stellarium Web. Astrometry.net is awesome. You upload an image and it reports the celestial coordinates and objects in view. It’s completely open source, so I can run the solver locally. Astrometry is how I ensure proper alignment of the scope and accurate go-to movements while in the field. Stellarium Web is a browser-based planetarium that has a huge object database. While processing, I centered my view on M105, just like my camera, and used it to walk across the image to see what objects had resolved.

    Screenshot of Stellarium Web showing the sky at 10:55 PM on February 11th, with one of the fainter galaxies in the image’s field of view highlighted. It’s interesting to note that there are even more galaxies in this shot that my imaging system couldn’t resolve.

    In summary, I used three new techniques on the processing side for the exposures I captured on February 11th:

    • Drizzle – Boosting resolution through slightly offset, undersampled exposures.
    • Vignette removal – Photoshop threshold filters, combined with selective removal of deep space objects from the background field to produce a gradient mask.
    • Locating objects – Astrometry and Stellarium Web to ensure alignment and get object names.

    And here it is! 20 galaxies in a single image.

    20 galaxies in a single image, unlabeled.

    Conclusions

    Two final notes on this field report:

    1. Starting with a simple, manual scope was absolutely the right decision. I wrote about this for anyone considering a telescope purchase, especially if they want to share it with their family. The depth of understanding necessary to identify problems in the imaging process is enormous and learning one-step-at-a-time is highly recommended. Dip in a toe first. Don’t drop $2,000 and get frustrated because you can’t get it to work.
    2. Less-than-perfect equipment makes room for experimentation. I started taking images with my cell phone and a manual mount. Right now I’m using my wife’s Canon EOS Rebel T3 from 2011. I’m not going to get Hubble-quality images in my backyard, but it’s amazing to learn the limits of our consumer technology and then push those limits. I’ve been astonished at what’s possible and how quickly my knowledge has grown based on necessity to get to the next level.

    Thanks for reading!

  • Photographing Saturn

    I’ve made a lot of progress in 2 months with astrophotography. Here’s my best attempt at Saturn from June 17 and then on August 12.

    What changed? Phone and telescope stayed the same. However, I learned how to use the exposure settings and focus lock on the Pixel 3a’s video mode, meaning that my photos were no longer blown out. I also increased the resolution to 4k.

    At the eyepiece, I switched from a Celestron X-Cel 12mm to an Explore Scientific 14mm. The biggest difference here is the field of view – moving from the 60-degree to the 82-degree means its easier to keep the planet within the frame (.3 degree TFOV vs. .47 TFOV), even though magnification is down (200x vs 171x when Barlowed). This means that focus and exposure are more consistence when I split and stack the frames from the camera using Registax.

    I’m still not happy with my Jupiter photos, but I’m starting to pick up color details. The problem has been edge resolution, which is a focus issue.

    Definitely having fun!

  • Astronomy

    One thing about frontier life is that you can’t always struggle against the environment. New Mexico is hot, dry, and high elevation. Being outdoors in the summer is physically taxing. What about the night though? All the isolation out here makes this one of the best places to go stargazing.

    At the start of quarantine, I decided to buy a telescope – something to get me outside, away from screens, and a chance to quiet my mind. I got an Apertura 8″ Dobsonian reflector from High Point Scientific, mostly on the wonder of AstroBackyard’s review video. It’s a fantastic beginner scope, and the manual mount is forcing me to really learn the skies.

    The moon at 80x.

    Seeing the moon, even at 40x magnification, is incredible. There are so many craters! Finding the planets has been a really neat adventure: I can’t believe that I’m able to separate Saturn’s rings and see Jupiter’s moons.

    Jupiter and the 4 Galilean moons.
    Saturn and it’s moon, Titan.

    I also got a cell-phone adapter for my eyepieces. It essentially lets me use the entire telescope as a gigantic camera lens. My Google Pixel 3a has an astrophotography mode that has helped get long exposure photos of the sky.

    The Milky Way, shot in astrophotography mode on a Pixel 3a XL.

    Another cool thing is the discovery of Comet NEOWISE. It’s not really visible from the city, so it’s been a good excuse to get out of town into the wild.

    Comet NEOWISE.

    As I get deeper into this hobby, I’m realizing that something I originally started to get away from screens might get me into more screens. The basic calculations around optics have led to a gnarly spreadsheet. The notion of astrophotography as data collection is mind-blowing. Digital sensors have evolved to where we are literally measuring the number of photons hitting a 3 square-micron pixel, down to the level of a single photon in 5 minutes. This is all possible with consumer hardware too!

    I wanted to share some beginner resources that have helped me.

    • AstroBackyard review of Apertura AD8 — Trevor Jones has a great channel for astrophotography and conveys the wonder of it all well. This video really sealed my purchase.
    • Allen’s Stuff on choosing a beginner telescope — Allan Hall reviews
      pretty much every kind of beginner scope and the pros-cons of each.
    • A Beginner’s Guide to Solar System Photography — Particularly useful article focusing on alt-az mounts. A Dobsonian is a fancy alt-az mount and one of the big challenges is that stars do not track with that mount so your exposure times are limited.
    • Astrophotography with a Dobsonian? — Video demonstrating reaosnable expectations from a beginner with the same type of telescope that I have.
    • The Deep-Sky Imaging Primer -— Fantastic guide, university-course level of detail, far exceeded expectations and gave me a glimpse of just how engrossing this hobby can be. All of the author’s books are stunningly beautiful – his Sky Atlas is also great!

  • Towards Cultural-Scale Models of Full Text

    For the past year, Colin and I have been on a HathiTrust Advanced Collaborative Support (ACS) Grant. This project has examined how topic models differ between library subject areas. For example, some areas may have a “canon” meaning that a low number of topics selects the same themes, no matter what the corpus size is. In contrast, still emerging fields may not agree on the overall thematic structure. We also looked at how sample size affects these models. We’ve uploaded the initial technical report to the arXiv:

    Towards Cultural Scale Models of Full Text
    Jaimie Murdock, Jiaan Zeng, Colin Allen
    In this preliminary study, we examine whether random samples from within given Library of Congress Classification Outline areas yield significantly different topic models. We find that models of subsamples can equal the topic similarity of models over the whole corpus. As the sample size increases, topic distance decreases and topic overlap increases. The requisite subsample size differs by field and by number of topics. While this study focuses on only five areas, we find significant differences in the behavior of these areas that can only be investigated with large corpora like the Hathi Trust.
    http://arxiv.org/abs/1512.05004

  • Psychonomics 2015

    This weekend I was in Chicago for the Psychonomic Society and Society for Computers in Psychology meetings. Emily and I stayed Thursday through Saturday and experienced a record first snow of the season. I hope that our fellow conference-goers made it back safely as well.

    Chicago is one of the best food towns we’ve ever been to: we cannot recommend Gino’s East deep-dish pizza and Santorini’s Greek restaurant enough.

    Below are some conference observations and highlights.

    Conference Impressions
    As an abstract-only, non-proceedings conference, it is a great opportunity to showcase developing or under review work. For an idea of the breadth of the conference, please look at the abstract book. The talks were of varying quality, but the rapt attention of the audience and quality of questions were excellent. Next year it will be in Boston on November 17-20.

    Distributed Cognition
    One of the best talks was by Steven Sloman on “The Illusion of Explanatory Depth and the Community of Knowledge”:

    Asking people to explain how something works reveals an illusion of explanatory depth: Typically, people know less about the causal mechanism they are describing than they think they do (Rozenblit & Keil, 2002). I report studies showing that explanation shatters people’s sense of understanding in politics. I also show that people’s sense of understanding increases when they are informed that someone else understands and that this effect is not attributable to task demands or understandability inferences. The evidence suggests that our sense of understanding resides in a community of knowledge: People fail to distinguish the knowledge inside their heads from the knowledge in other people’s heads.

    The article detailing that explanation shatters political understanding is quite accessible. The further results about “a community of knowledge” are under review.

    Prof. Sloman is the conference chair for the International Conference on Thinking on August 3-6, 2016 at Brown University. Submission deadline is March 31, 2016.

    The Science of Narrative
    Another excellent talk was by Mark Finlayson who studies “the science of narrative”. He developed “Analogical Story Merging” (ASM), which can replicate Vladmir Propp’s theory of the structure of folktale plots. This process is described in his dissertation, which is an excellent synthesis of literary theory and computer science.

    Prof. Finlayson is hosting the 7th International Workshop on Computational Models of Narrative at Digital Humanities 2016 in Kraków, Poland on July 11-12. The call for papers is pending.

    Bilingualism

    There were two talks in the Bilingualism track that were particularly interesting.  Conor McLennan and Sara Incera reported that mouse tracking behavior in bilinguals doing a word discrimination task shows the same sort of reaction delay as in expert discrimination tasks. This correlates with confidence in answers – experts may take longer but move directly to their answers. The results are published in Bilingualism.

    Another talk looked at how multilingualism affects vocabulary size using a massive online experiment. While the task of identifying whether a word is known or not is riddled with false positives, the results were interesting in and of themselves. Mutlilinguals tended to have higher vocabularies across languages, and L2 learners tended to actually have a higher vocabulary than L1 native speakers within a language. The results are published in The Quarterly Journal of Experimental Psychology.

  • Darwin’s Semantic Voyage

    The preprint of my project “Exploration and Exploitation of Victorian Science in Darwin’s Reading Notebooks” was released on arXiv on Friday. The paper is joint work with my advisors Colin Allen and Simon DeDeo.

    This has consumed my life for the past year and I’m incredibly proud of the results. It’s an entertaining read — printing pages “1-11,24-28” gives the main body and references. 12-23 are the “supporting information” explaining some of the archival work, mathematics, and model verification, but absolutely not central to the key points of the paper.

    The key point for digital humanities is that we’ve come up with a way to characterize an individual’s reading behaviors and identify key biographical periods from their life. Darwin is incredibly well-studied, so our results largely confirm existing history of science work. However, by adjusting the granularity we can also suggest hypotheses for further investigation – in this case, the period of Darwin’s life from 1851-1853 after his daughter’s death. For less well-studied individuals, this may help humanists gain traction on narrative organization when interacting with large historical archives.

    The key point for cognitive scientists is that we can now characterize information foraging behaviors on multiple timescales using an information theoretic measure of cognitive surprise. While many people have studied foraging behavior in individuals on the order of minutes, or in cultures on the order of decades – this is the first study that looks at how an individual interacts with the products of their culture over the course of a lifetime.

    It’s important to note that we don’t say anything about how his reading affected his writing – that’s for paper #2!

    Also, I’ll presenting this work at the 2015 Conference on Complex Systems this Friday at Arizona State University, with slides available on Google Slides.

    Exploration and Exploitation of Victorian Science in Darwin’s Reading Notebooks
    Jaimie Murdock, Colin Allen, Simon DeDeo
    Abstract: Search in an environment with an uncertain distribution of resources involves a trade-off between local exploitation and distant exploration. This extends to the problem of information foraging, where a knowledge-seeker shifts between reading in depth and studying new domains. To study this, we examine the reading choices made by one of the most celebrated scientists of the modern era: Charles Darwin. Darwin built his theory of natural selection in part by synthesizing disparate parts of Victorian science. When we analyze his extensively self-documented reading we find shifts, on multiple timescales, between choosing to remain with familiar topics and seeking cognitive surprise in novel fields. On the longest timescales, these shifts correlate with major intellectual epochs of his career, as detected by Bayesian epoch estimation. When we compare Darwin’s reading path with publication order of the same texts, we find Darwin more adventurous than the culture as a whole.

  • Topic Modeling Tutorial at JCDL2015

    Join the HathiTrust Research Center (HTRC) and InPhO Project for a half-day tutorial on HathiTrust data access and topic modeling at JCDL 2015 in Knoxville, TN on Sunday, June 21, 2015, 9am-12pm!
    Topic Exploration with the HTRC Data Capsule for Non-Consumptive Research
    Organizers: Jaimie Murdock, Jiaan Zeng and Robert McDonald
    Abstract: In this half-day tutorial, we will show 1) how the HathiTrust Research Center (HTRC) Data Capsule can be used for non-­consumptive research over collection of texts and 2) how integrated tools for LDA topic modeling and visualization can be used to drive formulation of new research questions. Participants will be given an account in the HTRC Data Capsule and taught how to use the workset manager to create a corpus, and then use the VM’s secure mode to download texts and analyze their contents. [tutorial paper]2015 HTRC UnCamp

     

     

    We draw your attention to the astonishingly low half-day tutorial fees:

    Half-Day Tutorial/Workshop Early Registration (by May 22!)
    ACM/IEEE/SIG/ASIS&T Members – $70
    Non-ACM/IEEE/SIG/ASIS&T Members – $95
    ACM/IEEE/SIG/ASIS&T Student – $20
    Non-member Student – $40

    Half-Day Tutorial/Workshop Late/Onsite Registration
    ACM/IEEE/SIG/ASIS&T Members – $95
    Non-ACM/IEEE/SIG/ASIS&T Members – $120
    ACM/IEEE/SIG/ASIS&T Student – $40
    Non-member Student – $60

    Hope to see you there!

  • Six Upcoming Talks

    For the past 6 months, I’ve been very busy working on a number of collaborations with Simon DeDeo and Colin Allen. Now, I’m taking to the road to show the fruit of my labors. Below are 6 upcoming talks, tutorials, and workshops about this work on topic modeling, Charles Darwin, information foraging, and the HathiTrust. I hope to see you there!

    Topics over Time: Into Darwin’s Mind (Local)
    Network Science @ IU Talks
    Monday, March 9 — 12:30-1pm
    Social Science Research Commons
    Slides: http://jamr.am/DarwinIUNetSci
    Video coming soon!

    Topic Modeling with the HathiTrust Data Capsule
    HathiTrust UnCamp 2015
    Monday, March 30
    Ann Arbor, MI
    Presenters: Jaimie Murdock, Colin Allen

    Topic-driven Foraging (Local)
    Goldstone, Todd, Landy Lab
    Friday, April 10 — 9-10a
    MSB II Gill Conference Room

    Visualization Techniques for LDA (Local)
    Cognitive Science 25th Anniversary
    Interactive Systems Open House
    Friday, April 17 — 3:30-5:15pm
    Location TBD

    Topic Modeling & Network Analysis (Local)
    Catapult Center Workshops
    Friday, April 24 — 1-4pm
    Wells Library E159
    Presenter: Colin Allen

    HT Data Capsule & Topic Modeling for Non-consumptive Research
    JCDL 2015 Tutorial
    Sunday, June 21 — 9am-noon
    Knoxville, TN
    Presenters: Jaimie Murdock, Jiaan Zeng, Robert MacDonald

  • Wisdom of the Few?

    Wisdom of the Few? “Supertaggers” in Collaborative Tagging Systems

    Jared Lorince, Sam Zorowitz, Jaimie Murdock, Peter M. Todd

    A folksonomy is ostensibly an information structure built up by the “wisdom of the crowd”, but is the “crowd” really doing the work? Tagging is in fact a sharply skewed process in which a small minority of “supertagger” users generate an overwhelming majority of the annotations. Using data from three large-scale social tagging platforms, we explore (a) how to best quantify the imbalance in tagging behavior and formally define a supertagger, (b) how supertaggers differ from other users in their tagging patterns, and (c) if effects of motivation and expertise inform our understanding of what makes a supertagger. Our results indicate that such prolific users not only tag more than their counterparts, but in quantifiably different ways. These findings suggest that we should question the extent to which folkosonomies achieve crowdsourced classification via the “wisdom of the crowd”, especially for broad folksonomies like Last.fm as opposed to narrow folksonomies like Flickr.

    Preprint of article in review available at arXiv:1502.02777 [cs.SI]

  • Topic Explorer at AAAI

    Next week, I’ll be headed to Austin, TX for AAAI-15 to present a demo of the Topic Explorer. With this presentation is a short paper:

    Topic models remain a black box both for modelers and for end users in many respects. From the modelers’ perspective, many decisions must be made which lack clear rationales and whose interactions are unclear – for example, how many topics the algorithms should find (K), which words to ignore (aka the “stop list”), and whether it is adequate to run the modeling process once or multiple times, producing different results due to the algorithms that approximate the Bayesian priors. Furthermore, the results of different parameter settings are hard to analyze, summarize, and visualize, making model comparison difficult. From the end users’ perspective, it is hard to understand why the models perform as they do, and information-theoretic similarity measures do not fully align with humanistic interpretation of the topics. We present the Topic Explorer, which advances the state-of-the-art in topic model visualization for document-document and topic-document relations. It brings topic models to life in a way that fosters deep understanding of both corpus and models, allowing users to generate interpretive hypotheses and to suggest further experiments. Such tools are an essential step toward assessing whether topic modeling is a suitable technique for AI and cognitive modeling applications.

    Jaimie Murdock and Colin Allen. (2015) Visualization Techniques for Topic Model Checking. [demo track] in Proceedings of the 29th AAAI Conference (AAAI-15). Austin, Texas, USA, January 25-29, 2015.