Silkenweb Example: Hackernews Clone

Subdomain.center – discover all subdomains for a domain

385 points by adam_gyroscope 1 year ago | 124 comments

gnyman 1 year ago
You cannot hide anything on the internet anymore, the full IPv4 range is scanned regularly by multiple entities. If you open a port on a public IP it will get found.
If it's a obscure non-standard port it might take longer, but if it's on any of the standard ports it will get probed very quickly and included tools like shodan.io
The reason why I'm repeating this, is that not everyone knows this. People still (albeit less) put up elastic and mongodb instances with no authentication on public IP's.
The second thing which isn't well known is the Certificate Transparency logs. This is the reason why you can't (without a wildcard cert) hide any HTTPS service. When you ask Let's Encrypt (or any CA actually) to generate veryobscure.domain.tld they will send that to the Certificate Transparency logs. You can find every certificate which was minted for a domain on a tool like https://crt.sh
There are many tools like subdomain.center, https://hackertarget.com/find-dns-host-records/ comes to mind. The most impressive one I've seen, which found more much more than expected, is Detectify (which is a paid service, no affiliation), they seem to combine the passive data collection (like subdomain.center) with active brute to find even more subdomains.
But you can probably get 95% there by using CT and a brute-force tool like https://github.com/aboul3la/Sublist3r
- fuzzy2 1 year ago
  The Certificate Transparency Log is very important. I recently spun up a service with HTTPS certs by Let's Encrypt. By coincidence I was watching the logs. Within just 80 seconds of the certificate being issued I could see the first automated "attacks".
  If you get a certificate, be ready for the consequences.
  - mixdup 1 year ago
    Were these automated "attacks" hitting you by hostname or IP? Because there's a chance you would've been getting them regardless just from people scanning the entire IPv4 space
    - fuzzy2 1 year ago
      They would not have been reverse proxied to the Docker container without the hostname.
  - tikkabhuna 1 year ago
    This is really interesting. For my Homelab I've been playing around with using Lets Encrypt rather than spinning up my own CA. "What's the worst that could happen?"
    Guess I'll be looking to spin up my own CA now!
    - dspillett 1 year ago
      Getting a wildcard certificate from LE might be a better option, depending on how easy the extra bit of if plumbing is with your lab setup.
      You need to use DNS based domain identification, and once you have a cert distribute it to all your services. The former can be automated using various common tools (look at https://github.com/joohoi/acme-dns, self-hosted unless you are only securing toys you don't really care about, if you self host DNS or your registrar doesn't have useful API access) or you can leave that as an every ~ten weeks manual job, the latter involves scripts to update you various services when a new certificate is available (either pushing from where you receive the certificate or picking up from elsewhere). I have a little VM that holds the couple of wildcard certificates (renewing them via DNS01 and acmedns on a separate machine so this one is impossible to see from the outside world), it pushes the new key and certificate out to other hosts (simple SSH to copy over then restart nginx/Apache/other).
      Of course you may decide that the shin if your own CA is easier than setting all this up, as you can sign long lived certificates for yourself. I prefer this because I don't need to switch to something else if I decide to give friends/others access to something.
      Your top level (sub)domain for the wildcard is still in the transparency logs of course, but nothing under it is.
    - rbut 1 year ago
      If you're homelab'ing then you should be using private IPs to host your services anyway. Don't put them on a public IP unless you absolutely have to (eg port 25 for mail).
      Use your internal DNS server (eg your routers) for DNS entries for each service. Or if you wish you can put them in public DNS also. Eg. gitlab.myhome.com A 192.168.33.11
      You can then access your services over an always-on VPN like wireguard when you're away from home.
      Then it doesn't matter if anyone knows what subdomains you have, they can't access them anyway.
    - KronisLV 1 year ago
      > Guess I'll be looking to spin up my own CA now!
      I was looking for a lazy/easy way to do this manually and settled on KeyStore Explorer, which is a GUI tool that lets you work with various keystores and do everything from making your own CA, to signing and exporting certificates in various formats: https://github.com/kaikramer/keystore-explorer (to me it feels easier than working with OpenSSL directly, provided I trust the tool)
      In addition, I also setup mTLS or even basicauth at the web server (reverse proxy) level for some of my sites, which seems to help that little bit more, given that some automated attacks might choose to ignore TLS errors, but won't be able to provide my client certs or the username/password. In addition, I also run fail2ban and mod_security, though that's more opinionated.
    - oooyay 1 year ago
      I use a wildcard certificate for my home infrastructure. For all the talk of hiding, though, it's wise not to count on hiding behind a wild card. Properly configure your firewalls and network policy. For the services you do have exposed, implement rate limiting and privileged access. I stuck most of my LE services behind Tailscale, so they get their certificates but aren't routable outside my Tailscale network.
    - ahoka 1 year ago
      Didn’t you read the original comment? It’s just a matter of time until someone starts to poke your IPs. Your own CA will be harder to get right.
    - teekert 1 year ago
      Can Tailscale magic DNS + tunnel obscure things? Or only when you keep a service within the tailnet? (Still a + for selfhosters)
- implements 1 year ago
  Recently, I opened 80 and 443 so I could use LetsEncrypt’s acme-client to get a certificate (and then test it). Tightening up security a bit, I configured an http relay to filter people accessing 80 by ip address rather than domain name - some scanners are still trying domain and sub-domain names I was using weeks ago - which goes to show how organised hackers are about attacking targets.
  - BoberMod 1 year ago
    You can use DNS-01 challenge [1] to get certificate. You just need to add temporary TXT record to your DNS. It also supports wildcart certificates.
    Most popular DNS providers (like Cloudflare) has API, so it can be easily automated.
    I'm using it in my local network: I have publicly available domain for it (intranet.domain.com) and I don't wont to expose my local services to the world to issue certificate trusted by root CA on all my devices. So, this method allows me to issue valid Let's encrypt wildcard cert (*.intranet.domain.com) for all my internal services without opening any ports to the world.
    [1]: https://letsencrypt.org/docs/challenge-types/#dns-01-challen...
  - intothemild 1 year ago
    Once you expose something long enough to get scanned. It's going to continue to get scanned pretty much forever.
    I self host a couple web services, but none are open, you need strong authentication to get in.
    It's not ideal, ideally I'd close the https web traffic and use some form of VPN to get in. But sadly that's just not feasible in my use case. So strong auth it is.
- fragmede 1 year ago
  not to underestimate the power of shodan, and oh god don't spin up a default mongo with no auth, but port knocking would seen to counteract this to enough of a degree, not to mention having a service only accessible via Tor.
  https://wiki.archlinux.org/title/Port_knocking#:~:text=Port%....
  - gnyman 1 year ago
    Yes, you can hide with a little bit of effort. Port knocking or Tor will stop almost any thing (but don't rely on it as the sole protection, just as another layer).
    I like to prefix anything "I don't want scraped" with a random prefix, like domain.com/kwo4sx_grafana/ and nobody will find it (as long as you don't link to it anywhere). But I still have auth enabled, but at least I don't have to worry about any automated attacks exploiting it before I have time to patch.
    Something as simple as moving SSH on a non standard port reduces the amount of noise from most automated scanners 99% (made up number, but a lot).
    - Vorh 1 year ago
      Have you had any problems with browsers leaking the prefixed sites, as seen here?
      https://news.ycombinator.com/item?id=35703789
- sgjohnson 1 year ago
  You don't even need "multiple entities". Absolutely anyone can do that. Scanning a single port on the entire IPv4 internet takes about 40 minutes.
- codethief 1 year ago
  > You cannot hide anything on the internet anymore, the full IPv4 range is scanned regularly by multiple entities. If you open a port on a public IP it will get found.
  Sure but you might still host multiple virtual hosts (e.g. subdomains) on the same web server. Unless an attacker knows their exact hostnames, they won't be able to access them.
  - gustavus 1 year ago
    There are several easy ways to the skirt that.
    First you can simply try bruteforcing subdomains, secondly if you are using https you can simply pull the cert and look at the aliases listed there. 2 ways off the top of my head.
    - codethief 1 year ago
      Of course, but my point was that none of them involve IP scanning.
- tamimio 1 year ago
  > This is the reason why you can't (without a wildcard cert)
  Guess being security conscious pays off, as testing those on some domains I have, they only managed to show what I want to show, since wildcard will just mask them.
  That being said, I don’t think anyone should consider a subdomain as a hidden thing, it’s an address after all and should not be considered hidden, assume it’s accessible or put it behind a FW or VPN and have a proper authentication, security by obscurity never works.
- matheusmoreira 1 year ago
  > the full IPv4 range is scanned regularly by multiple entities
  Single packet authorization. Server just drops any and all packets unless you send a cryptographically signed packet to it first. To all these observers, it's like the server is not even there.
- danielvaughn 1 year ago
  At my company we got bit by this several months ago. Luckily the database was either empty or only had testing data, but like you said the port was exposed and someone found it.
- 1 year ago
- pid-1 1 year ago
  > full IPv4 range is scanned regularly by multiple entities.
  Yet another good reason to use IPv6
  - gnyman 1 year ago
    IPv6 won't get found by brute-force but there are a few projects which tries to gather IPv6 addresses using various means and scans them as they are found.
    Shodan did (maybe still does) provided NTP servers to some ntp-pools and scanned anyone who sent incoming requests.
    https://arstechnica.com/information-technology/2016/02/using...
    So as with everything, layer the defences, don't rely on you IPv6 address being secret as the only defence.
banana_giraffe 1 year ago
Cute, it managed to find 121486 subdomains for amazonaws.com [1], and somehow I suspect that's a tiny fraction of what's in use.
https://gist.githubusercontent.com/Q726kbXuN/bf8a9a22b81fe65...
Brananarchy 1 year ago
As others have said, certificate transparency seems to be doing some heavy lifting here. It reports subdomains for me that have never had a public CNAME or A record, but have had let's encrypt certs issued for internal use.
It's also missing some that have not had certs issued, but that are in public DNS
- TekMol 1 year ago
  That's why HTTPS is still a pain in the butt. 30 years after it was invented.
  I don't want internally used subdomains to be public. Because of certificate transparency, the only way to achieve that is via wildcard certs.
  Let's encrypt only supports cumbersome validation methods for those. Like changing DNS records every time you need to renew the cert.
  Pretty annoying.
  - proto_lambda 1 year ago
    If the subdomains aren't supposed to be public, the public also doesn't need to trust the TLS certs. Sign them with your own CA and trust it on the devices that should be able to access the domains.
    - paranoidrobot 1 year ago
      Adding CAs to trust stores on devices and in apps is a major pain.
      If you have unmanaged devices this becomes even more painful.
      "Oh, hi, welcome to the company, please install this Root CA onto your machine to access <internal service>"
      Because you can't scope CAs to specific domains, this causes everyone with any idea about security to start being concerned.
    - Figs 1 year ago
      Non-public usage doesn't necessarily mean that only devices under your direct control need access. Slack needs access to some of my organization's systems, for example, to support the way we collaborate on our projects -- but the general public doesn't and would likely just be confused if they stumbled into one of our infrastructure subdomains instead of visiting our public website.
    - withinboredom 1 year ago
      Yeah. In that case, it's just easier to get a really cheap wild-card cert signed by a low-cost reseller for <50 bucks. They only reason to care about big-name certs is compatibility with all the devices out there, but if you don't need compatibility, then get the cheapest thing you can.
    - justsomehnguy 1 year ago
      > and trust it on the devices that should be able to access the domains
      Sometimes it's not an option. I spent too many hours trying to figure out why some Android apps didn't want to talk with a service I self-hosted. They just ignored my Root CA cert installed on the phone.
  - speedgoose 1 year ago
    I think you are supposed to automate the renewal with the DNS record method.
    - tjoff 1 year ago
      Which most registrars don't support.
- Hamuko 1 year ago
  I have a single wildcard certificate for my internal domain name and ~10 CNAMEs for various service subdomains in the network (plex.server.com, grafana.server.com, etc). This tool found zero subdomains for my internal domain.
  - rft 1 year ago
    I have a similar setup (*.home.domain.com DNS auth with LE -> service1.home.domain.com etc.) for my personal, but externally reachable domain, and I get the same results. I went the wildcard route just due to a bit of paranoia, nice to see that it actually worked out in this case.
    As this (I expect) heavily uses cert transparency in the background, I want to point out another use case for that service. You can search the CT logs with wildcards to find your domain "neighbors" on other TLDs: https://crt.sh/?Identity=google.%25&match=ILIKE This usually gives you somewhat more active websites compared to just checking whether you can register the domain and somewhat weeds out squatted domains. I found that for our company one TLD contained a NSFW games store that way.
  - SushiHippie 1 year ago
    You should check again after some time, the first time I looked up my domain there were no results, few minutes later it found some of my subdomains.
    - Hamuko 1 year ago
      Still nothing.
- Symbiote 1 year ago
  At work we have a wildcard certificate for most services we host on our own infrastructure. Most public websites have been detected, and some internal ones which have probably been referenced in public GitHub issues and so on.
  They've done simple reverse DNS lookups on our public IP range and indexed all those hostnames.
  Certificate transparency logs have found names used for externally hosted websites.
  There are some pretty old hostnames which haven't been used for 5 years or more, and were probably found with reverse DNS at the time.
TheHappyOddish 1 year ago
Hardly "all subdomains". Unless it's doing an AXFR of my zone file (unlikely), this isn't possible.
It's a scraper/guesser, using cert transparency, common names, etc. Cute toy, but false claims.
- panki27 1 year ago
  You are correct, I've tested it with my own domains. It does not know the ones running with a wildcard certificate for example.
  - wlonkly 1 year ago
    It knows many of the wildcard-served customer subdomains of one of my former employers. (They're probably just scraped from search or something, but a wildcard is not sufficient to prevent discovery.)
hankchinaski 1 year ago
I would be keen to know what techniques are used. Usually subdomain discovery is done with dns axfr transfer request which leaks the entire dns zone (but this only works on ancient and unpatched nameservers) or with dictionary attacks. There are some other techniques you can check if you look at the source code of amass (open source Golang reconnaissance/security tool), or CT logs. Dns dumpster is one of the tools I used alongside pentest tools (commercial) and amass (oss)
- cobertos 1 year ago
  I mean, doesn't it say right on the front page?
  * Apache Nutch - So they're crawling either some part of the root itself or some other websites to find subdomains. Honestly might help to query CommonCrawl too.
  * Calidog's Certstream - As you said, you can look at the CT logs
  * OpenAI Embeddings - So I guess it also uses LLM to try to generate ones to test too.
  * Proprietary Tools - your guess is as good as mine
  Probably a common list of subdomains to test against too.
  Seems like multiple techniques to try to squeeze out as much info as possible.
  - Zuiii 1 year ago
    I'd also add insecure DNSSEC implementations that allow you to "walk" the entire record chain for the domain.
    - kevincox 1 year ago
      Calling this "insecure" is a bit harsh. This is required for offline signing which provides better security but worse privacy.
  - piffey 1 year ago
    Proprietary tools means passive DNS.
    - smarx007 1 year ago
      How can one avoid their browsing ending up in the passive DNS logs? For example, is using 1.1.1.1, 8.8.8.8, or 9.9.9.9 (CF, Google, and Quad9, respectively) good or bad in this regard?
      For example, where does Spamhaus get their passive DNS data? They write [1] that it comes from "trusted third parties, including hosting companies, enterprises, and ISPs." But that's rather vague. Are CF, Google, and Quad9 some of those "hosting companies, enterprises, and ISPs"?
      [1]: https://www.spamhaus.com/resource-center/what-is-passive-dns...
    - hoppla 1 year ago
      https://passivedns.mnemonic.no/
derefr 1 year ago
Interesting. Our domain has some subdomains with a numeric suffix; and the API response here has entries in that pattern for not only the particular subdomains that exist or ever existed, but also for subdomains of the same pattern that go beyond any suffix number we've ever actually used.
You'd think they'd at least be filtering their response by checking which subdomains actually have an A/AAAA/CNAME record on them...
- cm2187 1 year ago
  In fact my suspicion is that they would also cheat by looking into the dns cache of the current machine.
- ornornor 1 year ago
  Maybe that’s the Open AI part of their secret sauce hallucinating subdomains?
blueflow 1 year ago
I entered my own domains and i got so many garbage entries. It feels like an AI reading letsencrypt logs and then adding made up shit to it.
internet2000 1 year ago
For my personal domain: it got the ones I have on the SSL cert alternative subject names, made up three, returned one I deleted more than a year ago, and didn't find two. Very curious.
- DaiPlusPlus 1 year ago
  Those SAN and CN names will appear in publicly visible certificate transparency lists ( https://en.wikipedia.org/wiki/Certificate_Transparency ): so if you ever get a TLS certificate for a super-seeekret internal sub-sub-sub-domain-name from a major CA then it won't be secret for long. The only way to keep a publicly-resolvable DNS subdomain confidential is to either get a wildcard cert for the parent domain or find a dodgy (yet somehow widely-trusted) CA that doesn't particiate in CT - or use a self-signed cert.
  This subdomain.center database returned one of my "private" sub-sub-domains (which just points to my NAS) for which I did get a cert from LetsEncrypt, but it doesn't have any of my other sub-sub domains listed (despite resolving to the same A IPv4 address as the listed subdomain) because those subdomains have only ever been secured by a wildcard cert.
donatj 1 year ago
Interesting. It only found less that a quarter of the subdomains of the site I work on, and everything it did find is public facing. I wonder if that’s maybe something to do with how we set up certificates for public vs internal subdomains? It even missed “staging.” which should be nearly identical in configuration to www
SushiHippie 1 year ago
Note, if you looked up a domain and it had no results, you should check back again after some minutes. I looked my domain up and had zero results, which was weird as it should at least find some in the ct logs, but a few minutes later it showed some subdomains.
- LinuxBender 1 year ago
  It took about 5 minutes for me. It found my apex domain and a sub-domain that must have belonged to the previous renter of my domain name. [1] So I was curious and it turns out the previous renters pages were in Wayback. [2] That page renders as mostly little boxes for me. Funny, I had never bothered to check that. I should check if any of my other domains have snapshots from before I rented them.
  [1] - https://api.subdomain.center/?domain=ohblog.net
  [2] - https://web.archive.org/web/20090302094112/https://ohblog.ne...
  - SushiHippie 1 year ago
    Web archive can also somewhat act as a subdomain finder (not really in this case, only the www subdomain, but still interesting): https://web.archive.org/web/*/ohblog.net*
RockRobotRock 1 year ago
This is certificate transparency doing most of the work, right?
- zootboy 1 year ago
  I would assume so. I tested on one of my private domains that generally isn't linked to anywhere, and it just returned the few domains that I generate Let's Encrypt certs for, plus my nameservers.
  Interestingly, I did not receive any DNS queries on my authoritative nameservers during the query, so they don't seem to be doing any active DNS probes.
- out-of-ideas 1 year ago
  it may utilize a few techniques as there are subdomains I am aware of that've never been published other than in the zone config on my registrar that are returned from api query
  - pbhjpbhj 1 year ago
    I use Siteground and it has a staging server that AFAIK hasn't been used for at least 6 years ...
    Nothing at the host has any details of that, archive.org doesn't have it in their site URLs, it's not in DNS records, not in .well-known, it was a transient test years ago ... really curious, must be historic data from somewhere?
  - RockRobotRock 1 year ago
    I use Cloudflare for DNS and the only ones it found had LE certs. It's not doing a simple brute-force on common names, I don't think. Otherwise it probably would have found a lot more. Curious about how it works.
Arubis 1 year ago
If this were able to determine which wildcard subdomains were active for a given domain, you could use it to figure out a lot of B2B companies’ client/customer list.
Xorakios 1 year ago
Just for giggles, does anyone else remember when "subdomains" were called "machine names" because physical devices were limited to one service?
www. ftp. mail.
... weren't theoretical or merely mnemonic.
Felt like an old coot when using "machine name" to a 40 year old IT professional and she was perplexed!
- semi 1 year ago
  I'm a somewhat old coot and do remember those days, but I think the term still makes sense but only in a lan environment.
  machines still have hostnames, and home routers will often trust your dhcp clients machine name.
  So I can still look up steamdeck.lan and find the IP of my steam deck and in that context calling it a machine name is perfectly apt and I think would still be well understood.
p4bl0 1 year ago
It gave me empty results for some of my domains that have multiple subdomains that have TLS certificate associated with them so that must appear in the certificate transparency log.
I guess it should be "discover some subdomains for some domains".
- Semaphor 1 year ago
  Empty for all my and my work’s domains. Then I tested random .com domains and got results. Seems pretty useless.
pabs3 1 year ago
More options here: https://wiki.archiveteam.org/index.php/Finding_subdomains
- sea-gold 1 year ago
  Thanks. This is a really helpful list which includes many of the sites/tools listed here.
ohuf 1 year ago
The subdomain explorer may be fun, but their Exploit Observer is really useful: https://www.exploit.observer/
- g147 1 year ago
  thanks!
keepamovin 1 year ago
This is fantastic!!!
What kind of security considerations are there to having multi-tenant user applications on subdomains and then having them exposed like this?
I'm building a SaaS right now, and I guess one thing is that a given username can then be discovered as a valid login for the system...but obviously that's only part of the login credential.
Maintaining a list of mappings to opaque subdomains seems to reduce targeting, and conceal login partial credentials, but doesn't seem to offer much besides.
Analysis?
- thorum 1 year ago
  It doesn’t seem to detect subdomains set up with Kubernetes ingresses, based on results for one of my domains, so that might be a place to start research.
  - davidkuennen 1 year ago
    It also doesn't find any subdomains for my domain.
    In my case I use Google Cloud DNS. Maybe they have some sort of protection in place (I wouldn't be surprised).
cm2187 1 year ago
One thing I noticed looking at my logs is that there is almost no unsolicited traffic (i.e. failed authentication attempts, exploits of various worldpress bugs, etc) through ipv6. I think it's a function of 1) those coming from networks (compromised home devices, etc) that don't support v6, 2) the v6 address space being too large to scan (the size of an encryption key), so good security by obscurity. This would nullify 2).
weird-eye-issue 1 year ago
I got back an empty list for my domain on Cloudflare with several subdomains (non wildcard)
edit: I retried on my computer (was on my phone earlier) and now it returns all of our subdomains, even picking up our test R2 bucket. In guessing I was rate limited because I accidentally loaded the example file a few times
hbcondo714 1 year ago
Seems similar yet still useful to Wolfram Alpha; just enter a domain and click on the "Subdomains" button:
https://www.wolframalpha.com/input?i=ycombinator.com
franky47 1 year ago
Sublist3r [1] does a similar job, as long as you have the authorisation to use it on a particular domain, as it uses more aggressive discovery techniques.
[1] https://github.com/aboul3la/Sublist3r
asmor 1 year ago
https://github.com/projectdiscovery/subfinder does this, but it explains all the methods and lets you choose to only do a passive scan.
johntiger1 1 year ago
Took a while, but was impressed it detected all of ours: https://api.subdomain.center/?domain=radiantai.health
- DistractionRect 1 year ago
  Certificate transparency does a lot of the heavy lifting:
  https://crt.sh/?q=radiantai.health
  - Semaphor 1 year ago
    Only that actually works. I get hundreds of entries for my domain there, including entries before Lets Encrypt was a thing, while the subdomain checker returns an empty array.
1 year ago
perryizgr8 1 year ago
It detects only some of mine. To be precise, it does not detect subdomains being served by a service behind a CloudFlare tunnel.
xg15 1 year ago
I think as soon as cert transparency was introduced, it was pretty clear we would eventually get something like this.
judge2020 1 year ago
https://dnsdumpster.com
TechBro8615 1 year ago
I get a rate limit error when I click the text input (I'm on a VPN).
- 867-5309 1 year ago
  use an obscure country like North Macedonia
mmarquezs 1 year ago
Nice, last time I used Wolframalpha for this.
webprofusion 1 year ago
This is a CT log search right?
zX41ZdbW 1 year ago
How can I download the entire dataset from this service?
maul666 1 year ago
dpd.co.uk
Ocha 1 year ago
Missed some for me
- Ocha 1 year ago
  Maybe because I use wild card certs with let’s encrypt
  - ThePowerOfFuet 1 year ago
    Instead of replying to yourself, try editing your first comment!
yadnst 1 year ago
[dead]
Andrew018 1 year ago
[dead]
chillbill 1 year ago
[dead]
tobinfekkes 1 year ago
This is crazy, I was just looking for this exact thing a couple days ago. Thank you for sharing. Brilliant work.