domainendung-in-php-erkennen-header

Domainendung in PHP automatisch erkennen

Domainendungen in URLs zu erkennen ist Heute nicht mehr leicht, da es bereits über 1.000 unterschiedliche TLDs gibt. Die häufigsten Domainendungen würden sich mit RegEx erkennen lassen. Das scheitert aber dann spätestens bei Endungen wie „co.uk“. Da ich dafür aber eine funktionierende Lösung brauchte, habe ich mich damit mal auseinander gesetzt und da ich keine gute fertige Lösung fand, habe ich mir selbst eine gebastelt. Und zwar Folgende:

Weil mir parse_url() nicht reicht..

Die in PHP integrierte Funktion parse_url() kann bereits einige Infos filtern. Hier am Beispiel ‚http://sub1.sub2.domain.co.uk/folder/file.php‘.

PHP-Code

print_r(parse_url('http://sub1.sub2.domain.co.uk/folder/file.php'));

Ausgabe

Array
(
    [scheme] => http
    [host] => sub1.sub2.domain.co.uk
    [path] => /folder/file.php
)

Damit ist es allerdings nur möglich den Host einer URL herauszufinden, nicht aber die Domain oder die Domainendung alleine. Und das ist mir zu wenig.

Alle Informationen aus einer URL extrahieren

Durch die neuen Domainendungen ist es auch nicht mehr möglich mit komplexen Regex-Filtern zu arbeiten. Dafür gibt es einfach zu viele. Das folgende Script beinhaltet bereits eine sehr ausführliche Liste aller aktuellen sowie bereits fest geplanten Domainendungen. Mit dieser Liste lässt sich die Domainendung herausfinden. Außerdem werden mit in dem Array, welches als Ergebnis aus der Funktion zurückgegeben wird, alle Subdomains einzeln mit übergeben.

Leider macht die lange Liste an Domainendungen das Script sehr lang.. 😀

PHP-Code

$domain = 'http://sub1.sub2.domain.co.uk/folder/file.php';
print_r(getDomainInfos($domain));

function getDomainInfos($domain)
{
    $parse = parse_url($domain);
     
    $tlds = json_decode('["ac","ac.in","academy","accountants","active","actor","ad","ae","ae.org","aero","af","africa","ag","agency","ai","airforce","al","allfinanz","alsace","am","amsterdam","an","android","ao","app","aq","ar","archi","army","arpa","art","as","asia","associates","at","attorney","au","auction","audio","auto","autos","aw","ax","axa","az","ba","baby","band","bar","bargains","basketball","bayern","bb","bd","be","beauty","beer","berlin","best","bet","bf","bg","bh","bi","bible","bid","bike","bingo","bio","biz","biz.tr","bj","bl","black","blackfriday","blog","bloomberg","blue","bm","bmw","bn","bnpparibas","bo","boo","book","boutique","bq","br","brussels","bs","bt","budapest","build","builders","business","buy","buzz","bv","bw","by","bz","bzh","ca","cab","cafe","cal","cam","camera","camp","cancerresearch","capetown","capital","car","caravan","cards","care","career","careers","cars","casa","cash","casino","cat","catering","cc","cd","center","ceo","cern","cf","cg","ch","channel","charity","chat","cheap","christmas","chrome","church","ci","citic","city","ck","cl","claims","cleaning","click","clinic","clothing","cloud","club","cm","cn","cn.com","co","co.at","co.cr","co.gl","co.gy","co.hu","co.id","co.il","co.in","co.jp","co.kr","co.mg","co.ms","co.nz","co.uk","co.vi","co.za","co.zw","codes","coffee","college","cologne","com","com.ag","com.ai","com.ar","com.au","com.bo","com.br","com.cn","com.cy","com.de","com.do","com.ec","com.es","com.fj","com.gl","com.gt","com.gy","com.hk","com.hr","com.kg","com.ki","com.lc","com.mg","com.ms","com.mt","com.mu","com.mx","com.my","com.nf","com.ng","com.ni","com.pa","com.pe","com.ph","com.ps","com.py","com.sa","com.sb","com.sc","com.sg","com.sv","com.tr","com.tw","com.uy","com.ve","community","company","computer","condos","construction","consulting","contractors","cooking","cool","coop","country","coupon","cr","credit","creditcard","cricket","crs","cruises","cu","cuisinella","cv","cw","cx","cy","cymru","cz","dad","dance","data","date","dating","day","de","de.com","deal","deals","degree","delivery","democrat","dental","dentist","desi","design","diamonds","diet","digital","direct","directory","discount","diy","dj","dk","dm","dnp","do","docs","doctor","dog","domains","download","drive","durban","dvag","dz","earth","eat","ec","eco","edu","education","ee","eg","eh","email","emerck","energy","engineer","engineering","enterprises","equipment","er","es","esq","estate","et","eu","eu.com","eus","events","exchange","expert","exposed","fail","faith","family","fan","farm","fashion","feedback","fi","film","finance","financial","firm.in","fish","fishing","fit","fitness","fj","fk","flights","florist","flowers","flsmidth","fly","fm","fo","foo","food","forsale","forum","foundation","fr","free","frl","frogans","fun","fund","furniture","futbol","ga","gal","gallery","game","games","garden","gay","gb","gb.com","gbiz","gd","ge","gen.in","gent","gf","gg","gh","gi","gift","gifts","gives","gl","glass","gle","global","globo","gm","gmail","gmbh","gmo","gmx","gn","gold","golf","google","gop","gov","gp","gq","gr","graphics","gratis","green","gripe","gs","gt","gu","guide","guitars","guru","gw","gy","hamburg","haus","health","healthcare","help","here","hiphop","hiv","hk","hm","hn","hockey","holdings","holiday","home","homes","horse","host","hosting","hot","hotel","hotels","house","how","hr","ht","hu","ibm","id","idv.tw","ie","il","im","immo","immobilien","in","inc","ind.in","industries","info","ing","ink","institute","insure","int","international","investments","io","iq","ir","is","istanbul","it","je","jetzt","jewelry","jm","jo","jobs","joburg","joy","jp","juegos","kaufen","ke","kg","kh","ki","kid","kim","kitchen","kiwi","km","kn","koeln","kp","kr","kr.com","krd","kred","kw","ky","kz","la","lacaixa","land","lat","law","lawyer","lb","lc","lds","lease","legal","lgbt","li","life","lighting","like","limited","limo","link","live","living","lk","loans","lol","london","lotto","love","lr","ls","lt","ltda","lu","luxe","luxury","lv","ly","ma","madrid","mail","maison","management","mango","map","market","marketing","mc","md","me","me.uk","med","media","meet","melbourne","meme","men","menu","mf","mg","mh","miami","mil","mini","mk","ml","mm","mn","mo","mobi","mobile","moda","moe","mom","monash","money","mormon","mortgage","moscow","motorcycles","mov","movie","mp","mq","mr","ms","mt","mu","museum","music","mv","mw","mx","my","mz","na","nagoya","name","navy","nc","ne","net","net.ag","net.ai","net.au","net.br","net.cn","net.do","net.gl","net.gy","net.in","net.kg","net.ki","net.lc","net.mg","net.mu","net.nf","net.ni","net.ps","net.sa","net.sb","net.sc","network","neustar","new","news","nexus","nf","ng","ngo","nhk","ni","ninja","nl","no","nom.ni","now","np","nr","nra","nrw","nu","nyc","nz","off.ai","okinawa","om","one","ong","onl","online","ooo","or.at","org","org.ag","org.ai","org.cn","org.do","org.es","org.gl","org.in","org.kg","org.ki","org.lc","org.mg","org.ms","org.mu","org.ni","org.ps","org.sa","org.sb","org.sc","org.tw","org.uk","organic","otsuka","ovh","pa","page","paris","partners","parts","party","pe","pf","pg","ph","pharmacy","phone","photo","photography","photos","physio","pics","pictures","ping","pink","pizza","pk","pl","place","play","plumbing","plus","pm","pn","pohl","poker","porn","post","pr","praxi","press","pro","prod","productions","prof","properties","property","ps","pt","pub","pw","py","qa","qpon","quebec","racing","radio","re","realtor","recipes","red","rehab","reise","reisen","reit","ren","rentals","repair","report","republican","rest","restaurant","review","reviews","rich","rio","rip","ro","rocks","rodeo","room","rs","rsvp","ru","ruhr","run","rw","ryukyu","sa","saarland","sale","sarl","sb","sc","sca","scb","schmidt","schule","science","scot","sd","se","search","secure","services","sex","sexy","sg","sh","shiksha","shoes","shop","shopping","show","si","singles","site","sj","sk","ski","sl","sm","sn","so","soccer","social","software","sohu","solar","solutions","soy","space","spiegel","sport","sports","sr","ss","st","store","studio","style","su","sucks","supplies","supply","support","surf","surgery","suzuki","sv","sx","sy","sydney","systems","sz","taipei","talk","tatar","tattoo","tax","taxi","tc","td","team","tech","technology","tel","tennis","tf","tg","th","theater","tickets","tienda","tips","tirol","tj","tk","tl","tm","tn","to","today","tokyo","tools","top","tour","town","toys","tp","tr","trade","training","travel","tt","tui","tv","tw","tz","ua","ug","uk","uk.com","um","university","uno","uol","us","us.com","uy","uz","va","vacations","vc","ve","vegas","ventures","verm\u00f6gensberater","verm\u00f6gensberatung","versicherung","vet","vg","vi","viajes","video","villas","vip","vision","vlaanderen","vn","vodka","vote","voting","voto","voyage","vu","wales","wang","watch","web","webcam","website","wed","wedding","wf","whoswho","wien","wiki","williamhill","win","wine","wme","work","works","world","ws","wtc","wtf","xxx","xyz","yachts","yandex","ye","yoga","yokohama","you","youtube","yt","za","zip","zm","zone","zw","\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae","\u0431\u0435\u043b","\u0434\u0435\u0442\u0438","\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435","\u043c\u043a\u0434","\u043c\u043e\u043d","\u043c\u043e\u0441\u043a\u0432\u0430","\u043e\u043d\u043b\u0430\u0439\u043d","\u043e\u0440\u0433","\u0440\u0443\u0441","\u0440\u0444","\u0441\u0430\u0439\u0442","\u0441\u0440\u0431","\u0443\u043a\u0440","\u049b\u0430\u0437","\u05d8\u05e2\u05e1\u05d8","\u0622\u0632\u0645\u0627\u06cc\u0634\u06cc","\u0625\u062e\u062a\u0628\u0627\u0631","\u0627\u0644\u0627\u0631\u062f\u0646","\u0627\u0644\u062c\u0632\u0627\u0626\u0631","\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629","\u0627\u0644\u0645\u063a\u0631\u0628","\u0627\u0645\u0627\u0631\u0627\u062a","\u0627\u06cc\u0631\u0627\u0646","\u0628\u0627\u0632\u0627\u0631","\u0628\u06be\u0627\u0631\u062a","\u062a\u0648\u0646\u0633","\u0633\u0648\u062f\u0627\u0646","\u0633\u0648\u0631\u064a\u0629","\u0634\u0628\u0643\u0629","\u0639\u0631\u0627\u0642","\u0639\u0645\u0627\u0646","\u0641\u0644\u0633\u0637\u064a\u0646","\u0642\u0637\u0631","\u0645\u0635\u0631","\u0645\u0644\u064a\u0633\u064a\u0627","\u0645\u0648\u0642\u0639","\u067e\u0627\u06a9\u0633\u062a\u0627\u0646","\u092a\u0930\u0940\u0915\u094d\u0937\u093e","\u092d\u093e\u0930\u0924","\u0938\u0902\u0917\u0920\u0928","\u09ac\u09be\u0982\u09b2\u09be","\u09ad\u09be\u09b0\u09a4","\u0a2d\u0a3e\u0a30\u0a24","\u0aad\u0abe\u0ab0\u0aa4","\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe","\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8","\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd","\u0baa\u0bb0\u0bbf\u0b9f\u0bcd\u0b9a\u0bc8","\u0c2d\u0c3e\u0c30\u0c24\u0c4d","\u0dbd\u0d82\u0d9a\u0dcf","\u0e44\u0e17\u0e22","\u10d2\u10d4","\u307f\u3093\u306a","\u30c6\u30b9\u30c8","\u4e16\u754c","\u4e2d\u4fe1","\u4e2d\u56fd","\u4e2d\u570b","\u4e2d\u6587\u7f51","\u4f01\u4e1a","\u4f5b\u5c71","\u516b\u5366","\u516c\u53f8","\u516c\u76ca","\u53f0\u6e7e","\u53f0\u7063","\u5546\u57ce","\u5546\u6807","\u5728\u7ebf","\u5e7f\u4e1c","\u6211\u7231\u4f60","\u624b\u673a","\u653f\u52a1","\u65b0\u52a0\u5761","\u673a\u6784","\u6d4b\u8bd5","\u6e2c\u8a66","\u6e38\u620f","\u79fb\u52a8","\u7ec4\u7ec7\u673a\u6784","\u7f51\u5740","\u7f51\u7edc","\u96c6\u56e2","\u9999\u6e2f","\uc0bc\uc131","\ud14c\uc2a4\ud2b8","\ud55c\uad6d"]');
    $parts = explode('.', $parse['host']);
    $count = count($parts)-1;
    $tld = '';
    $subdomains = '';
    if(in_array($parts[($count-1)].'.'.$parts[$count], $tlds) && $count > 1)
    {
        $tld = $parts[($count-1)].'.'.$parts[$count];
        $domain = $parts[$count-2].'.'.$tld;
        unset($parts[($count-2)], $parts[($count-1)], $parts[$count]);
    }
    elseif(in_array($parts[$count], $tlds))
    {
        $tld = $parts[$count];
        $domain = $parts[$count-1].'.'.$tld;
        unset($parts[($count-1)], $parts[$count]);
    }
     
    $parse['domain'] = $domain;
    $parse['tld'] = $tld;
    $parse['subdomains'] = $parts;
     
    return $parse;
}

Ausgabe

Array
(
    [scheme] => http
    [host] => sub1.sub2.domain.co.uk
    [path] => /folder/file.php
    [domain] => domain.co.uk
    [tld] => co.uk
    [subdomains] => Array
        (
            [0] => sub1
            [1] => sub2
        )

)

Fazit

Diese Funktion gibt weit mehr Infos über eine URL zurück, als die in PHP integrierten Funktionen. Auch wenn nicht alle Informationen zu einer URL immer benötigt werden, extrahiert diese Funktion die Werte schnell aus einer beliebigen URL.

Hinweise und Verbesserungen gerne in den Kommentaren.

Freelancer. Blogger. Affiliate. Und auf Weltreise.

Kommentar verfassen