On "Page Info > Links", saving Name and Address

berwin · Post by **berwin** » April 5th, 2006, 11:16 pm

When right-clicking on a page > View Page Info > Links, a window shows up with a description on the left side and its link on the right side. When there are many links on the page, many descriptions and their links are listed.
My question is: Is it possible to save both descriptions and their links?
So far I have only been able through "Select All > Copy" and paste into Notepad or Excel to receive only the links, but never the descriptions of the links.

the-edmeister · Post by **the-edmeister** » April 6th, 2006, 6:40 am

Maximize the window and take a Screenshot.

Ed

berwin · Post by **berwin** » April 6th, 2006, 9:39 am

Thanks, I know it would work, but on a page with hundreds of links the process would become cumbersome, and then the conversion to type from graphics, oh boy...
I guess there is not a simple method...

jscher2000 · Post by **jscher2000** » April 6th, 2006, 10:07 am

Didn't Netscape 4.x have the option to print pages with a list of all the links at the end? That was handy.

With a little bit of DOM programming, it definitely should be possible to extract all the links, leaving just the question of how to format the results. When I do applications like this, I tend to use the internet controls supplied with IE, programmed from a VBA host like Microsoft Word (just 'cause I'm so familiar with it). But JavaScript should work, too. I'm not familiar with all the respective powers of bookmarklets, Greasemonkey scripts, and extensions, but at least one of them should have the privileges necessary to do it.

texmex · Post by **texmex** » April 6th, 2006, 1:52 pm

Well here's a quick and dirty solution:
Copy the contents of this code to your clipboard
Go into Bookmark manager and create a new bookmark
Give it any name you like
Paste my code into the "Location" box.
and close.

Now if you click that book mark, it will open a new window and paste in the links and their text strings as a table. You can then copy and paste the table.

Code: Select all

javascript:x=document.getElementsByTagName(%22A%22);y=window.open();y.document.write(%22<HTML><HEAD></HEAD><BODY><table>%22);for(n=0;n<x.length;n++){y.document.write(%22<tr><td>%22+x[n].text+%22</td><td>%22+x[n].href+%22</td></tr>%22);}y.document.write(%22</table></BODY></HTML>%22);y.document.close();void 0;

Sorry it's so wide but it's imperitive that you have no spaces in this line of code.

It only checks for Anchors tags (not other types of links)
Hope this helps.

Thumper · Post by **Thumper** » April 6th, 2006, 1:56 pm

Remind me to file a bug about the Page Info UI by the way. It's been in dire need of refactoring for years.

- Chris

jscher2000 · Post by **jscher2000** » April 6th, 2006, 2:24 pm

texmex wrote:Well here's a quick and dirty solution:

Way cool. Here's an alternate version that doesn't use a table, skips the "name" type anchors that lack an href, and adds a little clickable link.

Code: Select all

javascript:loc=location.href;x=document.getElementsByTagName(%22A%22);y=window.open();y.document.write(%22<html><head><title>Links!<title></head><body><h3>Links from %22+loc+%22</h3>%22);for(n=0;n<x.length;n++){if(x[n].href!="")y.document.write(%22<p>Text: %22+x[n].text+%22<br>\nURL: %22+x[n].href+%22 <a href=\%22%22+x[n].href+%22\%22>Go!</a></p>\n%22);}y.document.write(%22</body></html>%22);y.document.close();void 0;

(The few spaces in there are correct as some phrases and tags do have spaces in them. For purity, you could replace them with %20 to create a truly correct URL.)

I wonder why the new document appears behind the original one?

texmex · Post by **texmex** » April 6th, 2006, 3:14 pm

OK so it's not as dirty, but since I got in first, it wasn't as quick either ;-)

I chose to put it into a table as I noticed that berwin mentioned Excel. If you select all and copy my resultant window you can then goto Excel and do a Paste Special.. Text.. and it will all nicely split up into the columns. Wish I'd thought of removing the empty links though. I daresay a hybrid of the two solutions could be quite useful.

jscher2000 wrote:I wonder why the new document appears behind the original one?

I was wondering that too. I did try to add the line y.focus(); but to no avail. Even though it doesn't stop the code running.

jscher2000 · Post by **jscher2000** » April 6th, 2006, 6:08 pm

texmex wrote:OK so it's not as dirty, but since I got in first, it wasn't as quick either

The problem with this project is, it's a bottomless pit. I decided I wanted to get images when there's no text...

Code: Select all

javascript:loc=location.href;x=document.getElementsByTagName(%22A%22);y=window.open();y.document.write(%22<html><head><title>Links!<title></head>\n<body><h3>Links from %22+loc+%22</h3>\n%22);for(n=0;n<x.length;n++){if(x[n].href!=""){if(x[n].text.replace(/\s+/,%22%22).length<1){for(j=0;j<x[n].childNodes.length;j++){if(x[n].childNodes[j].nodeName=="IMG"){y.document.write(%22<p>Image: <img src=\%22%22+x[n].childNodes[j].src+%22\%22 alt=\%22%22+x[n].childNodes[j].alt+%22\%22>%22); break;}}}else y.document.write(%22<p>Text: %22+x[n].text);y.document.write(%22<br>\nURL: %22+x[n].href+%22 <a href=\%22%22+x[n].href+%22\%22>=Go=&gt;</a></p>\n%22);}}y.document.write(%22</body></html>%22);y.document.close();void 0;

Thanks again for showing the way.

berwin · Post by **berwin** » April 6th, 2006, 8:18 pm

texmex, a thousand thanks to you, this works perfectly. Easy to paste into Excel and edit it there with two columns. Are you a genius? :-)

jscher2000, thanks for your effort and ideas.

I tried all three codes on a few web sites, and so far one website came back empty. Just a report, I am not complaining.

Is this forum great, or what?

dickvl · Post by **dickvl** » April 6th, 2006, 9:55 pm

berwin wrote:I tried all three codes on a few web sites, and so far one website came back empty. Just a report, I am not complaining.

Could be a frame issue?

the-edmeister · Post by **the-edmeister** » April 6th, 2006, 11:48 pm

jscher & texmex,

Didn't realize it could be done with a Bookmarklet.
Thanks for these Bookmarklets, I am adding them to my collection.

Ed

tester123 · Post by **tester123** » April 2nd, 2008, 6:41 am

I am trying this on Japanese html pages
page code has

(a href="http://xyz.net/index.html")(STRONG)高級 (/STRONG)食通(/a)

note: i have replaced the tag start and end <> signs with () since it was showing the effect of the above line insted of the line itself

the java script

"javascript:x=document.getElementsByTagName(%22A%22);y=window.open();y.document.write(%22<HTML><HEAD></HEAD><BODY><table>%22);for(n=0;n<x.length;n++){y.document.write(%22<tr><td>%22+x[n].text+%22</td><td>%22+x[n].href+%22</td></tr>%22);}y.document.write(%22</table></BODY></HTML>%22);y.document.close();void 0;"

i get the list as

食通 http://xyz.net/index.html

insted of

高級食通 http://xyz.net/index.html

pls note - i have faked the url and the words as i am not supposed to disclose but you can try this on any japanese page.

for quick try save the following as an html page and see

<html lang="ja-JP"> <head> <meta http-equiv="Content-type" content="text/html;

charset=Shift_JIS" /> <meta http-equiv="Content-Style-Type" content="text/css" />
<meta http-equiv="Content-Script-Type" content="text/javascript" />

<title>test</title>
<a href="http://xyz.net/index.html"><STRONG>高級 </STRONG>食通</a>
</html>

what makes this worst is i don't know japnese and java script.
i need to get the list of all the URLs and the respective anchor texts from many japanese html pages for verification and doing this manually has already become a nightmare!

pls help!

thanks,

- tstr

jscher2000 · Post by **jscher2000** » April 2nd, 2008, 5:49 pm

Good point. The script looks for the direct .text child of the link, and if there are other tags in there, that text is missed.

To solve that, try this. Change x[n].text to x[n].textContent (which is the Firefox equivalent of IE's innerText). Does it work?

If you actually wanted the full HTML from inside the link, to preserve the exact appearance, you could in theory change it to x[n].innerHTML, but I can't recommend it. Unless you thoroughly cleanse the HTML, you might end up moving untrusted code into a trusted content and creating a security problem for yourself later.

tester123 · Post by **tester123** » April 3rd, 2008, 7:00 pm

jscher,
sorry for late reply.
"textContent" has worked for me!. it is fetching correct text now.
cool!
thanks a Tons!
-tstr

On "Page Info > Links", saving Name and Address

On "Page Info > Links", saving Name and Address

need help on this script for using on japanese html pages