ScrapyFSharp


ScrapyFSharp

Documentation

The ScrapyFSharp library can be installed from NuGet:
PM> Install-Package ScrapyFSharp

Example

This example demonstrates how to search old library repo and forks.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
#r "FSharp.Data.dll"
#r "ScrapyFSharp.dll"

open System
open System.Net
open FSharp.Data
open ScrapyFSharp.CssSelectorExtensions
open ScrapyFSharp.Network

let b = browser (fun c -> { c with UserAgent=FakeUserAgent.InternetExplorer8 })

let links =
    async {
        let! state1 = b.NavigateTo(Uri "https://bitbucket.org/repo/all/1",
                        Get, HttpRequestData.FormData ["name", "scrapysharp"])
        let homePage = state1.WebPage()
        return 
            match homePage.Html() with 
            | Some html ->
                [ for div in html.CssSelect "a.repo-link" do
                    yield "https://bitbucket.org" + (div.Attribute "href").Value() ]
            | None -> List.empty
    } |> Async.RunSynchronously

links value is:

["https://bitbucket.org/rflechner/scrapysharp";
 "https://bitbucket.org/wei_zhou/scrapysharp";
 "https://bitbucket.org/NeilMeredith/scrapysharp";
 "https://bitbucket.org/huoxudong125/scrapysharp";
 "https://bitbucket.org/jonny_guapo/scrapysharp";
 "https://bitbucket.org/twudi/scrapysharp";
 "https://bitbucket.org/dpriest/scrapysharp";
 "https://bitbucket.org/greenoaktree/scrapysharp";
 "https://bitbucket.org/appppppa/scrapysharp";
 "https://bitbucket.org/yuqiang/scrapysharp"]

Some more info

Samples & documentation

The library comes with comprehensible documentation. It can include tutorials automatically generated from *.fsx files in the content folder. The API reference is automatically generated from Markdown comments in the library implementation.

  • Tutorial contains a further explanation of this sample library.

  • API Reference contains automatically generated documentation for all types, modules and functions in the library. This includes additional brief samples on using most of the functions.

Contributing and copyright

The project is hosted on GitHub where you can report issues, fork the project and submit pull requests. If you're adding a new public API, please also consider adding samples that can be turned into a documentation. You might also want to read the library design notes to understand how it works.

The library is available under Public Domain license, which allows modification and redistribution for both commercial and non-commercial purposes.

namespace System
namespace System.Net
Multiple items
namespace FSharp

--------------------
namespace Microsoft.FSharp
Multiple items
namespace FSharp.Data

--------------------
namespace Microsoft.FSharp.Data
namespace ScrapyFSharp
module CssSelectorExtensions

from ScrapyFSharp
module Network

from ScrapyFSharp
val b : ScrapingBrowser

Full name: Index.b
val browser : f:(BrowserConfig -> BrowserConfig) -> ScrapingBrowser

Full name: ScrapyFSharp.Network.browser
val c : BrowserConfig
type FakeUserAgent =
  {Name: string;
   UserAgent: string;}
  static member Chrome : FakeUserAgent
  static member Chrome24 : FakeUserAgent
  static member InternetExplorer8 : FakeUserAgent

Full name: ScrapyFSharp.Network.FakeUserAgent
property FakeUserAgent.InternetExplorer8: FakeUserAgent
val links : string list

Full name: Index.links
val async : AsyncBuilder

Full name: Microsoft.FSharp.Core.ExtraTopLevelOperators.async
val state1 : BrowserState
member ScrapingBrowser.NavigateTo : url:Uri * ?verb:HttpVerb * ?data:HttpRequestData -> Async<BrowserState>
Multiple items
type Uri =
  new : uriString:string -> Uri + 5 overloads
  member AbsolutePath : string
  member AbsoluteUri : string
  member Authority : string
  member DnsSafeHost : string
  member Equals : comparand:obj -> bool
  member Fragment : string
  member GetComponents : components:UriComponents * format:UriFormat -> string
  member GetHashCode : unit -> int
  member GetLeftPart : part:UriPartial -> string
  ...

Full name: System.Uri

--------------------
Uri(uriString: string) : unit
Uri(uriString: string, uriKind: UriKind) : unit
Uri(baseUri: Uri, relativeUri: string) : unit
Uri(baseUri: Uri, relativeUri: Uri) : unit
union case HttpVerb.Get: HttpVerb
type HttpRequestData =
  | Text of string
  | Buffer of byte array
  | ReadableData of Stream
  | FormData of (string * string) list
  member ToRawParams : unit -> string
  static member FromBytes : b:byte array -> HttpRequestData
  static member FromFormData : f:(string * string) list -> HttpRequestData
  static member FromStream : s:Stream -> HttpRequestData
  static member FromString : s:string -> HttpRequestData

Full name: ScrapyFSharp.Network.HttpRequestData
union case HttpRequestData.FormData: (string * string) list -> HttpRequestData
val homePage : WebPage
member BrowserState.WebPage : ?autoDetectCharsetEncoding:bool -> WebPage
member WebPage.Html : unit -> HtmlDocument option
union case Option.Some: Value: 'T -> Option<'T>
val html : HtmlDocument
val div : obj
type Attribute =
  member Equals : obj:obj -> bool
  member GetHashCode : unit -> int
  member IsDefaultAttribute : unit -> bool
  member Match : obj:obj -> bool
  member TypeId : obj
  static member GetCustomAttribute : element:MemberInfo * attributeType:Type -> Attribute + 7 overloads
  static member GetCustomAttributes : element:MemberInfo -> Attribute[] + 15 overloads
  static member IsDefined : element:MemberInfo * attributeType:Type -> bool + 7 overloads

Full name: System.Attribute
union case Option.None: Option<'T>
Multiple items
module List

from Microsoft.FSharp.Collections

--------------------
type List<'T> =
  | ( [] )
  | ( :: ) of Head: 'T * Tail: 'T list
  interface IEnumerable
  interface IEnumerable<'T>
  member GetSlice : startIndex:int option * endIndex:int option -> 'T list
  member Head : 'T
  member IsEmpty : bool
  member Item : index:int -> 'T with get
  member Length : int
  member Tail : 'T list
  static member Cons : head:'T * tail:'T list -> 'T list
  static member Empty : 'T list

Full name: Microsoft.FSharp.Collections.List<_>
val empty<'T> : 'T list

Full name: Microsoft.FSharp.Collections.List.empty
Multiple items
type Async
static member AsBeginEnd : computation:('Arg -> Async<'T>) -> ('Arg * AsyncCallback * obj -> IAsyncResult) * (IAsyncResult -> 'T) * (IAsyncResult -> unit)
static member AwaitEvent : event:IEvent<'Del,'T> * ?cancelAction:(unit -> unit) -> Async<'T> (requires delegate and 'Del :> Delegate)
static member AwaitIAsyncResult : iar:IAsyncResult * ?millisecondsTimeout:int -> Async<bool>
static member AwaitTask : task:Task -> Async<unit>
static member AwaitTask : task:Task<'T> -> Async<'T>
static member AwaitWaitHandle : waitHandle:WaitHandle * ?millisecondsTimeout:int -> Async<bool>
static member CancelDefaultToken : unit -> unit
static member Catch : computation:Async<'T> -> Async<Choice<'T,exn>>
static member FromBeginEnd : beginAction:(AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg:'Arg1 * beginAction:('Arg1 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg1:'Arg1 * arg2:'Arg2 * beginAction:('Arg1 * 'Arg2 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromBeginEnd : arg1:'Arg1 * arg2:'Arg2 * arg3:'Arg3 * beginAction:('Arg1 * 'Arg2 * 'Arg3 * AsyncCallback * obj -> IAsyncResult) * endAction:(IAsyncResult -> 'T) * ?cancelAction:(unit -> unit) -> Async<'T>
static member FromContinuations : callback:(('T -> unit) * (exn -> unit) * (OperationCanceledException -> unit) -> unit) -> Async<'T>
static member Ignore : computation:Async<'T> -> Async<unit>
static member OnCancel : interruption:(unit -> unit) -> Async<IDisposable>
static member Parallel : computations:seq<Async<'T>> -> Async<'T []>
static member RunSynchronously : computation:Async<'T> * ?timeout:int * ?cancellationToken:CancellationToken -> 'T
static member Sleep : millisecondsDueTime:int -> Async<unit>
static member Start : computation:Async<unit> * ?cancellationToken:CancellationToken -> unit
static member StartAsTask : computation:Async<'T> * ?taskCreationOptions:TaskCreationOptions * ?cancellationToken:CancellationToken -> Task<'T>
static member StartChild : computation:Async<'T> * ?millisecondsTimeout:int -> Async<Async<'T>>
static member StartChildAsTask : computation:Async<'T> * ?taskCreationOptions:TaskCreationOptions -> Async<Task<'T>>
static member StartImmediate : computation:Async<unit> * ?cancellationToken:CancellationToken -> unit
static member StartWithContinuations : computation:Async<'T> * continuation:('T -> unit) * exceptionContinuation:(exn -> unit) * cancellationContinuation:(OperationCanceledException -> unit) * ?cancellationToken:CancellationToken -> unit
static member SwitchToContext : syncContext:SynchronizationContext -> Async<unit>
static member SwitchToNewThread : unit -> Async<unit>
static member SwitchToThreadPool : unit -> Async<unit>
static member TryCancelled : computation:Async<'T> * compensation:(OperationCanceledException -> unit) -> Async<'T>
static member CancellationToken : Async<CancellationToken>
static member DefaultCancellationToken : CancellationToken

Full name: Microsoft.FSharp.Control.Async

--------------------
type Async<'T>

Full name: Microsoft.FSharp.Control.Async<_>
static member Async.RunSynchronously : computation:Async<'T> * ?timeout:int * ?cancellationToken:Threading.CancellationToken -> 'T
Fork me on GitHub