This work should still be considered experimental and is subject to change without notice. These files are not updated on any kind of automated schedule yet. Once they are a machine-readable "index" file will be provided.
This is a minimalist landing page to document vector embeddings releases produced by SFO Museum. These files are provided "as-is" for the purposes of testing and helping to understand what the "shape" of shared vector embeddings within the cultural heritage community might look like. Embeddings have been exported as Apache Parquet files.
For background please consult:
Each row in a Parquet file contains a single vector embedding representing a record. Records have the following definition:
| Field name | Type | Parquet type | Description |
|---|---|---|---|
| provider | string | dict,zstd | The name (or context) of the provider responsible for depiction_id. |
| depiction_id | string | dict,zstd | The unique identifier for the depiction for which embeddings have been generated. |
| subject_id | string | dict,zstd | The unique identifier associated with the record that depiction_id depicts. |
| model | string | dict,zstd | The label for the model used to generate embeddings for depiction_id. |
| embeddings | []float32 | plain,list | The vector embeddings generated for depiction_id using model. |
| created | int64 | The Unix timestamp when the vector embeddings were generated. | |
| attributes | map[string]string | An arbitrary map of key-value properties associated with the embeddings. Record attributes are encouraged to include the required "OEmbeddings" fields (described below)but this is not a requirement. |
Record attributes are a free-form set of key-value pairs. They are encouraged, but not required, to include the minimum set of common "OEmbeddings" properties which are meant to ensure the least amount of metadata necessary to allow suitable attribution and provenance for object records. These properties are defined in the OEmbeddings JSON schema. Here is that definition in table form:
| Name | Type | Required | Notes |
|---|---|---|---|
| type | string | yes | Either “image” or “text”. |
| preview | string | yes | The preview content for the vector embeddings. If type is “text” then this is expected to be a string. If type is “image” this is expected to be a string confirming to the JSON Schema “uri” type |
| depiction_url | uri | no | A web page (or resource) for the depiction used to create the vector embeddings. |
| subject_url | uri | yes | A web page (or resource) for the subject of the depiction used to create the vector embeddings. |
| subject_title | string | yes | The title of the subject of the depiction. This may be an empty string. |
| subject_creditline | string | yes | The creditline or attribution for the subject of the depiction. This may be an empty string. |
| provider_name | string | yes | The name of the provider (holder) of the subject being depicted. |
| provider_url | uri | yes | The primary web page for the provider (holder) of the subject being depicted. |
Most of these files were created using the tools provided by the sfomuseum/go-embeddings-harvest package.
| Name | Type | Description | Dimensions | Size | Signatures (x.509) |
|---|---|---|---|---|---|
| sfomuseum-collection-1152-siglip2-naflex-20260423.parquet | image | Object images from the SFO Museum Aviation Collection. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 374MB | sfomuseum-collection-1152-siglip2-naflex-20260423-signatures.parquet (public key) experimental |
| sfomuseum-collection-1152-siglip2-patch14-20260519.parquet | image | Object images from the SFO Museum Aviation Collection. Included models: google/siglip2-so400m-patch16-patch14-384. |
1152 | 374MB | sfomuseum-collection-1152-siglip2-patch14-20260519-signatures.parquet (public key) experimental |
| sfomuseum-collection-512-mobileclip-20260608.parquet | image | Object images from the SFO Museum Aviation Collection. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 526MB | sfomuseum-collection-512-mobileclip-20260608-signatures.parquet (public key) experimental |
| Name | Type | Description | Dimensions | Size | Signatures (x.509) |
|---|---|---|---|---|---|
| sfomuseum-exhibitions-1152-siglip2-naflex-20260424.parquet | image | Object images from SFO Museum exhibitions. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 17MB | sfomuseum-exhibitions-1152-siglip2-naflex-20260424-signatures.parquet (public key) experimental |
| sfomuseum-exhibitions-1152-siglip2-patch14-20260518.parquet | image | Object images from SFO Museum exhibitions. Included models: google/siglip2-so400m-patch14-384. |
1152 | 17MB | sfomuseum-exhibitions-1152-siglip2-patch14-20260518-signatures.parquet (public key) experimental |
| sfomuseum-exhibitions-512-mobileclip-20260414.parquet | image | Object images from SFO Museum exhibitions. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 25MB | sfomuseum-exhibitions-512-mobileclip-20260414-signatures.parquet (public key) experimental |
| Name | Type | Description | Dimensions | Size | Signatures (x.509) |
|---|---|---|---|---|---|
| sfomuseum-instagram-1152-siglip2-naflex-20260424.parquet | image | Images and photographs from the SFO Museum Instagram account. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 24MB | sfomuseum-instagram-1152-siglip2-naflex-20260424-signatures.parquet (public key) experimental |
| sfomuseum-instagram-1152-siglip2-patch14-20260511.parquet | image | Images and photographs from the SFO Museum Instagram account. Included models: google/siglip2-so400m-patch14-384. |
1152 | 24MB | sfomuseum-instagram-1152-siglip2-patch14-20260511-signatures.parquet (public key) experimental |
| sfomuseum-instagram-512-mobileclip-20260414.parquet | image | Images and photographs from the SFO Museum Instagram account. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 47MB | sfomuseum-instagram-512-mobileclip-20260414-signatures.parquet (public key) experimental |
| Name | Type | Description | Dimensions | Size | Signatures | |
|---|---|---|---|---|---|---|
| sfomuseum-media-1152-siglip2-naflex-20260518.parquet | image | Installation photos from SFO Museum exhibitions. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 25MB | sfomuseum-media-1152-siglip2-naflex-20260518-signatures.parquet (public key) experimental | |
| sfomuseum-media-1152-siglip2-patch14-20260519.parquet | image | Installation photos from SFO Museum exhibitions. Included models: google/siglip2-so400m-patch14-384. |
1152 | 25MB | sfomuseum-media-1152-siglip2-patch14-20260519-signatures.parquet (public key) experimental | |
| sfomuseum-media-512-mobileclip-20260414.parquet | image | Installation photos from SFO Museum exhibitions. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 35MB | sfomuseum-media-512-mobileclip-20260414-signatures.parquet (public key) experimental |
| Name | Type | Description | Dimensions | Size |
|---|---|---|---|---|
| nga-opendata-1152-siglip2-naflex-20260502.parquet | image | Object images from the National Gallery of Art collection. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 652MB |
| nga-opendata-1152-siglip2-patch14-naflex-20260522.parquet | image | Object images from the National Gallery of Art collection. Included models: google/siglip2-so400m-patch14-384. |
1152 | 659MB |
| nga-opendata-512-mobileclip-20260413.parquet | image | Object images from the National Gallery of Art collection. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 903MB |
| Name | Type | Description | Dimensions | Size |
|---|---|---|---|---|
| moma-collection-1152-siglip2-naflex-20260606.parquet | image | Object images from the Museum of Modern Art collection. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 450MB |
| moma-collection-1152-siglip2-patch14-20260523.parquet | image | Object images from the Museum of Modern Art collection. Included models: google/siglip2-so400m-patch14-384. |
1152 | 444MB |
| moma-collection-512-mobileclip-20260423.parquet | image | Object images from the Museum of Modern Art collection. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 656MB |
| Name | Type | Description | Dimensions | Size |
|---|---|---|---|---|
| si-fsg-openaccess-1152-siglip2-naflex-20260607.parquet | image | Object images from the Smithsonian's National Museum of Asian Art. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 38MB |
| si-fsg-openaccess-1152-siglip2-patch14-20260608.parquet | image | Object images from the Smithsonian's National Museum of Asian Art. Included models: google/siglip2-so400m-patch14-384. |
1152 | 38MB |
| si-fsg-openaccess-512-mobileclip-20260608.parquet | image | Object images from the Smithsonian's National Museum of Asian Art. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 55MB |
| Name | Type | Description | Dimensions | Size |
|---|---|---|---|---|
| si-nasm-openaccess-1152-siglip2-naflex-20260608.parquet | image | Object images from the National Air and Space Museum. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 32MB |
| si-nasm-openaccess-1152-siglip2-patch14-20260607.parquet | image | Object images from the National Air and Space Museum. Included models: google/siglip2-so400m-patch14-384. |
1152 | 32MB |
| si-nasm-openaccess-512-mobileclip-20260608.parquet | image | Object images from the National Air and Space Museum. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 47MB |
| Name | Type | Description | Dimensions | Size |
|---|---|---|---|---|
| si-nmah-openaccess-1152-siglip2-naflex-20260606.parquet | image | Object images from the National Museum of American History. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 95MB |
| si-nmah-openaccess-1152-siglip2-patch14-20260607.parquet | image | Object images from the National Museum of American History. Included models: google/siglip2-so400m-patch14-384. |
1152 | 95MB |
| si-nmah-openaccess-512-mobileclip-20260608.parquet | image | Object images from the National Museum of American History. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 160MB |
| Name | Type | Description | Dimensions | Size |
|---|---|---|---|---|
| si-saam-openaccess-1152-siglip2-naflex-20260607.parquet | image | Object images from the Smithsonian American Art Museum. Included models: google/siglip2-so400m-patch16-naflex. |
1152 | 60M |
| si-saam-openaccess-1152-siglip2-patch14-20260807.parquet | image | Object images from the Smithsonian American Art Museum. Included models: google/siglip2-so400m-patch14-384. |
1152 | 60MB |
| si-saam-openaccess-512-mobileclip-20260608.parquet | image | Object images from the Smithsonian American Art Museum. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 88MB |
| Name | Type | Description | Dimensions | Size |
|---|---|---|---|---|
| 20260410-flickr-49487266@N07-72157710813888403.parquet | image | Photos from the Flickr Commons San Diego Air and Space Museum California's Aviation History photoset. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 4MB |
| 20260410-flickr-group-95693046@N00.parquet | image | Open or CreativeCommons licensed photos from the Airports SFO Flickr group pool. Included models: apple/mobileclip_s0, apple/mobileclip_s1, apple/mobileclip_s2. |
512 | 29MB |